5.05b Unbiased estimates: of population mean and variance

259 questions

Sort by: Default | Easiest first | Hardest first
OCR S2 2010 June Q7
11 marks Standard +0.3
7 A machine is designed to make paper with mean thickness 56.80 micrometres. The thicknesses, \(x\) micrometres, of a random sample of 300 sheets are summarised by $$n = 300 , \quad \Sigma x = 17085.0 , \quad \Sigma x ^ { 2 } = 973847.0 .$$ Test, at the \(10 \%\) significance level, whether the machine is producing paper of the designed thickness.
OCR S2 2012 June Q5
11 marks Moderate -0.3
5 The acidity \(A\) (measured in pH ) of soil of a particular type has a normal distribution. The pH values of a random sample of 80 soil samples from a certain region can be summarised as $$\Sigma a = 496 , \quad \Sigma a ^ { 2 } = 3126 .$$ Test, at the \(10 \%\) significance level, whether in this region the mean pH of soil is 6.1 .
OCR S4 2009 June Q6
13 marks Challenging +1.8
6 The continuous random variable \(X\) has probability density function given by $$\mathrm { f } ( x ) = \begin{cases} 0 & x < a , \\ \mathrm { e } ^ { - ( x - a ) } & x \geqslant a , \end{cases}$$ where \(a\) is a constant. \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) are \(n\) independent observations of \(X\), where \(n \geqslant 4\).
  1. Show that \(\mathrm { E } ( X ) = a + 1\). \(T _ { 1 }\) and \(T _ { 2 }\) are proposed estimators of \(a\), where $$T _ { 1 } = X _ { 1 } + 2 X _ { 2 } - X _ { 3 } - X _ { 4 } - 1 \quad \text { and } \quad T _ { 2 } = \frac { X _ { 1 } + X _ { 2 } } { 4 } + \frac { X _ { 3 } + X _ { 4 } + \ldots + X _ { n } } { 2 ( n - 2 ) } - 1 .$$
  2. Show that \(T _ { 1 }\) and \(T _ { 2 }\) are unbiased estimators of \(a\).
  3. Determine which is the more efficient estimator.
  4. Suggest another unbiased estimator of \(a\) using all of the \(n\) observations.
OCR S4 2010 June Q7
15 marks Challenging +1.2
7 The continuous random variable \(X\) has probability density function given by $$f ( x ) = \begin{cases} \frac { x } { 2 \theta ^ { 2 } } & 0 \leqslant x \leqslant 2 \theta \\ 0 & \text { otherwise } \end{cases}$$ where \(\theta\) is an unknown positive constant.
  1. Find \(\mathrm { E } \left( X ^ { n } \right)\), where \(n \neq - 2\), and hence write down the value of \(\mathrm { E } ( X )\).
  2. Find
    1. \(\operatorname { Var } ( X )\),
    2. \(\operatorname { Var } \left( X ^ { 2 } \right)\).
    3. Find \(\mathrm { E } \left( X _ { 1 } + X _ { 2 } + X _ { 3 } \right)\) and \(\mathrm { E } \left( X _ { 1 } ^ { 2 } + X _ { 2 } ^ { 2 } + X _ { 3 } ^ { 2 } \right)\), where \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\) are independent observations of \(X\). Hence construct unbiased estimators, \(T _ { 1 }\) and \(T _ { 2 }\), of \(\theta\) and \(\operatorname { Var } ( X )\) respectively, which are based on \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\).
    4. Find \(\operatorname { Var } \left( T _ { 2 } \right)\).
OCR S4 2015 June Q8
12 marks Challenging +1.2
8 The independent random variables \(X _ { 1 }\) and \(X _ { 2 }\) have the distributions \(\mathrm { B } \left( n _ { 1 } , \theta \right)\) and \(\mathrm { B } \left( n _ { 2 } , \theta \right)\) respectively. Two possible estimators for \(\theta\) are $$T _ { 1 } = \frac { 1 } { 2 } \left( \frac { X _ { 1 } } { n _ { 1 } } + \frac { X _ { 2 } } { n _ { 2 } } \right) \text { and } T _ { 2 } = \frac { X _ { 1 } + X _ { 2 } } { n _ { 1 } + n _ { 2 } } .$$
  1. Show that \(T _ { 1 }\) and \(T _ { 2 }\) are both unbiased estimators, and calculate their variances.
  2. Find \(\frac { \operatorname { Var } \left( T _ { 1 } \right) } { \operatorname { Var } \left( T _ { 2 } \right) }\). Given that \(n _ { 1 } \neq n _ { 2 }\), use the inequality \(\left( n _ { 1 } - n _ { 2 } \right) ^ { 2 } > 0\) to find which of \(T _ { 1 }\) and \(T _ { 2 }\) is the more efficient estimator.
OCR S4 2018 June Q7
15 marks Challenging +1.2
7 Two independent observations \(X _ { 1 }\) and \(X _ { 2 }\) are made of a continuous random variable with probability density function $$f ( x ) = \begin{cases} \frac { 1 } { \theta } & 0 \leqslant x \leqslant \theta \\ 0 & \text { otherwise } \end{cases}$$ where \(\theta\) is a parameter whose value is to be estimated.
  1. Find \(\mathrm { E } ( X )\).
  2. Show that \(S _ { 1 } = X _ { 1 } + X _ { 2 }\) is an unbiased estimator of \(\theta\). \(L\) is the larger of \(X _ { 1 }\) and \(X _ { 2 }\), or their common value if they are equal.
  3. Show that the probability density function of \(L\) is \(\frac { 2 l } { \theta ^ { 2 } }\) for \(0 \leqslant l \leqslant \theta\).
  4. Find \(\mathrm { E } ( L )\).
  5. Find an unbiased estimator \(S _ { 2 }\) of \(\theta\), based on \(L\).
  6. Determine which of the two estimators \(S _ { 1 }\) and \(S _ { 2 }\) is the more efficient.
OCR MEI S4 2009 June Q1
24 marks Challenging +1.2
1 An industrial process produces components. Some of the components contain faults. The number of faults in a component is modelled by the random variable \(X\) with probability function $$\mathrm { P } ( X = x ) = \theta ( 1 - \theta ) ^ { x } \quad \text { for } x = 0,1,2 , \ldots$$ where \(\theta\) is a parameter with \(0 < \theta < 1\). The numbers of faults in different components are independent.
A random sample of \(n\) components is inspected. \(n _ { 0 }\) are found to have no faults, \(n _ { 1 }\) to have one fault and the remainder \(\left( n - n _ { 0 } - n _ { 1 } \right)\) to have two or more faults.
  1. Find \(\mathrm { P } ( X \geqslant 2 )\) and hence show that the likelihood is $$\mathrm { L } ( \theta ) = \theta ^ { n _ { 0 } + n _ { 1 } } ( 1 - \theta ) ^ { 2 n - 2 n _ { 0 } - n _ { 1 } }$$
  2. Find the maximum likelihood estimator \(\hat { \theta }\) of \(\theta\). You are not required to verify that any turning point you locate is a maximum.
  3. Show that \(\mathrm { E } ( X ) = \frac { 1 - \theta } { \theta }\). Deduce that another plausible estimator of \(\theta\) is \(\tilde { \theta } = \frac { 1 } { 1 + \bar { X } }\) where \(\bar { X }\) is the sample mean. What additional information is needed in order to calculate the value of this estimator?
  4. You are given that, in large samples, \(\tilde { \theta }\) may be taken as Normally distributed with mean \(\theta\) and variance \(\theta ^ { 2 } ( 1 - \theta ) / n\). Use this to obtain a \(95 \%\) confidence interval for \(\theta\) for the case when 100 components are inspected and it is found that 92 have no faults, 6 have one fault and the remaining 2 have exactly four faults each.
OCR MEI S4 2011 June Q1
24 marks Standard +0.8
1 The random variable \(X\) has the Normal distribution with mean 0 and variance \(\theta\), so that its probability density function is $$\mathrm { f } ( x ) = \frac { 1 } { \sqrt { 2 \pi \theta } } \mathrm { e } ^ { - x ^ { 2 } / 2 \theta } , \quad - \infty < x < \infty$$ where \(\theta ( \theta > 0 )\) is unknown. A random sample of \(n\) observations from \(X\) is denoted by \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\).
  1. Find \(\hat { \theta }\), the maximum likelihood estimator of \(\theta\).
  2. Show that \(\hat { \theta }\) is an unbiased estimator of \(\theta\).
  3. In large samples, the variance of \(\hat { \theta }\) may be estimated by \(\frac { 2 \hat { \theta } ^ { 2 } } { n }\). Use this and the results of parts (i) and (ii) to find an approximate \(95 \%\) confidence interval for \(\theta\) in the case when \(n = 100\) and \(\Sigma X _ { i } ^ { 2 } = 1000\).
CAIE FP2 2011 June Q6
7 marks Standard +0.8
6 The independent random variables \(X\) and \(Y\) have distributions with the same variance \(\sigma ^ { 2 }\). Random samples of 5 observations of \(X\) and \(n\) observations of \(Y\) are made and the results are summarised by $$\Sigma x = 5.5 , \quad \Sigma x ^ { 2 } = 15.05 , \quad \Sigma y = 8.0 , \quad \Sigma y ^ { 2 } = 36.4$$ Given that the pooled estimate of \(\sigma ^ { 2 }\) is 3 , find the value of \(n\).
CAIE FP2 2015 June Q6
4 marks Standard +0.8
6 The independent random variables \(X\) and \(Y\) have distributions with the same variance \(\sigma ^ { 2 }\). Random samples of \(N\) observations of \(X\) and 10 observations of \(Y\) are taken, and the results are summarised by $$\Sigma x = 5 , \quad \Sigma x ^ { 2 } = 11 , \quad \Sigma y = 10 , \quad \Sigma y ^ { 2 } = 160 .$$ These data give a pooled estimate of 12 for \(\sigma ^ { 2 }\). Find \(N\).
CAIE FP2 2015 June Q8
12 marks Standard +0.8
8 A large number of long jumpers are competing in a national long jump competition. The distances, in metres, jumped by a random sample of 7 competitors are as follows. $$\begin{array} { l l l l l l l } 6.25 & 7.01 & 5.74 & 6.89 & 7.24 & 5.64 & 6.52 \end{array}$$ Assuming that distances are normally distributed, test, at the \(5 \%\) significance level, whether the mean distance jumped by long jumpers in this competition is greater than 6.2 metres. The distances jumped by another random sample of 8 long jumpers in this competition are recorded. Using the data from this sample of 8 long jumpers, a \(95 \%\) confidence interval for the population mean, \(\mu\) metres, is calculated as \(5.89 < \mu < 6.75\). Find the unbiased estimates for the population mean and population variance used in this calculation.
CAIE FP2 2015 June Q10 OR
Challenging +1.3
The times taken, in hours, by cyclists from two different clubs, \(A\) and \(B\), to complete a 50 km time trial are being compared. The times taken by a cyclist from club \(A\) and by a cyclist from club \(B\) are denoted by \(t _ { A }\) and \(t _ { B }\) respectively. A random sample of 50 cyclists from \(A\) and a random sample of 60 cyclists from \(B\) give the following summarised data. $$\Sigma t _ { A } = 102.0 \quad \Sigma t _ { A } ^ { 2 } = 215.18 \quad \Sigma t _ { B } = 129.0 \quad \Sigma t _ { B } ^ { 2 } = 282.3$$ Using a 5\% significance level, test whether, on average, cyclists from club \(A\) take less time to complete the time trial than cyclists from club \(B\). A test at the \(\alpha \%\) significance level shows that there is evidence that the population mean time for cyclists from club \(B\) exceeds the population mean time for cyclists from club \(A\) by more than 0.05 hours. Find the set of possible values of \(\alpha\).
CAIE FP2 2016 June Q7
8 marks Standard +0.3
7 A random sample of 9 observations of a normal variable \(X\) is taken. The results are summarised as follows. $$\Sigma x = 24.6 \quad \Sigma x ^ { 2 } = 68.5$$ Test, at the \(5 \%\) significance level, whether the population mean is greater than 2.5.
CAIE FP2 2016 June Q11 OR
Challenging +1.8
Petra is studying a particular species of bird. She takes a random sample of 12 birds from nature reserve \(A\) and measures the wing span, \(x \mathrm {~cm}\), for each bird. She then calculates a \(95 \%\) confidence interval for the population mean wing span, \(\mu \mathrm { cm }\), for birds of this species, assuming that wing spans are normally distributed. Later, she is not able to find the summary of the results for the sample, but she knows that the \(95 \%\) confidence interval is \(25.17 \leqslant \mu \leqslant 26.83\). Find the values of \(\sum x\) and \(\sum x ^ { 2 }\) for this sample. Petra also measures the wing spans of a random sample of 7 birds from nature reserve \(B\). Their wing spans, \(y \mathrm {~cm}\), are as follows. $$\begin{array} { l l l l l l l } 23.2 & 22.4 & 27.6 & 25.3 & 28.4 & 26.5 & 23.6 \end{array}$$ She believes that the mean wing span of birds found in nature reserve \(A\) is greater than the mean wing span of birds found in nature reserve \(B\). Assuming that this second sample also comes from a normal distribution, with variance the same as the first distribution, test, at the \(10 \%\) significance level, whether there is evidence to support Petra's belief.
CAIE FP2 2008 November Q6
3 marks Standard +0.8
6 The independent random variables \(X\) and \(Y\) have normal distributions with the same variance \(\sigma ^ { 2 }\). Samples of 5 observations of \(X\) and 10 observations of \(Y\) are made, and the results are summarised by \(\Sigma x = 15 , \Sigma x ^ { 2 } = 128 , \Sigma y = 36\) and \(\Sigma y ^ { 2 } = 980\). Find a pooled estimate of \(\sigma ^ { 2 }\).
CAIE FP2 2013 November Q7
7 marks Standard +0.8
7 Two independent random variables \(X\) and \(Y\) have distributions with the same variance \(\sigma ^ { 2 }\). Random samples of \(n\) observations of \(X\) and \(2 n\) observations of \(Y\) are taken and the results are summarised by $$\Sigma x = 10.0 , \quad \Sigma x ^ { 2 } = 25.0 , \quad \Sigma y = 15.0 , \quad \Sigma y ^ { 2 } = 43.5 .$$ Given that the pooled estimate of \(\sigma ^ { 2 }\) is 2 , find the value of \(n\).
CAIE FP2 2014 November Q11 EITHER
Challenging +1.8
\includegraphics[max width=\textwidth, alt={}]{2c6b6722-ebba-4ade-9a9d-dd70e61cf52b-5_595_522_477_810}
A uniform plane object consists of three identical circular rings, \(X , Y\) and \(Z\), enclosed in a larger circular ring \(W\). Each of the inner rings has mass \(m\) and radius \(r\). The outer ring has mass \(3 m\) and radius \(R\). The centres of the inner rings lie at the vertices of an equilateral triangle of side \(2 r\). The outer ring touches each of the inner rings and the rings are rigidly joined together. The fixed axis \(A B\) is the diameter of \(W\) that passes through the centre of \(X\) and the point of contact of \(Y\) and \(Z\) (see diagram). It is given that \(R = \left( 1 + \frac { 2 } { 3 } \sqrt { } 3 \right) r\).
  1. Show that the moment of inertia of the object about \(A B\) is \(( 7 + 2 \sqrt { } 3 ) m r ^ { 2 }\). The line \(C D\) is the diameter of \(W\) that is perpendicular to \(A B\). A particle of mass \(9 m\) is attached to \(D\). The object is now held with its plane horizontal. It is released from rest and rotates freely about the fixed horizontal axis \(A B\).
  2. Find, in terms of \(g\) and \(r\), the angular speed of the object when it has rotated through \(60 ^ { \circ }\).
CAIE FP2 2014 November Q11 OR
Standard +0.8
Fish of a certain species live in two separate lakes, \(A\) and \(B\). A zoologist claims that the mean length of fish in \(A\) is greater than the mean length of fish in \(B\). To test his claim, he catches a random sample of 8 fish from \(A\) and a random sample of 6 fish from \(B\). The lengths of the 8 fish from \(A\), in appropriate units, are as follows. $$\begin{array} { l l l l l l l l } 15.3 & 12.0 & 15.1 & 11.2 & 14.4 & 13.8 & 12.4 & 11.8 \end{array}$$ Assuming a normal distribution, find a \(95 \%\) confidence interval for the mean length of fish in \(A\). The lengths of the 6 fish from \(B\), in the same units, are as follows. $$\begin{array} { l l l l l l } 15.0 & 10.7 & 13.6 & 12.4 & 11.6 & 12.6 \end{array}$$ Stating any assumptions that you make, test at the \(5 \%\) significance level whether the mean length of fish in \(A\) is greater than the mean length of fish in \(B\). Calculate a 95\% confidence interval for the difference in the mean lengths of fish from \(A\) and from \(B\).
CAIE FP2 2016 November Q6
7 marks Standard +0.3
6 A random sample of 8 observations of a normal random variable \(X\) has mean \(\bar { x }\), where $$\bar { x } = 6.246 \quad \text { and } \quad \Sigma ( x - \bar { x } ) ^ { 2 } = 0.784$$ Test, at the \(5 \%\) significance level, whether the population mean of \(X\) is less than 6.44.
Edexcel Paper 3 2023 June Q2
9 marks Moderate -0.8
  1. A machine fills packets with sweets and \(\frac { 1 } { 7 }\) of the packets also contain a prize.
The packets of sweets are placed in boxes before being delivered to shops. There are 40 packets of sweets in each box. The random variable \(T\) represents the number of packets of sweets that contain a prize in each box.
  1. State a condition needed for \(T\) to be modelled by \(\mathrm { B } \left( 40 , \frac { 1 } { 7 } \right)\) A box is selected at random.
  2. Using \(T \sim \mathrm {~B} \left( 40 , \frac { 1 } { 7 } \right)\) find
    1. the probability that the box has exactly 6 packets containing a prize,
    2. the probability that the box has fewer than 3 packets containing a prize. Kamil's sweet shop buys 5 boxes of these sweets.
  3. Find the probability that exactly 2 of these 5 boxes have fewer than 3 packets containing a prize. Kamil claims that the proportion of packets containing a prize is less than \(\frac { 1 } { 7 }\) A random sample of 110 packets is taken and 9 packets contain a prize.
  4. Use a suitable test to assess Kamil's claim. You should
Edexcel Paper 3 2024 June Q4
6 marks Standard +0.3
  1. The proportion of left-handed adults in a country is \(10 \%\)
Freya believes that the proportion of left-handed adults under the age of 25 in this country is different from 10\% She takes a random sample of 40 adults under the age of 25 from this country to investigate her belief.
  1. Find the critical region for a suitable test to assess Freya's belief. You should
    • state your hypotheses clearly
    • use a \(5 \%\) level of significance
    • state the probability of rejection in each tail
    • Write down the actual significance level of your test in part (a)
    In Freya's sample 7 adults were left-handed.
  2. With reference to your answer in part (a) comment on Freya's belief.
AQA Further AS Paper 2 Statistics Specimen Q8
9 marks Standard +0.3
8 In a small town, the number of properties sold during a week in spring by a local estate agent, Keith, can be regarded as occurring independently and with constant mean \(\mu\). Data from several years have shown the value of \(\mu\) to be 3.5 . A new housing development was built on the outskirts of the town and the properties on this development were offered for sale by the builder of the development, not by the local estate agents. During the first four weeks in spring, when properties on the new development were offered for sale by the builder, Keith sold a total of 8 properties. Keith claims that the sale of new properties by the builder reduced his mean number of properties sold during a week in spring. 8
  1. Investigate Keith's claim, using the \(5 \%\) level of significance.
    [0pt] [6 marks]
    8
  2. For your test carried out in part (a) state, in context, the meaning of a Type II error.
    [0pt] [1 mark]
    8
  3. State one advantage and one disadvantage of using a 1\% significance level rather than a 5\% level of significance in a hypothesis test.
    [0pt] [2 marks]
OCR Further Statistics 2021 November Q4
9 marks Standard +0.3
4 A random sample of 160 observations of a random variable \(X\) is selected. The sample can be summarised as follows. \(n = 160 \quad \sum x = 2688 \quad \sum x ^ { 2 } = 48398\)
  1. Calculate unbiased estimates of the following.
    1. \(\mathrm { E } ( X )\)
    2. \(\operatorname { Var } ( X )\)
  2. Find a 99\% confidence interval for \(\mathrm { E } ( X )\), giving the end-points of the interval correct to 4 significant figures.
  3. Explain whether it was necessary to use the Central Limit Theorem in answering
    1. part (a),
    2. part (b).
Edexcel S2 2015 January Q3
11 marks Moderate -0.8
3. Explain what you understand by
  1. a statistic,
  2. a sampling distribution. A factory stores screws in packets. A small packet contains 100 screws and a large packet contains 200 screws. The factory keeps small and large packets in the ratio 4:3 respectively.
  3. Find the mean and the variance of the number of screws in the packets stored at the factory. A random sample of 3 packets is taken from the factory and \(Y _ { 1 } , Y _ { 2 }\) and \(Y _ { 3 }\) denote the number of screws in each of these packets.
  4. List all the possible samples.
  5. Find the sampling distribution of \(\bar { Y }\)
Edexcel S2 2015 June Q2
15 marks Standard +0.3
2. A company produces chocolate chip biscuits. The number of chocolate chips per biscuit has a Poisson distribution with mean 8
  1. Find the probability that one of these biscuits, selected at random, does not contain 8 chocolate chips. A small packet contains 4 of these biscuits, selected at random.
  2. Find the probability that each biscuit in the packet contains at least 8 chocolate chips. A large packet contains 9 of these biscuits, selected at random.
  3. Use a suitable approximation to find the probability that there are more than 75 chocolate chips in the packet. A shop sells packets of biscuits, randomly, at a rate of 1.5 packets per hour. Following an advertising campaign, 11 packets are sold in 4 hours.
  4. Test, at the \(5 \%\) level of significance, whether or not there is evidence that the rate of sales of packets of biscuits has increased. State your hypotheses clearly.