Estimator properties and bias

A question is this type if and only if it asks to prove an estimator is unbiased, find its bias, or compare properties of different estimators.

15 questions · Standard +0.7

Sort by: Default | Easiest first | Hardest first
OCR MEI S3 2007 January Q1
18 marks Standard +0.3
1 The continuous random variable \(X\) has probability density function $$f ( x ) = k ( 1 - x ) \quad \text { for } 0 \leqslant x \leqslant 1$$ where \(k\) is a constant.
  1. Show that \(k = 2\). Sketch the graph of the probability density function.
  2. Find \(\mathrm { E } ( X )\) and show that \(\operatorname { Var } ( X ) = \frac { 1 } { 18 }\).
  3. Derive the cumulative distribution function of \(X\). Hence find the probability that \(X\) is greater than the mean.
  4. Verify that the median of \(X\) is \(1 - \frac { 1 } { \sqrt { 2 } }\).
  5. \(\bar { X }\) is the mean of a random sample of 100 observations of \(X\). Write down the approximate distribution of \(\bar { X }\).
OCR MEI S3 2007 June Q1
18 marks Standard +0.3
1 A manufacturer of fireworks is investigating the lengths of time for which the fireworks burn. For a particular type of firework this length of time, in minutes, is modelled by the random variable \(T\) with probability density function $$\mathrm { f } ( t ) = k t ^ { 3 } ( 2 - t ) \quad \text { for } 0 < t \leqslant 2$$ where \(k\) is a constant.
  1. Show that \(k = \frac { 5 } { 8 }\).
  2. Find the modal time.
  3. Find \(\mathrm { E } ( T )\) and show that \(\operatorname { Var } ( T ) = \frac { 8 } { 63 }\).
  4. A large random sample of \(n\) fireworks of this type is tested. Write down in terms of \(n\) the approximate distribution of \(\bar { T }\), the sample mean time.
  5. For a random sample of 100 such fireworks the times are summarised as follows. $$\Sigma t = 145.2 \quad \Sigma t ^ { 2 } = 223.41$$ Find a 95\% confidence interval for the mean time for this type of firework and hence comment on the appropriateness of the model.
OCR S4 2011 June Q7
14 marks Challenging +1.2
7 The continuous random variable \(U\) has unknown mean \(\mu\) and known variance \(\sigma ^ { 2 }\). In order to estimate \(\mu\), two random samples, one of 4 observations of \(U\) and the other of 6 observations of \(U\), are taken. The sample means are denoted by \(\bar { U } _ { 4 }\) and \(\bar { U } _ { 6 }\) respectively. One estimator \(S\), given by \(S = \frac { 1 } { 2 } \left( \bar { U } _ { 4 } + \bar { U } _ { 6 } \right)\), is proposed.
  1. Show that \(S\) is unbiased and find \(\operatorname { Var } ( S )\) in terms of \(\sigma ^ { 2 }\). A second estimator \(T\) of the form \(a \bar { U } _ { 4 } + b \bar { U } _ { 6 }\) is proposed, where \(a\) and \(b\) are chosen such that \(T\) is an unbiased estimator for \(\mu\) with the smallest possible variance.
  2. Find the values of \(a\) and \(b\) and the corresponding variance of \(T\).
  3. State, giving a reason, which of \(S\) and \(T\) is the better estimator.
  4. Compare the efficiencies of this preferred estimator and the mean of all 10 observations.
OCR MEI S4 2012 June Q1
24 marks Standard +0.3
1 In a certain country, any baby born is equally likely to be a boy or a girl, independently for all births. The birthweight of a baby boy is given by the continuous random variable \(X _ { B }\) with probability density function (pdf) \(\mathrm { f } _ { B } ( x )\) and cumulative distribution function (cdf) \(\mathrm { F } _ { B } ( x )\). The birthweight of a baby girl is given by the continuous random variable \(X _ { G }\) with pdf \(\mathrm { f } _ { G } ( x )\) and cdf \(\mathrm { F } _ { G } ( x )\). The continuous random variable \(X\) denotes the birthweight of a baby selected at random.
  1. By considering $$\mathrm { P } ( X \leqslant x ) = \mathrm { P } ( X \leqslant x \mid \text { boy } ) \mathrm { P } ( \text { boy } ) + \mathrm { P } ( X \leqslant x \mid \text { girl } ) \mathrm { P } ( \text { girl } ) ,$$ find the cdf of \(X\) in terms of \(\mathrm { F } _ { B } ( x )\) and \(\mathrm { F } _ { G } ( x )\), and deduce that the pdf of \(X\) is $$\mathrm { f } ( x ) = \frac { 1 } { 2 } \left\{ \mathrm { f } _ { B } ( x ) + \mathrm { f } _ { G } ( x ) \right\} .$$
  2. The birthweights of baby boys and girls have means \(\mu _ { B }\) and \(\mu _ { G }\) respectively. Deduce that $$\mathrm { E } ( X ) = \frac { 1 } { 2 } \left( \mu _ { B } + \mu _ { G } \right) .$$
  3. The birthweights of baby boys and girls have common variance \(\sigma ^ { 2 }\). Find an expression for \(\mathrm { E } \left( X ^ { 2 } \right)\) in terms of \(\mu _ { B } , \mu _ { G }\) and \(\sigma ^ { 2 }\), and deduce that $$\operatorname { Var } ( X ) = \sigma ^ { 2 } + \frac { 1 } { 4 } \left( \mu _ { B } - \mu _ { G } \right) ^ { 2 } .$$
  4. A random sample of size \(2 n\) is taken from all the babies born in a certain period. The mean birthweight of the babies in this sample is \(\bar { X }\). Write down an approximation to the sampling distribution of \(\bar { X }\) if \(n\) is large.
  5. Suppose instead that a stratified sample of size \(2 n\) is taken by selecting \(n\) baby boys at random and, independently, \(n\) baby girls at random. The mean birthweight of the \(2 n\) babies in this sample is \(\bar { X } _ { s t }\). Write down the expected value of \(\bar { X } _ { s t }\) and find the variance of \(\bar { X } _ { s t }\).
  6. Deduce that both \(\bar { X }\) and \(\bar { X } _ { s t }\) are unbiased estimators of the population mean birthweight. Find which is the more efficient.
OCR S2 2009 June Q7
16 marks Standard +0.3
7 The continuous random variable \(X\) has probability density function given by $$f ( x ) = \begin{cases} \frac { 2 } { 9 } x ( 3 - x ) & 0 \leqslant x \leqslant 3 , \\ 0 & \text { otherwise } . \end{cases}$$
  1. Find the variance of \(X\).
  2. Show that the probability that a single observation of \(X\) lies between 0.0 and 0.5 is \(\frac { 2 } { 27 }\).
  3. 108 observations of \(X\) are obtained. Using a suitable approximation, find the probability that at least 10 of the observations lie between 0.0 and 0.5 .
  4. The mean of 108 observations of \(X\) is denoted by \(\bar { X }\). Write down the approximate distribution of \(\bar { X }\), giving the value(s) of any parameter(s).
OCR MEI S3 2011 January Q4
17 marks Standard +0.3
4 A timber supplier cuts wooden fence posts from felled trees. The posts are of length \(( k + X ) \mathrm { cm }\) where \(k\) is a constant and \(X\) is a random variable which has probability density function $$f ( x ) = \begin{cases} 1 + x & - 1 \leqslant x < 0 \\ 1 - x & 0 \leqslant x \leqslant 1 \\ 0 & \text { elsewhere } \end{cases}$$
  1. Sketch \(\mathrm { f } ( x )\).
  2. Write down the value of \(\mathrm { E } ( X )\) and find \(\operatorname { Var } ( X )\).
  3. Write down, in terms of \(k\), the approximate distribution of \(\bar { L }\), the mean length of a random sample of 50 fence posts. Justify your choice of distribution.
  4. In a particular sample of 50 posts, the mean length is 90.06 cm . Find a \(95 \%\) confidence interval for the true mean length of the fence posts.
  5. Explain whether it is reasonable to suppose that \(k = 90\).
Edexcel S3 2014 June Q6
8 marks Standard +0.3
6. A random sample \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) is taken from a population with mean \(\mu\).
  1. Show that \(\bar { X } = \frac { 1 } { n } \left( X _ { 1 } + X _ { 2 } + \ldots + X _ { n } \right)\) is an unbiased estimator of the population mean \(\mu\). A company produces small jars of coffee. Five jars of coffee were taken at random and weighed. The weights, in grams, were as follows $$\begin{array} { l l l l l } 197 & 203 & 205 & 201 & 195 \end{array}$$
  2. Calculate unbiased estimates of the population mean and variance of the weights of the jars produced by the company. It is known from previous results that the weights are normally distributed with standard deviation 4.8 g . The manager is going to take a second random sample. He wishes to ensure that there is at least a \(95 \%\) probability that the estimate of the population mean is within 1.25 g of its true value.
  3. Find the minimum sample size required.
Edexcel S4 2007 June Q2
11 marks Standard +0.3
2. The value of orders, in \(\pounds\), made to a firm over the internet has distribution \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\). A random sample of \(n\) orders is taken and \(\bar { X }\) denotes the sample mean.
  1. Write down the mean and variance of \(\bar { X }\) in terms of \(\mu\) and \(\sigma ^ { 2 }\). A second sample of \(m\) orders is taken and \(\bar { Y }\) denotes the mean of this sample.
    An estimator of the population mean is given by $$U = \frac { n \bar { X } + m \bar { Y } } { n + m }$$
  2. Show that \(U\) is an unbiased estimator for \(\mu\).
  3. Show that the variance of \(U\) is \(\frac { \sigma ^ { 2 } } { n + m }\).
  4. State which of \(\bar { X }\) or \(U\) is a better estimator for \(\mu\). Give a reason for your answer.
Edexcel S4 2008 June Q1
13 marks Standard +0.3
  1. A random sample \(X _ { 1 } , X _ { 2 } , \ldots , X _ { 10 }\) is taken from a population with mean \(\mu\) and variance \(\sigma ^ { 2 }\).
    1. Determine the bias, if any, of each of the following estimators of \(\mu\).
    $$\begin{aligned} & \theta _ { 1 } = \frac { X _ { 3 } + X _ { 4 } + X _ { 5 } } { 3 } \\ & \theta _ { 2 } = \frac { X _ { 10 } - X _ { 1 } } { 3 } \\ & \theta _ { 3 } = \frac { 3 X _ { 1 } + 2 X _ { 2 } + X _ { 10 } } { 6 } \end{aligned}$$
  2. Find the variance of each of these estimators.
  3. State, giving reasons, which of these three estimators for \(\mu\) is
    1. the best estimator,
    2. the worst estimator.
Edexcel S4 2011 June Q6
16 marks Challenging +1.2
  1. A random sample \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) is taken from a population where each of the \(X _ { i }\) have a continuous uniform distribution over the interval \([ 0 , \beta ]\).
    The random variable \(Y = \max \left\{ X _ { 1 } , X _ { 2 } , \ldots , X _ { n } \right\}\).
    The probability density function of \(Y\) is given by
$$f ( y ) = \left\{ \begin{array} { c c } \frac { n } { \beta ^ { n } } y ^ { n - 1 } & 0 \leqslant y \leqslant \beta \\ 0 & \text { otherwise } \end{array} \right.$$
  1. Show that \(\mathrm { E } \left( Y ^ { m } \right) = \frac { n } { n + m } \beta ^ { m }\).
  2. Write down \(\mathrm { E } ( Y )\).
  3. Using your answers to parts (a) and (b), or otherwise, show that $$\operatorname { Var } ( Y ) = \frac { n } { ( n + 1 ) ^ { 2 } ( n + 2 ) } \beta ^ { 2 }$$
  4. State, giving your reasons, whether or not \(Y\) is a consistent estimator of \(\beta\). The random variables \(M = 2 \bar { X }\), where \(\bar { X } = \frac { 1 } { n } \left( X _ { 1 } + X _ { 2 } + \ldots + X _ { n } \right)\), and \(S = k Y\), where \(k\) is a constant, are both unbiased estimators of \(\beta\).
  5. Find the value of \(k\) in terms of \(n\).
  6. State, giving your reasons, which of \(M\) and \(S\) is the better estimator of \(\beta\) in this case. Five observations of \(X\) are: \(\quad \begin{array} { l l l l l } 8.5 & 6.3 & 5.4 & 9.1 & 7.6 \end{array}\)
  7. Calculate the better estimate of \(\beta\).
Edexcel S4 2013 June Q8
12 marks Challenging +1.2
8. A random sample \(W _ { 1 } , W _ { 2 } \ldots , W _ { n }\) is taken from a distribution with mean \(\mu\) and variance \(\sigma ^ { 2 }\)
  1. Write down \(\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } \right)\) and show that \(\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } \right) = n \left( \sigma ^ { 2 } + \mu ^ { 2 } \right)\) An estimator for \(\mu\) is $$\bar { X } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i }$$
  2. Show that \(\bar { X }\) is a consistent estimator for \(\mu\). An estimator of \(\sigma ^ { 2 }\) is $$U = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } - \left( \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } \right) ^ { 2 }$$
  3. Find the bias of \(U\).
  4. Write down an unbiased estimator of \(\sigma ^ { 2 }\) in the form \(k U\), where \(k\) is in terms of \(n\).
Edexcel S4 2014 June Q6
19 marks Challenging +1.2
6. Emily is monitoring the level of pollution in a river. Over a period of time she has found that the amount of pollution, \(X\), in a 100 ml sample of river water has a continuous distribution with probability density function \(\mathrm { f } ( x )\) given by $$f ( x ) = \left\{ \begin{array} { c c } \frac { 2 x } { a ^ { 2 } } & 0 \leqslant x \leqslant a \\ 0 & \text { otherwise } \end{array} \right.$$ where \(a\) is a constant. Emily takes a random sample \(X _ { 1 } , X _ { 2 } , X _ { 3 } , \ldots , X _ { n }\) to try to estimate the value of \(a\).
  1. Show that \(\mathrm { E } ( \bar { X } ) = \frac { 2 a } { 3 }\) and \(\operatorname { Var } ( \bar { X } ) = \frac { a ^ { 2 } } { 18 n }\) The random variable \(S = p \bar { X }\), where \(p\) is a constant, is an unbiased estimator of \(a\).
  2. Write down the value of \(p\) and find \(\operatorname { Var } ( S )\). Felix suggests using the statistic \(M = \max \left\{ X _ { 1 } , X _ { 2 } , X _ { 3 } , \ldots , X _ { n } \right\}\) as an estimator of \(a\).
    He calculates \(\mathrm { E } ( M ) = \frac { 2 n } { 2 n + 1 } a\) and \(\operatorname { Var } ( M ) = \frac { n } { ( n + 1 ) ( 2 n + 1 ) ^ { 2 } } a ^ { 2 }\)
  3. State, giving your reasons, whether or not \(M\) is a consistent estimator of \(a\). The random variable \(T = q M\), where \(q\) is a constant, is an unbiased estimator of \(a\).
  4. Write down, in terms of \(n\), the value of \(q\) and find \(\operatorname { Var } ( T )\).
  5. State, giving your reasons, which of \(S\) or \(T\) you would recommend Emily use as an estimator of \(a\). Emily took a sample of 5 values of \(X\) and obtained the following:
    5.3
    4.3 \(\begin{array} { l l } 5.7 & 7.8 \end{array}\) 6.9
  6. Calculate the estimate of \(a\) using your recommended estimator from part (e).
  7. Find the standard error of your estimate, giving your answer to 2 decimal places.
Edexcel S4 2016 June Q6
15 marks Challenging +1.2
6. A random sample of size \(n\) is taken from the random variable \(X\), which has a continuous uniform distribution over the interval [ \(0 , a\) ], \(a > 0\) The sample mean is denoted by \(\bar { X }\)
  1. Show that \(Y = 2 \bar { X }\) is an unbiased estimator of \(a\) The maximum value, \(M\), in the sample has probability density function $$f ( m ) = \left\{ \begin{array} { c c } \frac { n m ^ { n - 1 } } { a ^ { n } } & 0 \leqslant m \leqslant a \\ 0 & \text { otherwise } \end{array} \right.$$
  2. Find E(M)
  3. Show that \(\operatorname { Var } ( M ) = \frac { n a ^ { 2 } } { ( n + 2 ) ( n + 1 ) ^ { 2 } }\) The estimator \(S\) is defined by \(S = \frac { n + 1 } { n } M\) Given that \(n > 1\)
  4. state which of \(Y\) or \(S\) is the better estimator for \(a\). Give a reason for your answer.
Edexcel S4 Q6
18 marks Standard +0.3
6. A statistics student is trying to estimate the probability, \(p\), of rolling a 6 with a particular die. The die is rolled 10 times and the random variable \(X _ { 1 }\) represents the number of sixes obtained. The random variable \(R _ { 1 } = \frac { X _ { 1 } } { 10 }\) is proposed as an estimator of \(p\).
  1. Show that \(R _ { 1 }\) is an unbiased estimator of \(p\). The student decided to roll the die again \(n\) times ( \(n > 10\) ) and the random variable \(X _ { 2 }\) represents the number of sixes in these \(n\) rolls. The random variable \(R _ { 2 } = \frac { X _ { 2 } } { n }\) and the random variable \(Y = \frac { 1 } { 2 } \left( R _ { 1 } + R _ { 2 } \right)\).
  2. Show that both \(R _ { 2 }\) and \(Y\) are unbiased estimators of \(p\).
  3. Find \(\operatorname { Var } \left( R _ { 2 } \right)\) and \(\operatorname { Var } ( Y )\).
  4. State giving a reason which of the 3 estimators \(R _ { 1 } , R _ { 2 }\) and \(Y\) are consistent estimators of \(p\).
  5. For the case \(n = 20\) state, giving a reason, which of the 3 estimators \(R _ { 1 } , R _ { 2 }\) and \(Y\) you would recommend. The student's teacher pointed out that a better estimator could be found based on the random variable \(X _ { 1 } + X _ { 2 }\).
  6. Find a suitable estimator and explain why it is better than \(R _ { 1 } , R _ { 2 }\) and \(Y\). END
Pre-U Pre-U 9795/2 2018 June Q6
Challenging +1.3
6 In a certain city there are \(N\) taxis. Each taxi displays a different licensing number which is an integer in the range 1 to \(N\). A visitor to the city attempts to estimate the value of \(N\), assuming that the licensing number of each taxi observed is equally likely to be any integer from 1 to \(N\) inclusive.
  1. The visitor observes one randomly chosen licensing number, \(X\). Using standard results for \(\sum _ { r = 1 } ^ { n } r\) and \(\sum _ { r = 1 } ^ { n } r ^ { 2 }\), show that \(\mathrm { E } ( X ) = \frac { 1 } { 2 } ( N + 1 )\) and \(\operatorname { Var } ( X ) = \frac { 1 } { 12 } \left( N ^ { 2 } - 1 \right)\). The mean of 40 independent observations of \(X\) is denoted by \(A\).
  2. Find an unbiased estimator \(E _ { 1 }\) of \(N\) based on \(A\), and state the approximate distribution of \(E _ { 1 }\), giving the value(s) of any parameter(s). \(B\) is another random variable based on a random sample of 40 independent observations of \(X\). It is given that \(\mathrm { E } ( B ) = \frac { 40 } { 27 } N\) and that \(\operatorname { Var } ( B ) = \alpha N ^ { 2 }\) where \(\alpha\) is a constant.
  3. Find an unbiased estimator \(E _ { 2 }\) of \(N\) based on \(B\), and determine the set of values of \(\alpha\) for which \(\operatorname { Var } \left( E _ { 2 } \right) > \operatorname { Var } \left( E _ { 1 } \right)\) for all values of \(N\).