OCR MEI S4 (Statistics 4) 2012 June

Question 1
View details
1 In a certain country, any baby born is equally likely to be a boy or a girl, independently for all births. The birthweight of a baby boy is given by the continuous random variable \(X _ { B }\) with probability density function (pdf) \(\mathrm { f } _ { B } ( x )\) and cumulative distribution function (cdf) \(\mathrm { F } _ { B } ( x )\). The birthweight of a baby girl is given by the continuous random variable \(X _ { G }\) with pdf \(\mathrm { f } _ { G } ( x )\) and cdf \(\mathrm { F } _ { G } ( x )\). The continuous random variable \(X\) denotes the birthweight of a baby selected at random.
  1. By considering $$\mathrm { P } ( X \leqslant x ) = \mathrm { P } ( X \leqslant x \mid \text { boy } ) \mathrm { P } ( \text { boy } ) + \mathrm { P } ( X \leqslant x \mid \text { girl } ) \mathrm { P } ( \text { girl } ) ,$$ find the cdf of \(X\) in terms of \(\mathrm { F } _ { B } ( x )\) and \(\mathrm { F } _ { G } ( x )\), and deduce that the pdf of \(X\) is $$\mathrm { f } ( x ) = \frac { 1 } { 2 } \left\{ \mathrm { f } _ { B } ( x ) + \mathrm { f } _ { G } ( x ) \right\} .$$
  2. The birthweights of baby boys and girls have means \(\mu _ { B }\) and \(\mu _ { G }\) respectively. Deduce that $$\mathrm { E } ( X ) = \frac { 1 } { 2 } \left( \mu _ { B } + \mu _ { G } \right) .$$
  3. The birthweights of baby boys and girls have common variance \(\sigma ^ { 2 }\). Find an expression for \(\mathrm { E } \left( X ^ { 2 } \right)\) in terms of \(\mu _ { B } , \mu _ { G }\) and \(\sigma ^ { 2 }\), and deduce that $$\operatorname { Var } ( X ) = \sigma ^ { 2 } + \frac { 1 } { 4 } \left( \mu _ { B } - \mu _ { G } \right) ^ { 2 } .$$
  4. A random sample of size \(2 n\) is taken from all the babies born in a certain period. The mean birthweight of the babies in this sample is \(\bar { X }\). Write down an approximation to the sampling distribution of \(\bar { X }\) if \(n\) is large.
  5. Suppose instead that a stratified sample of size \(2 n\) is taken by selecting \(n\) baby boys at random and, independently, \(n\) baby girls at random. The mean birthweight of the \(2 n\) babies in this sample is \(\bar { X } _ { s t }\). Write down the expected value of \(\bar { X } _ { s t }\) and find the variance of \(\bar { X } _ { s t }\).
  6. Deduce that both \(\bar { X }\) and \(\bar { X } _ { s t }\) are unbiased estimators of the population mean birthweight. Find which is the more efficient.
Question 2
View details
2 The random variable \(X ( X = 1,2,3,4,5,6 )\) denotes the score when a fair six-sided die is rolled.
  1. Write down the mean of \(X\) and show that \(\operatorname { Var } ( X ) = \frac { 35 } { 12 }\).
  2. Show that \(\mathrm { G } ( t )\), the probability generating function (pgf) of \(X\), is given by $$\mathrm { G } ( t ) = \frac { t \left( 1 - t ^ { 6 } \right) } { 6 ( 1 - t ) }$$ The random variable \(N ( N = 0,1,2 , \ldots )\) denotes the number of heads obtained when an unbiased coin is tossed repeatedly until a tail is first obtained.
  3. Show that \(\mathrm { P } ( N = r ) = \left( \frac { 1 } { 2 } \right) ^ { r + 1 }\) for \(r = 0,1,2 , \ldots\).
  4. Hence show that \(\mathrm { H } ( t )\), the pgf of \(N\), is given by \(\mathrm { H } ( t ) = ( 2 - t ) ^ { - 1 }\).
  5. Use \(\mathrm { H } ( t )\) to find the mean and variance of \(N\). A game consists of tossing an unbiased coin repeatedly until a tail is first obtained and, each time a head is obtained in this sequence of tosses, rolling a fair six-sided die. The die is not rolled on the first occasion that a tail is obtained and the game ends at that point. The random variable \(Q ( Q = 0,1,2 , \ldots )\) denotes the total score on all the rolls of the die. Thus, in the notation above, \(Q = X _ { 1 } + X _ { 2 } + \ldots + X _ { N }\) where the \(X _ { i }\) are independent random variables each distributed as \(X\), with \(Q = 0\) if \(N = 0\). The pgf of \(Q\) is denoted by \(\mathrm { K } ( t )\). The familiar result that the pgf of a sum of independent random variables is the product of their pgfs does not apply to \(\mathrm { K } ( t )\) because \(N\) is a random variable and not a fixed number; you should instead use without proof the result that \(\mathrm { K } ( t ) = \mathrm { H } ( \mathrm { G } ( t ) )\).
  6. Show that \(\mathrm { K } ( t ) = 6 \left( 12 - t - t ^ { 2 } - \ldots - t ^ { 6 } \right) ^ { - 1 }\).
    [0pt] [Hint. \(\left. \left( 1 - t ^ { 6 } \right) = ( 1 - t ) \left( 1 + t + t ^ { 2 } + \ldots + t ^ { 5 } \right) .\right]\)
  7. Use \(\mathrm { K } ( t )\) to find the mean and variance of \(Q\).
  8. Using your results from parts (i), (v) and (vii), verify the result that (in the usual notation for means and variances) $$\sigma _ { Q } { } ^ { 2 } = \sigma _ { N } { } ^ { 2 } \mu _ { X } { } ^ { 2 } + \mu _ { N } \sigma _ { X } { } ^ { 2 } .$$
Question 3
View details
3 At an agricultural research station, trials are being made of two fertilisers, A and B, to see whether they differ in their effects on the yield of a crop. Preliminary investigations have established that the underlying variances of the distributions of yields using the two fertilisers may be assumed equal. Scientific analysis of the fertilisers has suggested that fertiliser A may be inferior in that it leads, on the whole, to lower yield. A statistical analysis is being carried out to investigate this. The crop is grown in carefully controlled conditions in 14 experimental plots, 6 with fertiliser A and 8 with fertiliser B. The yields, in kg per plot, are as follows, arranged in ascending order for each fertiliser.
Fertiliser A9.810.210.911.512.713.3
Fertiliser B10.811.912.012.212.913.513.613.7
  1. Carry out a Wilcoxon rank sum test at the \(5 \%\) significance level to examine appropriate hypotheses.
  2. Carry out a \(t\) test at the \(5 \%\) significance level to examine appropriate hypotheses.
  3. Goodness of fit tests based on more extensive data sets from other trials with these fertilisers have failed to reject hypotheses of underlying Normal distributions. Discuss the relative merits of the analyses in parts (i) and (ii).