OCR MEI S4 (Statistics 4) 2015 June

Question 1
View details
1 The random variable \(X\) has the following probability density function, in which \(a\) is a (positive) parameter. $$\mathrm { f } ( x ) = \frac { 2 } { a } x \mathrm { e } ^ { - x ^ { 2 } / a } , \quad x \geqslant 0 .$$
  1. Verify that \(\int _ { 0 } ^ { \infty } \mathrm { f } ( x ) \mathrm { d } x = 1\).
  2. Show that \(\mathrm { E } \left( X ^ { 2 } \right) = a\) and \(\mathrm { E } \left( X ^ { 4 } \right) = 2 a ^ { 2 }\). The parameter \(a\) is to be estimated by maximum likelihood based on an independent random sample from the distribution, \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\).
  3. Show that the logarithm of the likelihood function is $$n \ln 2 - n \ln a + \sum _ { i = 1 } ^ { n } \ln X _ { i } - \frac { 1 } { a } \sum _ { i = 1 } ^ { n } X _ { i } ^ { 2 }$$ Hence obtain the maximum likelihood estimator, \(\hat { a }\), for \(a\).
    [0pt] [You are not required to verify that any turning point you find is a maximum.]
  4. Using the results from part (ii), show that \(\hat { a }\) is unbiased for \(a\) and find the variance of \(\hat { a }\).
  5. In a particular random sample from this distribution, \(n = 100\) and \(\sum x _ { i } ^ { 2 } = 147.1\). Obtain an approximate 95\% confidence interval for \(a\). (You may assume that the Central Limit Theorem holds in this case.) Option 2: Generating Functions
Question 2
View details
2 The random variable \(Z\) has the standard Normal distribution. The random variable \(Y\) is defined by \(Y = Z ^ { 2 }\).
You are given that \(Y\) has the following probability density function. $$\mathrm { f } ( y ) = \frac { 1 } { \sqrt { 2 \pi y } } \mathrm { e } ^ { - \frac { 1 } { 2 } y } , \quad y > 0$$
  1. Show that the moment generating function (mgf) of \(Y\) is given by $$\mathrm { M } _ { Y } ( \theta ) = ( 1 - 2 \theta ) ^ { - \frac { 1 } { 2 } }$$
  2. Use the mgf to obtain \(\mathrm { E } ( Y )\) and \(\operatorname { Var } ( Y )\). The random variable \(U\) is defined by $$U = Z _ { 1 } ^ { 2 } + Z _ { 2 } ^ { 2 } + \ldots + Z _ { n } ^ { 2 } ,$$ where \(Z _ { 1 } , Z _ { 2 } , \ldots , Z _ { n }\) are independent standard Normal random variables.
  3. State an appropriate general theorem for mgfs and hence write down the mgf of \(U\). State the values of \(\mathrm { E } ( U )\) and \(\operatorname { Var } ( U )\). The random variable \(W\) is defined by $$W = \frac { U - n } { \sqrt { 2 n } }$$
  4. Show that the logarithm of the \(\operatorname { mgf }\) of \(W\) is $$- \sqrt { \frac { n } { 2 } } \theta - \frac { n } { 2 } \ln \left( 1 - \sqrt { \frac { 2 } { n } } \theta \right) .$$ Use the series expansion of \(\ln ( 1 - t )\) to show that, as \(n \rightarrow \infty\), this expression tends to \(\frac { 1 } { 2 } \theta ^ { 2 }\).
    State what this implies about the distribution of \(W\) for large \(n\).
Question 3
View details
3 At an agricultural research station, trials are being carried out to compare a standard variety of tomato with one that has been genetically modified (GM). The trials are concerned with the mean weight of the tomatoes and also with the aesthetic appearance of the tomatoes.
    1. Tomatoes of the standard and GM varieties are grown under similar conditions. The tomatoes are weighed and the data are summarised as follows.
      VarietySample sizeSum of weights \(( \mathrm { g } )\)
      Sum of squares of
      weights \(\left( \mathrm { g } ^ { 2 } \right)\)
      Standard303218.3349257
      GM262954.1338691
      Carry out a test, using the Normal distribution, to investigate whether there is evidence, at the 5\% level of significance, that the two varieties of tomato differ in mean weight. State one assumption required for this test to be valid.
    2. The data in part (i) could have been used to carry out a test for the equality of means based on the \(t\) distribution. State two additional assumptions required for this test to be valid. Discuss briefly which test would be preferable in this case.
  1. In order to judge whether, on the whole, GM tomatoes have a better aesthetic appearance than standard tomatoes, a trial is carried out as follows. 10 of each variety are chosen and consumer panel is asked to arrange the 20 tomatoes in order according to their appearance.
    1. State two important features of the way in which this trial should be designed. Comment briefly on how reliable the evidence from the trial is likely to be.
    2. The order in which the consumer panel arranges the tomatoes is as follows. The tomato with best appearance is listed first. \(G\) and \(S\) denote GM and standard tomatoes respectively. $$\begin{array} { c c c c c c c c c c c c c c c c c c c c } G & G & G & S & G & G & G & S & G & S & S & S & G & G & S & G & S & S & S & S \end{array}$$ Carry out an appropriate test at the \(1 \%\) level of significance.