OCR MEI S4 (Statistics 4) 2007 June

Question 1 24 marks

1 The random variable $X$ has the continuous uniform distribution with probability density function $$\mathrm { f } ( x ) = \frac { 1 } { \theta } , \quad 0 \leqslant x \leqslant \theta$$ where $\theta ( \theta > 0 )$ is an unknown parameter.
A random sample of $n$ observations from $X$ is denoted by $X _ { 1 } , X _ { 2 } , \ldots , X _ { n }$, with sample mean $\bar { X } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } X _ { i }$.

Show that $2 \bar { X }$ is an unbiased estimator of $\theta$.
Evaluate $2 \bar { X }$ for a case where, with $n = 5$, the observed values of the random sample are $0.4,0.2$, 1.0, 0.1, 0.6. Hence comment on a disadvantage of $2 \bar { X }$ as an estimator of $\theta$. For a general random sample of size $n$, let $Y$ represent the sample maximum, $Y = \max \left( X _ { 1 } , X _ { 2 } , \ldots , X _ { n } \right)$. You are given that the probability density function of $Y$ is $$g ( y ) = \frac { n y ^ { n - 1 } } { \theta ^ { n } } , \quad 0 \leqslant y \leqslant \theta$$
An estimator $k Y$ is to be used to estimate $\theta$, where $k$ is a constant to be chosen. Show that the mean square error of $k Y$ is $$k ^ { 2 } \mathrm { E } \left( Y ^ { 2 } \right) - 2 k \theta \mathrm { E } ( Y ) + \theta ^ { 2 }$$ and hence find the value of $k$ for which the mean square error is minimised.
Comment on whether $k Y$ with the value of $k$ found in part (iii) suffers from the disadvantage identified in part (ii).

Question 2 24 marks

View details

2 The random variable $X$ has the binomial distribution with parameters $n$ and $p$, i.e. $X \sim \mathrm {~B} ( n , p )$.

Show that the probability generating function of $X$ is $\mathrm { G } ( t ) = ( q + p t ) ^ { n }$, where $q = 1 - p$.
Hence obtain the mean $\mu$ and variance $\sigma ^ { 2 }$ of $X$.
Write down the mean and variance of the random variable $Z = \frac { X - \mu } { \sigma }$.
Write down the moment generating function of $X$ and use the linear transformation result to show that the moment generating function of $Z$ is $$\mathrm { M } _ { Z } ( \theta ) = \left( q \mathrm { e } ^ { - \frac { p \theta } { \sqrt { n p q } } } + p \mathrm { e } ^ { \frac { q \theta } { \sqrt { n p q } } } \right) ^ { n } .$$
By expanding the exponential terms in $\mathrm { M } _ { Z } ( \theta )$, show that the limit of $\mathrm { M } _ { Z } ( \theta )$ as $n \rightarrow \infty$ is $\mathrm { e } ^ { \theta ^ { 2 } / 2 }$. You may use the result $\lim _ { n \rightarrow \infty } \left( 1 + \frac { y + \mathrm { f } ( n ) } { n } \right) ^ { n } = \mathrm { e } ^ { y }$ provided $\mathrm { f } ( n ) \rightarrow 0$ as $n \rightarrow \infty$.
What does the result in part (v) imply about the distribution of $Z$ as $n \rightarrow \infty$ ? Explain your reasoning briefly.
What does the result in part (vi) imply about the distribution of $X$ as $n \rightarrow \infty$ ?

Question 3 24 marks

View details

3 An engineering company buys a certain type of component from two suppliers, A and B. It is important that, on the whole, the strengths of these components are the same from both suppliers. The company can measure the strengths in its laboratory. Random samples of seven components from supplier A and five from supplier B give the following strengths, in a convenient unit.

Supplier A	25.8	27.4	26.2	23.5	28.3	26.4	27.2
Supplier B	25.6	24.9	23.7	25.8	26.9

The underlying distributions of strengths are assumed to be Normal for both suppliers, with variances 2.45 for supplier A and 1.40 for supplier B.

Test at the $5 \%$ level of significance whether it is reasonable to assume that the mean strengths from the two suppliers are equal.
Provide a two-sided 90\% confidence interval for the true mean difference.
Show that the test procedure used in part (i), with samples of sizes 7 and 5 and a $5 \%$ significance level, leads to acceptance of the null hypothesis of equal means if $- 1.556 < \bar { x } - \bar { y } < 1.556$, where $\bar { x }$ and $\bar { y }$ are the observed sample means from suppliers A and B . Hence find the probability of a Type II error for this test procedure if in fact the true mean strength from supplier A is 2.0 units more than that from supplier B.
A manager suggests that the Wilcoxon rank sum test should be used instead, comparing the median strengths for the samples of sizes 7 and 5 . Give one reason why this suggestion might be sensible and two why it might not.

Question 4 24 marks

View details

4 An agricultural company conducts a trial of five fertilisers (A, B, C, D, E) in an experimental field at its research station. The fertilisers are applied to plots of the field according to a completely randomised design. The yields of the crop from the plots, measured in a standard unit, are analysed by the one-way analysis of variance, from which it appears that there are no real differences among the effects of the fertilisers. A statistician notes that the residual mean square in the analysis of variance is considerably larger than had been anticipated from knowledge of the general behaviour of the crop, and therefore suspects that there is some inadequacy in the design of the trial.

Explain briefly why the statistician should be suspicious of the design.
Explain briefly why an inflated residual leads to difficulty in interpreting the results of the analysis of variance, in particular that the null hypothesis is more likely to be accepted erroneously. Further investigation indicates that the soil at the west side of the experimental field is naturally more fertile than that at the east side, with a consistent 'fertility gradient' from west to east.
What experimental design can accommodate this feature? Provide a simple diagram of the experimental field indicating a suitable layout. The company decides to conduct a new trial in its glasshouse, where experimental conditions can be controlled so that a completely randomised design is appropriate. The yields are as follows.
Fertiliser A Fertiliser B Fertiliser C Fertiliser D Fertiliser E
23.6 26.0 18.8 29.0 17.7
18.2 35.3 16.7 37.2 16.5
32.4 30.5 23.0 32.6 12.8
20.8 31.4 28.3 31.4 20.4
[The sum of these data items is 502.6 and the sum of their squares is 13610.22 .]
Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a $5 \%$ significance level. Report briefly on your conclusions.
State the assumptions about the distribution of the experimental error that underlie your analysis in part (iv).

Fertiliser A	Fertiliser B	Fertiliser C	Fertiliser D	Fertiliser E
23.6	26.0	18.8	29.0	17.7
18.2	35.3	16.7	37.2	16.5
32.4	30.5	23.0	32.6	12.8
20.8	31.4	28.3	31.4	20.4