OCR MEI S4 (Statistics 4) 2013 June

Question 1

1 Traffic engineers are studying the flow of vehicles along a road. At an initial stage of the investigation, they assume that the average flow remains the same throughout the working day. An automatic counter records the number of vehicles passing a certain point per minute during the working day. A random sample of these records is selected; the sample values are denoted by $x _ { 1 } , x _ { 2 } , \ldots , x _ { n }$.

The engineers model the underlying random variable $X$ by a Poisson distribution with unknown parameter $\theta$. Obtain the likelihood of $x _ { 1 } , x _ { 2 } , \ldots , x _ { n }$ and hence find the maximum likelihood estimate of $\theta$.
Write down the maximum likelihood estimate of the probability that no vehicles pass during a minute.
The engineers note that, in a sample of size 1000 with sample mean $\bar { x } = 5$, there are no observations of zero. Suggest why this might cast some doubt on the investigation.
On checking the automatic counter, the engineers find that, due to a fault, no record at all is made if no vehicle passes in a minute. They therefore model $X$ as a Poisson random variable, again with an unknown parameter $\theta$, except that the value $x = 0$ cannot occur. Show that, under this model, $$\mathrm { P } ( X = x ) = \frac { \theta ^ { x } } { \left( \mathrm { e } ^ { \theta } - 1 \right) x ! } , \quad x = 1,2 , \ldots$$ and hence show that the maximum likelihood estimate of $\theta$ satisfies the equation $$\frac { \theta \mathrm { e } ^ { \theta } } { \mathrm { e } ^ { \theta } - 1 } = \bar { x }$$

Question 2

View details

2 The random variable $X$ takes values $- 2,0$ and 2 , each with probability $\frac { 1 } { 3 }$.

Write down the values of
(A) $\mu$, the mean of $X$,
(B) $\mathrm { E } \left( X ^ { 2 } \right)$,
(C) $\sigma ^ { 2 }$, the variance of $X$.
Obtain the moment generating function (mgf) of $X$. A random sample of $n$ independent observations on $X$ has sample mean $\bar { X }$, and the standardised mean is denoted by $Z$ where $$Z = \frac { \bar { X } - \mu } { \frac { \sigma } { \sqrt { n } } }$$
Stating carefully the required general results for mgfs of sums and of linear transformations, show that the mgf of $Z$ is $$M _ { Z } ( \theta ) = \left\{ \frac { 1 } { 3 } \left( 1 + e ^ { \frac { \theta \sqrt { 3 } } { \sqrt { 2 n } } } + e ^ { - \frac { \theta \sqrt { 3 } } { \sqrt { 2 n } } } \right) \right\} ^ { n } .$$
By expanding the exponential functions in $\mathrm { M } _ { Z } ( \theta )$, show that, for large $n$, $$\mathrm { M } _ { Z } ( \theta ) \approx \left( 1 + \frac { \theta ^ { 2 } } { 2 n } \right) ^ { n }$$
Use the result $\mathrm { e } ^ { y } = \lim _ { n \rightarrow \infty } \left( 1 + \frac { y } { n } \right) ^ { n }$ to find the limit of $\mathrm { M } _ { Z } ( \theta )$ as $n \rightarrow \infty$, and deduce the approximate distribution of $Z$ for large $n$.

Question 3

View details

Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic, power.
A test is to be carried out concerning a parameter $\theta$. The null hypothesis is that $\theta$ has the particular value $\theta _ { 0 }$. The alternative hypothesis is $\theta \neq \theta _ { 0 }$. Draw a sketch of the operating characteristic for a perfect test that never makes an error.
The random variable $X$ is distributed as $\mathrm { N } ( \mu , 9 )$. A random sample of size 25 is available. The null hypothesis $\mu = 0$ is to be tested against the alternative hypothesis $\mu \neq 0$. The null hypothesis will be accepted if $- 1 < \bar { x } < 1$ where $\bar { x }$ is the value of the sample mean, otherwise it will be rejected. Calculate the probability of a Type I error. Calculate the probability of a Type II error if in fact $\mu = 0.5$; comment on the value of this probability.
Without carrying out any further calculations, draw a sketch of the operating characteristic for the test in part (iii).

Question 4

View details

Explain the advantages of randomisation and replication in a statistically designed experiment.
The usual statistical model underlying the one-way analysis of variance is given, in the usual notation, by $$x _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ where $x _ { i j }$ denotes the $j$ th observation on the $i$ th treatment. Define carefully all the terms in this model and state the properties of the term that represents experimental error.
A trial of five fertilisers is carried out at an agricultural research station according to a completely randomised design in which each fertiliser is applied to four experimental plots of a crop (so that there are 20 experimental units altogether). The sums of squares in a one-way analysis of variance of the resulting data on yields of the crop are as follows.
Source of variation Sum of squares
Between fertilisers 219.2
Residual 304.5
Total 523.7
State the customary null and alternative hypotheses that are tested. Provide the degrees of freedom for each sum of squares. Hence copy and complete the analysis of variance table and carry out the test at the 5\% level.

Source of variation	Sum of squares
Between fertilisers	219.2
Residual	304.5
Total	523.7