OCR MEI S4 (Statistics 4) 2011 June

Question 1

1 The random variable $X$ has the Normal distribution with mean 0 and variance $\theta$, so that its probability density function is $$\mathrm { f } ( x ) = \frac { 1 } { \sqrt { 2 \pi \theta } } \mathrm { e } ^ { - x ^ { 2 } / 2 \theta } , \quad - \infty < x < \infty$$ where $\theta ( \theta > 0 )$ is unknown. A random sample of $n$ observations from $X$ is denoted by $X _ { 1 } , X _ { 2 } , \ldots , X _ { n }$.

Find $\hat { \theta }$, the maximum likelihood estimator of $\theta$.
Show that $\hat { \theta }$ is an unbiased estimator of $\theta$.
In large samples, the variance of $\hat { \theta }$ may be estimated by $\frac { 2 \hat { \theta } ^ { 2 } } { n }$. Use this and the results of parts (i) and (ii) to find an approximate $95 \%$ confidence interval for $\theta$ in the case when $n = 100$ and $\Sigma X _ { i } ^ { 2 } = 1000$.

Question 2

View details

2 The random variable $X$ has the $\chi _ { n } ^ { 2 }$ distribution. This distribution has moment generating function $\mathrm { M } ( \theta ) = ( 1 - 2 \theta ) ^ { - \frac { 1 } { 2 } n }$, where $\theta < \frac { 1 } { 2 }$.

Verify the expression for $\mathrm { M } ( \theta )$ quoted above for the cases $n = 2$ and $n = 4$, given that the probability density functions of $X$ in these cases are as follows. $$\begin{array} { l l } n = 2 : & \mathrm { f } ( x ) = \frac { 1 } { 2 } \mathrm { e } ^ { - \frac { 1 } { 2 } x } \quad ( x > 0 )
n = 4 : & \mathrm { f } ( x ) = \frac { 1 } { 4 } x \mathrm { e } ^ { - \frac { 1 } { 2 } x } \quad ( x > 0 ) \end{array}$$
For the general case, use $\mathrm { M } ( \theta )$ to find the mean and variance of $X$ in terms of $n$.
$Y _ { 1 } , Y _ { 2 } , \ldots , Y _ { k }$ are independent random variables, each with the $\chi _ { 1 } ^ { 2 }$ distribution. Show that $W = \sum _ { i = 1 } ^ { k } Y _ { i }$ has the $\chi _ { k } ^ { 2 }$ distribution.
Use the Central Limit Theorem to find an approximation for $\mathrm { P } ( W < 118.5 )$ for the case $k = 100$.

Question 3

View details

Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic, power.
A market research organisation is designing a sample survey to investigate whether expenditure on everyday food items has increased in 2011 compared with 2010. For one of the populations being studied, the random variable $X$ is used to model weekly expenditure, in $\pounds$, on these items in 2011, where $X$ is Normally distributed with mean $\mu$ and variance $\sigma ^ { 2 }$. As the corresponding mean value in 2010 was 94 , the hypotheses to be examined are $$\begin{aligned} & \mathrm { H } _ { 0 } : \mu = 94
& \mathrm { H } _ { 1 } : \mu > 94 \end{aligned}$$ By comparison with the corresponding 2010 value, $\sigma ^ { 2 }$ is assumed to be 25 .
The following criteria for the survey are laid down.
- If in fact $\mu = 94$, the probability of concluding that $\mu > 94$ must be only $2 \%$
- If in fact $\mu = 97$, the probability of concluding that $\mu > 94$ must be $95 \%$
A random sample of size $n$ is to be taken and the usual Normal test based on $\bar { X }$ is to be used, with a critical value of $c$ such that $\mathrm { H } _ { 0 }$ is rejected if the value of $\bar { X }$ exceeds $c$. Find $c$ and the smallest value of $n$ that is required.
Sketch the power function of an ideal test for examining the hypotheses in part (ii).

Question 4

View details

Provide an example of an experimental situation where there is one factor of primary interest and where a suitable experimental design would be
1. randomised blocks,
2. a Latin square. In each case, explain carefully why the design is suitable and why the other design would not be appropriate.
An industrial experiment to compare four treatments for increasing the tensile strength of steel is carried out according to a completely randomised design. For various reasons, it is not possible to use the same number of replicates for each treatment. The increases, in a suitable unit of tensile strength, are as follows.
Treatment
A
Treatment
B
Treatment
C
Treatment
D
10.1 21.1 9.2 22.6
21.2 20.3 8.8 17.4
11.6 16.0 15.2 23.1
13.6 15.0 19.2
[The sum of these data items is 256.8 and the sum of their squares is 4471.92 .] Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a $5 \%$ significance level. RECOGNISING ACHIEVEMENT