OCR MEI S4 (Statistics 4) 2009 June

Question 1 24 marks

1 An industrial process produces components. Some of the components contain faults. The number of faults in a component is modelled by the random variable $X$ with probability function $$\mathrm { P } ( X = x ) = \theta ( 1 - \theta ) ^ { x } \quad \text { for } x = 0,1,2 , \ldots$$ where $\theta$ is a parameter with $0 < \theta < 1$. The numbers of faults in different components are independent.
A random sample of $n$ components is inspected. $n _ { 0 }$ are found to have no faults, $n _ { 1 }$ to have one fault and the remainder $\left( n - n _ { 0 } - n _ { 1 } \right)$ to have two or more faults.

Find $\mathrm { P } ( X \geqslant 2 )$ and hence show that the likelihood is $$\mathrm { L } ( \theta ) = \theta ^ { n _ { 0 } + n _ { 1 } } ( 1 - \theta ) ^ { 2 n - 2 n _ { 0 } - n _ { 1 } }$$
Find the maximum likelihood estimator $\hat { \theta }$ of $\theta$. You are not required to verify that any turning point you locate is a maximum.
Show that $\mathrm { E } ( X ) = \frac { 1 - \theta } { \theta }$. Deduce that another plausible estimator of $\theta$ is $\tilde { \theta } = \frac { 1 } { 1 + \bar { X } }$ where $\bar { X }$ is the sample mean. What additional information is needed in order to calculate the value of this estimator?
You are given that, in large samples, $\tilde { \theta }$ may be taken as Normally distributed with mean $\theta$ and variance $\theta ^ { 2 } ( 1 - \theta ) / n$. Use this to obtain a $95 \%$ confidence interval for $\theta$ for the case when 100 components are inspected and it is found that 92 have no faults, 6 have one fault and the remaining 2 have exactly four faults each.

Question 2 24 marks

View details

The random variable $Z$ has the standard Normal distribution with probability density function $$\mathrm { f } ( z ) = \frac { 1 } { \sqrt { 2 \pi } } \mathrm { e } ^ { - z ^ { 2 } / 2 } , \quad - \infty < z < \infty$$ Obtain the moment generating function of $Z$.
Let $\mathrm { M } _ { Y } ( t )$ denote the moment generating function of the random variable $Y$. Show that the moment generating function of the random variable $a Y + b$, where $a$ and $b$ are constants, is $\mathrm { e } ^ { b t } \mathrm { M } _ { Y } ( a t )$.
Use the results in parts (i) and (ii) to obtain the moment generating function $\mathrm { M } _ { X } ( t )$ of the random variable $X$ having the Normal distribution with parameters $\mu$ and $\sigma ^ { 2 }$.
If $W = \mathrm { e } ^ { X }$ where $X$ is as in part (iii), $W$ is said to have a lognormal distribution. Show that, for any positive integer $k$, the expected value of $W ^ { k }$ is $\mathrm { M } _ { X } ( k )$. Use this result to find the expected value and variance of the lognormal distribution.

Question 3 24 marks

View details

At a waste disposal station, two methods for incinerating some of the rubbish are being compared. Of interest is the amount of particulates in the exhaust, which can be measured over the working day in a convenient unit of concentration. It is assumed that the underlying distributions of concentrations of particulates are Normal. It is also assumed that the underlying variances are equal. During a period of several months, measurements are made for method A on a random sample of 10 working days and for method B on a separate random sample of 7 working days, with results, in the convenient unit, as follows.
Method A 124.8 136.4 116.6 129.1 140.7 120.2 124.6 127.5 111.8 130.3
Method B 130.4 136.2 119.8 150.6 143.5 126.1 130.7
Use a $t$ test at the $10 \%$ level of significance to examine whether either method is better in resulting, on the whole, in a lower concentration of particulates. State the null and alternative hypotheses under test.
The company's statistician criticises the design of the trial in part (i) on the grounds that it is not paired. Summarise the arguments the statistician will have used. A new trial is set up with a paired design, measuring the concentrations of particulates on a random sample of 9 paired occasions. The results are as follows.
Pair I II III IV V VI VII VIII IX
Method A 119.6 127.6 141.3 139.5 141.3 124.1 116.6 136.2 128.8
Method B 112.2 128.8 130.2 134.0 135.1 120.4 116.9 134.4 125.2
Use a $t$ test at the $5 \%$ level of significance to examine the same hypotheses as in part (i). State the underlying distributional assumption that is needed in this case.
State the names of procedures that could be used in the situations of parts (i) and (ii) if the underlying distributional assumptions could not be made. What hypotheses would be under test?

Question 4 24 marks

View details

Describe, with the aid of a specific example, an experimental situation for which a Latin square design is appropriate, indicating carefully the features which show that a completely randomised or randomised blocks design would be inappropriate.
The model for the one-way analysis of variance may be written, in a customary notation, as $$x _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ State the distributional assumptions underlying $e _ { i j }$ in this model. What is the interpretation of the term $\alpha _ { i }$ ?
An experiment for comparing 5 treatments is carried out, with a total of 20 observations. A partial one-way analysis of variance table for the analysis of the results is as follows.
Source of variation Sums of squares Degrees of freedom Mean squares Mean square ratio
Between treatments
Residual 68.76
Total 161.06
Copy and complete the table, and carry out the appropriate test using a $1 \%$ significance level.

Method A	124.8	136.4	116.6	129.1	140.7	120.2	124.6	127.5	111.8	130.3
Method B	130.4	136.2	119.8	150.6	143.5	126.1	130.7

Pair	I	II	III	IV	V	VI	VII	VIII	IX
Method A	119.6	127.6	141.3	139.5	141.3	124.1	116.6	136.2	128.8
Method B	112.2	128.8	130.2	134.0	135.1	120.4	116.9	134.4	125.2

Source of variation	Sums of squares	Degrees of freedom	Mean squares	Mean square ratio
Between treatments
Residual	68.76
Total	161.06