OCR MEI S4 (Statistics 4) 2009 June

Question 1
View details
1 An industrial process produces components. Some of the components contain faults. The number of faults in a component is modelled by the random variable \(X\) with probability function $$\mathrm { P } ( X = x ) = \theta ( 1 - \theta ) ^ { x } \quad \text { for } x = 0,1,2 , \ldots$$ where \(\theta\) is a parameter with \(0 < \theta < 1\). The numbers of faults in different components are independent.
A random sample of \(n\) components is inspected. \(n _ { 0 }\) are found to have no faults, \(n _ { 1 }\) to have one fault and the remainder \(\left( n - n _ { 0 } - n _ { 1 } \right)\) to have two or more faults.
  1. Find \(\mathrm { P } ( X \geqslant 2 )\) and hence show that the likelihood is $$\mathrm { L } ( \theta ) = \theta ^ { n _ { 0 } + n _ { 1 } } ( 1 - \theta ) ^ { 2 n - 2 n _ { 0 } - n _ { 1 } }$$
  2. Find the maximum likelihood estimator \(\hat { \theta }\) of \(\theta\). You are not required to verify that any turning point you locate is a maximum.
  3. Show that \(\mathrm { E } ( X ) = \frac { 1 - \theta } { \theta }\). Deduce that another plausible estimator of \(\theta\) is \(\tilde { \theta } = \frac { 1 } { 1 + \bar { X } }\) where \(\bar { X }\) is the sample mean. What additional information is needed in order to calculate the value of this estimator?
  4. You are given that, in large samples, \(\tilde { \theta }\) may be taken as Normally distributed with mean \(\theta\) and variance \(\theta ^ { 2 } ( 1 - \theta ) / n\). Use this to obtain a \(95 \%\) confidence interval for \(\theta\) for the case when 100 components are inspected and it is found that 92 have no faults, 6 have one fault and the remaining 2 have exactly four faults each.
Question 2
View details
2
  1. The random variable \(Z\) has the standard Normal distribution with probability density function $$\mathrm { f } ( z ) = \frac { 1 } { \sqrt { 2 \pi } } \mathrm { e } ^ { - z ^ { 2 } / 2 } , \quad - \infty < z < \infty$$ Obtain the moment generating function of \(Z\).
  2. Let \(\mathrm { M } _ { Y } ( t )\) denote the moment generating function of the random variable \(Y\). Show that the moment generating function of the random variable \(a Y + b\), where \(a\) and \(b\) are constants, is \(\mathrm { e } ^ { b t } \mathrm { M } _ { Y } ( a t )\).
  3. Use the results in parts (i) and (ii) to obtain the moment generating function \(\mathrm { M } _ { X } ( t )\) of the random variable \(X\) having the Normal distribution with parameters \(\mu\) and \(\sigma ^ { 2 }\).
  4. If \(W = \mathrm { e } ^ { X }\) where \(X\) is as in part (iii), \(W\) is said to have a lognormal distribution. Show that, for any positive integer \(k\), the expected value of \(W ^ { k }\) is \(\mathrm { M } _ { X } ( k )\). Use this result to find the expected value and variance of the lognormal distribution.
Question 3
View details
3
  1. At a waste disposal station, two methods for incinerating some of the rubbish are being compared. Of interest is the amount of particulates in the exhaust, which can be measured over the working day in a convenient unit of concentration. It is assumed that the underlying distributions of concentrations of particulates are Normal. It is also assumed that the underlying variances are equal. During a period of several months, measurements are made for method A on a random sample of 10 working days and for method B on a separate random sample of 7 working days, with results, in the convenient unit, as follows.
    Method A124.8136.4116.6129.1140.7120.2124.6127.5111.8130.3
    Method B130.4136.2119.8150.6143.5126.1130.7
    Use a \(t\) test at the \(10 \%\) level of significance to examine whether either method is better in resulting, on the whole, in a lower concentration of particulates. State the null and alternative hypotheses under test.
  2. The company's statistician criticises the design of the trial in part (i) on the grounds that it is not paired. Summarise the arguments the statistician will have used. A new trial is set up with a paired design, measuring the concentrations of particulates on a random sample of 9 paired occasions. The results are as follows.
    PairIIIIIIIVVVIVIIVIIIIX
    Method A119.6127.6141.3139.5141.3124.1116.6136.2128.8
    Method B112.2128.8130.2134.0135.1120.4116.9134.4125.2
    Use a \(t\) test at the \(5 \%\) level of significance to examine the same hypotheses as in part (i). State the underlying distributional assumption that is needed in this case.
  3. State the names of procedures that could be used in the situations of parts (i) and (ii) if the underlying distributional assumptions could not be made. What hypotheses would be under test?
Question 4
View details
4
  1. Describe, with the aid of a specific example, an experimental situation for which a Latin square design is appropriate, indicating carefully the features which show that a completely randomised or randomised blocks design would be inappropriate.
  2. The model for the one-way analysis of variance may be written, in a customary notation, as $$x _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ State the distributional assumptions underlying \(e _ { i j }\) in this model. What is the interpretation of the term \(\alpha _ { i }\) ?
  3. An experiment for comparing 5 treatments is carried out, with a total of 20 observations. A partial one-way analysis of variance table for the analysis of the results is as follows.
    Source of variationSums of squaresDegrees of freedomMean squaresMean square ratio
    Between treatments
    Residual68.76
    Total161.06
    Copy and complete the table, and carry out the appropriate test using a \(1 \%\) significance level.