OCR MEI S4 (Statistics 4) 2008 June

Question 1
View details
1 The random variable \(X\) has the Poisson distribution with parameter \(\theta\) so that its probability function is $$\mathrm { P } ( X = x ) = \frac { \mathrm { e } ^ { - \theta } \theta ^ { x } } { x ! } , \quad x = 0,1,2 , \ldots$$ where \(\theta ( \theta > 0 )\) is unknown. A random sample of \(n\) observations from \(X\) is denoted by \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\).
  1. Find \(\hat { \theta }\), the maximum likelihood estimator of \(\theta\). The value of \(\mathrm { P } ( X = 0 )\) is denoted by \(\lambda\).
  2. Write down an expression for \(\lambda\) in terms of \(\theta\).
  3. Let \(R\) denote the number of observations in the sample with value zero. By considering the binomial distribution with parameters \(n\) and \(\mathrm { e } ^ { - \theta }\), write down \(\mathrm { E } ( R )\) and \(\operatorname { Var } ( R )\). Deduce that the observed proportion of observations in the sample with value zero, denoted by \(\tilde { \lambda }\), is an unbiased estimator of \(\lambda\) with variance \(\frac { \mathrm { e } ^ { - \theta } \left( 1 - \mathrm { e } ^ { - \theta } \right) } { n }\).
  4. In large samples, the variance of the maximum likelihood estimator of \(\lambda\) may be taken as \(\frac { \theta \mathrm { e } ^ { - 2 \theta } } { n }\). Use this and the appropriate result from part (iii) to show that the relative efficiency of \(\tilde { \lambda }\) with respect to the maximum likelihood estimator is \(\frac { \theta } { \mathrm { e } ^ { \theta } - 1 }\). Show that this expression is always less than 1 . Show also that it is near 1 if \(\theta\) is small and near 0 if \(\theta\) is large.
Question 2
View details
2 Independent trials, on each of which the probability of a 'success' is \(p ( 0 < p < 1 )\), are being carried out. The random variable \(X\) counts the number of trials up to and including that on which the first success is obtained. The random variable \(Y\) counts the number of trials up to and including that on which the \(n\)th success is obtained.
  1. Write down an expression for \(\mathrm { P } ( X = x )\) for \(x = 1,2 , \ldots\). Show that the probability generating function of \(X\) is $$\mathrm { G } ( t ) = p t ( 1 - q t ) ^ { - 1 }$$ where \(q = 1 - p\), and hence that the mean and variance of \(X\) are $$\mu = \frac { 1 } { p } \quad \text { and } \quad \sigma ^ { 2 } = \frac { q } { p ^ { 2 } }$$ respectively.
  2. Explain why the random variable \(Y\) can be written as $$Y = X _ { 1 } + X _ { 2 } + \ldots + X _ { n }$$ where the \(X _ { i }\) are independent random variables each distributed as \(X\). Hence write down the probability generating function, the mean and the variance of \(Y\).
  3. State an approximation to the distribution of \(Y\) for large \(n\).
  4. The aeroplane used on a certain flight seats 140 passengers. The airline seeks to fill the plane, but its experience is that not all the passengers who buy tickets will turn up for the flight. It uses the random variable \(Y\) to model the situation, with \(p = 0.8\) as the probability that a passenger turns up. Find the probability that it needs to sell at least 160 tickets to get 140 passengers who turn up. Suggest a reason why the model might not be appropriate.
Question 3
View details
3
  1. Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic. A machine fills salt containers that will be sold in shops. The containers are supposed to contain 750 g of salt. The machine operates in such a way that the amount of salt delivered to each container is a Normally distributed random variable with standard deviation 20 g . The machine should be calibrated in such a way that the mean amount delivered, \(\mu\), is 750 g . Each hour, a random sample of 9 containers is taken from the previous hour's output and the sample mean amount of salt is determined. If this is between 735 g and 765 g , the previous hour's output is accepted. If not, the previous hour's output is rejected and the machine is recalibrated.
  2. Find the probability of rejecting the previous hour's output if the machine is properly calibrated. Comment on your result.
  3. Find the probability of accepting the previous hour's output if \(\mu = 725 \mathrm {~g}\). Comment on your result.
  4. Obtain an expression for the operating characteristic of this testing procedure in terms of the cumulative distribution function \(\Phi ( z )\) of the standard Normal distribution. Evaluate the operating characteristic for the following values (in g) of \(\mu\) : 720, 730, 740, 750, 760, 770, 780.
Question 4
View details
4
  1. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  2. An examinations authority is considering using an external contractor for the typesetting and printing of its examination papers. Four contractors are being investigated. A random sample of 20 examination papers over the entire range covered by the authority is selected and 5 are allocated at random to each contractor for preparation. The authority carefully checks the printed papers for errors and assigns a score to each to indicate the overall quality (higher scores represent better quality). The scores are as follows.
    Contractor AContractor BContractor CContractor D
    41545641
    49454536
    50505446
    44505038
    56474935
    [The sum of these data items is 936 and the sum of their squares is 44544 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  3. The authority thinks that there might be differences in the ways the contractors cope with the preparation of examination papers in different subject areas. For this purpose, the subject areas are broadly divided into mathematics, sciences, languages, humanities, and others. The authority wishes to design a further investigation, ensuring that each of these subject areas is covered by each contractor. Name the experimental design that should be used and describe briefly the layout of the investigation.