Questions S4 (270 questions)

Browse by board
AQA AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further AS Paper 1 Further AS Paper 2 Discrete Further AS Paper 2 Mechanics Further AS Paper 2 Statistics Further Paper 1 Further Paper 2 Further Paper 3 Discrete Further Paper 3 Mechanics Further Paper 3 Statistics M1 M2 M3 Paper 1 Paper 2 Paper 3 S1 S2 S3 CAIE FP1 FP2 Further Paper 1 Further Paper 2 Further Paper 3 Further Paper 4 M1 M2 P1 P2 P3 S1 S2 Edexcel AEA AS Paper 1 AS Paper 2 C1 C12 C2 C3 C34 C4 CP AS CP1 CP2 D1 D2 F1 F2 F3 FD1 FD1 AS FD2 FD2 AS FM1 FM1 AS FM2 FM2 AS FP1 FP1 AS FP2 FP2 AS FP3 FS1 FS1 AS FS2 FS2 AS M1 M2 M3 M4 M5 P1 P2 P3 P4 PMT Mocks Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 OCR AS Pure C1 C2 C3 C4 D1 D2 FD1 AS FM1 AS FP1 FP1 AS FP2 FP3 FS1 AS Further Additional Pure Further Additional Pure AS Further Discrete Further Discrete AS Further Mechanics Further Mechanics AS Further Pure Core 1 Further Pure Core 2 Further Pure Core AS Further Statistics Further Statistics AS H240/01 H240/02 H240/03 M1 M2 M3 M4 Mechanics 1 PURE Pure 1 S1 S2 S3 S4 Stats 1 OCR MEI AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further Extra Pure Further Mechanics A AS Further Mechanics B AS Further Mechanics Major Further Mechanics Minor Further Numerical Methods Further Pure Core Further Pure Core AS Further Pure with Technology Further Statistics A AS Further Statistics B AS Further Statistics Major Further Statistics Minor M1 M2 M3 M4 Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 SPS SPS ASFM SPS ASFM Mechanics SPS ASFM Pure SPS ASFM Statistics SPS FM SPS FM Mechanics SPS FM Pure SPS FM Statistics SPS SM SPS SM Mechanics SPS SM Pure SPS SM Statistics WJEC Further Unit 1 Further Unit 2 Further Unit 3 Further Unit 4 Further Unit 5 Further Unit 6 Unit 1 Unit 2 Unit 3 Unit 4
OCR MEI S4 2007 June Q4
4 An agricultural company conducts a trial of five fertilisers (A, B, C, D, E) in an experimental field at its research station. The fertilisers are applied to plots of the field according to a completely randomised design. The yields of the crop from the plots, measured in a standard unit, are analysed by the one-way analysis of variance, from which it appears that there are no real differences among the effects of the fertilisers. A statistician notes that the residual mean square in the analysis of variance is considerably larger than had been anticipated from knowledge of the general behaviour of the crop, and therefore suspects that there is some inadequacy in the design of the trial.
  1. Explain briefly why the statistician should be suspicious of the design.
  2. Explain briefly why an inflated residual leads to difficulty in interpreting the results of the analysis of variance, in particular that the null hypothesis is more likely to be accepted erroneously. Further investigation indicates that the soil at the west side of the experimental field is naturally more fertile than that at the east side, with a consistent 'fertility gradient' from west to east.
  3. What experimental design can accommodate this feature? Provide a simple diagram of the experimental field indicating a suitable layout. The company decides to conduct a new trial in its glasshouse, where experimental conditions can be controlled so that a completely randomised design is appropriate. The yields are as follows.
    Fertiliser AFertiliser BFertiliser CFertiliser DFertiliser E
    23.626.018.829.017.7
    18.235.316.737.216.5
    32.430.523.032.612.8
    20.831.428.331.420.4
    [The sum of these data items is 502.6 and the sum of their squares is 13610.22 .]
  4. Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  5. State the assumptions about the distribution of the experimental error that underlie your analysis in part (iv).
OCR MEI S4 2008 June Q1
1 The random variable \(X\) has the Poisson distribution with parameter \(\theta\) so that its probability function is $$\mathrm { P } ( X = x ) = \frac { \mathrm { e } ^ { - \theta } \theta ^ { x } } { x ! } , \quad x = 0,1,2 , \ldots$$ where \(\theta ( \theta > 0 )\) is unknown. A random sample of \(n\) observations from \(X\) is denoted by \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\).
  1. Find \(\hat { \theta }\), the maximum likelihood estimator of \(\theta\). The value of \(\mathrm { P } ( X = 0 )\) is denoted by \(\lambda\).
  2. Write down an expression for \(\lambda\) in terms of \(\theta\).
  3. Let \(R\) denote the number of observations in the sample with value zero. By considering the binomial distribution with parameters \(n\) and \(\mathrm { e } ^ { - \theta }\), write down \(\mathrm { E } ( R )\) and \(\operatorname { Var } ( R )\). Deduce that the observed proportion of observations in the sample with value zero, denoted by \(\tilde { \lambda }\), is an unbiased estimator of \(\lambda\) with variance \(\frac { \mathrm { e } ^ { - \theta } \left( 1 - \mathrm { e } ^ { - \theta } \right) } { n }\).
  4. In large samples, the variance of the maximum likelihood estimator of \(\lambda\) may be taken as \(\frac { \theta \mathrm { e } ^ { - 2 \theta } } { n }\). Use this and the appropriate result from part (iii) to show that the relative efficiency of \(\tilde { \lambda }\) with respect to the maximum likelihood estimator is \(\frac { \theta } { \mathrm { e } ^ { \theta } - 1 }\). Show that this expression is always less than 1 . Show also that it is near 1 if \(\theta\) is small and near 0 if \(\theta\) is large.
OCR MEI S4 2008 June Q2
2 Independent trials, on each of which the probability of a 'success' is \(p ( 0 < p < 1 )\), are being carried out. The random variable \(X\) counts the number of trials up to and including that on which the first success is obtained. The random variable \(Y\) counts the number of trials up to and including that on which the \(n\)th success is obtained.
  1. Write down an expression for \(\mathrm { P } ( X = x )\) for \(x = 1,2 , \ldots\). Show that the probability generating function of \(X\) is $$\mathrm { G } ( t ) = p t ( 1 - q t ) ^ { - 1 }$$ where \(q = 1 - p\), and hence that the mean and variance of \(X\) are $$\mu = \frac { 1 } { p } \quad \text { and } \quad \sigma ^ { 2 } = \frac { q } { p ^ { 2 } }$$ respectively.
  2. Explain why the random variable \(Y\) can be written as $$Y = X _ { 1 } + X _ { 2 } + \ldots + X _ { n }$$ where the \(X _ { i }\) are independent random variables each distributed as \(X\). Hence write down the probability generating function, the mean and the variance of \(Y\).
  3. State an approximation to the distribution of \(Y\) for large \(n\).
  4. The aeroplane used on a certain flight seats 140 passengers. The airline seeks to fill the plane, but its experience is that not all the passengers who buy tickets will turn up for the flight. It uses the random variable \(Y\) to model the situation, with \(p = 0.8\) as the probability that a passenger turns up. Find the probability that it needs to sell at least 160 tickets to get 140 passengers who turn up. Suggest a reason why the model might not be appropriate.
OCR MEI S4 2008 June Q3
3
  1. Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic. A machine fills salt containers that will be sold in shops. The containers are supposed to contain 750 g of salt. The machine operates in such a way that the amount of salt delivered to each container is a Normally distributed random variable with standard deviation 20 g . The machine should be calibrated in such a way that the mean amount delivered, \(\mu\), is 750 g . Each hour, a random sample of 9 containers is taken from the previous hour's output and the sample mean amount of salt is determined. If this is between 735 g and 765 g , the previous hour's output is accepted. If not, the previous hour's output is rejected and the machine is recalibrated.
  2. Find the probability of rejecting the previous hour's output if the machine is properly calibrated. Comment on your result.
  3. Find the probability of accepting the previous hour's output if \(\mu = 725 \mathrm {~g}\). Comment on your result.
  4. Obtain an expression for the operating characteristic of this testing procedure in terms of the cumulative distribution function \(\Phi ( z )\) of the standard Normal distribution. Evaluate the operating characteristic for the following values (in g) of \(\mu\) : 720, 730, 740, 750, 760, 770, 780.
OCR MEI S4 2008 June Q4
4
  1. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  2. An examinations authority is considering using an external contractor for the typesetting and printing of its examination papers. Four contractors are being investigated. A random sample of 20 examination papers over the entire range covered by the authority is selected and 5 are allocated at random to each contractor for preparation. The authority carefully checks the printed papers for errors and assigns a score to each to indicate the overall quality (higher scores represent better quality). The scores are as follows.
    Contractor AContractor BContractor CContractor D
    41545641
    49454536
    50505446
    44505038
    56474935
    [The sum of these data items is 936 and the sum of their squares is 44544 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  3. The authority thinks that there might be differences in the ways the contractors cope with the preparation of examination papers in different subject areas. For this purpose, the subject areas are broadly divided into mathematics, sciences, languages, humanities, and others. The authority wishes to design a further investigation, ensuring that each of these subject areas is covered by each contractor. Name the experimental design that should be used and describe briefly the layout of the investigation.
OCR MEI S4 2010 June Q1
1 The random variable \(X\) has probability density function $$\mathrm { f } ( x ) = \frac { x \mathrm { e } ^ { - x / \lambda } } { \lambda ^ { 2 } } \quad ( x > 0 )$$ where \(\lambda\) is a parameter \(( \lambda > 0 ) . X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) are \(n\) independent observations on \(X\), and \(\bar { X } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } X _ { i }\) is their mean.
  1. Obtain \(\mathrm { E } ( X )\) and deduce that \(\hat { \lambda } = \frac { 1 } { 2 } \bar { X }\) is an unbiased estimator of \(\lambda\).
  2. \(\operatorname { Obtain } \operatorname { Var } ( \hat { \lambda } )\).
  3. Explain why the results in parts (i) and (ii) indicate that \(\hat { \lambda }\) is a good estimator of \(\lambda\) in large samples.
  4. Suppose that \(n = 3\) and consider the alternative estimator $$\tilde { \lambda } = \frac { 1 } { 8 } X _ { 1 } + \frac { 1 } { 4 } X _ { 2 } + \frac { 1 } { 8 } X _ { 3 } .$$ Show that \(\tilde { \lambda }\) is an unbiased estimator of \(\lambda\). Find the relative efficiency of \(\tilde { \lambda }\) compared with \(\hat { \lambda }\). Which estimator do you prefer in this case?
OCR MEI S4 2010 June Q2
2 The random variable \(X\) has the Poisson distribution with parameter \(\lambda\).
  1. Show that the probability generating function of \(X\) is \(\mathrm { G } ( t ) = \mathrm { e } ^ { \lambda ( t - 1 ) }\).
  2. Hence obtain the mean \(\mu\) and variance \(\sigma ^ { 2 }\) of \(X\).
  3. Write down the mean and variance of the random variable \(Z = \frac { X - \mu } { \sigma }\).
  4. Write down the moment generating function of \(X\). State the linear transformation result for moment generating functions and use it to show that the moment generating function of \(Z\) is $$\mathrm { M } _ { Z } ( \theta ) = \mathrm { e } ^ { \mathrm { f } ( \theta ) } \quad \text { where } \mathrm { f } ( \theta ) = \lambda \left( \mathrm { e } ^ { \theta / \sqrt { \lambda } } - \frac { \theta } { \sqrt { \lambda } } - 1 \right)$$
  5. Show that the limit of \(\mathrm { M } _ { Z } ( \theta )\) as \(\lambda \rightarrow \infty\) is \(\mathrm { e } ^ { \theta ^ { 2 } / 2 }\).
  6. Explain briefly why this implies that the distribution of \(Z\) tends to \(\mathrm { N } ( 0,1 )\) as \(\lambda \rightarrow \infty\). What does this imply about the distribution of \(X\) as \(\lambda \rightarrow \infty\) ?
OCR MEI S4 2010 June Q3
3 At a factory, two production lines are in use for making steel rods. A critical dimension is the diameter of a rod. For the first production line, it is assumed from experience that the diameters are Normally distributed with standard deviation 1.2 mm . For the second production line, it is assumed from experience that the diameters are Normally distributed with standard deviation 1.4 mm . It is desired to test whether the mean diameters for the two production lines, \(\mu _ { 1 }\) and \(\mu _ { 2 }\), are equal. A random sample of 8 rods is taken from the first production line and, independently, a random sample of 10 rods is taken from the second production line.
  1. Find the acceptance region for the customary test based on the Normal distribution for the null hypothesis \(\mu _ { 1 } = \mu _ { 2 }\), against the alternative hypothesis \(\mu _ { 1 } \neq \mu _ { 2 }\), at the \(5 \%\) level of significance.
  2. The sample means are found to be 25.8 mm and 24.4 mm respectively. What is the result of the test? Provide a two-sided \(99 \%\) confidence interval for \(\mu _ { 1 } - \mu _ { 2 }\). The production lines are modified so that the diameters may be assumed to be of equal (but unknown) variance. However, they may no longer be Normally distributed. A two-sided test of the equality of the population medians is required, at the \(5 \%\) significance level.
  3. The diameters in independent random samples of sizes 6 and 8 are as follows, in mm .
    First production line25.925.825.324.724.425.4
    Second production line23.825.624.023.524.124.524.325.1
    Use an appropriate procedure to carry out the test.
OCR MEI S4 2010 June Q4
4 At an agricultural research station, a trial is made of four varieties (A, B, C, D) of a certain crop in an experimental field. The varieties are grown on plots in the field and their yields are measured in a standard unit.
  1. It is at first thought that there may be a consistent trend in the natural fertility of the soil in the field from the west side to the east, though no other trends are known. Name an experimental design that should be used in these circumstances and give an example of an experimental layout. Initial analysis suggests that any natural fertility trend may in fact be ignored, so the data from the trial are analysed by one-way analysis of variance.
  2. The usual model for one-way analysis of variance of the yields \(y _ { i j }\) may be written as $$y _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ where the \(e _ { i j }\) represent the experimental errors. Interpret the other terms in the model. State the usual distributional assumptions for the \(e _ { i j }\).
  3. The data for the yields are as follows, each variety having been used on 5 plots.
    Variety
    ABCD
    12.314.214.113.6
    11.913.113.212.8
    12.813.114.613.3
    12.212.513.714.3
    13.512.713.413.8
    $$\left[ \Sigma \Sigma y _ { i j } = 265.1 , \quad \Sigma \Sigma y _ { i j } ^ { 2 } = 3524.31 . \right]$$ Construct the usual one-way analysis of variance table and carry out the usual test, at the 5\% significance level. Report briefly on your conclusions. {www.ocr.org.uk}) after the live examination series.
    If OCR has unwittingly failed to correctly acknowledge or clear any third-party content in this assessment material, OCR will be happy to correct its mistake at the earliest possible opportunity. For queries or further information please contact the Copyright Team, First Floor, 9 Hills Road, Cambridge CB2 1GE.
    OCR is part of the Cambridge Assessment Group; Cambridge Assessment is the brand name of University of Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge. }
OCR MEI S4 2012 June Q1
1 In a certain country, any baby born is equally likely to be a boy or a girl, independently for all births. The birthweight of a baby boy is given by the continuous random variable \(X _ { B }\) with probability density function (pdf) \(\mathrm { f } _ { B } ( x )\) and cumulative distribution function (cdf) \(\mathrm { F } _ { B } ( x )\). The birthweight of a baby girl is given by the continuous random variable \(X _ { G }\) with pdf \(\mathrm { f } _ { G } ( x )\) and cdf \(\mathrm { F } _ { G } ( x )\). The continuous random variable \(X\) denotes the birthweight of a baby selected at random.
  1. By considering $$\mathrm { P } ( X \leqslant x ) = \mathrm { P } ( X \leqslant x \mid \text { boy } ) \mathrm { P } ( \text { boy } ) + \mathrm { P } ( X \leqslant x \mid \text { girl } ) \mathrm { P } ( \text { girl } ) ,$$ find the cdf of \(X\) in terms of \(\mathrm { F } _ { B } ( x )\) and \(\mathrm { F } _ { G } ( x )\), and deduce that the pdf of \(X\) is $$\mathrm { f } ( x ) = \frac { 1 } { 2 } \left\{ \mathrm { f } _ { B } ( x ) + \mathrm { f } _ { G } ( x ) \right\} .$$
  2. The birthweights of baby boys and girls have means \(\mu _ { B }\) and \(\mu _ { G }\) respectively. Deduce that $$\mathrm { E } ( X ) = \frac { 1 } { 2 } \left( \mu _ { B } + \mu _ { G } \right) .$$
  3. The birthweights of baby boys and girls have common variance \(\sigma ^ { 2 }\). Find an expression for \(\mathrm { E } \left( X ^ { 2 } \right)\) in terms of \(\mu _ { B } , \mu _ { G }\) and \(\sigma ^ { 2 }\), and deduce that $$\operatorname { Var } ( X ) = \sigma ^ { 2 } + \frac { 1 } { 4 } \left( \mu _ { B } - \mu _ { G } \right) ^ { 2 } .$$
  4. A random sample of size \(2 n\) is taken from all the babies born in a certain period. The mean birthweight of the babies in this sample is \(\bar { X }\). Write down an approximation to the sampling distribution of \(\bar { X }\) if \(n\) is large.
  5. Suppose instead that a stratified sample of size \(2 n\) is taken by selecting \(n\) baby boys at random and, independently, \(n\) baby girls at random. The mean birthweight of the \(2 n\) babies in this sample is \(\bar { X } _ { s t }\). Write down the expected value of \(\bar { X } _ { s t }\) and find the variance of \(\bar { X } _ { s t }\).
  6. Deduce that both \(\bar { X }\) and \(\bar { X } _ { s t }\) are unbiased estimators of the population mean birthweight. Find which is the more efficient.
OCR MEI S4 2012 June Q2
2 The random variable \(X ( X = 1,2,3,4,5,6 )\) denotes the score when a fair six-sided die is rolled.
  1. Write down the mean of \(X\) and show that \(\operatorname { Var } ( X ) = \frac { 35 } { 12 }\).
  2. Show that \(\mathrm { G } ( t )\), the probability generating function (pgf) of \(X\), is given by $$\mathrm { G } ( t ) = \frac { t \left( 1 - t ^ { 6 } \right) } { 6 ( 1 - t ) }$$ The random variable \(N ( N = 0,1,2 , \ldots )\) denotes the number of heads obtained when an unbiased coin is tossed repeatedly until a tail is first obtained.
  3. Show that \(\mathrm { P } ( N = r ) = \left( \frac { 1 } { 2 } \right) ^ { r + 1 }\) for \(r = 0,1,2 , \ldots\).
  4. Hence show that \(\mathrm { H } ( t )\), the pgf of \(N\), is given by \(\mathrm { H } ( t ) = ( 2 - t ) ^ { - 1 }\).
  5. Use \(\mathrm { H } ( t )\) to find the mean and variance of \(N\). A game consists of tossing an unbiased coin repeatedly until a tail is first obtained and, each time a head is obtained in this sequence of tosses, rolling a fair six-sided die. The die is not rolled on the first occasion that a tail is obtained and the game ends at that point. The random variable \(Q ( Q = 0,1,2 , \ldots )\) denotes the total score on all the rolls of the die. Thus, in the notation above, \(Q = X _ { 1 } + X _ { 2 } + \ldots + X _ { N }\) where the \(X _ { i }\) are independent random variables each distributed as \(X\), with \(Q = 0\) if \(N = 0\). The pgf of \(Q\) is denoted by \(\mathrm { K } ( t )\). The familiar result that the pgf of a sum of independent random variables is the product of their pgfs does not apply to \(\mathrm { K } ( t )\) because \(N\) is a random variable and not a fixed number; you should instead use without proof the result that \(\mathrm { K } ( t ) = \mathrm { H } ( \mathrm { G } ( t ) )\).
  6. Show that \(\mathrm { K } ( t ) = 6 \left( 12 - t - t ^ { 2 } - \ldots - t ^ { 6 } \right) ^ { - 1 }\).
    [0pt] [Hint. \(\left. \left( 1 - t ^ { 6 } \right) = ( 1 - t ) \left( 1 + t + t ^ { 2 } + \ldots + t ^ { 5 } \right) .\right]\)
  7. Use \(\mathrm { K } ( t )\) to find the mean and variance of \(Q\).
  8. Using your results from parts (i), (v) and (vii), verify the result that (in the usual notation for means and variances) $$\sigma _ { Q } { } ^ { 2 } = \sigma _ { N } { } ^ { 2 } \mu _ { X } { } ^ { 2 } + \mu _ { N } \sigma _ { X } { } ^ { 2 } .$$
OCR MEI S4 2012 June Q3
3 At an agricultural research station, trials are being made of two fertilisers, A and B, to see whether they differ in their effects on the yield of a crop. Preliminary investigations have established that the underlying variances of the distributions of yields using the two fertilisers may be assumed equal. Scientific analysis of the fertilisers has suggested that fertiliser A may be inferior in that it leads, on the whole, to lower yield. A statistical analysis is being carried out to investigate this. The crop is grown in carefully controlled conditions in 14 experimental plots, 6 with fertiliser A and 8 with fertiliser B. The yields, in kg per plot, are as follows, arranged in ascending order for each fertiliser.
Fertiliser A9.810.210.911.512.713.3
Fertiliser B10.811.912.012.212.913.513.613.7
  1. Carry out a Wilcoxon rank sum test at the \(5 \%\) significance level to examine appropriate hypotheses.
  2. Carry out a \(t\) test at the \(5 \%\) significance level to examine appropriate hypotheses.
  3. Goodness of fit tests based on more extensive data sets from other trials with these fertilisers have failed to reject hypotheses of underlying Normal distributions. Discuss the relative merits of the analyses in parts (i) and (ii).
OCR MEI S4 2014 June Q2
2
  1. The probability density function of the random variable \(X\) is $$\mathrm { f } ( x ) = \frac { x ^ { k - 1 } \mathrm { e } ^ { - x / \phi } } { \phi ^ { k } ( k - 1 ) ! } , x > 0$$ where \(k\) is a known positive integer and \(\phi\) is an unknown parameter ( \(\phi > 0\) ). Show that the moment generating function (mgf) of \(X\) is $$\mathrm { M } _ { X } ( \theta ) = ( 1 - \phi \theta ) ^ { - k }$$ for \(\theta < \frac { 1 } { \phi }\).
  2. Write down the mgf of the random variable \(W = \sum _ { i = 1 } ^ { n } X _ { i }\) where \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) are independent random variables each with the same distribution as \(X\).
  3. Write down the mgf of the random variable \(Y = \frac { 2 W } { \phi }\). Given that the mgf of the random variable \(V\) having the \(\chi _ { m } ^ { 2 }\) distribution is \(\mathrm { M } _ { V } ( \theta ) = ( 1 - 2 \theta ) ^ { - m / 2 }\) (for \(\theta < \frac { 1 } { 2 }\) ), deduce the distribution of \(Y\).
  4. Deduce that \(\mathrm { P } \left( l < \frac { 2 W } { \phi } < u \right) = 0.95\) where \(l\) and \(u\) are the lower and upper \(2 \frac { 1 } { 2 } \%\) points of the \(\chi _ { 2 n k } ^ { 2 }\) distribution. Hence deduce that a \(95 \%\) confidence interval for \(\phi\) is given by \(\left( \frac { 2 w } { u } , \frac { 2 w } { l } \right)\) where \(w\) is an observation on the random variable \(W\).
  5. For the case \(k = 2\) and \(n = 10\), use percentage points of the \(\chi ^ { 2 }\) distribution to write down, in terms of \(w\), an expression for a \(95 \%\) confidence interval for \(\phi\). By considering the \(\operatorname { mgf }\) of \(W\), find in terms of \(\phi\) the expected length of this interval.
OCR MEI S4 2014 June Q3
3
  1. Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic, power.
  2. A chemical manufacturer is endeavouring to reduce the amount of a certain impurity in one of its bulk products by improving the production process. The amount of impurity is measured in a convenient unit of concentration, and this is modelled by the Normally distributed random variable \(X\). In the old production process, the mean of \(X\), denoted by \(\mu\), was 63 and the standard deviation of \(X\) was 3.7. Experimental batches of the product are to be made using the new process, and it is desired to examine the hypotheses \(\mathrm { H } _ { 0 } : \mu = 63\) and \(\mathrm { H } _ { 1 } : \mu < 63\) for the new process. Investigation of the variability in the new process has established that the standard deviation may be assumed unchanged. The usual Normal test based on \(\bar { X }\) is to be used, where \(\bar { X }\) is the mean of \(X\) over \(n\) experimental batches (regarded as a random sample), with a critical value \(c\) such that \(\mathrm { H } _ { 0 }\) is rejected if the value of \(\bar { X }\) is less than \(c\). The following criteria are set out.
    • If in fact \(\mu = 63\), the probability of concluding that \(\mu < 63\) must be only \(1 \%\).
    • If in fact \(\mu = 60\), the probability of concluding that \(\mu < 63\) must be \(90 \%\).
    Find \(c\) and the smallest value of \(n\) that is required. With these values, what is the power of the test if in fact \(\mu = 58.5\) ?
OCR MEI S4 2015 June Q1
1 The random variable \(X\) has the following probability density function, in which \(a\) is a (positive) parameter. $$\mathrm { f } ( x ) = \frac { 2 } { a } x \mathrm { e } ^ { - x ^ { 2 } / a } , \quad x \geqslant 0 .$$
  1. Verify that \(\int _ { 0 } ^ { \infty } \mathrm { f } ( x ) \mathrm { d } x = 1\).
  2. Show that \(\mathrm { E } \left( X ^ { 2 } \right) = a\) and \(\mathrm { E } \left( X ^ { 4 } \right) = 2 a ^ { 2 }\). The parameter \(a\) is to be estimated by maximum likelihood based on an independent random sample from the distribution, \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\).
  3. Show that the logarithm of the likelihood function is $$n \ln 2 - n \ln a + \sum _ { i = 1 } ^ { n } \ln X _ { i } - \frac { 1 } { a } \sum _ { i = 1 } ^ { n } X _ { i } ^ { 2 }$$ Hence obtain the maximum likelihood estimator, \(\hat { a }\), for \(a\).
    [0pt] [You are not required to verify that any turning point you find is a maximum.]
  4. Using the results from part (ii), show that \(\hat { a }\) is unbiased for \(a\) and find the variance of \(\hat { a }\).
  5. In a particular random sample from this distribution, \(n = 100\) and \(\sum x _ { i } ^ { 2 } = 147.1\). Obtain an approximate 95\% confidence interval for \(a\). (You may assume that the Central Limit Theorem holds in this case.) Option 2: Generating Functions
OCR MEI S4 2015 June Q2
2 The random variable \(Z\) has the standard Normal distribution. The random variable \(Y\) is defined by \(Y = Z ^ { 2 }\).
You are given that \(Y\) has the following probability density function. $$\mathrm { f } ( y ) = \frac { 1 } { \sqrt { 2 \pi y } } \mathrm { e } ^ { - \frac { 1 } { 2 } y } , \quad y > 0$$
  1. Show that the moment generating function (mgf) of \(Y\) is given by $$\mathrm { M } _ { Y } ( \theta ) = ( 1 - 2 \theta ) ^ { - \frac { 1 } { 2 } }$$
  2. Use the mgf to obtain \(\mathrm { E } ( Y )\) and \(\operatorname { Var } ( Y )\). The random variable \(U\) is defined by $$U = Z _ { 1 } ^ { 2 } + Z _ { 2 } ^ { 2 } + \ldots + Z _ { n } ^ { 2 } ,$$ where \(Z _ { 1 } , Z _ { 2 } , \ldots , Z _ { n }\) are independent standard Normal random variables.
  3. State an appropriate general theorem for mgfs and hence write down the mgf of \(U\). State the values of \(\mathrm { E } ( U )\) and \(\operatorname { Var } ( U )\). The random variable \(W\) is defined by $$W = \frac { U - n } { \sqrt { 2 n } }$$
  4. Show that the logarithm of the \(\operatorname { mgf }\) of \(W\) is $$- \sqrt { \frac { n } { 2 } } \theta - \frac { n } { 2 } \ln \left( 1 - \sqrt { \frac { 2 } { n } } \theta \right) .$$ Use the series expansion of \(\ln ( 1 - t )\) to show that, as \(n \rightarrow \infty\), this expression tends to \(\frac { 1 } { 2 } \theta ^ { 2 }\).
    State what this implies about the distribution of \(W\) for large \(n\).
OCR MEI S4 2015 June Q3
3 At an agricultural research station, trials are being carried out to compare a standard variety of tomato with one that has been genetically modified (GM). The trials are concerned with the mean weight of the tomatoes and also with the aesthetic appearance of the tomatoes.
    1. Tomatoes of the standard and GM varieties are grown under similar conditions. The tomatoes are weighed and the data are summarised as follows.
      VarietySample sizeSum of weights \(( \mathrm { g } )\)
      Sum of squares of
      weights \(\left( \mathrm { g } ^ { 2 } \right)\)
      Standard303218.3349257
      GM262954.1338691
      Carry out a test, using the Normal distribution, to investigate whether there is evidence, at the 5\% level of significance, that the two varieties of tomato differ in mean weight. State one assumption required for this test to be valid.
    2. The data in part (i) could have been used to carry out a test for the equality of means based on the \(t\) distribution. State two additional assumptions required for this test to be valid. Discuss briefly which test would be preferable in this case.
  1. In order to judge whether, on the whole, GM tomatoes have a better aesthetic appearance than standard tomatoes, a trial is carried out as follows. 10 of each variety are chosen and consumer panel is asked to arrange the 20 tomatoes in order according to their appearance.
    1. State two important features of the way in which this trial should be designed. Comment briefly on how reliable the evidence from the trial is likely to be.
    2. The order in which the consumer panel arranges the tomatoes is as follows. The tomato with best appearance is listed first. \(G\) and \(S\) denote GM and standard tomatoes respectively. $$\begin{array} { c c c c c c c c c c c c c c c c c c c c } G & G & G & S & G & G & G & S & G & S & S & S & G & G & S & G & S & S & S & S \end{array}$$ Carry out an appropriate test at the \(1 \%\) level of significance.
OCR MEI S4 2016 June Q1
1 The random variable \(X\) has a Cauchy distribution centred on \(m\). Its probability density function ( pdf ) is \(\mathrm { f } ( x )\) where $$\mathrm { f } ( x ) = \frac { 1 } { \pi } \frac { 1 } { 1 + ( x - m ) ^ { 2 } } , \quad \text { for } - \infty < x < \infty$$
  1. Sketch the pdf. Show that the mode and median are at \(x = m\).
  2. A sample of size 1 , consisting of the observation \(x _ { 1 }\), is taken from this distribution. Show that the maximum likelihood estimate (MLE) of \(m\) is \(x _ { 1 }\).
  3. Now suppose that a sample of size 2 , consisting of observations \(x _ { 1 }\) and \(x _ { 2 }\), is taken from the distribution. By considering the logarithm of the likelihood function or otherwise, show that the MLE, \(\hat { m }\), satisfies the cubic equation $$\left( 2 \hat { m } - \left( x _ { 1 } + x _ { 2 } \right) \right) \left( \hat { m } ^ { 2 } - \left( x _ { 1 } + x _ { 2 } \right) \hat { m } + 1 + x _ { 1 } x _ { 2 } \right) = 0$$
  4. Obtain expressions for the three roots of this equation. Show that if \(\left| x _ { 1 } - x _ { 2 } \right| < 2\) then only one root is real. How do you know, without doing further calculations, that in this case the real root will be the MLE of \(m\) ?
  5. Obtain the three possible values of \(\hat { m }\) in the case \(x _ { 1 } = - 2\) and \(x _ { 2 } = 2\). Evaluate the likelihood function for each value of \(\hat { m }\) and comment on your answer.
OCR MEI S4 2016 June Q2
2 The random variable \(X\) has probability density function \(\mathrm { f } ( x )\) where $$\mathrm { f } ( x ) = \lambda \mathrm { e } ^ { - \lambda x } , \quad x > 0 .$$
  1. Obtain the moment generating function (mgf) of \(X\).
  2. Use the mgf to find \(\mathrm { E } ( X )\) and \(\operatorname { Var } ( X )\). The random variable \(Y\) is defined as follows: $$Y = X _ { 1 } + \ldots + X _ { n } ,$$ where the \(X _ { i }\) are independently and identically distributed as \(X\).
  3. Write down expressions for \(\mathrm { E } ( Y )\) and \(\operatorname { Var } ( Y )\). Obtain the \(\operatorname { mgf }\) of \(Y\).
  4. Find the \(\operatorname { mgf }\) of \(Z\) where \(Z = \frac { Y - \frac { n } { \lambda } } { \frac { \sqrt { n } } { \lambda } }\).
  5. By considering the logarithm of the mgf of \(Z\), show that the distribution of \(Z\) tends to the standard Normal distribution as \(n\) tends to infinity.
OCR MEI S4 2016 June Q3
3 A large department in a university wished to compare the standards of literacy and numeracy of its students. A random sample of 24 students was taken and sub-divided, randomly, into two groups of 12 . The students in one group took a literacy assessment (scores denoted by \(x\) ); the students in the other group took a numeracy assessment (scores denoted by \(y\) ). The two assessments were designed to give the same distributions of scores when taken by random samples from the general population. The scores obtained by the students on the two assessments are shown in the table.
\(x\)234243464848505458596265
\(y\)443663555358638061578354
$$\sum x = 598 \quad \sum x ^ { 2 } = 31196 \quad \sum y = 707 \quad \sum y ^ { 2 } = 43543$$
  1. Carry out an appropriate \(t\) test, at the \(5 \%\) level of significance, to compare the standards of literacy and numeracy.
  2. State the distributional assumptions required for the \(t\) test to be valid. Name the test that you would use if the assumptions required for the \(t\) test are thought not to hold. State the hypotheses for this new test. Explain, in general terms, which of the two tests is more powerful, and why. A statistician at the university looked at the data and commented that a paired sample design would have been better.
  3. Explain how a paired sample design would be applied in this context, and how the data would be analysed. Explain also why it would be better than the design used.
OCR S4 2016 June Q1
1 Ten archers shot at targets with two types of bow. Their scores out of 100 are shown in the table.
Archer\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Bow type \(P\)95979285879290899877
Bow type \(Q\)91918890808893859484
  1. Use the sign test, at the \(5 \%\) level of significance, to test the hypothesis that bow type \(P\) is better than bow type \(Q\).
  2. Why would a Wilcoxon signed rank test, if valid, be a better test than the sign test?
OCR S4 2016 June Q2
2 Low density lipoprotein (LDL) cholesterol is known as 'bad' cholesterol.
15 randomly chosen patients, each with an LDL level of 190 mg per decilitre of blood, were given one of two treatments, chosen at random. After twelve weeks their LDL levels, in mg per decilitre, were as follows.
Treatment \(A\)189168176186183187188
Treatment \(B\)177179173180178170175174
Use a Wilcoxon rank sum test, at the \(5 \%\) level of significance, to test whether the LDL levels of patients given treatment \(B\) are lower than the LDL levels of patients given treatment \(A\).
OCR S4 2016 June Q3
3 The table shows the joint probability distribution of two random variables \(X\) and \(Y\).
\cline { 2 - 5 } \multicolumn{2}{c|}{}\(Y\)
\cline { 2 - 5 } \multicolumn{2}{c|}{}012
\multirow{3}{*}{\(X\)}00.070.070.16
\cline { 2 - 5 }10.060.090.15
\cline { 2 - 5 }20.070.140.19
  1. Find \(\operatorname { Cov } ( X , Y )\).
  2. Are \(X\) and \(Y\) independent? Give a reason for your answer.
  3. Find \(\mathrm { P } ( X = 1 \mid X Y = 2 )\).
OCR S4 2016 June Q4
4 The continuous random variable \(Y\) has a uniform (rectangular) distribution on \([ a , b ]\), where \(a\) and \(b\) are constants.
  1. Show that the moment generating function \(\mathrm { M } _ { Y } ( \mathrm { t } )\) of \(Y\) is \(\frac { \left( \mathrm { e } ^ { b t } - \mathrm { e } ^ { a t } \right) } { t ( b - a ) }\).
  2. Use the series expansion of \(\mathrm { e } ^ { x }\) to show that the mean and variance of \(Y\) are \(\frac { 1 } { 2 } ( a + b )\) and \(\frac { 1 } { 12 } ( b - a ) ^ { 2 }\), respectively.
OCR S4 2016 June Q5
5 Events \(A\) and \(B\) are such that \(\mathrm { P } ( A ) = 0.5 , \mathrm { P } ( B ) = 0.6\) and \(\mathrm { P } \left( A \mid B ^ { \prime } \right) = 0.75\).
  1. Find \(\mathrm { P } ( A \cap B )\) and \(\mathrm { P } ( A \cup B )\).
  2. Determine, giving a reason in each case,
    (a) whether \(A\) and \(B\) are mutually exclusive,
    (b) whether \(A\) and \(B\) are independent.
  3. A further event \(C\) is such that \(\mathrm { P } ( A \cup B \cup C ) = 1\) and \(\mathrm { P } ( A \cap B \cap C ) = 0.05\). It is also given that \(\mathrm { P } \left( A \cap B ^ { \prime } \cap C \right) = \mathrm { P } \left( A ^ { \prime } \cap B \cap C \right) = x\) and \(\mathrm { P } \left( A \cap B ^ { \prime } \cap C ^ { \prime } \right) = 2 x\).
    Find \(\mathrm { P } ( C )\).