OCR Further Statistics (Further Statistics) 2021 November

Question 1
View details
1 At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows.
\(\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297\)
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(\mathrm { c } = \frac { 5 } { 9 } ( \mathrm { f } - 32 )\). Find the equation of the new regression line.
Question 2
View details
2 A discrete random variable \(D\) has the following probability distribution, where \(a\) is a constant.
\(d\)0246
\(\mathrm { P } ( D = d )\)\(a\)0.10.30.2
Determine the value of \(\operatorname { Var } ( 3 D + 4 )\).
Question 3
View details
3 In a large collection of coloured marbles of identical size, the proportion of green marbles is \(p\). One marble is chosen randomly, its colour is noted, and it is then replaced. This process is repeated until a green marble is chosen. The first green marble chosen is the \(X\) th marble chosen.
  1. You are given that \(p = 0.3\).
    1. Find \(\mathrm { P } ( 5 \leqslant X \leqslant 10 )\).
    2. Determine the smallest value of \(n\) for which \(\mathrm { P } ( X = n ) < 0.1\).
  2. You are given instead that \(\operatorname { Var } ( X ) = 42\). Determine the value of \(\mathrm { E } ( X )\).
Question 4
View details
4 A random sample of 160 observations of a random variable \(X\) is selected. The sample can be summarised as follows.
\(n = 160 \quad \sum x = 2688 \quad \sum x ^ { 2 } = 48398\)
  1. Calculate unbiased estimates of the following.
    1. \(\mathrm { E } ( X )\)
    2. \(\operatorname { Var } ( X )\)
  2. Find a 99\% confidence interval for \(\mathrm { E } ( X )\), giving the end-points of the interval correct to 4 significant figures.
  3. Explain whether it was necessary to use the Central Limit Theorem in answering
    1. part (a),
    2. part (b).
Question 5
View details
5 The numbers of each of 9 items sold in two different supermarkets in a week are given in the following table.
Item123456789
Supermarket \(A\)1728414362697593115
Supermarket \(B\)24718124729584237
A researcher wants to test whether there is association between the numbers of these items sold in the two supermarkets. However, it is known that the collection of data in Supermarket \(B\) was done inaccurately and each of the numbers in the corresponding row of the table could have been in error by as much as 2 items greater or 2 items fewer.
  1. Explain why Spearman's rank correlation coefficient might be preferred to the use of Pearson's product-moment correlation coefficient in this context.
  2. Carry out the test at the \(5 \%\) significance level using Spearman's rank correlation coefficient.
Question 6 3 marks
View details
6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, \(X\) minutes, taken by candidates to complete the paper. The organiser starts by carrying out a goodness-of-fit test for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) at the \(5 \%\) significance level. The grouped data and the results of some of the calculations are shown in the following table.
Time\(0 \leqslant X < 80\)\(80 \leqslant X < 90\)\(90 \leqslant X < 100\)\(100 \leqslant X < 110\)\(X \geqslant 110\)
Observed frequency \(O\)3695137129103
Expected frequency \(E\)45.60680.641123.754123.754126.246
\(\frac { ( O - E ) ^ { 2 } } { E }\)2.0232.5571.4180.2224.280
  1. State suitable hypotheses for the test.
  2. Show how the figures 123.754 and 0.222 in the column for \(100 \leqslant X < 110\) were obtained. [3]
  3. Carry out the test. The organiser now wants to suggest an improved model for the data.
    1. Suggest an aspect of the data that the organiser should take into account in considering an improved model.
    2. The graph of the probability density function for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) is shown in the diagram in the Printed Answer Booklet. On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
Question 7
View details
7 In a school opinion poll a random sample of 8 pupils were asked to rate school lunches on a scale of 0 to 20 . The results were as follows.
\(\begin{array} { l l l l l l l l } 0 & 1 & 2 & 3 & 4 & 10 & 11 & 13 \end{array}\) After a new menu was introduced, the test was repeated with a different random sample of 8 pupils. The results were as follows.
\(\begin{array} { l l l l l l l l } 7 & 8 & 9 & 14 & 15 & 17 & 19 & 20 \end{array}\)
  1. Carry out an appropriate Wilcoxon test at the \(5 \%\) significance level to test whether pupils' opinions of school lunches have changed. A statistics student tells the organisers of the opinion poll that it would have been better to have asked the same 8 pupils both times.
  2. Explain why the statistics student's suggestion would produce a better test.
  3. State which test should be used if the student's suggestion is followed.
  4. You are given that there are 12870 ways in which 8 different integers can be chosen from the integers 1 to 16 inclusive. Estimate the number of ways of selecting 8 different digits between 1 and 16 inclusive that have a sum less than or equal to the critical value used in the test in part (a).
Question 8
View details
8 The continuous random variable \(Y\) has a uniform distribution on [0,2].
  1. It is given that \(\mathrm { E } [ a \cos ( a Y ) ] = 0.3\), where \(a\) is a constant between 0 and 1 , and \(a Y\) is measured in radians. Determine the value of the constant \(a\).
  2. Determine the \(60 ^ { \text {th } }\) percentile of \(Y ^ { 2 }\).