OCR MEI Further Statistics Major (Further Statistics Major) 2022 June

Question 1
View details
1 During a meteor shower, the number of meteors that can be seen at a particular location can be modelled by a Poisson distribution with mean 1.2 per minute.
  1. Find the probability that exactly 2 meteors are seen in a period of 1 minute.
  2. Find the probability that more than 3 meteors are seen in a period of 1 minute.
  3. Find the probability that no more than 8 meteors are seen in a period of 10 minutes.
  4. Explain what the fact that the number of meteors seen can be modelled by a Poisson distribution tells you about the occurrence of meteors.
Question 2
View details
2 A manufacturer is testing how long coloured LED lights will last before the battery runs out, using two different battery types. The times in hours before the battery runs out are modelled by independent Normal distributions with means and standard deviations as shown in the table.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Time
TypeMean
Standard
deviation
A232.8
B353.6
  1. In a particular test, a battery of type A is used and the time taken for it to run out is recorded. This process is repeated until a total of 5 randomly selected batteries have been used. Determine the probability that the total time the 5 batteries last is at least 120 hours.
  2. In a similar test, 3 randomly selected batteries of type A are used, one after the other. Then 2 randomly selected batteries of type B are used, one after the other. Determine the probability that the 3 type A batteries last longer in total than the 2 type B batteries.
  3. Explain why it is necessary that the Normal distributions are independent in order to be able to find the probability in part (b).
Question 3
View details
3 The table shows the probability distribution of the random variable \(X\), where \(a\) and \(b\) are constants.
\(r\)01234
\(\mathrm { P } ( X = r )\)\(a\)\(b\)0.240.32\(b ^ { 2 }\)
  1. Given that \(\mathrm { E } ( X ) = 1.8\), determine the values of \(a\) and \(b\). The random variable \(Y\) is given by \(Y = 10 - 3 X\).
  2. Using the values of \(a\) and \(b\) which you found in part (a), find each of the following.
    • \(\mathrm { E } ( Y )\)
    • \(\operatorname { Var } ( Y )\)
Question 4
View details
4 A pack of \(k\) cards is labelled \(1,2 , \ldots , k\). A card is drawn at random from the pack. The random variable \(X\) represents the number on the card.
  1. Given that \(k > 10\), find \(\mathrm { P } ( X \geqslant 10 )\). You are now given that \(k = 20\).
  2. A card is drawn at random from the pack and the number on it is noted. The card is then returned to the pack. This process is repeated until the second occasion on which the number noted is less than 9 . Find the probability that no more than 4 cards have to be drawn. Answer all the questions. Section B (95 marks)
Question 5
View details
5 A motorist is investigating the relationship between tyre pressure and temperature. As the temperature increases during a hot day, she records the pressure (measured in bars) of one of her car tyres at specific temperatures of \(20 ^ { \circ } \mathrm { C } , 22 ^ { \circ } \mathrm { C } , \ldots , 36 ^ { \circ } \mathrm { C }\). The results are shown in Table 5.1. \begin{table}[h]
Temperature \(\left( t ^ { \circ } \mathrm { C } \right)\)202224262830323436
Tyre pressure \(( P\) bar \()\)2.0122.0362.0652.0742.1142.1402.1492.1762.192
\captionsetup{labelformat=empty} \caption{Table 5.1}
\end{table}
  1. Calculate the equation of the regression line of pressure on temperature. Give your answer in the form \(P = a t + b\), giving the values of \(a\) and \(b\) to \(\mathbf { 4 }\) significant figures.
  2. Table 5.2 shows the residuals for most of the data values. Complete the copy of the table in the Printed Answer Booklet. \begin{table}[h]
    Temperature202224262830323436
    Residual tyre
    pressure
    - 0.003- 0.0020.004- 0.0100.011- 0.0030.001
    \captionsetup{labelformat=empty} \caption{Table 5.2}
    \end{table}
  3. With reference to the values of the residuals, comment on the goodness of fit of the regression line.
  4. Use your answer to part (a) to calculate an estimate of the pressure in the tyre at each of the following temperatures, giving your answers to \(\mathbf { 3 }\) decimal places.
    • \(25 ^ { \circ } \mathrm { C }\)
    • \(10 ^ { \circ } \mathrm { C }\)
    • Comment on the reliability of each of your estimates.
Question 6
View details
  1. Determine a 95\% confidence interval for the mean weight of liquid paraffin in a tub.
  2. Explain whether the confidence interval supports the researcher's belief.
  3. Explain why the sample has to be random in order to construct the confidence interval.
    [0pt]
  4. A 95\% confidence interval for the mean weight in grams of another ingredient in the skin cream is [1.202, 1.398]. This confidence interval is based on a large sample and the unbiased estimate of the population variance calculated from the sample is 0.25 . Find each of the following.
    • The mean of the sample
    • The size of the sample
Question 7
View details
7 Amir is trying to thread a needle. On each attempt the probability that he is successful is 0.3 , independently of any other attempt. The random variable \(X\) represents the number of attempts that he takes to thread the needle.
  1. Find \(\mathrm { P } ( X = 5 )\).
  2. During the course of a day, Amir has to thread 6 needles. Determine the probability that it takes him more than 3 attempts to be successful for at least 4 of the 6 needles.
  3. Amaya is also trying to thread a needle. On each attempt the probability that she is successful is \(p\), independently of any other attempt. The probability that Amaya takes 2 attempts to thread a particular needle is \(\frac { 28 } { 121 }\). Determine the possible values of \(p\).
Question 8
View details
8 A swimming coach is investigating whether there is correlation between the times taken by teenage swimmers to swim 50 m Butterfly and 50 m Freestyle. The coach selects a random sample of 11 teenage swimmers and records the times that each of them take for each event. The spreadsheet shows the data, together with a scatter diagram to illustrate the data.
\includegraphics[max width=\textwidth, alt={}, center]{77eabbd6-a058-457f-9601-d66f3c2db005-06_712_1465_456_274}
  1. In the scatter diagram, Butterfly times have been plotted on the horizontal axis and Freestyle times on the vertical axis. A student states that the variables should have been plotted the other way around. Explain whether the student is correct. The student decides to carry out a hypothesis test to investigate whether there is any correlation between the times taken for the two events.
  2. Explain why the student decides to carry out a test based on Spearman's rank correlation coefficient.
  3. In this question you must show detailed reasoning. Carry out the test at the 5\% significance level.
  4. The student concludes that there is definitely no correlation between the times. Comment on the student's conclusion.
Question 9
View details
9 The random variable \(X\) has a discrete uniform distribution over the values \(\{ 0,1,2 , \ldots , 20 \}\).
  1. Find \(\mathrm { P } ( X \leqslant 7 )\).
  2. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    The spreadsheet shows a simulation of the distribution of \(X\). Each of the 25 rows of the spreadsheet below the heading row shows a simulation of 10 independent values of \(X\) together with the value of the mean of the 10 values, denoted by \(Y\).
    \includegraphics[max width=\textwidth, alt={}]{77eabbd6-a058-457f-9601-d66f3c2db005-07_38_45_880_279}ABCDEFGHIJKL
    1\(X _ { 1 }\)\(X _ { 2 }\)\(X _ { 3 }\)\(X _ { 4 }\)\(X _ { 5 }\)\(X _ { 6 }\)\(X _ { 7 }\)\(X _ { 8 }\)\(X _ { 9 }\)\(X _ { 10 }\)\(Y\)
    216211864911116.9
    313141224111601608.8
    441711641012218139.7
    5281214161221588.0
    6715160471130208.3
    71513101120201516610.8
    81413171221816189412.3
    9202123173018151310.3
    10212512260910157.3
    115111310917104201511.4
    12149976202211169.6
    1315191819766203812.1
    1451064119158171810.3
    150315151112039168.4
    16112115041111926.6
    171250838121913129.2
    1895113541811197.6
    19162202012172782012.4
    20181732818701169.0
    211510720405611149.2
    223910142186076.0
    23111011101911371009.2
    241214665201118101411.6
    25111514111011205.6
    26014711185102011910.5
    27
  3. Use the spreadsheet to estimate \(\mathrm { P } ( Y \leqslant 7 )\).
  4. Explain why the true value of \(\mathrm { P } ( Y \leqslant 7 )\) is less than \(\mathrm { P } ( X \leqslant 7 )\), relating your answer to \(\operatorname { Var } ( X )\) and \(\operatorname { Var } ( Y )\).
  5. The random variable \(W\) is the mean of 30 independent values of \(X\). Determine an estimate of \(\mathrm { P } ( W \leqslant 7 )\).
Question 10
View details
10 A scientist is researching dietary fat intake and cholesterol level. A random sample of 60 people is selected and their dietary fat intakes and cholesterol levels are measured. Dietary fat intakes are classified as low, medium and high, and cholesterol levels are classified as normal and high. The scientist decides to carry out a chi-squared test to investigate whether there is any association between dietary fat intake and cholesterol level. Tables \(\mathbf { 1 0 . 1 }\) and \(\mathbf { 1 0 . 2 }\) show the data and some of the expected frequencies for the test. \begin{table}[h]
\multirow{2}{*}{}Dietary fat intake
LowMediumHighTotal
\multirow{2}{*}{Cholesterol level}Normal918532
High3131228
Total12311760
\captionsetup{labelformat=empty} \caption{Table 10.1}
\end{table} \begin{table}[h]
Expected frequencyDietary fat intake
\cline { 3 - 5 }LowMediumHigh
\multirow{2}{*}{
Cholesterol
level
}
Normal9.0667
\cline { 2 - 5 }High7.9333
\captionsetup{labelformat=empty} \caption{Table 10.2}
\end{table}
  1. Complete the table of expected frequencies in the Printed Answer Booklet.
  2. Determine the contribution to the chi-squared test statistic for people with normal cholesterol level and high dietary fat intake, giving your answer to \(\mathbf { 4 }\) decimal places. The contributions to the chi-squared test statistic for the remaining categories are shown in Table 10.3. \begin{table}[h]
    Dietary fat intake
    \cline { 2 - 5 }LowMediumHigh
    \multirow{2}{*}{
    Cholesterol
    level
    }
    Normal1.05630.1301
    \cline { 2 - 5 }High1.20710.14872.0846
    \captionsetup{labelformat=empty} \caption{Table 10.3} \end{table}
  3. In this question you must show detailed reasoning. Carry out the test at the 5\% significance level.
  4. For each level of dietary fat intake, give a brief interpretation of what the data suggest about the level of cholesterol.
  5. Question 11
    View details
    11 A particular dietary supplement, when taken for a period of 1 month, is claimed to increase lean body mass of adults by an average of 1 kg . A researcher believes that this claim overestimates the increase. She selects a random sample of 10 adults who then each take the supplement for a month. The increases in lean body masses in kg are as follows. $$\begin{array} { l l l l l l l l l l } - 0.84 & - 0.76 & - 0.16 & 0.43 & 1.31 & 1.32 & 1.47 & 1.64 & 1.93 & 2.14 \end{array}$$ A Normal probability plot and the \(p\)-value of the Kolmogorov-Smirnov test for these data are shown below.
    \includegraphics[max width=\textwidth, alt={}, center]{77eabbd6-a058-457f-9601-d66f3c2db005-09_575_1485_689_242}
    1. The researcher decides to carry out a hypothesis test in order to investigate the claim. Comment on the type of hypothesis test that should be used. You should refer to
      • The Normal probability plot
      • The \(p\)-value of the Kolmogorov-Smirnov test
      • Carry out a test at the \(5 \%\) significance level to investigate whether the researcher's belief may be correct.
      • If the Normal probability plot had been different, giving a \(p\)-value of 0.65 for the KolmogorovSmirnov test, a different procedure could have been used to investigate the researcher's belief.
      • State what alternative test could have been used in this case.
      • State what the hypotheses would have been.
    Question 12
    View details
    12 The continuous random variable \(X\) has cumulative distribution function given by $$F ( x ) = \begin{cases} 0 & x < 0
    k \left( a x - 0.5 x ^ { 2 } \right) & 0 \leqslant x \leqslant a
    1 & x > a \end{cases}$$ where \(a\) and \(k\) are positive constants.
    1. Determine the median of \(X\) in terms of \(a\).
    2. Given that \(a = 10\), determine the probability that \(X\) is within one standard deviation of its mean.