OCR MEI Further Statistics A AS (Further Statistics A AS) 2020 November

Question 1
View details
1 The random variable \(X\) represents the number of cars arriving at a car wash per 10-minute period. From observations over a number of days, an estimate was made of the probability distribution of \(X\). Table 1 shows this estimated probability distribution. \begin{table}[h]
\(r\)01234\(> 4\)
\(\mathrm { P } ( X = r )\)0.300.380.190.080.050
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. In this question you must show detailed reasoning. Use Table 1 to calculate estimates of each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    • Explain how your answers to part (a) indicate that a Poisson distribution may be a suitable model for \(X\).
    You should now assume that \(X\) can be modelled by a Poisson distribution with mean equal to the value which you calculated in part (a).
  2. Find each of the following.
    • \(\mathrm { P } ( X = 2 )\)
    • \(\mathrm { P } ( X > 3 )\)
    • Given that the probability that there is at least 1 car arriving in a period of \(k\) minutes is at least 0.99 , find the least possible value of \(k\).
Question 2
View details
2 A researcher is investigating the concentration of bacteria and fungi in the air in buildings. The researcher selects a random sample of 12 buildings and measures the concentrations of bacteria, \(x\), and fungi, \(y\), in the air in each building. Both concentrations are measured in the same standard units. Fig. 2 illustrates the data collected. The researcher wishes to test for a relationship between \(x\) and \(y\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ba3fcd3c-6834-4116-be0e-d5b27aed0a7e-3_595_844_513_255} \captionsetup{labelformat=empty} \caption{Fig. 2}
\end{figure}
  1. Explain why a test based on the product moment correlation coefficient is likely to be appropriate for these data. Summary statistics for the data are as follows.
    \(n = 12 \quad \sum x = 18030 \quad \sum y = 15550 \quad \sum x ^ { 2 } = 31458700 \quad \sum y ^ { 2 } = 21980500 \quad \sum x y = 25626800\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. Carry out a test at the \(5 \%\) significance level based on the product moment correlation coefficient to investigate whether there is any correlation between concentrations of bacteria and fungi.
  4. Explain why, in order for proper inference to be undertaken, the sample should be chosen randomly.
Question 3
View details
3 A child is trying to draw court cards from an ordinary pack of 52 cards (court cards are Kings, Queens and Jacks; there are 12 in a pack). She draws cards, one at a time, with replacement, from the pack. Find the probabilities of the following events.
  1. She draws a court card for the first time on the sixth try.
  2. She draws a court card at least once in the first six tries.
  3. She draws a court card for the second time on the sixth try.
  4. She draws at least two court cards in the first six tries.
Question 4
View details
4 A fair 8 -sided dice has faces labelled \(1,2 , \ldots , 8\). The random variable \(X\) represents the score when the dice is rolled once.
  1. State the distribution of \(X\).
  2. Find \(\mathrm { P } ( X < 4 )\).
  3. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    • The random variable \(Y\) is defined by \(Y = 10 X + 5\). Find each of the following.
    • \(\mathrm { E } ( Y )\)
    • \(\operatorname { Var } ( Y )\)
Question 5
View details
5 A doctor is investigating the relationship between the levels in the blood of a particular hormone and of calcium in healthy adults. The levels of the hormone and of calcium, each measured in suitable units, are denoted by \(x\) and \(y\) respectively. The doctor selects a random sample of 14 adults and measures the hormone and calcium levels in each of them. The spreadsheet in Fig. 5 shows the values obtained, together with a scatter diagram which illustrates the data. The equation of the regression line of \(y\) on \(x\) is shown on the scatter diagram, together with the value of the square of the product moment correlation coefficient. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ba3fcd3c-6834-4116-be0e-d5b27aed0a7e-5_801_1644_646_255} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure}
  1. Use the equation of the regression line to estimate the mean calcium level of people with the following hormone levels.
    • 150
    • 250
    • Explain which of your two estimates is likely to be more reliable.
    • Comment on the goodness of fit of the regression line.
    • Explain whether it would be appropriate to plot the scatter diagram the other way around with calcium level on the horizontal axis and hormone level on the vertical axis.
    • Calculate the equation of a regression line which would be suitable for estimating the mean hormone level of people with a known calcium level.
Question 6
View details
6 A researcher is investigating whether there is any relationship between whether a cyclist wears a helmet and the distance, \(x \mathrm {~m}\), the cyclist is from the kerb (the edge of the road). Data are collected at a particular location for a random sample of 250 cyclists. The researcher carries out a chi-squared test. Fig. 6 is a screenshot showing part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEFG
1\multirow{2}{*}{}Observed frequency
2\(\boldsymbol { x } \boldsymbol { \leq } \mathbf { 0 . 3 }\)\(0.3 < x \leq 0.5\)\(0.5 < x \leq 0.8\)x > 0.8Totals
3\multirow[t]{2}{*}{Wears helmet}Yes26272346122
4No45312131128
5\multirow{2}{*}{}Totals71584477250
6
7Expected frequency
8\(\boldsymbol { x } \boldsymbol { \leq } \mathbf { 0 . 3 }\)\(0.3 < x \leq 0.5\)\(0.5 < x \leq 0.8\)\(\boldsymbol { x } \boldsymbol { > } \mathbf { 0 . 8 }\)
9\multirow[t]{2}{*}{Wears helmet}Yes34.648037.5760
10No36.352039.4240
11
12\multirow{2}{*}{}Contribution to the test statistic
13\(\boldsymbol { x } \boldsymbol { \leq } \mathbf { 0 . 3 }\)\(0.3 < x \leq 0.5\)\(0.5 < x \leq 0.8\)\(\boldsymbol { x } \boldsymbol { > } \mathbf { 0 . 8 }\)
14\multirow[t]{2}{*}{Wears helmet}Yes2.15850.06010.10871.8885
15No2.05730.05731.8000
16
\captionsetup{labelformat=empty} \caption{Fig. 6}
\end{table}
  1. Showing your calculations, find the missing values in each of the following cells.
    • E10
    • E15
    • In this question you must show detailed reasoning.
    Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any association between helmet wearing and distance from the kerb.
  2. Discuss briefly what the data suggest about helmet wearing for different distances from the kerb.