OCR MEI Further Statistics Minor (Further Statistics Minor) 2022 June

Question 1
View details
1 In a quiz a contestant is asked up to four questions. The contestant's turn ends once the contestant gets a question wrong or has answered all four questions. The probability that a particular contestant gets any question correct is 0.6 , independently of other questions. The discrete random variable \(X\) models the number of questions which the contestant gets correct in a turn.
  1. Show that \(\mathrm { P } ( X = 4 ) = 0.1296\). The probability distribution of \(X\) is shown in Fig. 1.1. \begin{table}[h]
    \(r\)01234
    \(\mathrm { P } ( X = r )\)0.40.240.1440.08640.1296
    \captionsetup{labelformat=empty} \caption{Fig. 1.1}
    \end{table}
  2. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    The number of points that a contestant scores is as shown in Fig. 1.2. \begin{table}[h]
    Number of
    questions correct
    Number of
    points scored
    0 or 10
    22
    33
    45
    \captionsetup{labelformat=empty} \caption{Fig. 1.2}
    \end{table} The discrete random variable \(Y\) models the number of points which the contestant scores.
  3. Without doing any working, explain whether each of the following will be less than, equal to or greater than the corresponding value for \(X\).
    • \(\mathrm { E } ( Y )\)
    • \(\operatorname { Var } ( Y )\)
Question 2
View details
2 A forester is investigating the relationship between the diameter and the height of young beech trees. She selects a random sample of 15 young beech trees in a forest and records their diameters, \(d \mathrm {~cm}\), and their heights, \(h \mathrm {~m}\). The data are illustrated in the scatter diagram.
\includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-3_649_1116_386_230}
  1. State whether either or both of the variables \(d\) and \(h\) are random variables. Summary data for the diameters and heights are as follows. $$\mathrm { n } = 15 \quad \sum \mathrm {~d} = 84.9 \quad \sum \mathrm {~h} = 124.7 \quad \sum \mathrm {~d} ^ { 2 } = 624.55 \quad \sum \mathrm {~h} ^ { 2 } = 1230.57 \quad \sum \mathrm { dh } = 866.63$$
  2. Find the equation of the regression line of \(h\) on \(d\). Give your answer in the form \(h = a d + b\), giving the values of \(a\) and \(b\) correct to \(\mathbf { 2 }\) decimal places.
  3. Use the regression line to predict the heights of beech trees with the following diameters.
    • 7.5 cm
    • 20.0 cm
    • Comment on the reliability of your predictions.
    • There are many mature beech trees with diameter of 60 cm or greater. However, there are no beech trees with a height of more than 50 m .
    Comment on this in relation to your regression line.
  4. State the coordinates of the point at which the regression line of \(d\) on \(h\) meets the line which you calculated in part (b).
Question 3
View details
3 Jane wonders whether the number of wasps entering a wasp's nest per 5 second interval can be modelled by a Poisson distribution with mean \(\mu\). She counts the number of wasps entering the nest over 60 randomly selected 5 -second intervals. The results are shown in Fig. 3.1. \begin{table}[h]
Number of wasps0123456789\(\geqslant 10\)
Frequency025512101011140
\captionsetup{labelformat=empty} \caption{Fig. 3.1}
\end{table}
  1. Show that a suitable estimate for the value of \(\mu\) is 5.1. Fig. 3.2 shows part of a screenshot for a \(\chi ^ { 2 }\) test to assess the goodness of fit of a Poisson model. The sample mean has been used as an estimate for the population mean. Some of the values in the spreadsheet have been deliberately omitted. \begin{table}[h]
    ABCDE
    \includegraphics[max width=\textwidth, alt={}]{e8624e9b-5143-49d2-9683-cc3a1082694e-4_132_40_1069_273}Number of waspsObserved frequencyPoisson probabilityExpected frequencyChi-squared contribution
    2\(\leqslant 2\)70.11656.98870.0000
    3358.08741.1786
    44120.2765
    55100.0255
    66100.14908.94000.1257
    77110.10866.51343.0904
    8\(\geqslant 8\)50.14408.6414
    9
    \captionsetup{labelformat=empty} \caption{Fig. 3.2}
    \end{table}
  2. Determine the missing values in each of the following cells, giving your answers correct to 4 decimal places.
    • C3
    • D5
    • E8
    • Explain why some of the frequencies have been combined into the categories \(\leqslant 2\) and \(\geqslant 8\).
    • In this question you must show detailed reasoning.
    Carry out the hypothesis test at the 5\% significance level.
  3. Jane also carries out a \(\chi ^ { 2 }\) test for the number of wasps leaving another nest. As part of her calculations, she finds that the probability of no wasps leaving the nest in a 5 -second period is 0.0053 . She finds that a Poisson distribution is also an appropriate model in this case. Find a suitable estimate for the value of the mean number of wasps leaving the nest per 5-second period.
Question 4
View details
4 Alex is practising bowling at a cricket wicket. Every time she bowls a ball, she has a \(30 \%\) chance of hitting the wicket.
  1. Assuming that successive bowls are independent, determine the probability that Alex first hits the wicket on her third attempt.
  2. Determine the probability that Alex hits the wicket for the fourth time on her tenth attempt.
Question 5
View details
5 A medical researcher is investigating whether there is any relationship between the age of a person and the level of a particular protein in the person’s blood. She measures the levels of the protein (measured in suitable units) in a random sample of 12 hospital patients of various ages (in years). The spreadsheet shows the values obtained, together with a scatter diagram which illustrates the data.
\includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-5_736_1470_1087_246}
  1. The researcher decides that a test based on Pearson's product moment correlation coefficient may not be valid. Explain why she comes to this conclusion.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a test based on this coefficient at the \(5 \%\) significance level to investigate whether there is any association between age and protein level.
  4. Explain why the researcher chose a sample that was random.
  5. The researcher had originally intended to use a sample size of 6 rather than the 12 that she actually used. Explain what advantage there is in using the larger sample size.
Question 6
View details
6 The random variable \(X\) has a uniform distribution over the values \(\{ 1,4,7 , \ldots , 3 n - 2 \}\), where \(n\) is a positive integer.
  1. Determine \(\operatorname { Var } ( X )\) in terms of \(n\).
  2. Given that \(n = 100\), find the probability that \(X\) is within one standard deviation of the mean.