OCR MEI Further Statistics Minor (Further Statistics Minor) 2024 June

Question 1

1 When a footballer takes a penalty kick the result is that either a goal is scored or a goal is not scored. It is known that, on average, a certain footballer scores a goal on $85 \%$ of penalty kicks. During one practice session, the footballer decides to take penalty kicks until a goal is not scored. You may assume that the outcome of each penalty kick that the footballer takes is independent of the outcome of each other penalty kick. The random variable representing the number of penalty kicks up to and including the first penalty kick that does not result in a goal is denoted by $X$.

State one further assumption that is necessary for $X$ to be modelled by a Geometric distribution. For the remainder of this question you may assume that this assumption is valid.
Find each of the following.
- $\mathrm { E } ( X )$
- $\operatorname { Var } ( X )$
- Find the probability that the footballer takes exactly 3 penalty kicks.
- Find the probability that the footballer takes at least 5 penalty kicks.

Question 2

View details

2 The sides of a fair 12 -sided spinner are labelled $1,2 , \ldots , 12$. The spinner is spun and $X$ is the random variable denoting the number on the side of the spinner that it lands on.

Suggest a suitable distribution to model $X$. You should state the value(s) of any parameter(s).
Find each of the following.
- $\mathrm { E } ( X )$
- $\operatorname { Var } ( X )$
You are given that $\mathrm { E } ( X )$ is denoted by $\mu$ and $\operatorname { Var } ( X )$ is denoted by $\sigma ^ { 2 }$.
Determine $\mathrm { P } \left( \left| \frac { 2 ( X - \mu ) } { \sigma } \right| > 1 \right)$.

Question 3

View details

3 The scatter diagram below illustrates data concerning average annual income per person, $
) x\(, and average life expectancy, $y$ years, for 45 randomly selected cities.
\includegraphics[max width=\textwidth, alt={}, center]{464c80be-007b-4d5a-9fe5-2f35100bdea6-3_860_1465_354_244}

State whether neither variable, one variable or both variables can be considered to be random in this situation. A student is researching possible positive association between average annual income and average life expectancy. The student decides that the data point labelled A on the scatter diagram is an outlier.
Describe the apparent relationship between average annual income and average life expectancy for this data point relative to the rest of the data. The data for point A is removed. The student now wishes to carry out a hypothesis test using the product moment correlation coefficient for the remaining 44 data points to investigate whether there is positive correlation between average annual income and average life expectancy.
Explain why this type of hypothesis test is appropriate in this situation. Justify your answer. The summary statistics for these 44 data points are as follows.
$\sum x = 751120 \sum y = 2397.1 \sum x ^ { 2 } = 14363849200 \sum y ^ { 2 } = 133014.63 \sum x y = 42465962$
Determine the value of the product moment correlation coefficient.
Carry out the test at the 1\% significance level.

Question 4

View details

4 A genetics researcher is investigating whether there is any association between natural hair colour and natural eye colour. A random sample of 800 adults is selected. Each adult can categorise their natural hair colour as blonde, brown, black or red and their natural eye colour as brown, blue or green.

Explain the benefit of using a random sample in this investigation. The data collected from the sample are summarised in Table 4.1. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 4.1}
\multirow{2}{*}{Observed frequency} Hair Colour
Blonde Brown Black Red Total
\multirow{3}{*}{Eye Colour} Brown 47 153 196 36 432
Blue 61 78 115 26 280
Green 19 22 31 16 88
Total 127 253 342 78 800
\end{table} The researcher decides to carry out a chi-squared test.
Determine the expected frequencies for each eye colour in the blonde hair category. You are given that the test statistic is 28.62 to 2 decimal places.

Carry out the chi-squared test at the 10\% significance level. Table 4.2 shows the chi-squared contributions for some of the categories. The contributions for the categories relating to green eye colour have been deliberately omitted. \begin{table}[h]

\captionsetup{labelformat=empty} \caption{Table 4.2}

Hair Colour

\cline { 2 - 6 }

Blonde

Brown

Black

Red

\multirow{3}{*}{

Eye

Colour

}

Brown

6.791

1.964

0.694

0.889

\cline { 2 - 6 }

Blue

6.162

1.257

0.185

0.062

\cline { 2 - 6 }

Green

\end{table}

Calculate the chi-squared contribution for the green eye and blonde hair category.

With reference to the values in Table 4.2, discuss what the data suggest about brown eye colour and blue eye colour for people with blonde hair.

A different researcher, carrying out the same investigation, independently takes a different random sample of size 800 and performs the same hypothesis test, but at the 1\% significance level, reaching the same conclusion as the original test. By comparing only the significance level of the two tests, specify which test, the one at the 10\% significance level or the one at the 1\% significance level, provides stronger evidence for the conclusion. Justify your answer.

Question 5

View details

5 Over a long period of time, it is found that the mean number of mistakes made by a certain player when playing a particular piece of music is 5 . The number of mistakes that the player makes when playing the piece is denoted by the random variable $Y$.

State two assumptions necessary for $Y$ to be modelled by a Poisson distribution. For the remainder of this question you may assume that $Y$ can be modelled by a Poisson distribution.
1. Find the probability that the player makes exactly 3 mistakes when playing the piece.
2. Find the probability that the player makes fewer than 3 mistakes when playing the piece.
3. Find the probability that the player makes fewer than 6 mistakes in total when playing the piece twice, assuming that the performances are independent. In a recording studio, the player plays the piece once in the morning and once in the afternoon each day for one week (7 days). It can be assumed that all the performances are independent of each other. The performances are recorded onto two CDs, one for each of two critics, A and B, to review. The critics are interested in the total number of mistakes made by the player per day. Unfortunately, there is a recording error in one of the CDs; on this CD, every piece that is supposed to be an afternoon recording is in fact just a repeat of that morning’s recording. The random variables $M _ { 1 }$ and $M _ { 2 }$ represent the total number of mistakes per day for the correctly recorded CD and for the wrongly recorded CD respectively.
By considering the values of $\mathrm { E } \left( M _ { 1 } \right)$ and $\mathrm { E } \left( M _ { 2 } \right)$ explain why it is not possible to use the mean number of mistakes per day on the CDs to determine which critic received the wrongly recorded CD. Each critic counts the total number of mistakes made per day, for each of the 7 days of recordings on their CD. Summary data for this is given below. Critic A: $\quad n = 7 , \quad \sum x _ { A } = 70 , \quad \sum x _ { A } ^ { 2 } = 812$
Critic B: $\quad \mathrm { n } = 7 , \sum \mathrm { x } _ { \mathrm { B } } = 72 , \sum \mathrm { x } _ { \mathrm { B } } ^ { 2 } = 800$
By considering the values of $\operatorname { Var } \left( M _ { 1 } \right)$ and $\operatorname { Var } \left( M _ { 2 } \right)$ determine which critic is likely to have received the wrongly recorded CD.

Question 6

View details

6 The probability distribution of a discrete random variable, $X$, is shown in the table below.

$x$	0	1	2
$\mathrm { P } ( X = x )$	$1 - a - b$	$a$	$b$

Find $\mathrm { E } ( X )$ in terms of $a$ and $b$.
1. In the case where $\mathrm { E } ( \mathrm { X } ) = \mathrm { a } + 0.4$, find an expression for $\operatorname { Var } ( X )$ in terms of $a$.
2. In this case, show that the greatest possible value of $\operatorname { Var } ( X )$ is 0.65 . You must state the associated value of $a$.
You are now given instead that $\mathrm { E } ( X )$ is not known.
1. State the least possible value of $\operatorname { Var } ( X )$.
2. Give all possible pairs of values of $a$ and $b$ which give the least possible value of $\operatorname { Var } ( X )$ stated in part (c)(i).

\multirow{2}{*}{Observed frequency}		Hair Colour
		Blonde	Brown	Black	Red	Total
\multirow{3}{*}{Eye Colour}	Brown	47	153	196	36	432
	Blue	61	78	115	26	280
	Green	19	22	31	16	88
	Total	127	253	342	78	800