OCR MEI Further Statistics Major (Further Statistics Major) 2021 November

Question 1
View details
1 When babies are born, their head circumferences are measured. A random sample of 50 newborn female babies is selected. The sample mean head circumference is 34.711 cm . The sample standard deviation head circumference is 1.530 cm .
  1. Determine a 95\% confidence interval for the population mean head circumference of newborn female babies.
  2. Explain why you can calculate this interval even though the distribution of the population of head circumferences of newborn female babies is unknown.
Question 2
View details
2 In a game at a charity fair, a player rolls 3 unbiased six-sided dice. The random variable \(X\) represents the difference between the highest and lowest scores.
  1. Show that \(\mathrm { P } ( X = 0 ) = \frac { 1 } { 36 }\). The table shows the probability distribution of \(X\).
    \(r\)012345
    \(\mathrm { P } ( \mathrm { X } = \mathrm { r } )\)\(\frac { 1 } { 36 }\)\(\frac { 5 } { 36 }\)\(\frac { 2 } { 9 }\)\(\frac { 1 } { 4 }\)\(\frac { 2 } { 9 }\)\(\frac { 5 } { 36 }\)
  2. Draw a graph to illustrate the distribution.
  3. Describe the shape of the distribution.
  4. In this question you must show detailed reasoning. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    As a result of playing the game, the player receives \(30 X\) pence from the organiser of the game.
  5. Find the variance of the amount that the player receives.
  6. The player pays \(k\) pence to play the game. Given that the average profit made by the organiser is 12.5 pence per game, determine the value of \(k\).
Question 3
View details
3 In air traffic management, air traffic controllers send radio messages to pilots. On receiving a message, the pilot repeats it back to the controller to check that it has been understood correctly. At a particular site, on average \(4 \%\) of messages sent by controllers are not repeated back correctly and so have been misunderstood. You should assume that instances of messages being misunderstood occur randomly and independently.
  1. Find the probability that exactly 2 messages are misunderstood in a sequence of 50 messages.
  2. Find the probability that in a sequence of messages, the 10th message is the first one which is misunderstood.
  3. Find the probability that in a sequence of 20 messages, there are no misunderstood messages.
  4. Determine the expected number of messages required for 3 of them to be misunderstood.
  5. Determine the probability that in a sequence of messages, the 3rd misunderstood message is the 60th message in the sequence.
Question 4
View details
4 A radioactive source contains 1000000 nuclei of a particular radioisotope. On average 1 in 200000 of these nuclei will decay in a period of 1 second. The random variable \(X\) represents the number of nuclei which decay in a period of 1 second. You should assume that nuclei decay randomly and independently of each other.
  1. Explain why you could use either a binomial distribution or a Poisson distribution to model the distribution of \(X\). Use a Poisson distribution to answer parts (b) and (c).
  2. Calculate each of the following probabilities.
    • \(\mathrm { P } ( X = 6 )\)
    • \(\mathrm { P } ( X > 6 )\)
    • Determine an estimate of the probability that at least 60 nuclei decay in a period of 10 seconds.
Question 5
View details
5 A manufacturer uses three types of capacitor in a particular electronic device. The capacitances, measured in suitable units, are modelled by independent Normal distributions with means and standard deviations as shown in the table.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Capacitance
TypeMean
Standard
deviation
A3.90.32
B7.80.41
C30.20.64
  1. Determine the probability that the total capacitance of a randomly chosen capacitor of Type B and two randomly chosen capacitors of Type A is at least 16 units.
  2. Determine the probability that the capacitance of a randomly chosen capacitor of Type C is within 1 unit of the total capacitance of four randomly chosen capacitors of Type B. When the manufacturer gets a new batch of 1000 capacitors from the supplier, a random sample of 10 of them is tested to check the capacitances. For a new batch of Type C capacitors, summary statistics for the capacitances, \(x\) units, of the random sample are as follows.
    \(n = 10\) $$\sum x = 299.6 \quad \sum x ^ { 2 } = 8981.0$$ You should assume that the capacitances of the sample come from a Normally distributed population, but you should not assume that the standard deviation is 0.64 as for previous Type C capacitors.
  3. In this question you must show detailed reasoning. Carry out a hypothesis test at the \(5 \%\) significance level to check whether it is reasonable to assume that the capacitors in this batch have the specified mean capacitance for Type C of 30.2 units.
Question 6
View details
6 Cosmic rays passing through the upper atmosphere cause muons, and other types of particle, to be formed. Muons can be detected when they reach the surface of the earth. It is known that the mean number of muons reaching a particular detector is 1.7 per second. The numbers of muons reaching this detector in 200 randomly selected periods of 1 second are shown in Fig. 6.1. \begin{table}[h]
Number of muons0123456\(\geqslant 7\)
Frequency3465552414620
\captionsetup{labelformat=empty} \caption{Fig. 6.1}
\end{table}
  1. Use the values of the sample mean and sample variance to discuss the suitability of a Poisson distribution as a model. The screenshot in Fig. 6.2 shows part of a spreadsheet to assess the goodness of fit of the distribution Po(1.7). \begin{table}[h]
    ABCDE
    1Number of muonsObserved frequencyPoisson probabilityExpected frequencyChi-squared contribution
    20340.182736.53670.1761
    3165
    42550.264052.79550.0920
    53240.149629.91751.1704
    64140.1299
    7\(\geqslant 5\)80.02965.92300.7284
    \captionsetup{labelformat=empty} \caption{Fig. 6.2}
    \end{table}
  2. Calculate the missing values in each of the following cells.
    • C3
    • D3
    • E3
    • Explain why the numbers for 5, 6 and at least 7 muons have been combined into the single category of at least 5 muons, as shown in Fig. 6.2.
    • In this question you must show detailed reasoning.
    Carry out the test at the 5\% significance level.
Question 7
View details
7 A physiotherapist is investigating hand grip strength in adult women under 30 years old. She thinks that the grip strength of the dominant hand will be on average 2 kg higher than the grip strength of the non-dominant hand. The physiotherapist selects a random sample of 12 adult women under 30 years old and measures the grip strength of each of their hands. She then uses software to produce a \(95 \%\) confidence interval for the mean difference in grip strength between the two hands (dominant minus nondominant), as shown in Fig. 7. \begin{table}[h]
T Estimate of a Mean
Confidence Level0.95
Sample
\multirow{3}{*}{
}
Result
T Estimate of a Mean
Mean2.79
s3.92
SE1.13161
N12
df11
Lower Limit0.29935
Upper Limit5.28065
Interval\(2.79 \pm 2.49065\)
\captionsetup{labelformat=empty} \caption{Fig. 7} \end{table}
  1. Explain why the physiotherapist used the same people for testing their dominant and nondominant grip strengths.
  2. State any assumptions necessary in order to construct the confidence interval shown in Fig. 7.
  3. Explain whether the confidence interval supports the physiotherapist's belief.
  4. The physiotherapist then finds some data which have previously been collected on grip strength using a sample of 100 adult women. A 95\% confidence interval, based on this sample and calculated using a Normal distribution, for the mean difference in grip strength between the two hands (dominant minus non-dominant) is (1.94, 2.84).
    1. For this sample, find
      • the mean difference
  5. the standard deviation of the differences.
    (ii) Explain what you would need to know about the nature of this sample if you wanted to draw conclusions about the mean difference in grip strength in the population of adult women.
Question 8
View details
8
  1. \(\mathrm { VO } _ { 2 \max }\) is a measure of athletic fitness. Since \(\mathrm { VO } _ { 2 \max }\) is fairly time-consuming and expensive to measure, an exercise scientist wants to predict \(\mathrm { VO } _ { 2 _ { \text {max } } }\) from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(y\), in suitable units. She also obtains accurate measurements of the \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(x\), in the same units. The scatter diagram in Fig. 8.1 shows the values of \(x\) and \(y\) obtained, together with the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317} \captionsetup{labelformat=empty} \caption{Fig. 8.1}
    \end{figure}
    1. Use the regression line to estimate the predicted \(\mathrm { VO } _ { 2 \text { max } }\) of an athlete whose accurately measured \(\mathrm { VO } _ { 2 \text { max } }\) is 50 .
    2. Comment on the reliability of your estimate.
    3. The equation of the regression line of \(x\) on \(y\) is \(x = 0.7565 y + 10.493\). Find the coordinates of the point at which the two regression lines meet.
    4. State what the point you found in part (iii) represents.
  2. It is known that there is negative correlation between \(\mathrm { VO } _ { 2 \text { max } }\) and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours and accurately measures their \(\mathrm { VO } _ { 2 \text { max } }\). Fig. 8.2 is a scatter diagram of accurately measured \(\mathrm { VO } _ { \text {2max } }\), \(v\) units, against best marathon time, \(t\) hours, for these runners. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319} \captionsetup{labelformat=empty} \caption{Fig. 8.2}
    \end{figure}
    1. Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid. Summary statistics for the 20 runners are as follows. $$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
    2. Find the value of Pearson's product moment correlation coefficient.
    3. Carry out a test at the \(5 \%\) significance level to investigate whether there is negative correlation between accurately measured \(\mathrm { VO } _ { 2 _ { \text {max } } }\) and best marathon time for runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours.
Question 9
View details
9 The discrete random variable \(X\) has a uniform distribution over the set of all integers between \(- n\) and \(n\) inclusive, where \(n\) is a positive integer.
  1. Given that \(n\) is odd, determine \(\mathrm { P } \left( \mathrm { X } > \frac { 1 } { 2 } \mathrm { n } \right)\), giving your answer as a single fraction in terms of \(n\).
  2. Determine the variance of the sum of 10 independent values of \(X\), giving your answer in the form \(\mathrm { an } ^ { 2 } + \mathrm { bn }\), where \(a\) and \(b\) are constants.
Question 10
View details
10 Sarah takes a bus to work each weekday morning and returns each evening. The times in minutes that she has to wait for the bus in the morning and evening are modelled by uniform distributions over the intervals \([ 0,10 ]\) and \([ 0,6 ]\) respectively. The times in minutes for the bus journeys in the morning and evening are modelled by \(\mathrm { N } ( 25,4 )\) and \(\mathrm { N } ( 28,16 )\) respectively. You should assume that all of the times are independent. The total time in minutes that she takes for her two journeys, including the waiting times, is denoted by the random variable \(T\). The spreadsheet below shows the first 20 rows of a simulation of 500 return journeys. It also shows in column H the numbers of values of \(T\) that are less than or equal to the corresponding values in column G. For example, there are 156 out of the 500 simulated values of \(T\) which are less than or equal to 58 minutes. All of the times have been rounded to 2 decimal places.
ABCDEFGH
1Waiting time morningJourney time morningWaiting time eveningJourney time eveningTotal timeTotal time tNumber \(\leqslant \mathbf { t }\)
20.8920.781.8826.3049.86460
33.5521.241.0429.6155.44484
42.1321.832.4028.6455.005013
55.1225.043.1324.3057.605235
64.0327.492.1930.8164.525457
72.4720.544.3234.6161.9356104
83.2126.933.7827.6661.5858156
99.7224.150.6327.5362.0360218
101.5928.450.0835.8765.9962288
117.3423.044.0224.7759.1764357
121.0424.691.6631.9559.3366408
137.1722.162.5525.3957.2868441
145.2026.972.4130.0564.6270475
155.0126.841.8836.2169.9372490
163.7626.032.2130.9662.9674496
170.9623.722.5529.3656.5976500
188.6424.972.8226.3962.82
190.5920.824.5731.4157.38
209.8523.685.5429.9268.99
01
  1. Use the spreadsheet output to estimate each of the following.
    • \(\mathrm { P } ( T \leqslant 56 )\)
    • \(\mathrm { P } ( T > 61 )\)
    • The random variable \(W\) is Normally distributed with the same mean and variance as \(T\). Find each of the following.
    • \(\mathrm { P } ( W \leqslant 56 )\)
    • \(\mathrm { P } ( W > 61 )\)
    • Explain why, if many more journeys were used in the simulation, you would expect \(\mathrm { P } ( T > 61 )\) to be extremely close to \(\mathrm { P } ( W > 61 )\).
Question 11
View details
11 The continuous random variable \(X\) has probability density function given by
\(f ( x ) = \begin{cases} a x ^ { 2 } & 0 \leqslant x < 2 ,
b ( 3 - x ) ^ { 2 } & 2 \leqslant x \leqslant 3 ,
0 & \text { otherwise } \end{cases}\)
where \(a\) and \(b\) are positive constants.
  1. Given that \(\mathrm { E } ( X ) = 2\), determine the values of \(a\) and \(b\).
  2. Determine the median value of \(X\).
  3. A random sample of 50 observations of \(X\) is selected. Given that \(\operatorname { Var } ( X ) = 0.2\), determine an estimate of the probability that the mean value of the 50 observations is less than 1.9.