OCR MEI Further Statistics Major (Further Statistics Major) Specimen

Question 1
View details
1 In a promotion for a new type of cereal, a toy dinosaur is included in each pack. There are three different types of dinosaur to collect. They are distributed, with equal probability, randomly and independently in the packs. Sam is trying to collect all three of the dinosaurs.
  1. Find the probability that Sam has to open only 3 packs in order to collect all three dinosaurs. Sam continues to open packs until she has collected all three dinosaurs, but once she has opened 6 packs she gives up even if she has not found all three. The random variable \(X\) represents the number of packs which Sam opens.
  2. Complete the table below, using the copy in the Printed Answer Booklet, to show the probability distribution of \(X\).
    \(r\)3456
    \(\mathrm { P } ( X = r )\)\(\frac { 2 } { 9 }\)\(\frac { 14 } { 81 }\)
    \section*{(iii) In this question you must show detailed reasoning.} Find
    • \(\mathrm { E } ( X )\) and
    • \(\operatorname { Var } ( X )\).
Question 2
View details
2 The continuous random variable \(X\) takes values in the interval \(- 1 \leq x \leq 1\) and has probability density function $$f ( x ) = \left\{ \begin{array} { l r } a & - 1 \leq x < 0
a + x ^ { 2 } & 0 \leq x \leq 1 \end{array} \right.$$ where \(a\) is a constant.
  1. (A) Sketch the probability density function.
    (B) Show that \(a = \frac { 1 } { 3 }\).
  2. Find
    (A) \(\mathrm { P } \left( X < \frac { 1 } { 2 } \right)\),
    (B) the mean of \(X\).
  3. Show that the median of \(X\) satisfies the equation \(2 m ^ { 3 } + 2 m - 1 = 0\).
Question 3
View details
3 A researcher is investigating factors that might affect how many hours per day different species of mammals spend asleep. First she investigates human beings. She collects data on body mass index, \(x\), and hours of sleep, \(y\), for a random sample of people. A scatter diagram of the data is shown in Fig. 3.1 together with the regression line of \(y\) on \(x\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-04_885_1584_598_274} \captionsetup{labelformat=empty} \caption{Fig. 3.1}
\end{figure}
  1. Calculate the residual for the data point which has the residual with the greatest magnitude.
  2. Use the equation of the regression line to estimate the mean number of hours spent asleep by a person with body mass index
    (A) 26,
    (B) 16,
    commenting briefly on each of your predictions. The researcher then collects additional data for a large number of species of mammals and analyses different factors for effect size. Definitions of the variables measured for a typical animal of the species, the correlations between these variables, and guidelines often used when considering effect size are given in Fig. 3.2.
    VariableDefinition
    Body massMass of animal in kg
    Brain massMass of brain in g
    Hours of sleep/dayNumber of hours per day spent asleep
    Life spanHow many years the animal lives
    DangerA measure of how dangerous the animal's situation is when asleep, taking into account predators and how protected the animal's den is: higher value indicates greater danger.
    Correlations (pmcc)Body MassBrain MassHours of sleep/dayLife spanDanger
    Body Mass1.00
    Brain Mass0.931.00
    Hours of sleep/day-0.31-0.361.00
    Life span0.300.51-0.411.00
    Danger0.130.15-0.590.061.00
    \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Fig. 3.2}
    \end{table}
  3. State two conclusions the researcher might draw from these tables, relevant to her investigation into how many hours mammals spend asleep. One of the researcher's students notices the high correlation between body mass and brain mass and produces a scatter diagram for these two variables, shown in Fig. 3.3 below. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-05_675_698_1802_735} \captionsetup{labelformat=empty} \caption{Fig. 3.3}
    \end{figure}
  4. Comment on the suitability of a linear model for these two variables.
Question 4
View details
4 A fair six-sided dice is rolled repeatedly. Find the probability of the following events.
  1. A five occurs for the first time on the fourth roll.
  2. A five occurs at least once in the first four rolls.
  3. A five occurs for the second time on the third roll.
  4. At least two fives occur in the first three rolls. The dice is rolled repeatedly until a five occurs for the second time.
  5. Find the expected number of rolls required for two fives to occur. Justify your answer.
Question 5
View details
5 A particular brand of pasta is sold in bags of two different sizes. The mass of pasta in the large bags is advertised as being 1500 g ; in fact it is Normally distributed with mean 1515 g and standard deviation 4.7 g . The mass of pasta in the small bags is advertised as being 500 g ; in fact it is Normally distributed with mean 508 g and standard deviation 3.3 g .
  1. Find the probability that the total mass of pasta in 5 randomly selected small bags is less than 2550 g .
  2. Find the probability that the mass of pasta in a randomly selected large bag is greater than three times the mass of pasta in a randomly selected small bag.
Question 6
View details
6 Fig. 6 shows the wages earned in the last 12 months by each of a random sample of American males aged between 16 and 65 . \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-07_771_1278_340_392} \captionsetup{labelformat=empty} \caption{Fig. 6}
\end{figure} A researcher wishes to test whether the sample provides evidence of a tendency for higher wages to be earned by older men in the age range 16 to 65 in America.
  1. The researcher needs to decide whether to use a test based on Pearson's product moment correlation coefficient or Spearman's rank correlation coefficient. Use the information in Fig. 6 to decide which test is more appropriate.
  2. Should it be a one-tail or a two-tail test? Justify your answer.
Question 7
View details
7 A newspaper reports that the average price of unleaded petrol in the UK is 110.2 p per litre. The price, in pence, of a litre of unleaded petrol at a random sample of 15 petrol stations in Yorkshire is shown below together with some output from software used to analyse the data.
116.9114.9110.9113.9114.9
117.9112.999.9114.9103.9
123.9105.7108.9102.9112.7
\begin{table}[h]
\(| l |\)Statistics
n15
Mean111.6733
\(\sigma\)6.1877
s6.4048
\(\Sigma \mathrm { x }\)1675.1
\(\Sigma \mathrm { x } ^ { 2 }\)187638.31
Min99.9
Q 1105.7
Median112.9
Q 3114.9
Max123.9
\captionsetup{labelformat=empty} \caption{Fig. 7.1}
\end{table}
\(n\)15
Kolmogorov-Smirnov
test
\(p > 0.15\)
Null hypothesis
The data can be modelled
by a Normal distribution
Alternative hypothesis
The data cannot be
modelled by a Normal
distribution
  1. Select a suitable hypothesis test to investigate whether there is any evidence that the average price of unleaded petrol in Yorkshire is different from 110.2 p. Justify your choice of test.
  2. Conduct the hypothesis test at the \(5 \%\) level of significance.
Question 8
View details
8 Natural background radiation consists of various particles, including neutrons. A detector is used to count the number of neutrons per second at a particular location.
  1. State the conditions required for a Poisson distribution to be a suitable model for the number of neutrons detected per second. The number of neutrons detected per second due to background radiation only is modelled by a Poisson distribution with mean 1.1.
  2. Find the probability that the detector detects
    (A) no neutrons in a randomly chosen second,
    (B) at least 60 neutrons in a randomly chosen period of 1 minute. A neutron source is switched on. It emits neutrons which should all be contained in a protective casing. The detector is used to check whether any neutrons have not been contained; these are known as stray neutrons. If the detector detects more than 8 neutrons in a period of 1 second, an alarm will be triggered in case this high reading is due to stray neutrons.
  3. Suppose that there are no stray neutrons and so the neutrons detected are all due to the background radiation. Find the expected number of times the alarm is triggered in 1000 randomly chosen periods of 1 second.
  4. Suppose instead that stray neutrons are being produced at a rate of 3.4 per second in addition to the natural background radiation. Find the probability that at least one alarm will be triggered in 10 randomly chosen periods of 1 second. You should assume that all stray neutrons produced are detected.
Question 9
View details
9 A random sample of adults in the UK were asked to state their primary source of news: television (T), internet (I), newspapers (N) or radio (R). The responses were classified by age group, and an analysis was carried out to see if there is any association between age group and primary source of news. Fig. 9 is a screenshot showing part of the spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1SourceAge group
2of news18-3233-4748-6465+
3T63617180275
4I33332212100
5N98112048
6R499527
7109111113117450
8
9Expected frequencies
1066.6167.8369.0671.50
1124.2224.6726.00
1211.6311.8412.0512.48
136.546.666.787.02
14
15Contributions to the test statistic
160.200.690.051.01
173.182.827.54
180.590.094.53
190.990.820.730.58
20test statistic25.45
\captionsetup{labelformat=empty} \caption{Fig. 9}
\end{table}
  1. (A) State the sample size.
    (B) Give the name of the appropriate hypothesis test.
    (C) State the null and alternative hypotheses.
  2. Showing your calculations, find the missing values in cells
    • D11,
    • D17 and
    • C18.
    • Complete the appropriate hypothesis test at the \(5 \%\) level of significance.
    • Discuss briefly what the data suggest about primary source of news. You should make a comment for each age group.
Question 10
View details
10 The label on a particular size of milk carton states that it contains 1.5 litres of milk. In an investigation at the packaging plant the contents, \(x\) litres, of each of 60 randomly selected cartons are measured. The data are summarised as follows. $$\Sigma x = 89.758 \quad \Sigma x ^ { 2 } = 134.280$$
  1. Estimate the variance of the underlying population.
  2. Find a 95\% confidence interval for the mean of the underlying population.
  3. What does the confidence interval which you have calculated suggest about the statement on the carton? Each day for 300 days a random sample of 60 cartons is selected and for each sample a \(95 \%\) confidence interval is constructed.
  4. Explain why the confidence intervals will not be identical.
  5. What is the expected number of confidence intervals to contain the population mean?
Question 11
View details
11 Two girls, Lili and Hui, play a game with a fair six-sided dice. The dice is thrown 10 times.
\(X _ { 1 } , X _ { 2 } , \ldots , X _ { 10 }\) represent the scores on the \(1 ^ { \text {st } } , 2 ^ { \text {nd } } , \ldots , 10 ^ { \text {th } }\) throws of the dice.
\(L\) denotes Lili's score and \(L = 10 X _ { 1 }\).
\(H\) denotes Hui's score and \(H = X _ { 1 } + X _ { 2 } + X _ { 3 } + \ldots + X _ { 10 }\).
  1. Calculate
    • \(\mathrm { P } ( L = 60 )\) and
    • \(\mathrm { P } ( H = 60 )\).
    • Without doing any further calculations, explain which girl's score has the greater standard deviation.
    • Write down
    • the name of the probability distribution of \(X _ { 1 }\),
    • the value of \(\mathrm { E } \left( X _ { 1 } \right)\),
    • the value of \(\operatorname { Var } \left( X _ { 1 } \right)\).
    • Find
      (A) \(\mathrm { E } ( L )\),
      (B) \(\operatorname { Var } ( L )\),
      (C) \(\mathrm { E } ( H )\),
      (D) \(\operatorname { Var } ( H )\).
    The spreadsheet below shows a simulation of 25 plays of the game. The cell E3, highlighted, shows the score when the dice is thrown the fourth time in the first game. \begin{table}[h]
    ABCDEFGHIJKLMN
    1Throw of diceLili'sHui's
    212345678910scorescore
    3Game 135211311143022
    4Game 263244353356038
    5Game 364265215236036
    6Game 415166314621035
    7Game 544316441624035
    8Game 621512515232027
    9Game 711344563421033
    10Game 811363445231032
    11Game 922243215562032
    12Game 1035335343113031
    13Game 1153655421155037
    14Game 1264324133536034
    15Game 1323212222212019
    16Game 1441331266134030
    17Game 1551263463645040
    18Game 1636115313333029
    19Game 1752524522345034
    20Game 1836355231123031
    21Game 1966315634166041
    22Game 2026456524332040
    23Game 2153545336615041
    24Game 2263556356116041
    25Game 2354556421365041
    26Game 2435232432333030
    27Game 2552424522525033
    28
    29mean37.6033.68
    30sd17.395.77
    \captionsetup{labelformat=empty} \caption{Fig. 11}
    \end{table}
  2. Use the simulation to estimate \(\mathrm { P } ( L > 40 )\) and \(\mathrm { P } ( H > 40 )\).
  3. (A) Calculate the exact value of \(\mathrm { P } ( L > 40 )\).
    (B) Comment on how the exact value compares with your estimate of \(\mathrm { P } ( L > 40 )\) in part (v). Hui wonders whether it is appropriate to use the Central Limit Theorem to approximate the distribution of \(X _ { 1 } + X _ { 2 } + X _ { 3 } + \ldots + X _ { 10 }\).
  4. (A) State what type of diagram Hui could draw, based on the output from the spreadsheet, to investigate this.
    (B) Explain how she should interpret the diagram.
  5. (A) Calculate an approximate value of \(\mathrm { P } \left( X _ { 1 } + X _ { 2 } + X _ { 3 } + \ldots + X _ { 10 } > 40 \right)\) using the Central Limit Theorem.
    (B) Comment on how this value compares with your estimate of \(\mathrm { P } ( H > 40 )\) in part (v). \section*{Copyright Information:} OCR is committed to seeking permission to reproduce all third-party content that it uses in the assessment materials. OCR has attempted to identify and contact all copyright holders whose work is used in this paper. To avoid the issue of disclosure of answer-related information to candidates, all copyright acknowledgements are reproduced in the OCR Copyright Acknowledgements booklet. This is produced for each series of examinations and is freely available to download from our public website (\href{http://www.ocr.org.uk}{www.ocr.org.uk}) after the live examination series. If OCR has unwittingly failed to correctly acknowledge or clear any third-party content in this assessment material, OCR will be happy to correct its mistake at the earliest possible opportunity.
    For queries or further information please contact the Copyright Team, First Floor, 9 Hills Road, Cambridge CB2 1GE.
    OCR is part of the Cambridge Assessment Group; Cambridge Assessment is the brand name of University of Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.