OCR MEI Further Statistics Major (Further Statistics Major) 2019 June

Question 1
View details
1 A fair six-sided dice is rolled three times.
The random variable \(X\) represents the lowest of the three scores.
The probability distribution of \(X\) is given by the formula
\(\mathrm { P } ( X = r ) = k \left( 127 - 39 r + 3 r ^ { 2 } \right)\) for \(r = 1,2,3,4,5,6\).
  1. Complete the copy of the table in the Printed Answer Booklet.
    \(r\)123456
    \(\mathrm { P } ( X = r )\)\(91 k\)\(61 k\)\(37 k\)
  2. Show that \(k = \frac { 1 } { 216 }\).
  3. Draw a graph to illustrate the distribution.
  4. Comment briefly on the shape of the distribution.
  5. In this question you must show detailed reasoning. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
Question 2
View details
2 A special railway coach detects faults in the railway track before they become dangerous.
  1. Write down the conditions required for the numbers of faults in the track to be modelled by a Poisson distribution. You should now assume that these conditions do apply, and that the mean number of faults in a 5 km length of track is 1.6 .
  2. Find the probability that there are at least 2 faults in a randomly chosen 5 km length of track.
  3. Find the probability that there are at most 10 faults in a randomly chosen 25 km length of track.
  4. On a particular day the coach is used to check 10 randomly chosen 1 km lengths of track. Find the probability that exactly 1 fault, in total, is found.
Question 3
View details
3 The weights of bananas sold by a supermarket are modelled by a Normal distribution with mean 205 g and standard deviation 11 g .
  1. Find the probability that the total weight of 5 randomly selected bananas is at least 1 kg . When a banana is peeled the change in its weight is modelled as being a reduction of \(35 \%\).
  2. Find the probability that the weight of a randomly selected peeled banana is at most 150 g Andy makes smoothies. Each smoothie is made using 2 peeled bananas and 20 strawberries from the supermarket, all the items being randomly chosen. The weight of a strawberry is modelled by a Normal distribution with mean 22.5 g and standard deviation 2.7 g .
  3. Find the probability that the total weight of a smoothie is less than 700 g .
Question 4
View details
4 Shellfish in the sea near nuclear power stations are regularly monitored for levels of radioactivity. On a particular occasion, the levels of caesium-137 (a radioactive isotope) in a random sample of 8 cockles, measured in becquerels per kilogram, were as follows.
\(\begin{array} { l l l l l l l l } 2.36 & 2.97 & 2.69 & 3.00 & 2.51 & 2.45 & 2.21 & 2.63 \end{array}\) Software is used to produce a 95\% confidence interval for the level of caesium-137 in the cockles. The output from the software is shown in Fig. 4. The value for 'SE' has been deliberately omitted. T Estimate of a Mean
Confidence Level 0.95 Sample
Mean 2.6025
s 0.2793

0.2793 N □ 8 Result T Estimate of a Mean \begin{table}[h]
Mean2.6025
s0.2793
SE
N8
df7
Interval\(2.6025 \pm 0.2335\)
\captionsetup{labelformat=empty} \caption{Fig. 4}
\end{table}
  1. State an assumption necessary for the use of the \(t\) distribution in the construction of this confidence interval.
  2. State the confidence interval which the software gives in the form \(a < \mu < b\).
  3. In the software output shown in Fig. 4, SE stands for standard error. Find the standard error in this case.
  4. Show how the value of 0.2335 in the confidence interval was calculated.
  5. State how, using this sample, a wider confidence interval could be produced.
Question 5
View details
5 In an investigation into the possible relationship between smoking and weight in adults in a particular country, a researcher selected a random sample of 500 adults.
The adults in the sample were classified according to smoking status (non-smoker, light smoker or heavy smoker, where light smoker indicates less than 10 cigarettes per day) and body weight (underweight, normal weight or overweight). Fig. 5 is a screenshot showing part of the spreadsheet used to calculate the contributions for a chisquared test. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1Observed frequencies
2UnderweightNormalOverweightTotals
3Non-smoker852178238
4Light smoker104068118
5Heavy smoker54792144
6Totals23139338500
7
8Expected frequencies
9Non-smoker10.948066.1640160.8880
10Light smoker5.428079.7680
11Heavy smoker40.032097.3440
12
13
14Non-smoker0.79381.8200
15Light smoker3.85101.57851.7361
16Heavy smoker0.39821.21290.2934
17
\captionsetup{labelformat=empty} \caption{Fig. 5}
\end{table}
  1. Showing your calculations, find the missing values in each of the following cells.
    • B11
    • C10
    • C14
    • Complete the hypothesis test at the \(1 \%\) level of significance.
    • For each smoking status, give a brief interpretation of the largest of the three contributions to the test statistic.
Question 6
View details
6
  1. A researcher is investigating the date of the 'start of spring' at different locations around the country.
    A suitable date (measured in days from the start of the year) can be identified by checking, for example, when buds first appear for certain species of trees and plants, but this is time-consuming and expensive. Satellite data, measuring microwave emissions, can alternatively be used to estimate the date that land-based measurements would give. The researcher chooses a random sample of 12 locations, and obtains land-based measurements for the start of spring date at each location, together with relevant satellite measurements. The scatter diagram in Fig. 6.1 shows the results; the land-based measurements are denoted by \(x\) days and the corresponding values derived from satellite measurements by \(y\) days. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-06_732_1342_781_333} \captionsetup{labelformat=empty} \caption{Fig. 6.1}
    \end{figure} Fig. 6.2 shows part of a spreadsheet used to analyse the data. Some rows of the spreadsheet have been deliberately omitted. \begin{table}[h]
    1ABCDEF
    1x\(\boldsymbol { y }\)\(\boldsymbol { x } ^ { \mathbf { 2 } }\)\(\boldsymbol { y } ^ { \mathbf { 2 } }\)xy
    2901028100104049180
    3
    10
    11
    129497883694099118
    13991019801102019999
    14Sum11311227107783126725116724
    15
    \captionsetup{labelformat=empty} \caption{Fig. 6.2}
    \end{table}
    1. Calculate the equation of a regression line suitable for estimating the land-based date of the start of spring from satellite measurements.
    2. Using this equation, estimate the land-based date of the start of spring for the following dates from satellite measurements.
      • 95 days
  2. 60 days
    (iii) Comment on the reliability of each of your estimates.
  3. The researcher is also investigating whether there is any correlation between the average temperature during a month in spring and the total rainfall during that month at a particular location. The average temperatures in degrees Celsius and total rainfall in mm for a random selection, over several years, of 10 spring months at this location are as follows.
  4. Temperature4.27.15.63.58.66.52.75.96.74.1
    Rainfall18264276154384536636
    The researcher plots the scatter diagram shown in Fig. 6.3 to check which type of test to carry out. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-07_693_880_1174_338} \captionsetup{labelformat=empty} \caption{Fig. 6.3}
    \end{figure} (i) Explain why the researcher might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
    (ii) Find the value of Pearson's product moment correlation coefficient.
    (iii) Carry out a test at the \(5 \%\) significance level to investigate whether there is any correlation between temperature and rainfall.
Question 7
View details
7 A swimming coach believes that times recorded by people using stopwatches are on average 0.2 seconds faster than those recorded by an electronic timing system. In order to test this, the coach takes a random sample of 40 competitors' times recorded by both methods, and finds the differences between the times recorded by the two methods. The mean difference in the times (electronic time minus stopwatch time) is 0.1442 s and the standard deviation of the differences is 0.2580 s .
  1. Find a 95\% confidence interval for the mean difference between electronic and stopwatch times.
  2. Explain whether there is evidence to suggest that the coach’s belief is correct.
  3. Explain how you can calculate the confidence interval in part (a) even though you do not know the distribution of the parent population of differences.
  4. If the coach wanted to produce a \(95 \%\) confidence interval of width no more than 0.12 s , what is the minimum sample size that would be needed, assuming that the standard deviation remains the same?
Question 8
View details
8 A student doing a school project wants to test a claim which she read in a newspaper that drinking a cup of tea will improve a person's arithmetic skills.
She chooses 13 students from her school and gets each of them to drink a cup of tea. She then gives each of them an arithmetic test. She knows that the average score for this test in students of the same age group as those she has chosen is 33.5.
The scores of the students she tests, arranged in ascending order, are as follows.
\(\begin{array} { l l l l l l l l l l l l l } 26 & 28 & 29 & 30 & 31 & 32 & 34 & 42 & 49 & 54 & 55 & 56 & 61 \end{array}\) The student decides to use software to draw a Normal probability plot for these data, and to carry out a Normality test as shown in Fig. 8. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-09_536_1234_792_244} \captionsetup{labelformat=empty} \caption{Fig. 8}
\end{figure}
  1. The student uses the output from the software to help in deciding on a suitable hypothesis test to use for investigating the claim about drinking tea.
    Explain what the student should conclude.
  2. The student's teacher agrees with the student's choice of hypothesis test, but says that even this test may not be valid as there may be some unsatisfactory features in the student's project. Give three features that the teacher might identify as unsatisfactory.
  3. Assuming that the student's procedures can be justified, carry out an appropriate test at the \(5 \%\) significance level to investigate the claim about drinking tea.
Question 9 3 marks
View details
9 Every weekday Jonathan takes an underground train to work. On any weekday the time in minutes that he has to wait at the station for a train is modelled by the continuous uniform distribution over \([ 0,5 ]\).
  1. Find the probability that Jonathan has to wait at least 3 minutes for a train. The total time that Jonathan has to wait on two days is modelled by the continuous random variable \(X\) with probability density function given by
    \(\mathrm { f } ( x ) = \begin{cases} \frac { 1 } { 25 } x & 0 \leqslant x \leqslant 5 ,
    \frac { 1 } { 25 } ( 10 - x ) & 5 < x \leqslant 10 ,
    0 & \text { otherwise } . \end{cases}\)
  2. Find the probability that Jonathan has to wait a total of at most 6 minutes on two days. Jonathan's friend suggests that the total waiting time for 5 days, \(T\) minutes, will almost certainly be less than 18 minutes. In order to investigate this suggestion, Jonathan constructs the simulation shown in Fig. 9. All of the numbers in the simulation have been rounded to 2 decimal places. \begin{table}[h]
    ABCDEF
    1MonTueWedThuFriTotal T
    21.784.362.743.884.6417.41
    30.951.304.834.291.8113.18
    44.274.904.571.413.6618.81
    50.800.063.201.760.356.17
    60.034.821.263.530.139.77
    73.884.731.193.751.2914.84
    84.113.544.330.774.5017.25
    93.540.113.852.861.5811.94
    101.871.823.003.531.8312.05
    114.002.984.591.731.7615.06
    121.913.852.081.722.8212.38
    130.104.862.510.522.1710.15
    141.244.260.951.331.789.57
    152.990.693.853.412.4213.36
    164.671.762.133.483.1015.14
    171.941.070.910.633.347.89
    180.112.290.714.210.868.18
    190.434.584.891.862.8414.60
    204.230.882.714.884.2016.91
    213.724.583.114.893.1819.49
    \captionsetup{labelformat=empty} \caption{Fig. 9}
    \end{table}
  3. Use the simulation to estimate \(\mathrm { P } ( T > 18 )\).
  4. Explain how Jonathan could obtain a better estimate. Jonathan thinks that he can use the Central Limit Theorem to provide a very good approximation to the distribution of \(T\).
  5. Find each of the following.
    • \(\mathrm { E } ( T )\)
    • \(\operatorname { Var } ( T )\)
    • Use the Central Limit Theorem to estimate \(\mathrm { P } ( T > 18 )\).
    • Comment briefly on the use of the Central Limit Theorem in this case.
    Jonathan travels to work on 200 days in a year.
  6. Find the probability that the total waiting time for Jonathan in a year is more than 510 minutes.
    [0pt] [3]
Question 10
View details
10 The probability density function of the continuous random variable \(X\) is given by
\(f ( x ) = \begin{cases} k x ^ { m } & 0 \leqslant x \leqslant a ,
0 & \text { otherwise, } \end{cases}\)
where \(a , k\) and \(m\) are positive constants.
  1. Show that \(k = \frac { m + 1 } { a ^ { m + 1 } }\).
  2. Find the cumulative distribution function of \(X\) in terms of \(x , a\) and \(m\).
  3. Given that \(\mathrm { P } \left( \frac { 1 } { 4 } a < X < \frac { 1 } { 2 } a \right) = \frac { 1 } { 10 }\),
    1. show that \(2 p ^ { 2 } - 10 p + 5 = 0\), where \(p = 2 ^ { m }\),
    2. find the value of \(m\). \section*{END OF QUESTION PAPER}