OCR MEI Further Statistics Major (Further Statistics Major) 2024 June

Question 1
View details
1 The number of insurance policy sales made per month by a salesperson is modelled by the random variable \(X\), with probability distribution shown in the table.
\(r\)0123456
\(\mathrm { P } ( \mathrm { X } = \mathrm { r } )\)0.050.10.250.30.150.10.05
  1. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    The salesperson is paid a basic salary of \(\pounds 1000\) per month plus \(\pounds 500\) for each policy that is sold.
  2. Find the mean and standard deviation of the salesperson's monthly salary.
Question 2
View details
2 The number of cars arriving per minute to queue at a drive-through fast-food restaurant is modelled by the random variable \(X\). The standard deviation of \(X\) is 0.6 . You should assume that arrivals are random and independent and occur at a constant average rate.
  1. Find the mean of \(X\).
    1. Calculate \(\mathrm { P } ( X = 1 )\).
    2. Calculate \(\mathrm { P } ( X > 1 )\).
  2. Find the probability that fewer than 5 cars arrive in a randomly chosen 20 -minute period.
Question 3
View details
3 At a launderette the process of cleaning a load of clothes consists of three stages: washing, drying and folding. The times in minutes for each process are modelled by independent Normal distributions with means and standard deviations as shown in the table.
\cline { 2 - 3 } \multicolumn{1}{c|}{}MeanStandard deviation
Washing352.4
Drying463.1
Folding122.2
  1. Find the probability that drying a randomly chosen load of clothes takes more than 50 minutes.
  2. It is given that for \(99 \%\) of loads of clothes the washing time is less than \(k\) minutes. Find the value of \(k\).
  3. Determine the probability that the drying time for a randomly chosen load of clothes is less than the total of the washing and folding times.
  4. Determine the probability that the mean time for cleaning 5 randomly chosen loads of clothes is less than 90 minutes. You should assume that the time for cleaning any load is independent of the time for cleaning any other load.
Question 4
View details
4 An archer fires arrows at a circular target of radius 50 cm . The distance in cm that an arrow lands from the centre of the target is modelled by the random variable \(X\), with probability density function given by
\(f ( x ) = \begin{cases} a x & 0 \leqslant x \leqslant 50 ,
0 & \text { otherwise, } \end{cases}\)
where \(a\) is a constant.
  1. Determine the value of \(a\).
  2. Determine the probability that an arrow will land within 5 cm of the centre of the target.
  3. Determine the median distance from the centre of the target that an arrow will land.
Question 5
View details
5 A researcher is investigating whether doing yoga has any effect on quality of sleep in older people. The researcher selects a random sample of 40 older people, who then complete a yoga course. Before they start the course and again at the end, the 40 people fill in a questionnaire which measures their perceived sleep quality. The higher the score, the better is the perceived quality of sleep. The researcher uses software to produce a 90\% confidence interval for the difference in mean sleep quality (sleep quality after the course minus sleep quality before the course). The output from the software is shown below. Z Estimate of a Mean Confidence level □ 0.9 Sample
Mean0.586
\(s\)2.14
40
Result
Z Estimate of a Mean
Mean0.586
s2.14
SE0.3384
N40
Lower limit0.029
Upper limit1.143
Interval\(0.586 \pm 0.557\)
  1. Explain why the confidence interval is based on the Normal distribution even though the distribution of the population of differences is not known.
  2. Explain whether the confidence interval suggests that the mean sleep qualities before and after completing a yoga course are different.
  3. In the output from the software, SE stands for 'standard error'.
    1. Explain what standard error is.
    2. Show how the standard error was calculated in this case.
  4. A colleague of the researcher suggests that the confidence level should have been \(95 \%\) rather than \(90 \%\). Determine whether this would have made a difference to your answer to part (b).
Question 6
View details
6 A student is investigating the relationship between age and grip strength in adults. The student selects 10 people and records their ages in years and the grip strengths of their dominant hand, measured in kg. The data are shown in the table below, together with a scatter diagram to illustrate the data.
Age22293639535760717682
Grip strength38464249374736333424
\includegraphics[max width=\textwidth, alt={}]{bab116b3-6e5f-44db-ac86-670e4040d649-05_634_1107_641_239}
The student decides to carry out a hypothesis test to investigate whether there is negative association between age and grip strength.
  1. Explain why the student decides to carry out a test based on Spearman's rank correlation coefficient.
  2. State what property of the sample is required in order for it to be valid to carry out a hypothesis test.
  3. In this question you must show detailed reasoning. Assuming that the property in part (b) holds, carry out the test at the \(5 \%\) significance level.
Question 7
View details
7 An environmental investigator wants to check whether the level of selenium in carrots in fields near a mine is different from the usual level in the country, which is \(9.4 \mathrm { ng } / \mathrm { g }\) (nanograms per gram). She takes a random sample of 10 carrots from fields near the mine and measures the selenium level of each of them in \(\mathrm { ng } / \mathrm { g }\), with results as follows.
\(\begin{array} { l l l l l l l l l l } 6.20 & 10.72 & 11.42 & 16.32 & 15.33 & 10.56 & 8.83 & 9.21 & 7.78 & 14.32 \end{array}\)
  1. Find estimates of each of the following.
    • The population mean
    • The population standard deviation
    The investigator produces a Normal probability plot and carries out a Kolmogorov-Smirnov test for these data as shown in the diagram.
    \includegraphics[max width=\textwidth, alt={}, center]{bab116b3-6e5f-44db-ac86-670e4040d649-06_583_1499_959_242}
  2. Comment on what the Normal probability plot and the \(p\)-value of the test suggest about the data.
  3. State the null hypothesis for the Kolmogorov-Smirnov test for Normality.
  4. In this question you must show detailed reasoning. Carry out a test at the \(5 \%\) significance level to investigate whether the mean selenium level in carrots from fields near the mine is different from \(9.4 \mathrm { ng } / \mathrm { g }\).
  5. If the \(p\)-value of the Kolmogorov-Smirnov test for Normality had been 0.007, explain what procedure you could have used to investigate the selenium level in carrots from fields near the mine.
Question 8
View details
8 An estate agent collects data for a random selection of 13 flats in order to investigate the link between the floor areas of flats and their price. The scatter diagram shows the floor areas, \(x \mathrm {~m} ^ { 2 }\), and prices, \(\pounds y\) thousand, of the 13 flats.
\includegraphics[max width=\textwidth, alt={}, center]{bab116b3-6e5f-44db-ac86-670e4040d649-07_613_1246_386_242}
  1. The estate agent notes that two of the data points are outliers. One is Flat A which has a large floor area but is in poor condition. The other is Flat B which has a balcony with a desirable view overlooking the sea. Label these two data points on the copy of the scatter diagram in the Printed Answer Booklet. The estate agent decides to remove these two data points from the analysis. Summary statistics for the remaining 11 flats are as follows. $$\sum x = 652.5 \quad \sum y = 5067 \quad \sum x ^ { 2 } = 41987.35 \quad \sum y ^ { 2 } = 2456813 \quad \sum x y = 315928.2$$
  2. In this question you must show detailed reasoning. Calculate the equation of a regression line which is suitable for estimating the price of a flat from its floor area.
  3. Use the regression line to estimate the price for the following floor areas.
    • \(40 \mathrm {~m} ^ { 2 }\)
    • \(110 \mathrm {~m} ^ { 2 }\)
    • Given that the value of the product moment correlation coefficient for these 11 data items is 0.765 , comment on the reliability of your estimates.
    • The estate agent thinks that he can predict the floor area of a flat from its price, using the equation of the regression line found in part (b).
    Comment briefly on the estate agent's idea.
Question 9
View details
9 A cyclist has 3 bicycles, a road bike, a gravel bike and an electric bike. She wishes to know if the bicycle which she is riding makes any difference to whether she reaches a speed of 25 mph or greater on a journey. She selects a random sample of 120 journeys and notes the bicycle and whether or not her maximum speed was 25 mph or greater. She decides to carry out a chisquared test to investigate whether there is any association between bicycle type and whether her maximum speed is 25 mph or greater. Tables 9.1 and 9.2 show the data and some of the expected frequencies for the test. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 9.1}
\multirow{2}{*}{}Bicycle
RoadGravelElectricTotal
\multirow{2}{*}{Maximum speed}Less than 25 mph2211942
25 mph or greater13471878
Total156837120
\end{table} \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 9.2}
\multirow{2}{*}{Expected frequency}Bicycle
RoadGravelElectric
\multirow{2}{*}{Maximum speed}Less than 25 mph12.95
25 mph or greater24.05
\end{table}
  1. Complete the table of expected frequencies in the Printed Answer Booklet.
  2. Determine the contribution to the chi-squared test statistic for the Electric bicycle and maximum speed 25 mph or greater. Give your answer correct to 4 decimal places. The contributions to the chi-squared test statistic for the remaining categories are shown in Table 9.3. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 9.3}
    \multirow{2}{*}{Contribution to the test statistic}Bicycle
    RoadGravelElectric
    \multirow{2}{*}{Maximum speed}Less than 25 mph2.01190.32942.8264
    25 mph or greater1.08330.1774
    \end{table}
  3. In this question you must show detailed reasoning. Carry out the test at the 5\% significance level.
  4. For each type of bicycle, give a brief interpretation of what the data suggest about maximum speed.
Question 10
View details
10 Ben takes an underground train to work and back home each day. The waiting time is defined as the time from when he reaches the station platform until he boards the train. On his way to work the waiting time is \(X\) minutes, where \(X\) is modelled by a continuous uniform distribution on \([ 0,6 ]\). On his way back from work, the waiting time is \(Y\) minutes, where \(Y\) is modelled by a continuous uniform distribution on [0,4]. Ben's total waiting time for both journeys is \(Z\) minutes, where \(Z = X + Y\). You should assume that \(X\) and \(Y\) are independent.
  1. Find \(\mathrm { E } ( \mathrm { Z } )\).
  2. Ben thinks that \(Z\) will be well modelled by a continuous uniform distribution on \([ 0,10 ]\). By considering variances, show that he is not correct.
  3. Ben's friend Jamila constructs the spreadsheet below, which shows a simulation of 20 values of \(X , Y\) and \(Z\). All of the values have been rounded to 2 decimal places.
    \multirow[b]{3}{*}{
    1
    2
    }
    ABC
    XYZ
    1.173.835.01
    32.010.812.82
    41.271.522.78
    51.413.945.35
    64.112.947.05
    71.760.962.72
    83.290.984.27
    90.770.220.99
    100.991.442.43
    114.792.437.22
    123.823.937.75
    135.252.747.99
    142.640.483.12
    151.542.183.72
    162.711.664.36
    170.043.243.28
    185.953.129.07
    195.221.216.42
    204.160.114.27
    211.020.992.01
    22
    Write down an estimate of \(\mathrm { P } ( Z > 6 )\).
  4. Use a Normal approximation to determine the probability that Ben's total waiting time when travelling to and from work on 40 days is more than 210 minutes.
Question 11
View details
11 The discrete random variable \(X\) has a uniform distribution over the set of all integers between 25 and \(n\) inclusive, where \(n\) is a positive integer with \(n > 25\).
  1. Determine \(\mathrm { P } \left( \mathrm { X } < \frac { \mathrm { n } + 25 } { 2 } \right)\) in each of the following cases.
    • \(n\) is even
    • \(n\) is odd
    • Determine an expression in terms of \(n\) for the variance of the mean of 100 independent values of \(X\).
    • Given that \(n = 75\), calculate an estimate of the probability that the mean of 100 independent values of \(X\) is less than 48 .
Question 12
View details
12 The cumulative distribution function of the continuous random variable \(X\) is given by
\(F ( x ) = \begin{cases} 0 & x < 20 ,
a \left( x ^ { 2 } + b x + c \right) & 20 \leqslant x \leqslant 30 ,
1 & x > 30 , \end{cases}\)
where \(a\), \(b\) and \(c\) are constants.
You are given that \(\mathrm { P } ( X < 25 ) = \frac { 11 } { 24 }\).
  1. Find \(\mathrm { P } ( X > 27 )\).
  2. Find the 90th percentile of \(X\).