OCR MEI Further Statistics Major (Further Statistics Major) 2023 June

Question 1
View details
1 A website simulates the outcome of throwing four fair dice. Ten thousand people take part in a challenge using the website in which they have one attempt at getting four sixes in the four throws of the dice. The number of people who succeed in getting four sixes is denoted by the random variable \(X\).
  1. Show that, for each person, the probability that the person gets four sixes is equal to \(\frac { 1 } { 1296 }\).
  2. Explain why you could use either a binomial distribution or a Poisson distribution to model the distribution of \(X\).
  3. Use a Poisson distribution to calculate each of the following probabilities.
    • \(\mathrm { P } ( X = 10 )\)
    • \(\mathrm { P } ( X > 10 )\)
    • In another challenge on the website, 50 people are each given 20 independent attempts to try to get four sixes as often as they can.
    Determine the probability that no more than 2 people succeed in getting four sixes at least once in their 20 attempts.
Question 2
View details
2 A student is investigating the link between temperature and electricity consumption in the winter months. The student finds the average minimum temperature, \(x ^ { \circ } \mathrm { C }\), from across the country on a day. The student then finds the total electricity consumption for that day, \(y \mathrm { GWh }\). The scatter diagram below shows the values of \(x\) and \(y\) obtained from a random sample of 10 winter days. It also shows the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\), where \(r\) is the product moment correlation coefficient.
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-03_776_1043_609_244}
  1. Use the regression line to estimate the electricity consumption at each of the following average minimum temperatures.
    • \(5 ^ { \circ } \mathrm { C }\)
    • \(- 4 ^ { \circ } \mathrm { C }\)
    • Comment on the reliability of your estimates.
Question 3
View details
3 A tennis player is practising her serve. Each time she serves, she has a \(55 \%\) chance of being successful (getting the serve in the required area without hitting the net). You should assume that whether she is successful on any serve is independent of whether she is successful on any other serve.
  1. Find the probability that the player is not successful in any of her first three serves.
  2. Determine the probability that the player is successful at least 10 times in her first 20 serves.
  3. Determine the probability that the player is successful for the first time on her fifth serve.
  4. Determine the probability that the player is successful for the fifth time on her tenth serve. Another player is also practising his serve. Each time he serves, he has a probability \(p\) of being successful. You should assume that whether he is successful on any serve is independent of whether he is successful on any other serve. The probability that he is successful for the first time on his second serve is 0.2496 and the probability that he is successful on both of his first two serves is less than 0.25 .
  5. Determine the value of \(p\).
Question 4
View details
4 A machine manufactures batches of 100 titanium sheets. The thickness of every sheet in a batch is Normally distributed with mean \(\mu \mathrm { mm }\) and standard deviation 0.03 mm . You should assume that each sheet is of uniform thickness and that the thicknesses of different sheets are independent of each other. The values of \(\mu\) for three different batches, A, B and C, are 3.125, 3.117 and 3.109 respectively.
  1. Determine the probability that the total thickness of 10 sheets from Batch A is less than 31.0 mm .
  2. Determine the probability that, if a single sheet from Batch A is cut into pieces and 10 of the pieces are stacked together, the total thickness of the stack is less than 31.0 mm .
  3. Determine the probability that, if one sheet from each of Batches A, B and C are stacked together, the total thickness of the stack is at least 9.4 mm .
  4. Determine the probability that the total thickness of 10 sheets from Batch A is less than the total thickness of 10 sheets from Batch B.
Question 5
View details
5 Amari is investigating how accurately people can estimate a short time period. He asks each of a random sample of 40 people to estimate a period of 20 seconds. For each person, he starts a stopwatch and then stops it when they tell him that they think that 20 s has elapsed. The times which he records are denoted by \(x \mathrm {~s}\). You are given that
\(\sum x = 765 , \quad \sum x ^ { 2 } = 15065\).
  1. Determine a 95\% confidence interval for the mean estimated time.
  2. Amari says that the confidence interval supports the suggestion that people can estimate 20 s accurately. Make two comments about Amari's statement.
  3. Discuss whether you could have constructed the confidence interval if there had only been 10 people involved in the experiment. Amari thinks that people would be able to estimate more accurately if he gave them a second attempt. He repeats the experiment with each person and again records the times. Software is used to produce a \(95 \%\) confidence interval for the mean estimated time. The output from the software is shown below. Z Estimate of a Mean Confidence level 0.95 Sample
    Mean19.68
    s1.38
    N40
    Result
    Z Estimate of a Mean
    Mean19.68
    s1.38
    SE0.2182
    N40
    Interval\(19.68 \pm 0.4277\)
  4. State the confidence interval in the form \(\mathrm { a } < \mu < \mathrm { b }\).
  5. Make two comments based on this confidence interval about Amari's opinion that second attempts result in more accurate estimates.
Question 6
View details
6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed \(x\) and upload speed \(y\) in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected.
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
  1. Explain why the student decides to carry out a test based on the product moment correlation coefficient. Summary statistics for the 20 occasions are as follows. $$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any correlation between download speed and upload speed.
  4. Both of the variables, download speed and upload speed, are random. Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.
Question 7
View details
7 An analyst routinely examines bottles of hair shampoo in order to check that the average percentage of a particular chemical which the shampoo contains does not exceed the value of \(1.0 \%\) specified by the manufacturer. The percentages of the chemical in a random sample of 12 bottles of the shampoo are as follows.
\(\begin{array} { l l l l l l l l l l l } 1.087 & 1.171 & 1.047 & 0.846 & 0.909 & 1.052 & 1.042 & 0.893 & 1.021 & 1.085 & 1.096 \end{array} 0.931\)
The analyst uses software to draw a Normal probability plot for these data, and to carry out a Normality test as shown below.
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-08_524_1539_694_264}
  1. The analyst is going to carry out a hypothesis test to check whether the average percentage exceeds 1.0\%. Explain which test the analyst should use, referring to each of the following.
    • The Normal probability plot
    • The \(p\)-value of the Kolmogorov-Smirnov test
    • In this question you must show detailed reasoning.
    Carry out the test at the 5\% significance level.
Question 8
View details
8 The random variable \(X\) has a continuous uniform distribution over [0,10].
  1. Find the probability that, if two independent values of \(X\) are taken, one is less than 3 and the other is greater than 3 . The random variable \(T\) denotes the sum of 5 independent values of \(X\).
  2. State the value of \(\mathrm { P } ( T \leqslant 25 )\). The spreadsheet below shows the heading row and the first 20 data rows from a total of 100 data rows of a simulation of the distribution of \(X\). Each of the 100 rows shows a simulation of 5 independent values of \(X\), together with \(T\), the sum of the 5 values. All of the values have been rounded to 2 decimal places. In column I the spreadsheet shows the number of values of \(T\) that are less than or equal to the corresponding values in column H . For example, there are 75 simulated values of \(T\) that are less than or equal to 30 .
    ABcDEFGHI
    1\(\mathrm { X } _ { 1 }\)\(\mathrm { X } _ { 2 }\)\(\mathrm { X } _ { 3 }\)\(\mathrm { X } _ { 4 }\)\(\mathrm { X } _ { 5 }\)TtNumber \(\leqslant \mathrm { t }\)
    23.736.654.930.419.3325.0600
    34.956.584.482.517.2625.7950
    48.104.874.263.830.7921.85101
    56.704.105.101.826.7624.48154
    63.738.388.499.871.3131.792023
    73.224.360.121.349.4918.532548
    89.177.135.474.352.4428.553075
    93.421.936.042.998.8523.243593
    100.980.689.829.837.2828.584099
    115.861.677.774.087.1426.5245100
    129.200.315.825.316.4527.1050100
    137.044.302.060.064.1617.62
    140.315.021.485.371.7713.94
    153.776.041.217.675.0123.69
    161.215.541.901.436.9117.00
    179.271.985.809.379.3435.76
    184.305.662.801.561.1915.51
    197.153.196.895.412.1824.82
    206.186.323.016.499.1231.13
    215.035.995.196.973.5526.73
  3. Use the spreadsheet output to estimate each of the following.
    • \(\mathrm { P } ( T \leqslant 25 )\)
    • \(\mathrm { P } ( T > 35 )\)
    • In this question you must show detailed reasoning.
    The random variable \(Y\) is the mean of 100 independent values of \(T\). Determine an estimate of \(\mathrm { P } ( Y > 26 )\).
Question 9
View details
9 A cyclist who lives on an island suspects that car drivers with locally registered number plates allow more space when passing her than those with non-locally registered number plates. She decides to carry out a hypothesis test and so over a period of time selects a random sample of 250 cars which pass her. For each car she estimates whether the car driver allows at least the recommended 1.5 metres when passing her. The table shows the data which she collected.
Where registered
\cline { 3 - 4 } \multicolumn{2}{|c|}{}LocalNon-local
\multirow{2}{*}{
Passing
distance
}
Under 1.5 m1211
\cline { 2 - 4 }At least 1.5 m15770
  1. In this question you must show detailed reasoning. Carry out the test at the \(5 \%\) significance level to examine whether there is any association between where the car is registered and passing distance.
  2. A friend of the cyclist suggests that there may be a problem with the data, since the cyclist may have introduced some bias in estimating whether cars were allowing the recommended distance. Explain how any bias might have arisen.
Question 10
View details
10 The continuous random variable \(X\) has probability density function given by
\(f ( x ) = \begin{cases} \frac { 4 } { 15 } \left( \frac { a } { x ^ { 2 } } + 3 x ^ { 2 } - \frac { 7 } { 2 } \right) & 1 \leqslant x \leqslant 2 ,
0 & \text { otherwise, } \end{cases}\)
where \(a\) is a positive constant.
  1. Find the cumulative distribution function of \(X\) in terms of \(a\).
  2. Hence or otherwise determine the value of \(a\).
  3. Show that the median value \(m\) of \(X\) satisfies the equation $$8 m ^ { 4 } - 28 m ^ { 2 } + 9 m - 4 = 0 .$$
  4. Verify that the median value of \(X\) is 1.74, correct to \(\mathbf { 2 }\) decimal places.
  5. Find \(\mathrm { E } ( X )\).
  6. Determine the mode of \(X\).
Question 11
View details
11 The random variable \(X\) takes the value 1 with probability \(p\) and the value 0 with probability \(1 - p\).
  1. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    • The random variable \(Y \sim \mathrm {~B} ( 50,0.2 )\) has mean \(\mu\) and variance \(\sigma ^ { 2 }\).
    Use the results of part (a) to prove that
    • \(\mu = 10\)
    • \(\sigma ^ { 2 } = 8\).