Edexcel S3 (Statistics 3) 2022 January

Question 1
View details
  1. The weights, \(x \mathrm {~kg}\), of each of 10 watermelons selected at random from Priya's shop were recorded. The results are summarised as follows
$$\sum x = 114.2 \quad \sum x ^ { 2 } = 1310.464$$
  1. Calculate unbiased estimates of the mean and the variance of the weights of the watermelons in Priya’s shop. Priya researches the weight of watermelons, for the variety she has in her shop, and discovers that the weights of these watermelons are normally distributed with a standard deviation of 0.8 kg
  2. Calculate a \(95 \%\) confidence interval for the mean weight of watermelons in Priya’s shop. Give the limits of your confidence interval to 2 decimal places. Priya claims that the confidence interval in part (b) suggests that nearly all of the watermelons in her shop weigh more than 10.5 kg
  3. Use your answer to part (b) to estimate the smallest proportion of watermelons in her shop that weigh less than 10.5 kg
Question 2
View details
  1. Secondary schools in a region conduct ability testing at the start of Year 7 and the start of Year 8. Each year a regional education officer randomly selects 240 Year 7 students and 240 Year 8 students from across the region. The results for last year are summarised in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Mean scoreVariance of scores
Year 710138
Year 810342
The regional education officer claims that there is no difference between the mean scores of these two year groups.
  1. Test the regional education officer's claim at the \(1 \%\) significance level. You should state your hypotheses, test statistic and critical value clearly.
  2. Explain the significance of the Central Limit Theorem in part (a).
Question 3
View details
  1. A medical research team carried out an investigation into the metabolic rate, MR, of men aged between 30 years and 60 years.
A random sample of 10 men was taken from this age group.
The table below shows for each man his MR and his body mass index, BMI. The table also shows the rank for the level of daily physical activity, DPA, which was assessed by the medical research team. Rank 1 was assigned to the man with the highest level of daily physical activity.
Man\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
MR ( \(\boldsymbol { x }\) )6.245.946.836.536.317.447.328.707.887.78
BMI ( \(\boldsymbol { y }\) )19.619.223.621.420.220.822.925.523.325.1
DPA rank10798631452
$$\text { [You may use } \quad \mathrm { S } _ { x y } = 15.1608 \quad \mathrm {~S} _ { x x } = 6.90181 \quad \mathrm {~S} _ { y y } = 45.304 \text { ] }$$
  1. Calculate the value of the product moment correlation coefficient between MR and BMI for these 10 men.
  2. Use your value of the product moment correlation coefficient to test, at the 5\% significance level, whether or not there is evidence of a positive correlation between MR and BMI.
    State your hypotheses clearly.
  3. State an assumption that must be made to carry out the test in part (b).
  4. Calculate the value of Spearman's rank correlation coefficient between MR and DPA for these 10 men.
  5. Use a two-tailed test and a \(5 \%\) level of significance to assess whether or not there is evidence of a correlation between MR and DPA.
Question 4
View details
  1. A survey was carried out with students that had studied Maths, Physics and Chemistry at a college between 2016 and 2020. The students were divided into two groups \(A\) and \(B\).
    1. Explain how a sample could be obtained from this population using quota sampling.
    The students were asked which of the three subjects they enjoyed the most. The results of the survey are shown in the table.
    \multirow{2}{*}{}Subject enjoyed the most
    MathsPhysicsChemistryTotal
    Group A16101339
    Group B38131061
    Total542323100
  2. Test, at the \(5 \%\) level of significance, whether the subject enjoyed the most is independent of group. You should state your hypotheses, expected frequencies, test statistic and the critical value used for this test. The Headteacher discovered later that the results were actually based on a random sample of 200 students but had been recorded in the table as percentages.
  3. For the test in part (b), state with reasons the effect, if any, that this information would have on
    1. the null and alternative hypotheses,
    2. the critical value,
    3. the value of the test statistic,
    4. the conclusion of the test.
Question 5
View details
  1. Charlie is training for three events: a 1500 m swim, a 40 km bike ride and a 10 km run.
From past experience his times, in minutes, for each of the three events independently have the following distributions. $$\begin{aligned} & S \sim \mathrm {~N} \left( 41,5.2 ^ { 2 } \right) \text { represents the time for the swim }
& B \sim \mathrm {~N} \left( 81,4.2 ^ { 2 } \right) \text { represents the time for the bike ride }
& R \sim \mathrm {~N} \left( 57,6.6 ^ { 2 } \right) \text { represents the time for the run } \end{aligned}$$
  1. Find the probability that Charlie's total time for a randomly selected swim, bike ride and run exceeds 3 hours.
  2. Find the probability that the time for a randomly selected swim will be at least 20 minutes quicker than the time for a randomly selected run. Given that \(\mathrm { P } ( S + B + R > t ) = 0.95\)
  3. find the value of \(t\) A triathlon consists of a 1500 m swim, immediately followed by a 40 km bike ride, immediately followed by a 10 km run. Charlie uses the answer to part (a) to find the probability that, in 6 successive independent triathlons, his time will exceed 3 hours on at least one occasion.
  4. Find the answer Charlie should obtain. Jane says that Charlie should not have used the answer to part (a) for the calculation in part (d).
  5. Explain whether or not Jane is correct.
Question 6
View details
  1. A farmer sells strawberries in baskets. The contents of each of 100 randomly selected baskets were weighed and the results, given to the nearest gram, are shown below.
Weight of strawberries (grams)Number of baskets
302-3035
304-30513
306-30710
308-30918
310-31125
312-31320
314-3155
316-3174
The farmer proposes that the weight of strawberries per basket, in grams, should be modelled by a normal distribution with a mean of 310 g and standard deviation 4 g . Using his model, the farmer obtains the following expected frequencies.
Weight of strawberries (s, grams)Expected frequency
\(s \leqslant 303.5\)\(a\)
\(303.5 < s \leqslant 305.5\)7.8
\(305.5 < s \leqslant 307.5\)13.6
\(307.5 < s \leqslant 309.5\)18.4
\(309.5 < s \leqslant 311.5\)19.6
\(311.5 < s \leqslant 313.5\)16.3
\(313.5 < s \leqslant 315.5\)10.6
\(s > 315.5\)\(b\)
  1. Find the value of \(a\) and the value of \(b\). Give your answers correct to one decimal place. Before \(s \leqslant 303.5\) and \(s > 315.5\) are included, for the remaining cells, $$\sum \frac { ( O - E ) ^ { 2 } } { E } = 9.71$$
  2. Using a 5\% significance level, test whether the data are consistent with the model. You should state your hypotheses, the test statistic and the critical value used. An alternative model uses estimates for the population mean and standard deviation from the data given. Using these estimated values no expected frequency is below 5
    Another test is to be carried out, using a \(5 \%\) significance level, to assess whether the data are consistent with this alternative model.
  3. State the effect, if any, on the critical value for this test. Give a reason for your answer.