| Exam Board | OCR MEI |
|---|---|
| Module | S2 (Statistics 2) |
| Marks | 18 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Interpret features of scatter diagram |
| Difficulty | Standard +0.3 This is a straightforward multi-part question testing standard linear regression techniques: calculating regression line from summary statistics (routine formula application), computing a residual (simple substitution), discussing outlier effects (conceptual understanding), and performing a correlation hypothesis test (standard procedure). All parts follow textbook methods with no novel problem-solving required, making it slightly easier than average. |
| Spec | 5.09c Calculate regression line5.09d Linear coding: effect on regression5.09e Use regression: for estimation in context |
| Answer | Marks | Guidance |
|---|---|---|
| \(P(75.5 \leq X < 76.5) = P\left(\frac{75.5-76}{12} \leq Z < \frac{76.5-76}{12}\right)\) | M1 | Standardising both |
| \(= P(-0.0417 \leq Z < 0.0417)\) | A1 | Both \(z\)-values correct |
| \(= 2\Phi(0.0417) - 1 = 2(0.5166) - 1 = 0.0333\) | A1 A1 | Final answer |
| Answer | Marks | Guidance |
|---|---|---|
| \(P(\text{reported mark} \geq 80) = P(X \geq 79.5) = P\left(Z \geq \frac{79.5-76}{12}\right)\) | M1 | Continuity correction |
| \(= P(Z \geq 0.2917)\) | A1 | \(z\)-value |
| \(= 1 - 0.6147 = 0.3853\) | A1 | Answer |
| Answer | Marks | Guidance |
|---|---|---|
| \(P(\text{exactly one}) = \binom{3}{1}(0.3853)^1(0.6147)^2\) | M1 | Correct structure |
| \(= 3 \times 0.3853 \times 0.3778 = 0.4367\) | A1 | Answer |
| Answer | Marks | Guidance |
|---|---|---|
| Need \(P(X \geq k - 0.5) \leq 0.10\) where \(k\) is integer boundary | M1 | Setting up inequality |
| \(\frac{(k-0.5)-76}{12} \geq 1.282\) | M1 | Using \(z = 1.282\) |
| Answer | Marks | Guidance |
|---|---|---|
| \(k \geq 91.884\) | A1 | Correct value |
| Lowest mark \(= 92\) | A1 A1 | Final answer |
| Answer | Marks | Guidance |
|---|---|---|
| \(P(\text{reported} \leq 50) = P(X < 50.5)= 0.20\) | M1 | Continuity correction |
| \(\frac{50.5 - \mu}{12} = -0.8416\) | M1 A1 | \(z\)-value correct |
| \(\mu = 50.5 + 12 \times 0.8416 = 60.6\) | A1 | Answer |
# Question 3:
## Part (i)
$P(75.5 \leq X < 76.5) = P\left(\frac{75.5-76}{12} \leq Z < \frac{76.5-76}{12}\right)$ | M1 | Standardising both
$= P(-0.0417 \leq Z < 0.0417)$ | A1 | Both $z$-values correct
$= 2\Phi(0.0417) - 1 = 2(0.5166) - 1 = 0.0333$ | A1 A1 | Final answer
## Part (ii)
$P(\text{reported mark} \geq 80) = P(X \geq 79.5) = P\left(Z \geq \frac{79.5-76}{12}\right)$ | M1 | Continuity correction
$= P(Z \geq 0.2917)$ | A1 | $z$-value
$= 1 - 0.6147 = 0.3853$ | A1 | Answer
## Part (iii)
$P(\text{exactly one}) = \binom{3}{1}(0.3853)^1(0.6147)^2$ | M1 | Correct structure
$= 3 \times 0.3853 \times 0.3778 = 0.4367$ | A1 | Answer
## Part (iv)
Need $P(X \geq k - 0.5) \leq 0.10$ where $k$ is integer boundary | M1 | Setting up inequality
$\frac{(k-0.5)-76}{12} \geq 1.282$ | M1 | Using $z = 1.282$
$k - 0.5 \geq 91.384$
$k \geq 91.884$ | A1 | Correct value
Lowest mark $= 92$ | A1 A1 | Final answer
## Part (v)
$P(\text{reported} \leq 50) = P(X < 50.5)= 0.20$ | M1 | Continuity correction
$\frac{50.5 - \mu}{12} = -0.8416$ | M1 A1 | $z$-value correct
$\mu = 50.5 + 12 \times 0.8416 = 60.6$ | A1 | Answer
---
3 In a triathlon, competitors have to swim 600 metres, cycle 40 kilometres and run 10 kilometres. To improve her strength, a triathlete undertakes a training programme in which she carries weights in a rucksack whilst running. She runs a specific course and notes the total time taken for each run. Her coach is investigating the relationship between time taken and weight carried. The times taken with eight different weights are illustrated on the scatter diagram below, together with the summary statistics for these data. The variables $x$ and $y$ represent weight carried in kilograms and time taken in minutes respectively.\\
\includegraphics[max width=\textwidth, alt={}, center]{d138173d-c70c-46db-b9b9-d5f19334c5f1-04_627_1536_630_281}
Summary statistics: $n = 8 , \Sigma x = 36 , \Sigma y = 214.8 , \Sigma x ^ { 2 } = 204 , \Sigma y ^ { 2 } = 5775.28 , \Sigma x y = 983.6$.\\
(i) Calculate the equation of the regression line of $y$ on $x$.
On one of the eight runs, the triathlete was carrying 4 kilograms and took 27.5 minutes. On this run she was delayed when she tripped and fell over.\\
(ii) Calculate the value of the residual for this weight.\\
(iii) The coach decides to recalculate the equation of the regression line without the data for this run. Would it be preferable to use this recalculated equation or the equation found in part (i) to estimate the delay when the triathlete tripped and fell over? Explain your answer.
The triathlete's coach claims that there is positive correlation between cycling and swimming times in triathlons. The product moment correlation coefficient of the times of twenty randomly selected competitors in these two sections is 0.209 .\\
(iv) Carry out a hypothesis test at the $5 \%$ level to examine the coach's claim, explaining your conclusions clearly.\\
(v) What distributional assumption is necessary for this test to be valid? How can you use a scatter diagram to decide whether this assumption is likely to be true?
\hfill \mbox{\textit{OCR MEI S2 Q3 [18]}}