| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2024 |
| Session | January |
| Marks | 12 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Calculate PMCC from raw data |
| Difficulty | Moderate -0.8 This is a standard S1 regression question requiring routine calculations: identifying an outlier using the IQR rule (straightforward quartile calculation), deriving a regression line from given summary statistics using standard formulas, and interpreting results. All steps follow textbook procedures with no problem-solving insight required. The calculations are mechanical and the concepts are core S1 material, making this easier than average for A-level. |
| Spec | 2.02f Measures of average and spread2.02g Calculate mean and standard deviation2.02h Recognize outliers5.09a Dependent/independent variables5.09c Calculate regression line |
| Student | A | B | C | D | E | F | G | H | I | J | K |
| French mark ( f ) | 24 | 30 | 32 | 32 | 36 | 36 | 40 | 44 | 50 | 60 | 68 |
| Spanish mark ( \(\boldsymbol { s }\) ) | 16 | 90 | 24 | 28 | 32 | 36 | 38 | 44 | 48 | 48 | 68 |
| Answer | Marks | Guidance |
|---|---|---|
| \(LQ = 28\) or \(UQ = 48\) | B1 | For either LQ or UQ correct (may be seen in calculation for M1) |
| \(48 + 1.5(48-28)[=78]\) | M1 | Correct use of \(Q_3 + 1.5\times(Q_3-Q_1)\) ft their LQ and UQ provided UQ > LQ |
| \(90 > 78\) so 90 is an outlier | A1* | For both LQ and UQ correct and identifying \(90>78\) or 90 is an outlier; answer is given so no incorrect working can be seen |
| Answer | Marks | Guidance |
|---|---|---|
| \(b = \frac{1735.6}{1667.6}[=1.04\ldots]\) | M1 | For correct method to find gradient |
| \(a = 38.2 - b(42.2)[=-5.72\ldots]\) | M1 | For correct method to find intercept (division by 11 is M0) |
| \(s = -5.72 + 1.04f\) | A1* | cao (dep on both M marks); must see printed answer \(s=-5.72+1.04f\) |
| Answer | Marks | Guidance |
|---|---|---|
| For every extra mark (oe) in French/\(f\), Spanish/\(s\) goes up (oe) by [on average] 1.04 marks | B1 | For correct numerical interpretation of gradient in context which must include marks at least once |
| Answer | Marks | Guidance |
|---|---|---|
| \(s = -5.72 + 1.04\times 55 = 51.48\) awrt 51.5 | M1 A1 | For correct substitution into regression equation; allow 51 or 52 |
| Answer | Marks | Guidance |
|---|---|---|
| \(s = -5.72 + 1.04\times 18 = 13\) | A1 | 13 or awrt 13.0 |
| Answer | Marks | Guidance |
|---|---|---|
| \(\ldots\)so 51.5 is the more reliable estimate | M1 | For any equivalent correct reason; ignore extraneous non-contradictory comments; must be clear referring to French marks \((24, f, 68)\); do not allow comments referring to range of Spanish marks; do not allow '55 is closer to the median (than 18)' |
| A1 | For clearly identifying the estimate from part (d)(i): 51.5 or 55 or (i) or 'the first estimate' |
## Question 4:
### Part (a):
$LQ = 28$ or $UQ = 48$ | B1 | For either LQ or UQ correct (may be seen in calculation for M1)
$48 + 1.5(48-28)[=78]$ | M1 | Correct use of $Q_3 + 1.5\times(Q_3-Q_1)$ ft their LQ and UQ provided UQ > LQ
$90 > 78$ so 90 is an outlier | A1* | For both LQ and UQ correct and identifying $90>78$ or 90 is an outlier; answer is given so no incorrect working can be seen
### Part (b):
$b = \frac{1735.6}{1667.6}[=1.04\ldots]$ | M1 | For correct method to find gradient
$a = 38.2 - b(42.2)[=-5.72\ldots]$ | M1 | For correct method to find intercept (division by 11 is M0)
$s = -5.72 + 1.04f$ | A1* | cao (dep on both M marks); must see printed answer $s=-5.72+1.04f$
### Part (c):
For every extra mark (oe) in French/$f$, Spanish/$s$ goes up (oe) by [on average] **1.04 marks** | B1 | For correct numerical interpretation of gradient in context which must include marks at least once
### Part (d)(i):
$s = -5.72 + 1.04\times 55 = 51.48$ awrt 51.5 | M1 A1 | For correct substitution into regression equation; allow 51 or 52
### Part (d)(ii):
$s = -5.72 + 1.04\times 18 = 13$ | A1 | 13 or awrt 13.0
### Part (e):
- The first estimate is an interpolation / the second estimate is an extrapolation
- 55 is within the range of data / 18 is not within the range of data
- 55 is closer to the mean / 18 is further away from the mean
$\ldots$so 51.5 is the more reliable estimate | M1 | For any equivalent correct reason; ignore extraneous non-contradictory comments; must be clear referring to French marks $(24, f, 68)$; do not allow comments referring to range of Spanish marks; do not allow '55 is closer to the median (than 18)'
| A1 | For clearly identifying the estimate from part (d)(i): 51.5 or 55 or (i) or 'the first estimate'
---
\begin{enumerate}
\item A French test and a Spanish test were sat by 11 students.
\end{enumerate}
The table below shows their marks.
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|}
\hline
Student & A & B & C & D & E & F & G & H & I & J & K \\
\hline
French mark ( f ) & 24 & 30 & 32 & 32 & 36 & 36 & 40 & 44 & 50 & 60 & 68 \\
\hline
Spanish mark ( $\boldsymbol { s }$ ) & 16 & 90 & 24 & 28 & 32 & 36 & 38 & 44 & 48 & 48 & 68 \\
\hline
\end{tabular}
\end{center}
Greg says that if these points were plotted on a scatter diagram, then the point $( 30,90 )$ would be an outlier because 90 is an outlier for the Spanish marks.
An outlier is defined as a value that is
$$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { or smaller than } Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
(a) Show that 90 is an outlier for the Spanish marks.
Ignoring the point (30, 90), Greg calculated the following summary statistics.
$$\sum f = 422 \quad \sum s = 382 \quad S _ { f f } = 1667.6 \quad S _ { f s } = 1735.6$$
(b) Use these summary statistics to show that the equation of the least squares regression line of $s$ on $f$ for the remaining 10 students is
$$s = - 5.72 + 1.04 f$$
where the values of the intercept and gradient are given to 3 significant figures. You must show your working.\\
(c) Give an interpretation of the gradient of the regression line.
Two further students sat the French test but missed the Spanish test.\\
(d) Using the equation given in part (b), estimate\\
(i) a Spanish mark for the student who scored 55 marks in their French test,\\
(ii) a Spanish mark for the student who scored 18 marks in their French test.\\
(e) State, giving a reason, which of the two estimates found in part (d) would be the more reliable estimate.
\hfill \mbox{\textit{Edexcel S1 2024 Q4 [12]}}