| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2021 |
| Session | June |
| Marks | 16 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Calculate from summary statistics |
| Difficulty | Standard +0.3 This is a standard S1 regression question requiring routine application of formulas for Sxx, Syy, and PMCC from summary statistics, plus interpretation of extrapolation reliability. All parts follow textbook procedures with no novel problem-solving required, making it slightly easier than average for A-level. |
| Spec | 2.02c Scatter diagrams and regression lines5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09c Calculate regression line5.09e Use regression: for estimation in context |
| Answer | Marks | Guidance |
|---|---|---|
| \(\{S_{yy} =\} 42.63 - \frac{23.7^2}{16} = [7.524375]\) | B1 | Value given so must see correct expression – allow 561.69 for \(23.7^2\) |
| Answer | Marks | Guidance |
|---|---|---|
| Use of \(\bar{y} = 3.684 - 0.3242\bar{x}\); so \(\sum x = 16 \times \left(\frac{3.684 - \frac{23.7}{16}}{0.3242}\right) = 108.71067...\) | M1; A1 | 1st M1 for clear use of regression line with \(\bar{y}\) or \(\sum y\); 1st A1 for \(\sum x =\) awrt 109 |
| \(\{S_{xx} =\} 756.81 - \frac{(\text{"108.71..."})^2}{16}\); \(= 18.18435...\) awrt 18.2 | M1; A1 | 2nd M1 for correct expression for \(S_{xx}\) ft their \(\Sigma x\); 2nd A1 for awrt 18.2 |
| Answer | Marks | Guidance |
|---|---|---|
| \(b = \frac{S_{xy}}{S_{xx}} \Rightarrow S_{xy} = \text{"18.1843..."} \times (-0.3242) [= -5.8953...]\) | M1 | for use of gradient to find \(S_{xy}\) |
| \(r = \frac{\text{"}-5.89536\text{"}}{\sqrt{\text{"18.184..."} \times 7.524375}}\) | M1 | for correct expression for \(r\) ft their \(S_{xy}\) and \(S_{xx}\) |
| \(= -0.50399... = \) \(-0.49 \sim -0.51\) | A1 | for answer in range \(-0.49 \sim -0.51\) |
| Answer | Marks | Guidance |
|---|---|---|
| Sub \(x = 2\) in the regression line gives \(y = 3.0356\) | B1 | for sight of \(y = 3.03...\) or better. Allow 3.04 |
| Answer | Marks | Guidance |
|---|---|---|
| \(\text{St.dev} = \sqrt{\frac{S_{xx}}{n}} = \sqrt{\frac{\text{"18.184..."}}{16}} = 1.066...\) | M1 | for correct attempt at st. dev. ft their \(S_{xx}\) or \(\sqrt{\frac{756.81}{16} - \left(\frac{\text{"108.71..."}}{16}\right)^2}\) ft their \(\Sigma x\) |
| So limits are: \(\frac{\text{"108.71..."}}{16} \pm 3 \times \text{"1.066..."} = 3.5965... \sim 9.9929...\) = awrt 3.6~10 | M1, A1 | 2nd M1 for one correct calc...ft their values; A1 for a range awrt 3.6~10 |
| Answer | Marks | Guidance |
|---|---|---|
| The probability of \(x = 2\) being in the range is very small; so Behrouz's estimate is unreliable | B1ft; dB1ft | 1st B1ft for correct reason ft their range e.g. \(x=2\) is outside the range. Allow extrapolation; 2nd dB1ft dep on 1st B1 for stating correct conclusion |
| Answer | Marks | Guidance |
|---|---|---|
| Should use regression of \(x\) on \(y\) to estimate unemployment or equivalent | B1 | for suitable reason based on reg line, e.g. regression line (\(y\) on \(x\)) can only be used to estimate wages. Allow \(x\) instead of unemployment and \(y\) instead of wages |
| So Andi's suggestion is not suitable or not to be recommended | dB1 | dep on 1st B1 for suggesting not suitable (or equivalent) |
# Question 6:
## Part (a)
| $\{S_{yy} =\} 42.63 - \frac{23.7^2}{16} = [7.524375]$ | B1 | Value given so must see correct expression – allow 561.69 for $23.7^2$ |
**(1 mark)**
## Part (b)
| Use of $\bar{y} = 3.684 - 0.3242\bar{x}$; so $\sum x = 16 \times \left(\frac{3.684 - \frac{23.7}{16}}{0.3242}\right) = 108.71067...$ | M1; A1 | 1st M1 for clear use of regression line with $\bar{y}$ or $\sum y$; 1st A1 for $\sum x =$ awrt 109 |
| $\{S_{xx} =\} 756.81 - \frac{(\text{"108.71..."})^2}{16}$; $= 18.18435...$ awrt **18.2** | M1; A1 | 2nd M1 for correct expression for $S_{xx}$ ft their $\Sigma x$; 2nd A1 for awrt 18.2 |
**(4 marks)**
## Part (c)
| $b = \frac{S_{xy}}{S_{xx}} \Rightarrow S_{xy} = \text{"18.1843..."} \times (-0.3242) [= -5.8953...]$ | M1 | for use of gradient to find $S_{xy}$ |
| $r = \frac{\text{"}-5.89536\text{"}}{\sqrt{\text{"18.184..."} \times 7.524375}}$ | M1 | for correct expression for $r$ ft their $S_{xy}$ and $S_{xx}$ |
| $= -0.50399... = $ **$-0.49 \sim -0.51$** | A1 | for answer in range $-0.49 \sim -0.51$ |
**(3 marks)**
## Part (d)
| Sub $x = 2$ in the regression line gives $y = 3.0356$ | B1 | for sight of $y = 3.03...$ or better. Allow 3.04 |
**(1 mark)**
## Part (e)
| $\text{St.dev} = \sqrt{\frac{S_{xx}}{n}} = \sqrt{\frac{\text{"18.184..."}}{16}} = 1.066...$ | M1 | for correct attempt at st. dev. ft their $S_{xx}$ or $\sqrt{\frac{756.81}{16} - \left(\frac{\text{"108.71..."}}{16}\right)^2}$ ft their $\Sigma x$ |
| So limits are: $\frac{\text{"108.71..."}}{16} \pm 3 \times \text{"1.066..."} = 3.5965... \sim 9.9929...$ = awrt **3.6~10** | M1, A1 | 2nd M1 for one correct calc...ft their values; A1 for a range awrt 3.6~10 |
**(3 marks)**
## Part (f)
| The probability of $x = 2$ being in the range is very small; so Behrouz's estimate is unreliable | B1ft; dB1ft | 1st B1ft for correct reason ft their range e.g. $x=2$ is outside the range. Allow extrapolation; 2nd dB1ft dep on 1st B1 for stating correct conclusion |
**(2 marks)**
## Part (g)
| Should use regression of $x$ on $y$ to estimate unemployment or equivalent | B1 | for suitable reason based on reg line, e.g. regression line ($y$ on $x$) can only be used to estimate wages. Allow $x$ instead of unemployment and $y$ instead of wages |
| So Andi's suggestion is not suitable or not to be recommended | dB1 | dep on 1st B1 for suggesting not suitable (or equivalent) |
**(2 marks)**
**[Total: 16 marks]**
\begin{enumerate}
\item Two economics students, Andi and Behrouz, are studying some data relating to unemployment, $x \%$, and increase in wages, $y \%$, for a European country. The least squares regression line of $y$ on $x$ has equation
\end{enumerate}
$$y = 3.684 - 0.3242 x$$
and
$$\sum y = 23.7 \quad \sum y ^ { 2 } = 42.63 \quad \sum x ^ { 2 } = 756.81 \quad n = 16$$
(a) Show that $\mathrm { S } _ { y y } = 7.524375$\\
(b) Find $\mathrm { S } _ { x x }$\\
(c) Find the product moment correlation coefficient between $x$ and $y$.
Behrouz claims that, assuming the model is valid, the data show that when unemployment is 2\% wages increase at over 3\%\\
(d) Explain how Behrouz could have come to this conclusion.
Andi uses the formula
$$\text { range } = \text { mean } \pm 3 \times \text { standard deviation }$$
to estimate the range of values for $x$.\\
(e) Find estimates of the minimum value and the maximum value of $x$ in these data using Andi's formula.\\
(f) Comment, giving a reason, on the reliability of Behrouz's claim.
Andi suggests using the regression line with equation $y = 3.684 - 0.3242 x$ to estimate unemployment when wages are increasing at $2 \%$\\
(g) Comment, giving a reason, on Andi's suggestion.\\
\begin{center}
\includegraphics[max width=\textwidth, alt={}]{a439724e-b570-434d-bf75-de2b50915042-20_2647_1835_118_116}
\end{center}
\hfill \mbox{\textit{Edexcel S1 2021 Q6 [16]}}