Edexcel S1 2021 June — Question 6 16 marks

Exam BoardEdexcel
ModuleS1 (Statistics 1)
Year2021
SessionJune
Marks16
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeCalculate from summary statistics
DifficultyStandard +0.3 This is a standard S1 regression question requiring routine application of formulas for Sxx, Syy, and PMCC from summary statistics, plus interpretation of extrapolation reliability. All parts follow textbook procedures with no novel problem-solving required, making it slightly easier than average for A-level.
Spec2.02c Scatter diagrams and regression lines5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09c Calculate regression line5.09e Use regression: for estimation in context

  1. Two economics students, Andi and Behrouz, are studying some data relating to unemployment, \(x \%\), and increase in wages, \(y \%\), for a European country. The least squares regression line of \(y\) on \(x\) has equation
$$y = 3.684 - 0.3242 x$$ and $$\sum y = 23.7 \quad \sum y ^ { 2 } = 42.63 \quad \sum x ^ { 2 } = 756.81 \quad n = 16$$
  1. Show that \(\mathrm { S } _ { y y } = 7.524375\)
  2. Find \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(x\) and \(y\). Behrouz claims that, assuming the model is valid, the data show that when unemployment is 2\% wages increase at over 3\%
  4. Explain how Behrouz could have come to this conclusion. Andi uses the formula $$\text { range } = \text { mean } \pm 3 \times \text { standard deviation }$$ to estimate the range of values for \(x\).
  5. Find estimates of the minimum value and the maximum value of \(x\) in these data using Andi's formula.
  6. Comment, giving a reason, on the reliability of Behrouz's claim. Andi suggests using the regression line with equation \(y = 3.684 - 0.3242 x\) to estimate unemployment when wages are increasing at \(2 \%\)
  7. Comment, giving a reason, on Andi's suggestion.
    \includegraphics[max width=\textwidth, alt={}]{a439724e-b570-434d-bf75-de2b50915042-20_2647_1835_118_116}

Question 6:
Part (a)
AnswerMarks Guidance
\(\{S_{yy} =\} 42.63 - \frac{23.7^2}{16} = [7.524375]\)B1 Value given so must see correct expression – allow 561.69 for \(23.7^2\)
(1 mark)
Part (b)
AnswerMarks Guidance
Use of \(\bar{y} = 3.684 - 0.3242\bar{x}\); so \(\sum x = 16 \times \left(\frac{3.684 - \frac{23.7}{16}}{0.3242}\right) = 108.71067...\)M1; A1 1st M1 for clear use of regression line with \(\bar{y}\) or \(\sum y\); 1st A1 for \(\sum x =\) awrt 109
\(\{S_{xx} =\} 756.81 - \frac{(\text{"108.71..."})^2}{16}\); \(= 18.18435...\) awrt 18.2M1; A1 2nd M1 for correct expression for \(S_{xx}\) ft their \(\Sigma x\); 2nd A1 for awrt 18.2
(4 marks)
Part (c)
AnswerMarks Guidance
\(b = \frac{S_{xy}}{S_{xx}} \Rightarrow S_{xy} = \text{"18.1843..."} \times (-0.3242) [= -5.8953...]\)M1 for use of gradient to find \(S_{xy}\)
\(r = \frac{\text{"}-5.89536\text{"}}{\sqrt{\text{"18.184..."} \times 7.524375}}\)M1 for correct expression for \(r\) ft their \(S_{xy}\) and \(S_{xx}\)
\(= -0.50399... = \) \(-0.49 \sim -0.51\)A1 for answer in range \(-0.49 \sim -0.51\)
(3 marks)
Part (d)
AnswerMarks Guidance
Sub \(x = 2\) in the regression line gives \(y = 3.0356\)B1 for sight of \(y = 3.03...\) or better. Allow 3.04
(1 mark)
Part (e)
AnswerMarks Guidance
\(\text{St.dev} = \sqrt{\frac{S_{xx}}{n}} = \sqrt{\frac{\text{"18.184..."}}{16}} = 1.066...\)M1 for correct attempt at st. dev. ft their \(S_{xx}\) or \(\sqrt{\frac{756.81}{16} - \left(\frac{\text{"108.71..."}}{16}\right)^2}\) ft their \(\Sigma x\)
So limits are: \(\frac{\text{"108.71..."}}{16} \pm 3 \times \text{"1.066..."} = 3.5965... \sim 9.9929...\) = awrt 3.6~10M1, A1 2nd M1 for one correct calc...ft their values; A1 for a range awrt 3.6~10
(3 marks)
Part (f)
AnswerMarks Guidance
The probability of \(x = 2\) being in the range is very small; so Behrouz's estimate is unreliableB1ft; dB1ft 1st B1ft for correct reason ft their range e.g. \(x=2\) is outside the range. Allow extrapolation; 2nd dB1ft dep on 1st B1 for stating correct conclusion
(2 marks)
Part (g)
AnswerMarks Guidance
Should use regression of \(x\) on \(y\) to estimate unemployment or equivalentB1 for suitable reason based on reg line, e.g. regression line (\(y\) on \(x\)) can only be used to estimate wages. Allow \(x\) instead of unemployment and \(y\) instead of wages
So Andi's suggestion is not suitable or not to be recommendeddB1 dep on 1st B1 for suggesting not suitable (or equivalent)
(2 marks)
[Total: 16 marks]
# Question 6:

## Part (a)
| $\{S_{yy} =\} 42.63 - \frac{23.7^2}{16} = [7.524375]$ | B1 | Value given so must see correct expression – allow 561.69 for $23.7^2$ |
**(1 mark)**

## Part (b)
| Use of $\bar{y} = 3.684 - 0.3242\bar{x}$; so $\sum x = 16 \times \left(\frac{3.684 - \frac{23.7}{16}}{0.3242}\right) = 108.71067...$ | M1; A1 | 1st M1 for clear use of regression line with $\bar{y}$ or $\sum y$; 1st A1 for $\sum x =$ awrt 109 |

| $\{S_{xx} =\} 756.81 - \frac{(\text{"108.71..."})^2}{16}$; $= 18.18435...$ awrt **18.2** | M1; A1 | 2nd M1 for correct expression for $S_{xx}$ ft their $\Sigma x$; 2nd A1 for awrt 18.2 |

**(4 marks)**

## Part (c)
| $b = \frac{S_{xy}}{S_{xx}} \Rightarrow S_{xy} = \text{"18.1843..."} \times (-0.3242) [= -5.8953...]$ | M1 | for use of gradient to find $S_{xy}$ |

| $r = \frac{\text{"}-5.89536\text{"}}{\sqrt{\text{"18.184..."} \times 7.524375}}$ | M1 | for correct expression for $r$ ft their $S_{xy}$ and $S_{xx}$ |

| $= -0.50399... = $ **$-0.49 \sim -0.51$** | A1 | for answer in range $-0.49 \sim -0.51$ |

**(3 marks)**

## Part (d)
| Sub $x = 2$ in the regression line gives $y = 3.0356$ | B1 | for sight of $y = 3.03...$ or better. Allow 3.04 |

**(1 mark)**

## Part (e)
| $\text{St.dev} = \sqrt{\frac{S_{xx}}{n}} = \sqrt{\frac{\text{"18.184..."}}{16}} = 1.066...$ | M1 | for correct attempt at st. dev. ft their $S_{xx}$ or $\sqrt{\frac{756.81}{16} - \left(\frac{\text{"108.71..."}}{16}\right)^2}$ ft their $\Sigma x$ |

| So limits are: $\frac{\text{"108.71..."}}{16} \pm 3 \times \text{"1.066..."} = 3.5965... \sim 9.9929...$ = awrt **3.6~10** | M1, A1 | 2nd M1 for one correct calc...ft their values; A1 for a range awrt 3.6~10 |

**(3 marks)**

## Part (f)
| The probability of $x = 2$ being in the range is very small; so Behrouz's estimate is unreliable | B1ft; dB1ft | 1st B1ft for correct reason ft their range e.g. $x=2$ is outside the range. Allow extrapolation; 2nd dB1ft dep on 1st B1 for stating correct conclusion |

**(2 marks)**

## Part (g)
| Should use regression of $x$ on $y$ to estimate unemployment or equivalent | B1 | for suitable reason based on reg line, e.g. regression line ($y$ on $x$) can only be used to estimate wages. Allow $x$ instead of unemployment and $y$ instead of wages |

| So Andi's suggestion is not suitable or not to be recommended | dB1 | dep on 1st B1 for suggesting not suitable (or equivalent) |

**(2 marks)**

**[Total: 16 marks]**
\begin{enumerate}
  \item Two economics students, Andi and Behrouz, are studying some data relating to unemployment, $x \%$, and increase in wages, $y \%$, for a European country. The least squares regression line of $y$ on $x$ has equation
\end{enumerate}

$$y = 3.684 - 0.3242 x$$

and

$$\sum y = 23.7 \quad \sum y ^ { 2 } = 42.63 \quad \sum x ^ { 2 } = 756.81 \quad n = 16$$

(a) Show that $\mathrm { S } _ { y y } = 7.524375$\\
(b) Find $\mathrm { S } _ { x x }$\\
(c) Find the product moment correlation coefficient between $x$ and $y$.

Behrouz claims that, assuming the model is valid, the data show that when unemployment is 2\% wages increase at over 3\%\\
(d) Explain how Behrouz could have come to this conclusion.

Andi uses the formula

$$\text { range } = \text { mean } \pm 3 \times \text { standard deviation }$$

to estimate the range of values for $x$.\\
(e) Find estimates of the minimum value and the maximum value of $x$ in these data using Andi's formula.\\
(f) Comment, giving a reason, on the reliability of Behrouz's claim.

Andi suggests using the regression line with equation $y = 3.684 - 0.3242 x$ to estimate unemployment when wages are increasing at $2 \%$\\
(g) Comment, giving a reason, on Andi's suggestion.\\

\begin{center}
\includegraphics[max width=\textwidth, alt={}]{a439724e-b570-434d-bf75-de2b50915042-20_2647_1835_118_116}
\end{center}

\hfill \mbox{\textit{Edexcel S1 2021 Q6 [16]}}