| Exam Board | Edexcel |
|---|---|
| Module | FS2 (Further Statistics 2) |
| Year | 2022 |
| Session | June |
| Marks | 7 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Hypothesis test for regression slope |
| Difficulty | Standard +0.3 This is a straightforward FS2 question testing standard linear regression concepts: prediction from a regression line (simple substitution), definition of residual (recall), calculating PMCC from given summary statistics (formula application), and interpreting correlation/residual plots (standard textbook interpretation). All parts are routine applications of learned techniques with no novel problem-solving required, making it slightly easier than average. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09d Linear coding: effect on regression5.09e Use regression: for estimation in context |
| Answer | Marks | Guidance |
|---|---|---|
| \([20 \times 45.5 + 2080] = 2990\) [kg/ha] | B1 (1 mark) | cao |
| Answer | Marks | Guidance |
|---|---|---|
| (A residual is the) difference between the observed value (oe) and the predicted value (oe) (of the dependent variable) | B1 (1 mark) | Correct definition. Allow equivalent wording. Distance from regression line on its own is B0, but allow if vertical distance or \(y\) is referenced. |
| Answer | Marks | Guidance |
|---|---|---|
| \(1666567 = 1774155(1-r^2)\) | M1 | Use of correct expression for \(r\) or \(r^2\). Allow use of \(S_{ty} = 45.5 \times 52.0\ [=2366]\), or RSS: \(1774155 - \frac{(S_{ty})^2}{52} = 1666567 \rightarrow [S_{ty} = 2365.2...]\) |
| \(r = 0.246\ldots\) awrt 0.246 | A1 (2 marks) | awrt 0.246 \((-0.246\) or \(\pm 0.246\) scores M1A0). And then \(r = \frac{\text{awrt } 2365 \text{ or awrt } 2366}{\sqrt{52.0 \times 1774155}}\) |
| Answer | Marks | Guidance |
|---|---|---|
| Since \(r\) is close to 0/weak correlation | B1 (1 mark) | Correct explanation |
| Answer | Marks | Guidance |
|---|---|---|
| e.g. (For \(t > 20\ldots\)) the residuals do not appear randomly scattered about 0. | B1 (1 mark) | Correct evaluation of the fit of the model's residuals (e.g. variance either side of \(t = 20\) does not appear to be the same). 'Residuals not randomly scattered' on its own is B0. |
| Answer | Marks | Guidance |
|---|---|---|
| Kwame's conclusion cannot be supported using RSS since the two values of RSS do not have the same units. | B1 (1 mark) | Correct assessment of the conclusion involving the units/size of the variables used to calculate the RSS |
# Question 1:
## Part (a)
$[20 \times 45.5 + 2080] = 2990$ [kg/ha] | B1 (1 mark) | cao
## Part (b)
(A residual is the) difference between the observed value (oe) and the predicted value (oe) (of the dependent variable) | B1 (1 mark) | Correct definition. Allow equivalent wording. Distance from regression line on its own is B0, but allow if vertical distance or $y$ is referenced.
## Part (c)
$1666567 = 1774155(1-r^2)$ | M1 | Use of correct expression for $r$ or $r^2$. Allow use of $S_{ty} = 45.5 \times 52.0\ [=2366]$, or RSS: $1774155 - \frac{(S_{ty})^2}{52} = 1666567 \rightarrow [S_{ty} = 2365.2...]$
$r = 0.246\ldots$ awrt 0.246 | A1 (2 marks) | awrt 0.246 $(-0.246$ or $\pm 0.246$ scores M1A0). And then $r = \frac{\text{awrt } 2365 \text{ or awrt } 2366}{\sqrt{52.0 \times 1774155}}$
## Part (d)(i)
Since $r$ is close to 0/weak correlation | B1 (1 mark) | Correct explanation
## Part (d)(ii)
e.g. (For $t > 20\ldots$) the residuals do not appear randomly scattered about 0. | B1 (1 mark) | Correct evaluation of the fit of the model's residuals (e.g. variance either side of $t = 20$ does not appear to be the same). 'Residuals not randomly scattered' on its own is B0.
## Part (e)
Kwame's conclusion cannot be supported using RSS since the two values of RSS do not have the same units. | B1 (1 mark) | Correct assessment of the conclusion involving the units/size of the variables used to calculate the RSS
**Total: 7 marks**
\begin{enumerate}
\item Kwame is investigating a possible relationship between average March temperature, $t ^ { \circ } \mathrm { C }$, and tea yield, $y \mathrm {~kg} /$ hectare, for tea grown in a particular location. He uses 30 years of past data to produce the following summary statistics for a linear regression model, with tea yield as the dependent variable.
\end{enumerate}
$$\begin{aligned}
& \text { Residual Sum of Squares } ( \mathrm { RSS } ) = 1666567 \quad \mathrm {~S} _ { t t } = 52.0 \quad \mathrm {~S} _ { y y } = 1774155 \\
& \text { least squares regression line: } \quad \text { gradient } = 45.5 \quad y \text {-intercept } = 2080
\end{aligned}$$
(a) Use the regression model to predict the tea yield for an average March temperature of $20 ^ { \circ } \mathrm { C }$
He also produces the following residual plot for the data.\\
\includegraphics[max width=\textwidth, alt={}, center]{d139840b-16ec-42ce-8501-f79c263c8017-02_663_880_868_589}\\
(b) Explain what you understand by the term residual.\\
(c) Calculate the product moment correlation coefficient between $t$ and $y$\\
(d) Explain why the linear model may not be a good fit for the data\\
(i) with reference to your answer to part (c)\\
(ii) with reference to the residual plot.
\section*{Question 1 continues on page 4}
Kwame also collects data on total March rainfall, $w \mathrm {~mm}$, for each of these 30 years. For a linear regression model of $w$ on $t$ the following summary statistic is found.
$$\text { Residual Sum of Squares (RSS) = } 86754$$
Kwame concludes that since this model has a smaller RSS, there must be a stronger linear relationship between $w$ and $t$ than between $y$ and $t$ (where RSS $= 1666567$ )\\
(e) State, giving a reason, whether or not you agree with the reasoning that led to Kwame's conclusion.
\hfill \mbox{\textit{Edexcel FS2 2022 Q1 [7]}}