| Exam Board | Edexcel |
|---|---|
| Module | FS2 AS (Further Statistics 2 AS) |
| Year | 2019 |
| Session | June |
| Marks | 11 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Calculate PMCC from summary statistics |
| Difficulty | Standard +0.3 This is a standard Further Statistics question testing routine PMCC calculation from summary statistics and understanding of linear coding. Part (a) requires direct formula application, (b) tests knowledge that correlation is invariant under linear transformations, (c)-(e) involve straightforward algebraic manipulation and RSS calculation, and (f) requires basic interpretation of residual plots. While it's a Further Maths topic, the techniques are mechanical with no novel problem-solving required, making it slightly easier than average overall. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09c Calculate regression line5.09e Use regression: for estimation in context |
| Answer | Marks | Guidance |
|---|---|---|
| Working/Answer | Mark | Guidance |
| \(S_{ll} = 26.2326 - \frac{16.06^2}{10} = 0.44024\) | [given] | — |
| \(r = \frac{42.786}{\sqrt{9936.9 \times \text{"0.44024"}}}\) | M1 | 1.1b — complete correct method to find \(r\) |
| \(r = 0.64689...\) awrt \(0.647\) | A1 | 1.1b — for awrt 0.647 |
| Answer | Marks | Guidance |
|---|---|---|
| Working/Answer | Mark | Guidance |
| "0.647" coding has no effect on the pmcc | B1ft | 1.1b — stating their answer to (a) and a correct reason |
| Answer | Marks | Guidance |
|---|---|---|
| Working/Answer | Mark | Guidance |
| \(l - 20 = 0.00431(w - 6) - 18.87\) | M1 | 3.1a — use of a correct model, correct expression for \(b\) |
| \(l = 0.00431w + \ldots\) | M1 | 1.1b — correct expression (ft) for \(a\) |
| \(l = 0.00431w + 1.10414\) | A1 | 1.1b — correct model, awrt 0.00431 and awrt 1.10 |
| Answer | Marks | Guidance |
|---|---|---|
| Working/Answer | Mark | Guidance |
| \(l = 0.00431 \times 100 + 1.10 = 1.53\) | B1ft | 3.4 — correct answer using their equation and \(w=100\); allow awrt 1.53/1.54 |
| Answer | Marks | Guidance |
|---|---|---|
| Working/Answer | Mark | Guidance |
| \(\text{RSS} = \text{"0.44024"} - \frac{(42.786)^2}{9936.9}\) or \(\text{"0.44024"}(1 - \text{"0.647"}^2)\) | M1 | 1.1b — correct expression for RSS |
| \(\text{RSS} = 0.2560\) | A1 | 1.1b — awrt 0.256 |
| Answer | Marks | Guidance |
|---|---|---|
| Working/Answer | Mark | Guidance |
| (i) Points appear randomly scattered above and below zero giving no reason to doubt the suitability of the linear model | B1 | 3.5a — explaining why model may be suitable; allow randomly scattered around \(w\) (\(x\)) axis; do not allow "most residuals close to zero" or "not randomly scattered" |
| (ii) There is a possible outlier that could be removed (and the regression line recalculated) | B1 | 3.5c — explaining how the fit of the model might be improved |
## Question 3:
### Part (a):
| Working/Answer | Mark | Guidance |
|---|---|---|
| $S_{ll} = 26.2326 - \frac{16.06^2}{10} = 0.44024$ | [given] | — |
| $r = \frac{42.786}{\sqrt{9936.9 \times \text{"0.44024"}}}$ | M1 | 1.1b — complete correct method to find $r$ |
| $r = 0.64689...$ awrt $0.647$ | A1 | 1.1b — for awrt 0.647 |
### Part (b):
| Working/Answer | Mark | Guidance |
|---|---|---|
| "0.647" coding has no effect on the pmcc | B1ft | 1.1b — stating their answer to (a) and a correct reason |
### Part (c):
| Working/Answer | Mark | Guidance |
|---|---|---|
| $l - 20 = 0.00431(w - 6) - 18.87$ | M1 | 3.1a — use of a correct model, correct expression for $b$ |
| $l = 0.00431w + \ldots$ | M1 | 1.1b — correct expression (ft) for $a$ |
| $l = 0.00431w + 1.10414$ | A1 | 1.1b — correct model, awrt 0.00431 and awrt 1.10 |
### Part (d):
| Working/Answer | Mark | Guidance |
|---|---|---|
| $l = 0.00431 \times 100 + 1.10 = 1.53$ | B1ft | 3.4 — correct answer using their equation and $w=100$; allow awrt 1.53/1.54 |
### Part (e):
| Working/Answer | Mark | Guidance |
|---|---|---|
| $\text{RSS} = \text{"0.44024"} - \frac{(42.786)^2}{9936.9}$ or $\text{"0.44024"}(1 - \text{"0.647"}^2)$ | M1 | 1.1b — correct expression for RSS |
| $\text{RSS} = 0.2560$ | A1 | 1.1b — awrt 0.256 |
### Part (f):
| Working/Answer | Mark | Guidance |
|---|---|---|
| (i) Points appear **randomly** scattered above and below zero giving no reason to doubt the suitability of the linear model | B1 | 3.5a — explaining why model may be suitable; allow randomly scattered around $w$ ($x$) axis; do not allow "most residuals close to zero" or "not randomly scattered" |
| (ii) There is a possible outlier that could be removed (and the regression line recalculated) | B1 | 3.5c — explaining how the fit of the model might be improved |
---
\begin{enumerate}
\item Two students, Jim and Dora, collected data on the mean annual rainfall, $w \mathrm {~cm}$, and the annual yield of leeks, $l$ tonnes per hectare, for 10 years.
\end{enumerate}
Jim summarised the data as follows
$$\mathrm { S } _ { w l } = 42.786 \quad \mathrm {~S} _ { w w } = 9936.9 \quad \sum l ^ { 2 } = 26.2326 \quad \sum l = 16.06$$
(a) Find the product moment correlation coefficient between $l$ and $w$
Dora decided to code the data first using $s = w - 6$ and $t = l - 20$\\
(b) Write down the value of the product moment correlation coefficient between $s$ and $t$. Give a justification for your answer.
Dora calculates the equation of the regression line of $t$ on $s$ to be $t = 0.00431 s - 18.87$\\
(c) Find the equation of the regression line of $l$ on $w$ in the form $l = a + b w$, giving the values of $a$ and $b$ to 3 significant figures.\\
(d) Use your equation to estimate the yield of leeks when $w$ is 100 cm .\\
(e) Calculate the residual sum of squares.
The graph shows the residual for each value of $l$\\
\includegraphics[max width=\textwidth, alt={}, center]{7e46e14a-0f5a-4d02-8f00-a92bc4def6d7-08_716_1594_1594_239}\\
(f) (i) State whether this graph suggests that the use of a linear regression model is suitable for these data. Give a reason for your answer.\\
(ii) Other than collecting more data, suggest how to improve the fit of the model in part (c) to the data.
\hfill \mbox{\textit{Edexcel FS2 AS 2019 Q3 [11]}}