| Exam Board | Edexcel |
|---|---|
| Module | FS2 AS (Further Statistics 2 AS) |
| Year | 2018 |
| Session | June |
| Marks | 11 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Calculate r from summary statistics |
| Difficulty | Moderate -0.3 This is a standard Further Statistics question testing routine application of correlation and regression formulas with given summary statistics. Parts (a)-(d) require direct formula substitution with no conceptual challenges. Parts (e)-(f) test basic understanding of outliers and their effect on correlation. While it's Further Maths content, the calculations are mechanical and the interpretation straightforward, making it slightly easier than an average A-level question overall. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09d Linear coding: effect on regression5.09e Use regression: for estimation in context |
| V349 SIHI NI IMIMM ION OC | VJYV SIHIL NI LIIIM ION OO | VJYV SIHIL NI JIIYM ION OC |
| Answer | Marks | Guidance |
|---|---|---|
| Working | Mark | Guidance |
| \([S_{pp} = 4748 - \frac{254^2}{16} = 715.75]\), \(r = \frac{1115}{\sqrt{1846 \times (\text{"715.75"})}}\) | M1 | Complete correct method for finding \(r\) |
| \(r = 0.970014...\) awrt \(\mathbf{0.970}\) | A1 | Allow 0.97 from correct working |
| Answer | Marks | Guidance |
|---|---|---|
| Working | Mark | Guidance |
| \(b = \frac{1115}{1846} [= 0.6040...]\) | M1 | Correct expression for \(b\) |
| \(a = \frac{254}{16} - \text{"b"}\frac{392}{16} [= 1.076...]\) | M1 | Correct (ft) expression for \(a\) |
| \(p = 1.08 + 0.604m\) | A1 | awrt 1.08 and awrt 0.604; no fractions; must be in terms of \(p\) and \(m\) |
| Answer | Marks | Guidance |
|---|---|---|
| Working | Mark | Guidance |
| \(\text{RSS} = \text{"715.75"} - \frac{1115^2}{1846}\) or \(\text{RSS} = \text{"715.75"}(1 - \text{"0.970"}^2)\) | M1 | Correct expression for RSS |
| \(\text{RSS} = 42.28033...\) awrt \(\mathbf{42.3}\) | A1 |
| Answer | Marks | Guidance |
|---|---|---|
| Working | Mark | Guidance |
| \(p = 1.08 + 0.604(30) + \text{residual}\) | M1 | Substitution of \(m = 30\) into regression equation and adding residual |
| \(p = \mathbf{18}\) | A1ft |
| Answer | Marks | Guidance |
|---|---|---|
| Working | Mark | Guidance |
| \( | \text{residual} | \) is large/may be an outlier |
| Answer | Marks | Guidance |
|---|---|---|
| Working | Mark | Guidance |
| New \(r\) should be closer to 1 than part (a) since remaining points are likely to be closer to the new regression line | B1 | Closer to 1 than part (a) / increase; correct supporting reason about relative strength of correlation (condone outlier removed) |
# Question 1:
## Part (a)
| Working | Mark | Guidance |
|---------|------|----------|
| $[S_{pp} = 4748 - \frac{254^2}{16} = 715.75]$, $r = \frac{1115}{\sqrt{1846 \times (\text{"715.75"})}}$ | M1 | Complete correct method for finding $r$ |
| $r = 0.970014...$ awrt $\mathbf{0.970}$ | A1 | Allow 0.97 from correct working |
## Part (b)
| Working | Mark | Guidance |
|---------|------|----------|
| $b = \frac{1115}{1846} [= 0.6040...]$ | M1 | Correct expression for $b$ |
| $a = \frac{254}{16} - \text{"b"}\frac{392}{16} [= 1.076...]$ | M1 | Correct (ft) expression for $a$ |
| $p = 1.08 + 0.604m$ | A1 | awrt 1.08 and awrt 0.604; no fractions; must be in terms of $p$ and $m$ |
## Part (c)
| Working | Mark | Guidance |
|---------|------|----------|
| $\text{RSS} = \text{"715.75"} - \frac{1115^2}{1846}$ or $\text{RSS} = \text{"715.75"}(1 - \text{"0.970"}^2)$ | M1 | Correct expression for RSS |
| $\text{RSS} = 42.28033...$ awrt $\mathbf{42.3}$ | A1 | |
## Part (d)
| Working | Mark | Guidance |
|---------|------|----------|
| $p = 1.08 + 0.604(30) + \text{residual}$ | M1 | Substitution of $m = 30$ into regression equation and adding residual |
| $p = \mathbf{18}$ | A1ft | |
## Part (e)
| Working | Mark | Guidance |
|---------|------|----------|
| $|\text{residual}|$ is large/may be an outlier | B1 | Residual is far from 0; it may be an outlier/anomaly/does not fit the trend |
## Part (f)
| Working | Mark | Guidance |
|---------|------|----------|
| New $r$ should be closer to 1 than part (a) since remaining points are likely to be closer to the new regression line | B1 | Closer to 1 than part (a) / increase; correct supporting reason about relative strength of correlation (condone outlier removed) |
---
\begin{enumerate}
\item The scores achieved on a maths test, $m$, and the scores achieved on a physics test, $p$, by 16 students are summarised below.
\end{enumerate}
$$\sum m = 392 \quad \sum p = 254 \quad \sum p ^ { 2 } = 4748 \quad \mathrm {~S} _ { m m } = 1846 \quad \mathrm {~S} _ { m p } = 1115$$
(a) Find the product moment correlation coefficient between $m$ and $p$\\
(b) Find the equation of the linear regression line of $p$ on $m$
Figure 1 shows a plot of the residuals.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{0fcb4d83-9763-4edd-8006-93f75a44c596-02_808_1222_997_429}
\captionsetup{labelformat=empty}
\caption{Figure 1}
\end{center}
\end{figure}
(c) Calculate the residual sum of squares (RSS).
For the person who scored 30 marks on the maths test,\\
(d) find the score on the physics test.
The data for the person who scored 20 on the maths test is removed from the data set.\\
(e) Suggest a reason why.
The product moment correlation coefficient between $m$ and $p$ is now recalculated for the remaining 15 students.\\
(f) Without carrying out any further calculations, suggest how you would expect this recalculated value to compare with your answer to part (a).\\
Give a reason for your answer.
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
V349 SIHI NI IMIMM ION OC & VJYV SIHIL NI LIIIM ION OO & VJYV SIHIL NI JIIYM ION OC \\
\hline
\hline
\end{tabular}
\end{center}
\hfill \mbox{\textit{Edexcel FS2 AS 2018 Q1 [11]}}