| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2002 |
| Session | January |
| Marks | 19 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Calculate r from summary statistics |
| Difficulty | Moderate -0.3 This is a standard S1 bivariate data question requiring routine application of correlation and regression formulas with given summary statistics. All calculations follow textbook procedures (PMCC formula, regression line equation), though it requires careful arithmetic and understanding of outlier effects. The conceptual demands are minimal—slightly easier than average due to provided summaries and straightforward interpretation questions. |
| Spec | 2.02c Scatter diagrams and regression lines5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables |
| Food | \(t\) | \(s\) |
| Packet of biscuits | 170 | 420 |
| 1 potato | 90 | 160 |
| 1 apple | 80 | 110 |
| Crisp breads | 10 | 70 |
| Chocolate bar | 260 | 360 |
| 1 slice white bread | 75 | 135 |
| 1 slice brown bread | 60 | 115 |
| Portion of beef curry | 270 | 350 |
| Portion of rice pudding | 165 | 390 |
| Half a pint of milk | 160 | 200 |
| Answer | Marks | Guidance |
|---|---|---|
| Scatter plot | B1 | |
| Points | B2 (3) | (8,9 points \(\Rightarrow\) B1) |
| Answer | Marks |
|---|---|
| \((E, 5)\) plot line | B1/ \(\checkmark\) |
| Correct line | B1/ |
| Answer | Marks | Guidance |
|---|---|---|
| \(S_y = 694650 - \frac{2510^2}{16} = 161040\) | M1 A1 | |
| \(S_u = 66930\); \(S_e = 87335\) | A1 A1 | |
| \(r = \frac{87335}{\sqrt{66930 \times 161040}} = 0.843035\)... | M1 | \(\frac{S_{ee}}{\sqrt{(S_{uu} S_{ee})}}\) |
| \(r = 0.843\) | A1 (3) | 0.843 |
| Sp: \(0.843\) without working \(\Rightarrow\) B1t only | B1; B1(2) |
| Answer | Marks |
|---|---|
| No change; coding does not affect \(t\) | M1 |
| Answer | Marks | Guidance |
|---|---|---|
| \(\hat{b} = \frac{72557.5}{63671.875} = 1.14002\)... | M1 | |
| \(\hat{a} = 187.5 - (1.140024... \times 125.625) = 44.2044\)... | M1 | |
| \(\therefore S = 44.3 + 1.44t\) | A1 (3) | must use \(S\) & \(t\) |
| Answer | Marks |
|---|---|
| Graph | \(\checkmark\) |
| B1 | |
| B1(2) |
| Answer | Marks |
|---|---|
| Both points above the line, to mean line up | B1 |
| Prediction of \(S\) from \(t\) less accurate | B1 (2) |
## Part (a)
Scatter plot | B1 |
Points | B2 (3) | (8,9 points $\Rightarrow$ B1)
## Part (c)
$(E, 5)$ plot line | B1/ $\checkmark$ |
Correct line | B1/ |
## Part (b)
$S_y = 694650 - \frac{2510^2}{16} = 161040$ | M1 A1 |
$S_u = 66930$; $S_e = 87335$ | A1 A1 |
$r = \frac{87335}{\sqrt{66930 \times 161040}} = 0.843035$... | M1 | $\frac{S_{ee}}{\sqrt{(S_{uu} S_{ee})}}$
$r = 0.843$ | A1 (3) | 0.843
Sp: $0.843$ without working $\Rightarrow$ B1t only | B1; B1(2) |
## Part (c)
No change; coding does not affect $t$ | M1 |
## Part (d)
$\hat{b} = \frac{72557.5}{63671.875} = 1.14002$... | M1 |
$\hat{a} = 187.5 - (1.140024... \times 125.625) = 44.2044$... | M1 |
$\therefore S = 44.3 + 1.44t$ | A1 (3) | must use $S$ & $t$
## Part (e)
Graph | $\checkmark$ |
| B1 |
| B1(2) |
## Part (f)
Both points above the line, to mean line up | B1 |
Prediction of $S$ from $t$ less accurate | B1 (2) |
---
A number of people were asked to guess the calorific content of 10 foods. The mean $s$ of the guesses for each food and the true calorific content $t$ are given in the table below.
\begin{center}
\begin{tabular}{|l|c|c|}
\hline
Food & $t$ & $s$ \\
\hline
Packet of biscuits & 170 & 420 \\
1 potato & 90 & 160 \\
1 apple & 80 & 110 \\
Crisp breads & 10 & 70 \\
Chocolate bar & 260 & 360 \\
1 slice white bread & 75 & 135 \\
1 slice brown bread & 60 & 115 \\
Portion of beef curry & 270 & 350 \\
Portion of rice pudding & 165 & 390 \\
Half a pint of milk & 160 & 200 \\
\hline
\end{tabular}
\end{center}
[You may assume that $\Sigma t = 1340$, $\Sigma s = 2310$, $\Sigma ts = 396775$, $\Sigma t^2 = 246050$, $\Sigma s^2 = 694650$.]
\begin{enumerate}[label=(\alph*)]
\item Draw a scatter diagram, indicating clearly which is the explanatory (independent) and which is the response (dependent) variable. [3]
\item Calculate, to 3 significant figures, the product moment correlation coefficient for the above data. [7]
\item State, with a reason, whether or not the value of the product moment correlation coefficient changes if all the guesses are 50 calories higher than the values in the table. [2]
\end{enumerate}
The mean of the guesses for the portion of rice pudding and for the packet of biscuits are outside the linear relation of the other eight foods.
\begin{enumerate}[label=(\alph*)]
\setcounter{enumi}{3}
\item Find the equation of the regression line of $s$ on $t$ excluding the values for rice pudding and biscuits. [3]
\end{enumerate}
[You may now assume that $S_{tt} = 72587$, $S_{st} = 63671.875$, $\bar{t} = 125.625$, $\bar{s} = 187.5$.]
\begin{enumerate}[label=(\alph*)]
\setcounter{enumi}{4}
\item Draw the regression line on your scatter diagram. [2]
\item State, with a reason, what the effect would be on the regression line of including the values for a portion of rice pudding and a packet of biscuits. [2]
\end{enumerate}
\hfill \mbox{\textit{Edexcel S1 2002 Q7 [19]}}