Edexcel S1 2019 January — Question 6 18 marks

Exam BoardEdexcel
ModuleS1 (Statistics 1)
Year2019
SessionJanuary
Marks18
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicBivariate data
TypeCalculate r from summary statistics
DifficultyModerate -0.3 This is a standard S1 correlation/regression question requiring routine calculations from given summary statistics. While multi-part with several steps (finding means, Sxy, correlation coefficient r, and regression line), all techniques are straightforward applications of formulas with no conceptual challenges or novel problem-solving required. Slightly easier than average due to being highly procedural.
Spec2.02c Scatter diagrams and regression lines2.02d Informal interpretation of correlation2.02f Measures of average and spread2.02g Calculate mean and standard deviation5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09c Calculate regression line

  1. Following some school examinations, Chetna is studying the results of the 16 students in her class. The mark for paper \(1 , x\), and the mark for paper \(2 , y\), for each student are summarised in the following statistics.
$$\bar { x } = 35.75 \quad \bar { y } = 25.75 \quad \sigma _ { x } = 7.79 \quad \sigma _ { y } = 11.91 \quad \sum x y = 15837$$
  1. Comment on the differences between the marks of the students on paper 1 and paper 2 Chetna decides to examine these data in more detail and plots the marks for each of the 16 students on the scatter diagram opposite.
    1. Explain why the circled point \(( 38,0 )\) is possibly an outlier.
    2. Suggest a possible reason for this result. Chetna decides to omit the data point \(( 38,0 )\) and examine the other 15 students' marks.
  2. Find the value of \(\bar { x }\) and the value of \(\bar { y }\) for these 15 students. For these 15 students
    1. explain why \(\sum x y\) is still 15837
    2. show that \(\mathrm { S } _ { x y } = 1169.8\) For these 15 students, Chetna calculates \(\mathrm { S } _ { x x } = 965.6\) and \(\mathrm { S } _ { y y } = 1561.7\) correct to 1 decimal place.
  3. Calculate the product moment correlation coefficient for these 15 students.
  4. Calculate the equation of the line of regression of \(y\) on \(x\) for these 15 students, giving your answer in the form \(y = a + b x\) The product moment correlation coefficient between \(x\) and \(y\) for all 16 students is 0.746
  5. Explain how your calculation in part (e) supports Chetna's decision to omit the point \(( 38,0 )\) before calculating the equation of the linear regression line.
    (1)
  6. Estimate the mark in the second paper for a student who scored 38 marks in the first paper.
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-17_1127_1146_301_406}
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-20_2630_1828_121_121}

Question 6:
Part (a)
AnswerMarks Guidance
AnswerMarks Guidance
Mean, median, average, marks, results score: on P2 (\(y\)) is lower than P1(\(x\))B1 One of these 5 terms seen for 1st B1
Spread, dispersion, range, st. dev, var(iance): on P2 is more than P1B1 (2)
Part (b)(i)
AnswerMarks Guidance
AnswerMarks Guidance
e.g. (38, 0) doesn't follow the pattern/trend or out of range of other points, or far from (best fit) line / other pointsB1 Suitable explanation; saying "extreme point" is B0
Part (b)(ii)
AnswerMarks Guidance
AnswerMarks Guidance
The student was absent when paper 2 was takenB1 e.g. teacher didn't mark it, wrongly recorded/plotted (2)
Part (c)
AnswerMarks Guidance
AnswerMarks Guidance
New \(\bar{x} = \frac{35.75 \times 16 - 38}{15} = \frac{534}{15} = \textbf{35.6}\)M1, A1 M1 for correct method; list requires \(\Sigma x = 534\) and \(\div 15\); A1 for 35.6 or e.g. \(35\frac{3}{5}\)
New \(\bar{y} = \frac{25.75 \times 16}{15} = 27.4\dot{6}\) awrt 27.5 (allow \(\frac{412}{15}\))B1 B1 for awrt 27.5 (3)
Part (d)(i)
AnswerMarks Guidance
AnswerMarks Guidance
New \(\sum xy = 15837 - 38 \times 0\) so no changeB1 For explanation with sight of "\(38 \times 0\)"; e.g. for (38, 0) or omitted point, \(xy = 0\)
Part (d)(ii)
AnswerMarks Guidance
AnswerMarks Guidance
\(S_{xy} = 15837 - \frac{(35.75 \times 16 - 38)(25.75 \times 16)}{15}\) or \(-\frac{\text{"534"} \times \text{"412"}}{15}\) or \(-\frac{220008}{15}\)M1 For a correct expression (can ft their 534 and their 412 if stated in (c))
\(= \textbf{1169.8}\) (*)A1cso Dependent on M1 with no incorrect working; may be seen in (e) (3)
Part (e)
AnswerMarks Guidance
AnswerMarks Guidance
\(r = \frac{1169.8}{\sqrt{965.6 \times 1561.7}} = 0.9526079\ldots\) awrt 0.953M1, A1 M1 for correct method (implied by ans = awrt 0.95); A1 for awrt 0.953 (2)
Part (f)
AnswerMarks Guidance
AnswerMarks Guidance
\(b = \frac{1169.8}{965.6}\ [= 1.21147\ldots]\)M1 1st M1 for correct expression for \(b\)
\(a = \text{"27.5"} - \text{"b"} \times \text{"35.6"}\ [= -15.6618\ldots]\)M1 2nd M1 for correct expression for \(a\) (ft means in (c))
\(y = -15.6/7 + 1.2x\); \(b\) = awrt 1.2, \(a\) = awrt \(-15.6\) or \(-15.7\)A1, A1 \(a\) and \(b\) must be in an \(x, y\) equation; 1st A1 for \(b\), 2nd A1 for \(a\) (4)
Part (g)
AnswerMarks Guidance
AnswerMarks Guidance
Value of \(r\) increased from 0.746 to 0.953 so points lie closer to a straight lineB1 Suitable comment e.g. linear relationship stronger or stronger linear correlation (1)
Part (h)
AnswerMarks Guidance
AnswerMarks Guidance
\(y = \text{"1.21..."} \times 38 - \text{"15.66..."}\) or awrt 30B1ft ft for awrt 30 or ft expression using \(x = 38\) in their equation (need not be evaluated) (1)
[18 marks total]
# Question 6:

## Part (a)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Mean, median, average, marks, results score: on P2 ($y$) is lower than P1($x$) | B1 | One of these 5 terms seen for 1st B1 |
| Spread, dispersion, range, st. dev, var(iance): on P2 is more than P1 | B1 | **(2)** |

## Part (b)(i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| e.g. (38, 0) doesn't follow the pattern/trend or out of range of other points, or far from (best fit) line / other points | B1 | Suitable explanation; saying "extreme point" is B0 |

## Part (b)(ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| The student was absent when paper 2 was taken | B1 | e.g. teacher didn't mark it, wrongly recorded/plotted **(2)** |

## Part (c)
| Answer | Marks | Guidance |
|--------|-------|----------|
| New $\bar{x} = \frac{35.75 \times 16 - 38}{15} = \frac{534}{15} = \textbf{35.6}$ | M1, A1 | M1 for correct method; list requires $\Sigma x = 534$ and $\div 15$; A1 for 35.6 or e.g. $35\frac{3}{5}$ |
| New $\bar{y} = \frac{25.75 \times 16}{15} = 27.4\dot{6}$ awrt **27.5** (allow $\frac{412}{15}$) | B1 | B1 for awrt 27.5 **(3)** |

## Part (d)(i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| New $\sum xy = 15837 - 38 \times 0$ so no change | B1 | For explanation with sight of "$38 \times 0$"; e.g. for (38, 0) or omitted point, $xy = 0$ |

## Part (d)(ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $S_{xy} = 15837 - \frac{(35.75 \times 16 - 38)(25.75 \times 16)}{15}$ or $-\frac{\text{"534"} \times \text{"412"}}{15}$ or $-\frac{220008}{15}$ | M1 | For a correct expression (can ft their 534 and their 412 if stated in (c)) |
| $= \textbf{1169.8}$ (*) | A1cso | Dependent on M1 with no incorrect working; may be seen in (e) **(3)** |

## Part (e)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $r = \frac{1169.8}{\sqrt{965.6 \times 1561.7}} = 0.9526079\ldots$ awrt **0.953** | M1, A1 | M1 for correct method (implied by ans = awrt 0.95); A1 for awrt 0.953 **(2)** |

## Part (f)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $b = \frac{1169.8}{965.6}\ [= 1.21147\ldots]$ | M1 | 1st M1 for correct expression for $b$ |
| $a = \text{"27.5"} - \text{"b"} \times \text{"35.6"}\ [= -15.6618\ldots]$ | M1 | 2nd M1 for correct expression for $a$ (ft means in (c)) |
| $y = -15.6/7 + 1.2x$; $b$ = awrt **1.2**, $a$ = awrt **$-15.6$ or $-15.7$** | A1, A1 | $a$ and $b$ must be in an $x, y$ equation; 1st A1 for $b$, 2nd A1 for $a$ **(4)** |

## Part (g)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Value of $r$ increased from 0.746 to 0.953 so points lie closer to a straight line | B1 | Suitable comment e.g. linear relationship **stronger** or stronger linear correlation **(1)** |

## Part (h)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $y = \text{"1.21..."} \times 38 - \text{"15.66..."}$ or awrt **30** | B1ft | ft for awrt 30 or ft expression using $x = 38$ in their equation (need not be evaluated) **(1)** |

**[18 marks total]**
\begin{enumerate}
  \item Following some school examinations, Chetna is studying the results of the 16 students in her class. The mark for paper $1 , x$, and the mark for paper $2 , y$, for each student are summarised in the following statistics.
\end{enumerate}

$$\bar { x } = 35.75 \quad \bar { y } = 25.75 \quad \sigma _ { x } = 7.79 \quad \sigma _ { y } = 11.91 \quad \sum x y = 15837$$

(a) Comment on the differences between the marks of the students on paper 1 and paper 2

Chetna decides to examine these data in more detail and plots the marks for each of the 16 students on the scatter diagram opposite.\\
(b) (i) Explain why the circled point $( 38,0 )$ is possibly an outlier.\\
(ii) Suggest a possible reason for this result.

Chetna decides to omit the data point $( 38,0 )$ and examine the other 15 students' marks.\\
(c) Find the value of $\bar { x }$ and the value of $\bar { y }$ for these 15 students.

For these 15 students\\
(d) (i) explain why $\sum x y$ is still 15837\\
(ii) show that $\mathrm { S } _ { x y } = 1169.8$

For these 15 students, Chetna calculates $\mathrm { S } _ { x x } = 965.6$ and $\mathrm { S } _ { y y } = 1561.7$ correct to 1 decimal place.\\
(e) Calculate the product moment correlation coefficient for these 15 students.\\
(f) Calculate the equation of the line of regression of $y$ on $x$ for these 15 students, giving your answer in the form $y = a + b x$

The product moment correlation coefficient between $x$ and $y$ for all 16 students is 0.746\\
(g) Explain how your calculation in part (e) supports Chetna's decision to omit the point $( 38,0 )$ before calculating the equation of the linear regression line.\\
(1)\\
(h) Estimate the mark in the second paper for a student who scored 38 marks in the first paper.

\begin{center}
\includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-17_1127_1146_301_406}
\end{center}

\begin{center}
\includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-20_2630_1828_121_121}
\end{center}

\hfill \mbox{\textit{Edexcel S1 2019 Q6 [18]}}