OCR S1 2005 January — Question 9 15 marks

Exam BoardOCR
ModuleS1 (Statistics 1)
Year2005
SessionJanuary
Marks15
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeCalculate regression line then predict
DifficultyStandard +0.3 This is a standard S1 regression question requiring routine application of formulas (calculating gradient/intercept, making predictions, working with residuals). Part (i) uses given summaries with standard formulas, parts (ii-v) involve straightforward arithmetic and recall of residual properties. While multi-part with several calculations, each step follows textbook procedures without requiring problem-solving insight or novel approaches.
Spec5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09d Linear coding: effect on regression

9 Five observations of bivariate data produce the following results, denoted as ( \(x _ { i } , y _ { i }\) ) for \(i = 1,2,3,4,5\). $$\begin{aligned} & ( 13,2.7 ) \\ & { \left[ \Sigma x = 90 , \Sigma y = 15.0 , \Sigma x ^ { 2 } = 1720 , \Sigma y ^ { 2 } = 46.86 , \Sigma x y = 264.0 . \right] } \end{aligned}$$
  1. Show that the regression line of \(y\) on \(x\) has gradient - 0.06 , and find its equation in the form \(y = a + b x\).
  2. The regression line is used to estimate the value of \(y\) corresponding to \(x = 20\), but the value \(x = 20\) is accurate only to the nearest whole number. Calculate the difference between the largest and the smallest values that the estimated value of \(y\) could take. The numbers \(e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }\) are defined by $$e _ { i } = a + b x _ { i } - y _ { i } \quad \text { for } i = 1,2,3,4,5$$
  3. The values of \(e _ { 1 } , e _ { 2 }\) and \(e _ { 3 }\) are \(0.6 , - 0.7\) and 0.2 respectively. Calculate the values of \(e _ { 4 }\) and \(e _ { 5 }\).
  4. Calculate the value of \(e _ { 1 } ^ { 2 } + e _ { 2 } ^ { 2 } + e _ { 3 } ^ { 2 } + e _ { 4 } ^ { 2 } + e _ { 5 } ^ { 2 }\) and explain the relevance of this quantity to the regression line found in part (i).
  5. Find the mean and the variance of \(e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }\).

Question 9:
Part (i)
AnswerMarks Guidance
AnswerMark Guidance
\(\frac{264 - \frac{90 \times 15}{5}}{1720 - \frac{90^2}{5}}\) or \(\frac{264 - 5 \times 18 \times 3}{1720 - 5 \times 18^2}\)M1 Formula correctly used
\(= -0.06\) AGA1 \(-0.06\) correctly obtained
\(y - \frac{15}{5} = -0.06\left(x - \frac{90}{5}\right)\)M1 or \(a = \frac{15}{5} - (-0.06) \times \frac{90}{5}\)
\(y = 4.08 - 0.06x\)A1 4 marks, complete equation correct
Part (ii)
AnswerMarks Guidance
AnswerMark Guidance
Substitute \(x = 20.5\ (y = 2.85)\)M1 Allow \(20\ (y = 2.88)\) or \(20.49\)
Substitute \(x = 19.5\ (y = 2.91)\)M1
\(2.91 - 2.85 = 0.06\)A1 3 marks, answer \(0.06\) or \(-0.06\), c.w.d
Part (iii)
AnswerMarks Guidance
AnswerMark Guidance
\(-0.6\)B1 \(-0.6\) correct
\(0.5\)B1 2 marks, \(0.5\) correct
Part (iv)
AnswerMarks Guidance
AnswerMark Guidance
\(1.5\)B1
Calculated equation minimises this quantityB1 2 marks, not "Low value for \(\Sigma e^2\) means points near line"
Part (v)
AnswerMarks Guidance
AnswerMark Guidance
\(\bar{e} = \Sigma e_i / 5\)M1 \(\Sigma e_i / 5\) used
\(= 0\)A1 Answer \(0\), cwd, cao
\(\Sigma e_i^2 / 5 - (-\text{her}\ \bar{e})^2\)M1 \(\Sigma e_i^2 / 5\)
\(= 0.3\)A1 4 marks, \(0.3\) only, must see \(-0^2\) or \(-0\) in variance. ie: No working: \(\bar{e} = 0\): M1A1; Var \(= 0.3\): M1A0
# Question 9:

## Part (i)
| Answer | Mark | Guidance |
|--------|------|----------|
| $\frac{264 - \frac{90 \times 15}{5}}{1720 - \frac{90^2}{5}}$ or $\frac{264 - 5 \times 18 \times 3}{1720 - 5 \times 18^2}$ | M1 | Formula correctly used |
| $= -0.06$ AG | A1 | $-0.06$ correctly obtained |
| $y - \frac{15}{5} = -0.06\left(x - \frac{90}{5}\right)$ | M1 | or $a = \frac{15}{5} - (-0.06) \times \frac{90}{5}$ |
| $y = 4.08 - 0.06x$ | A1 | 4 marks, complete equation correct |

## Part (ii)
| Answer | Mark | Guidance |
|--------|------|----------|
| Substitute $x = 20.5\ (y = 2.85)$ | M1 | Allow $20\ (y = 2.88)$ or $20.49$ |
| Substitute $x = 19.5\ (y = 2.91)$ | M1 | |
| $2.91 - 2.85 = 0.06$ | A1 | 3 marks, answer $0.06$ or $-0.06$, c.w.d |

## Part (iii)
| Answer | Mark | Guidance |
|--------|------|----------|
| $-0.6$ | B1 | $-0.6$ correct |
| $0.5$ | B1 | 2 marks, $0.5$ correct |

## Part (iv)
| Answer | Mark | Guidance |
|--------|------|----------|
| $1.5$ | B1 | |
| Calculated equation minimises this quantity | B1 | 2 marks, not "Low value for $\Sigma e^2$ means points near line" |

## Part (v)
| Answer | Mark | Guidance |
|--------|------|----------|
| $\bar{e} = \Sigma e_i / 5$ | M1 | $\Sigma e_i / 5$ used |
| $= 0$ | A1 | Answer $0$, cwd, cao |
| $\Sigma e_i^2 / 5 - (-\text{her}\ \bar{e})^2$ | M1 | $\Sigma e_i^2 / 5$ |
| $= 0.3$ | A1 | 4 marks, $0.3$ only, must see $-0^2$ or $-0$ in variance. ie: No working: $\bar{e} = 0$: M1A1; Var $= 0.3$: M1A0 |
9 Five observations of bivariate data produce the following results, denoted as ( $x _ { i } , y _ { i }$ ) for $i = 1,2,3,4,5$.

$$\begin{aligned}
& ( 13,2.7 ) \\
& { \left[ \Sigma x = 90 , \Sigma y = 15.0 , \Sigma x ^ { 2 } = 1720 , \Sigma y ^ { 2 } = 46.86 , \Sigma x y = 264.0 . \right] }
\end{aligned}$$

(i) Show that the regression line of $y$ on $x$ has gradient - 0.06 , and find its equation in the form $y = a + b x$.\\
(ii) The regression line is used to estimate the value of $y$ corresponding to $x = 20$, but the value $x = 20$ is accurate only to the nearest whole number. Calculate the difference between the largest and the smallest values that the estimated value of $y$ could take.

The numbers $e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }$ are defined by

$$e _ { i } = a + b x _ { i } - y _ { i } \quad \text { for } i = 1,2,3,4,5$$

(iii) The values of $e _ { 1 } , e _ { 2 }$ and $e _ { 3 }$ are $0.6 , - 0.7$ and 0.2 respectively. Calculate the values of $e _ { 4 }$ and $e _ { 5 }$.\\
(iv) Calculate the value of $e _ { 1 } ^ { 2 } + e _ { 2 } ^ { 2 } + e _ { 3 } ^ { 2 } + e _ { 4 } ^ { 2 } + e _ { 5 } ^ { 2 }$ and explain the relevance of this quantity to the regression line found in part (i).\\
(v) Find the mean and the variance of $e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }$.

\hfill \mbox{\textit{OCR S1 2005 Q9 [15]}}