OCR S1 2012 January — Question 2 10 marks

Exam BoardOCR
ModuleS1 (Statistics 1)
Year2012
SessionJanuary
Marks10
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicBivariate data
TypeDistinguish dependent and independent variables
DifficultyEasy -1.8 This is a straightforward S1 question testing basic definitions and standard formula application. Part (i) requires simple recall of what 'independent variable' means, part (ii) is direct substitution into the PMCC formula, and part (iii) involves standard regression line calculation plus commenting on interpolation vs extrapolation—all routine textbook exercises with no problem-solving or novel insight required.
Spec5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09c Calculate regression line5.09d Linear coding: effect on regression

2 In an experiment, the percentage sand content, \(y\), of soil in a given region was measured at nine different depths, \(x \mathrm {~cm}\), taken at intervals of 6 cm from 0 cm to 48 cm . The results are summarised below. $$n = 9 \quad \Sigma x = 216 \quad \Sigma x ^ { 2 } = 7344 \quad \Sigma y = 512.4 \quad \Sigma y ^ { 2 } = 30595 \quad \Sigma x y = 10674$$
  1. State, with a reason, which variable is the independent variable.
  2. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. (a) Calculate the equation of the appropriate regression line.
    (b) This regression line is used to estimate the percentage sand content at depths of 25 cm and 100 cm . Comment on the reliability of each of these estimates. You are not asked to find the estimates.

Question 2:
Part (i)
AnswerMarks Guidance
AnswerMarks Guidance
\(x\), because values (or depths) are fixed (or controlled or chosen or predetermined or manipulated or given) oe; because they can be changed or it is changed or because it is not measured ie not "read off" oe; or because we change the values ourselvesB1 [1] Allow "because it goes up in intervals" or "because it is taken at set intervals"; Ignore all else; NB "\(x\) is changed" B1, but "\(x\) changes" B0
Part (ii)
AnswerMarks Guidance
AnswerMarks Guidance
\(S_{xx} = 7344 - \frac{216^2}{9}\) \((= 2160)\)
\(S_{yy} = 30595 - \frac{512.4^2}{9}\) \((= 1422.36)\)
\(S_{xy} = 10674 - \frac{216 \times 512.4}{9}\) \((= -1623.6)\)M1 correct substitution in any \(S\) formula
\(r = \frac{-1623.6}{\sqrt{2160 \times 1422.36}}\)M1 correct substitution in all \(S\)s and in \(r\)
\(= -0.926\) (3 sf)A1 [3]
Part (iii)(a)
AnswerMarks Guidance
AnswerMarks Guidance
\(b = \frac{-1623.6}{2160}\) or \(-0.75\ldots\) or \(-\frac{451}{600}\)M1 ft \(S_{xy}\) and \(S_{xx}\) from (ii)
\(y - \frac{512.4}{9} = \text{"-}0.75\ldots\text{"}(x - \frac{216}{9})\)M1 or \(a = \frac{512.4}{9} - 0.75\ldots \times (-\frac{216}{9})\) or \(\frac{5623}{75}\)
\(y = -0.75x + 75(.0)\) (2 sf) or \(y = -\frac{451}{600}x + \frac{5623}{75}\)A1 [3] 2 sf is enough; Allow \(y = -0.75x + (-75)\)
Part (iii)(b)
AnswerMarks Guidance
AnswerMarks Guidance
\(r\) close to \(-1\) (or high or strong), \(r \) close to 1
25 within range of data oe, so reliableB1 or … so more reliable
100 outside range of data oe, so unreliableB1 [3] or … so less reliable
## Question 2:

### Part (i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $x$, because values (or depths) are fixed (or controlled or chosen or predetermined or manipulated or given) oe; because they can be changed or it is changed or because it is not measured ie not "read off" oe; or because we change the values ourselves | B1 [1] | Allow "because it goes up in intervals" or "because it is taken at set intervals"; Ignore all else; NB "$x$ is changed" B1, but "$x$ changes" B0 | NOT: $x$ as values are constant; $x$ as $y$ depends on $x$; $x$ as % sand depends on depth; Depth as not affected by %; sand content; $x$ as it is not dependent; $x$ because $y$ is measured; $x$ because it changes; $y$ which is the depth and this is controlled |

### Part (ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $S_{xx} = 7344 - \frac{216^2}{9}$ $(= 2160)$ | | |
| $S_{yy} = 30595 - \frac{512.4^2}{9}$ $(= 1422.36)$ | | |
| $S_{xy} = 10674 - \frac{216 \times 512.4}{9}$ $(= -1623.6)$ | M1 | correct substitution in any $S$ formula |
| $r = \frac{-1623.6}{\sqrt{2160 \times 1422.36}}$ | M1 | correct substitution in all $S$s and in $r$ |
| $= -0.926$ (3 sf) | A1 [3] | |

### Part (iii)(a)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $b = \frac{-1623.6}{2160}$ or $-0.75\ldots$ or $-\frac{451}{600}$ | M1 | ft $S_{xy}$ and $S_{xx}$ from (ii) | If ans to (i) is $y$, and $x$ on $y$ found here: $b' = \frac{-1623.6}{1422.36}$ $(= -1.14)$ M1 |
| $y - \frac{512.4}{9} = \text{"-}0.75\ldots\text{"}(x - \frac{216}{9})$ | M1 | or $a = \frac{512.4}{9} - 0.75\ldots \times (-\frac{216}{9})$ or $\frac{5623}{75}$ | $x - \frac{216}{9} = \text{"-}1.14\text{"}(y - \frac{512.4}{9})$ M1 |
| $y = -0.75x + 75(.0)$ (2 sf) or $y = -\frac{451}{600}x + \frac{5623}{75}$ | A1 [3] | 2 sf is enough; Allow $y = -0.75x + (-75)$ | $x = -1.14y + 89(.0)$ A1 |

### Part (iii)(b)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $r$ close to $-1$ (or high or strong), $|r|$ close to 1 | B1 | Allow strong or good or high correlation or relationship etc | NOT strong neg correlation; Award even if comment linked to 100 instead of 25; BUT: "$r$ close to -1, so unreliable": B0 |
| 25 within range of data oe, so reliable | B1 | or … so more reliable | |
| 100 outside range of data oe, so unreliable | B1 [3] | or … so less reliable | or 100 gives neg %age; If (ii) $|r| < 0.7$: poor corr'n oe B1f; 25 unreliable B1f; 100 unreliable B1f |

---
2 In an experiment, the percentage sand content, $y$, of soil in a given region was measured at nine different depths, $x \mathrm {~cm}$, taken at intervals of 6 cm from 0 cm to 48 cm . The results are summarised below.

$$n = 9 \quad \Sigma x = 216 \quad \Sigma x ^ { 2 } = 7344 \quad \Sigma y = 512.4 \quad \Sigma y ^ { 2 } = 30595 \quad \Sigma x y = 10674$$
\begin{enumerate}[label=(\roman*)]
\item State, with a reason, which variable is the independent variable.
\item Calculate the product moment correlation coefficient between $x$ and $y$.
\item (a) Calculate the equation of the appropriate regression line.\\
(b) This regression line is used to estimate the percentage sand content at depths of 25 cm and 100 cm . Comment on the reliability of each of these estimates. You are not asked to find the estimates.
\end{enumerate}

\hfill \mbox{\textit{OCR S1 2012 Q2 [10]}}