OCR Further Statistics AS 2023 June — Question 3 8 marks

Exam BoardOCR
ModuleFurther Statistics AS (Further Statistics AS)
Year2023
SessionJune
Marks8
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeCalculate regression line then predict
DifficultyStandard +0.3 This is a standard linear regression question requiring routine application of formulas for variance, regression line, and prediction. Part (d) adds mild interpretation using standard deviation to assess data range, and part (e) requires commenting on correlation strength—both straightforward for Further Maths students. The calculations are mechanical with no novel problem-solving required.
Spec5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line

3 An insurance company collected data concerning the age, \(x\) years, of policy holders and the average size of claim, \(\pounds y\) thousand. The data is summarised as follows. \(n = 32 \quad \sum x = 1340 \quad \sum y = 612 \quad \sum x ^ { 2 } = 64282 \quad \sum y ^ { 2 } = 13418 \quad \sum x y = 27794\)
  1. Find the variance of \(x\).
  2. Find the equation of the regression line of \(y\) on \(x\).
  3. Hence estimate the expected size of claim from a policy holder of age 48. Tom is aged 48. He claims that the range of the data probably does not include people of his age because the mean age for the data is 41.875 , and 48 is not close to this.
  4. Use your answer to part (a) to determine how likely it is that Tom's claim is correct.
  5. Comment on the reliability of your estimate in part (c). You should refer to the value of the product-moment correlation coefficient for the data, which is 0.579 correct to 3 significant figures.

Question 3:
Part (a)
AnswerMarks Guidance
AnswerMarks Guidance
\(64282/32 - (1340/32)^2 = 255(.297)\)B1[1] Awrt 255. Allow \(263.52\) from \(n/(n-1)\). Don't give ISW for \(\sqrt{255}\)
Part (b)
AnswerMarks Guidance
AnswerMarks Guidance
\(y = 8.02 + 0.265(2)x\quad \left[\frac{131039}{16339} + \frac{4333}{16339}x\right]\)B2[2] Coefficients exact or correct to 3 sf, allow 8.03, letters correct. One error: B1
Part (c)
AnswerMarks Guidance
AnswerMarks Guidance
\(8.02 + 0.2652 \times 48 = \pounds 20\,700\ (3\ \text{sf})\ (20749)\)B1[1] Awrt 20700 (not 20.7) or in range [20740, 20750]. Ignore absence of £. NB: can be obtained from calculator even if (b) is wrong; B1 for this
Part (d)
AnswerMarks Guidance
AnswerMarks Guidance
SD is \(\sqrt{255} \approx 16\) and 48 is less than 6 away from \(\bar{x}\), so extremely likely that range includes 48B1 B1[2] Relevant calculation, e.g. \(1340/32 \pm 2\sqrt{255}\), or difference is \(0.383\sigma\). SD or variance mentioned and nuanced conclusion e.g. "very likely that Tom is wrong" or more extreme, but not "Tom is wrong". SC: Only variance mentioned: max (B0)B1
#### Part (d) – further examples:
AnswerMarks
ResponseMarks
(A) The standard deviation is \(\approx 16\), so Tom is likely to be rightB0
(B) Variance is large so very likely that Tom is wrong *(SC – but not "variance is very large so results inaccurate")*B0B1
(C) Less than 2 SD above mean, so Tom is incorrect *(B1, but not nuanced so B0)*B1B0
(D) Variance is large so results vary a lot, so likely to be data above 48, so unlikely that Tom's claim is correctB0B1
(E) Less than one standard deviation away from mean [consistent with (a)], so Tom is very unlikely to be right *(minimum for B1B1)*B1B1
Part (e)
AnswerMarks Guidance
AnswerMarks Guidance
(48 almost certainly within range but) correlation only moderate so not very reliableM1 A1[2] Comment on size of PMCC, allow comparison with CV. Nuanced conclusion, but *not* from "significant evidence of correlation". OE (a significance test asks "is there evidence that \(\rho > 0\)?", but here the issue is "how close is \(\rho\) to \(\pm 1\)?", so a significance test is irrelevant)
#### Part (e) – further examples:
AnswerMarks
ResponseMarks
(F) PMCC shows quite strong correlation and probably within range, so reliableM1A0
(G) PMCC shows quite strong correlation so fairly reliableM1A1
(H) Not very reliable as PMCC is low and might be extrapolatingM1A1
(I) Not very reliable as PMCC is lowM1A1
## Question 3:

### Part (a)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $64282/32 - (1340/32)^2 = 255(.297)$ | B1[1] | Awrt 255. Allow $263.52$ from $n/(n-1)$. Don't give ISW for $\sqrt{255}$ |

### Part (b)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $y = 8.02 + 0.265(2)x\quad \left[\frac{131039}{16339} + \frac{4333}{16339}x\right]$ | B2[2] | Coefficients exact or correct to 3 sf, allow 8.03, letters correct. One error: B1 |

### Part (c)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $8.02 + 0.2652 \times 48 = \pounds 20\,700\ (3\ \text{sf})\ (20749)$ | B1[1] | Awrt 20700 (not 20.7) or in range [20740, 20750]. Ignore absence of £. NB: can be obtained from calculator even if **(b)** is wrong; B1 for this |

### Part (d)
| Answer | Marks | Guidance |
|--------|-------|----------|
| SD is $\sqrt{255} \approx 16$ and 48 is less than 6 away from $\bar{x}$, so extremely likely that range includes 48 | B1 B1[2] | Relevant calculation, e.g. $1340/32 \pm 2\sqrt{255}$, or difference is $0.383\sigma$. SD or variance mentioned and nuanced conclusion e.g. "very likely that Tom is wrong" or more extreme, but not "Tom is wrong". SC: Only variance mentioned: max (B0)B1 |

#### Part (d) – further examples:
| Response | Marks |
|----------|-------|
| (A) The standard deviation is $\approx 16$, so Tom is likely to be right | B0 |
| (B) Variance is large so very likely that Tom is wrong *(SC – but not "variance is very large so results inaccurate")* | B0B1 |
| (C) Less than 2 SD above mean, so Tom is incorrect *(B1, but not nuanced so B0)* | B1B0 |
| (D) Variance is large so results vary a lot, so likely to be data above 48, so unlikely that Tom's claim is correct | B0B1 |
| (E) Less than one standard deviation away from mean [consistent with **(a)**], so Tom is very unlikely to be right *(minimum for B1B1)* | B1B1 |

### Part (e)
| Answer | Marks | Guidance |
|--------|-------|----------|
| (48 almost certainly within range but) correlation only moderate so not very reliable | M1 A1[2] | Comment on size of PMCC, allow comparison with CV. Nuanced conclusion, but *not* from "significant evidence of correlation". OE (a significance test asks "is there evidence that $\rho > 0$?", but here the issue is "how close is $\rho$ to $\pm 1$?", so a significance test is irrelevant) |

#### Part (e) – further examples:
| Response | Marks |
|----------|-------|
| (F) PMCC shows quite strong correlation and probably within range, so reliable | M1A0 |
| (G) PMCC shows quite strong correlation so fairly reliable | M1A1 |
| (H) Not very reliable as PMCC is low and might be extrapolating | M1A1 |
| (I) Not very reliable as PMCC is low | M1A1 |

---
3 An insurance company collected data concerning the age, $x$ years, of policy holders and the average size of claim, $\pounds y$ thousand. The data is summarised as follows.\\
$n = 32 \quad \sum x = 1340 \quad \sum y = 612 \quad \sum x ^ { 2 } = 64282 \quad \sum y ^ { 2 } = 13418 \quad \sum x y = 27794$
\begin{enumerate}[label=(\alph*)]
\item Find the variance of $x$.
\item Find the equation of the regression line of $y$ on $x$.
\item Hence estimate the expected size of claim from a policy holder of age 48.

Tom is aged 48. He claims that the range of the data probably does not include people of his age because the mean age for the data is 41.875 , and 48 is not close to this.
\item Use your answer to part (a) to determine how likely it is that Tom's claim is correct.
\item Comment on the reliability of your estimate in part (c). You should refer to the value of the product-moment correlation coefficient for the data, which is 0.579 correct to 3 significant figures.
\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics AS 2023 Q3 [8]}}