Edexcel S1 2016 October — Question 4 15 marks

Exam BoardEdexcel
ModuleS1 (Statistics 1)
Year2016
SessionOctober
Marks15
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeCalculate y on x from summary statistics
DifficultyModerate -0.3 This is a standard S1 linear regression question requiring routine calculations of Sff, Shh, correlation coefficient, and regression line from summary statistics, plus a normal distribution probability at the end. All steps follow textbook procedures with no novel insight required, making it slightly easier than average due to its mechanical nature, though the multiple parts and careful arithmetic keep it close to typical difficulty.
Spec2.04e Normal distribution: as model N(mu, sigma^2)5.04a Linear combinations: E(aX+bY), Var(aX+bY)5.08a Pearson correlation: calculate pmcc5.09b Least squares regression: concepts5.09c Calculate regression line5.09e Use regression: for estimation in context

  1. A doctor is studying the scans of 30 -week old foetuses. She takes a random sample of 8 scans and measures the length, \(f \mathrm {~mm}\), of the leg bone called the femur. She obtains the following results.
$$\begin{array} { l l l l l l l l } 52 & 53 & 56 & 57 & 57 & 59 & 60 & 62 \end{array}$$
  1. Show that \(\mathrm { S } _ { f f } = 80\) The doctor also measures the head circumference, \(h \mathrm {~mm}\), of each foetus and her results are summarised as $$\sum h = 2209 \quad \sum h ^ { 2 } = 610463 \quad \mathrm {~S} _ { f h } = 182$$
  2. Find \(\mathrm { S } _ { h h }\)
  3. Calculate the product moment correlation coefficient between the length of the femur and the head circumference for these data. The doctor believes that there is a linear relationship between the length of the femur and the head circumference of 30-week old foetuses.
  4. State, giving a reason, whether or not your calculation in part (c) supports the doctor's belief.
  5. Find an equation of the regression line of \(h\) on \(f\). The doctor plans in future to measure the femur length, \(f\), and then use the regression line to estimate the corresponding head circumference, \(h\). A statistician points out that there will always be the chance of an error between the true head circumference and the estimated value of the head circumference. Given that the error, \(E \mathrm {~mm}\), has the normal distribution \(\mathrm { N } \left( 0,4 ^ { 2 } \right)\)
  6. find the probability that the estimate is within 3 mm of the true value.

AnswerMarks Guidance
4(a)\(\sum f = 456\) or \(\overline{f} = 57\) and \(\sum f^2 = 26 072\) M1, A1
\([S_ff] = 26072 - \frac{456^2}{8}\) or \(26072 - \frac{207936}{8}\) or \(26072 - 25992 = 80\) (*)A1 cso 2nd A1cso for correct expression leading to 80 with no incorrect working seen. Allow 3/3 for 80 only or 80 following no incorrect working
4(b)\([S_{hh}] = 610 463 - \frac{2209^2}{8} = 502.875\) awrt 503 M1, A1
4(c)\(r = \frac{182}{\sqrt{80 \times \text{"502.875}"}} = 0.90739\ldots = \text{awrt } 0.907\) M1, A1
4(d)\(r\) is close to 1 or "strong" (positive) correlation (idea of "strong" required) ... so does support the belief B1ft
4(e)\(b = \frac{182}{80} = [2.275]\), \(a = \frac{2209}{8} - b \times \frac{"456"}{8} = [276.125 - 129.675 = 146.45]\) M1, M1
So \(h = 146 + 2.28f\) [Accept \(h = 146 + 2.27f\)]A1A1 1st A1 for either awrt 146 or awrt 2.28 correctly placed (so M1M0A1A0 is possible); 2nd A1 for both correct. Accept \(h - 276 = 2.28( f - 57 )\) [awrt 3 sf]. (Allow 2.27 instead of 2.28 for value of b, since b = 2.275)[Use of y and x is 2nd A0]
4(f)\(P(-3 < E < 3)\) or \(2P(0 < E < 3) = \{2\} P\left(0 < Z < \frac{3}{4}\right) = 2(0.7734 - 0.5) = 0.5468 =\) awrt 0.547 M1, M1, A1
**4(a)** | $\sum f = 456$ or $\overline{f} = 57$ and $\sum f^2 = 26 072$ | M1, A1 | M1 for attempt at $\sum f$ and $\sum f^2$ where $400 < \sum f < 500$ and $\sum f^2 =$ awrt 30 000; 1st A1 for $\sum f = 456$ and $\sum f^2 = 26 072$ (456 may be implied by 207936 or 25992)

| $[S_ff] = 26072 - \frac{456^2}{8}$ or $26072 - \frac{207936}{8}$ or $26072 - 25992 = 80$ (*) | A1 cso | 2nd A1cso for correct expression leading to 80 with no incorrect working seen. Allow 3/3 for 80 only or 80 following no incorrect working

**4(b)** | $[S_{hh}] = 610 463 - \frac{2209^2}{8} = 502.875$ awrt 503 | M1, A1 | M1 for correct expression for $S_{hh}$; A1 for awrt 503 (Answer only scores 2/2 and condone $\frac{4023}{8}$)

**4(c)** | $r = \frac{182}{\sqrt{80 \times \text{"502.875}"}} = 0.90739\ldots = \text{awrt } 0.907$ | M1, A1 | M1 for correct expression using their value for $S_{hh}$ but 182 and 80 must be correct. For expression M1A0 is common; A1 for awrt 0.907 (Answer only M1A1 awrt 0.91 with no expression M1A0)

**4(d)** | $r$ is close to 1 or "strong" (positive) correlation (idea of "strong" required) ... so does support the belief | B1ft | For correct comment that uses their value of r as support e.g. "yes since strong correlation" For |r| < 0.5 allow comment that does not support

**4(e)** | $b = \frac{182}{80} = [2.275]$, $a = \frac{2209}{8} - b \times \frac{"456"}{8} = [276.125 - 129.675 = 146.45]$ | M1, M1 | 1st M1 for correct expression for b or awrt 2.28 (Allow 2.27 here as well); 2nd M1 for correct expression for a ft their value for b and their 456 (using b = r is M0)

| So $h = 146 + 2.28f$ [Accept $h = 146 + 2.27f$] | A1A1 | 1st A1 for either awrt 146 or awrt 2.28 correctly placed (so M1M0A1A0 is possible); 2nd A1 for both correct. Accept $h - 276 = 2.28( f - 57 )$ [awrt 3 sf]. (Allow 2.27 instead of 2.28 for value of b, since b = 2.275)[Use of y and x is 2nd A0]

**4(f)** | $P(-3 < E < 3)$ or $2P(0 < E < 3) = \{2\} P\left(0 < Z < \frac{3}{4}\right) = 2(0.7734 - 0.5) = 0.5468 =$ awrt 0.547 | M1, M1, A1 | 1st M1 for correct probability statement (either of these two expressions); 2nd M1 for standardising with 3, 0 and 4 (allow $\pm 0.75$ as answer). Ignore ×2 and 0 <… ; A1 for awrt 0.547 [NB M0M1A0 is common]

---
\begin{enumerate}
  \item A doctor is studying the scans of 30 -week old foetuses. She takes a random sample of 8 scans and measures the length, $f \mathrm {~mm}$, of the leg bone called the femur. She obtains the following results.
\end{enumerate}

$$\begin{array} { l l l l l l l l } 
52 & 53 & 56 & 57 & 57 & 59 & 60 & 62
\end{array}$$

(a) Show that $\mathrm { S } _ { f f } = 80$

The doctor also measures the head circumference, $h \mathrm {~mm}$, of each foetus and her results are summarised as

$$\sum h = 2209 \quad \sum h ^ { 2 } = 610463 \quad \mathrm {~S} _ { f h } = 182$$

(b) Find $\mathrm { S } _ { h h }$\\
(c) Calculate the product moment correlation coefficient between the length of the femur and the head circumference for these data.

The doctor believes that there is a linear relationship between the length of the femur and the head circumference of 30-week old foetuses.\\
(d) State, giving a reason, whether or not your calculation in part (c) supports the doctor's belief.\\
(e) Find an equation of the regression line of $h$ on $f$.

The doctor plans in future to measure the femur length, $f$, and then use the regression line to estimate the corresponding head circumference, $h$.

A statistician points out that there will always be the chance of an error between the true head circumference and the estimated value of the head circumference.

Given that the error, $E \mathrm {~mm}$, has the normal distribution $\mathrm { N } \left( 0,4 ^ { 2 } \right)$\\
(f) find the probability that the estimate is within 3 mm of the true value.

\hfill \mbox{\textit{Edexcel S1 2016 Q4 [15]}}