AQA S1 2014 June — Question 3 11 marks

Exam BoardAQA
ModuleS1 (Statistics 1)
Year2014
SessionJune
Marks11
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeCalculate y on x from raw data table
DifficultyModerate -0.8 This is a routine S1 regression question requiring standard calculations (finding regression line from summary statistics, making predictions, interpreting residuals) with no conceptual challenges. All parts follow textbook procedures: calculate Sxx/Sxy for the line equation, substitute a value, identify extrapolation issues, and compute a residual. The only mild challenge is the data entry/arithmetic, but the methods are entirely algorithmic.
Spec5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09d Linear coding: effect on regression

3 The table shows the body mass index (BMI), \(x\), and the systolic blood pressure (SBP), \(y \mathrm { mmHg }\), for each of a random sample of 10 men, aged between 35 years and 40 years, from a particular population.
\(\boldsymbol { x }\)13232935173425203127
\(\boldsymbol { y }\)103115124126108120113117118119
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. Use your equation to estimate the SBP of a man from this population who is aged 38 years and who has a BMI of 30 .
  3. State why your equation might not be appropriate for estimating the SBP of a man from this population:
    1. who is aged 38 years and who has a BMI of 45 ;
    2. who is aged 50 years and who has a BMI of 25 .
  4. Find the value of the residual for the point \(( 20,117 )\).
  5. The mean of the vertical distances of the 10 points from the regression line calculated in part (a) is 2.71, correct to three significant figures. Comment on the likely accuracy of your estimate in part (b).
    [0pt] [1 mark]

Question 3:
Part (a)
AnswerMarks Guidance
Answer/WorkingMarks Guidance
\(\sum x = 228\), \(\sum y = 1163\), \(\bar{x} = 22.8\), \(\bar{y} = 116.3\)B1 May be implied
\(S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n} = 5954 - \frac{228^2}{10} = 754.4\)M1 Correct formula used
\(S_{xy} = \sum xy - \frac{\sum x \sum y}{n} = 27127 - \frac{228 \times 1163}{10} = 627.6\)M1 Correct formula used
\(b = \frac{S_{xy}}{S_{xx}} = \frac{627.6}{754.4} = 0.8319...\)A1 Accept 0.832
\(a = \bar{y} - b\bar{x} = 116.3 - 0.832 \times 22.8 = 97.3...\)A1
\(\hat{y} = 97.3 + 0.832x\) Accept awrt
Part (b)
AnswerMarks Guidance
Answer/WorkingMarks Guidance
\(\hat{y} = 97.3 + 0.832(30) = 122\) (mmHg)M1 Substituting \(x = 30\) into their equation
\(\approx 122\)A1 Accept answers from correct working
Part (c)
AnswerMarks Guidance
Answer/WorkingMarks Guidance
(i) BMI of 45 is outside the range of the data (extrapolation)B1 Must mention BMI/x value out of range
(ii) Age 50 is outside the range of the data / equation only applies to men aged 35–40B1 Must mention age out of range
Part (d)
AnswerMarks Guidance
Answer/WorkingMarks Guidance
Predicted value: \(\hat{y} = 97.3 + 0.832(20) = 113.9...\)M1 Substituting \(x = 20\)
Residual \(= 117 - 113.9... = 3.1\) (awrt)A1 Accept \(\approx 3.08\)
Part (e)
AnswerMarks Guidance
Answer/WorkingMarks Guidance
Mean vertical distance is 2.71, which is small/close to the data points, so the estimate is likely to be fairly accurate/reliableB1 Must relate the mean distance to accuracy of estimate in part (b)
# Question 3:

## Part (a)

| Answer/Working | Marks | Guidance |
|---|---|---|
| $\sum x = 228$, $\sum y = 1163$, $\bar{x} = 22.8$, $\bar{y} = 116.3$ | B1 | May be implied |
| $S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n} = 5954 - \frac{228^2}{10} = 754.4$ | M1 | Correct formula used |
| $S_{xy} = \sum xy - \frac{\sum x \sum y}{n} = 27127 - \frac{228 \times 1163}{10} = 627.6$ | M1 | Correct formula used |
| $b = \frac{S_{xy}}{S_{xx}} = \frac{627.6}{754.4} = 0.8319...$ | A1 | Accept 0.832 |
| $a = \bar{y} - b\bar{x} = 116.3 - 0.832 \times 22.8 = 97.3...$ | A1 | |
| $\hat{y} = 97.3 + 0.832x$ | | Accept awrt |

## Part (b)

| Answer/Working | Marks | Guidance |
|---|---|---|
| $\hat{y} = 97.3 + 0.832(30) = 122$ (mmHg) | M1 | Substituting $x = 30$ into their equation |
| $\approx 122$ | A1 | Accept answers from correct working |

## Part (c)

| Answer/Working | Marks | Guidance |
|---|---|---|
| **(i)** BMI of 45 is outside the range of the data (extrapolation) | B1 | Must mention BMI/x value out of range |
| **(ii)** Age 50 is outside the range of the data / equation only applies to men aged 35–40 | B1 | Must mention age out of range |

## Part (d)

| Answer/Working | Marks | Guidance |
|---|---|---|
| Predicted value: $\hat{y} = 97.3 + 0.832(20) = 113.9...$ | M1 | Substituting $x = 20$ |
| Residual $= 117 - 113.9... = 3.1$ (awrt) | A1 | Accept $\approx 3.08$ |

## Part (e)

| Answer/Working | Marks | Guidance |
|---|---|---|
| Mean vertical distance is 2.71, which is small/close to the data points, so the estimate is likely to be fairly accurate/reliable | B1 | Must relate the mean distance to accuracy of estimate in part (b) |
3 The table shows the body mass index (BMI), $x$, and the systolic blood pressure (SBP), $y \mathrm { mmHg }$, for each of a random sample of 10 men, aged between 35 years and 40 years, from a particular population.

\begin{center}
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | }
\hline
$\boldsymbol { x }$ & 13 & 23 & 29 & 35 & 17 & 34 & 25 & 20 & 31 & 27 \\
\hline
$\boldsymbol { y }$ & 103 & 115 & 124 & 126 & 108 & 120 & 113 & 117 & 118 & 119 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\alph*)]
\item Calculate the equation of the least squares regression line of $y$ on $x$.
\item Use your equation to estimate the SBP of a man from this population who is aged 38 years and who has a BMI of 30 .
\item State why your equation might not be appropriate for estimating the SBP of a man from this population:
\begin{enumerate}[label=(\roman*)]
\item who is aged 38 years and who has a BMI of 45 ;
\item who is aged 50 years and who has a BMI of 25 .
\end{enumerate}\item Find the value of the residual for the point $( 20,117 )$.
\item The mean of the vertical distances of the 10 points from the regression line calculated in part (a) is 2.71, correct to three significant figures.

Comment on the likely accuracy of your estimate in part (b).\\[0pt]
[1 mark]
\end{enumerate}

\hfill \mbox{\textit{AQA S1 2014 Q3 [11]}}