| Exam Board | AQA |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2011 |
| Session | January |
| Marks | 14 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Calculate y on x from raw data table |
| Difficulty | Moderate -0.3 This is a straightforward S1 regression question requiring standard calculations (means, Sxx, Sxy, regression line) and interpretation. The arithmetic is tedious but routine, and all parts follow textbook procedures with no novel problem-solving required. Slightly easier than average due to being purely procedural. |
| Spec | 5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09e Use regression: for estimation in context |
| \(\boldsymbol { x }\) | 0 | 5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 |
| \(\boldsymbol { y }\) | 31 | 42 | 32 | 58 | 47 | 56 | 79 | 68 | 89 | 95 | 85 |
| Answer | Marks | Guidance |
|---|---|---|
| The time taken depends on when Craig leaves home / \(x\) (departure time) is controlled/chosen by Craig, whereas \(y\) (journey time) cannot be controlled | B1 | Accept equivalent explanation that \(y\) depends on \(x\) |
| Answer | Marks | Guidance |
|---|---|---|
| \(\bar{x} = 25\), \(\bar{y} = 62.\) (or exact values) | B1 | Correct means |
| \(S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n}\), \(S_{xy} = \sum xy - \frac{\sum x \sum y}{n}\) | M1 | Correct formula used |
| \(S_{xx} = 9625 - \frac{275^2}{11} = 2750\) | A1 | Correct \(S_{xx}\) |
| \(S_{xy} = 19350 - \frac{275 \times 680}{11} = 2350\) (allow correct values) | A1 | Correct \(S_{xy}\) |
| \(b = \frac{S_{xy}}{S_{xx}} = \frac{2350}{2750} = \frac{47}{55} \approx 0.855\) | A1 | Correct \(b\) |
| \(a = \bar{y} - b\bar{x} = 61.8 - 0.855(25) \approx 40.4\); \(y = 40.4 + 0.855x\) | A1 | Correct equation |
| Answer | Marks | Guidance |
|---|---|---|
| \(x = 15\) (leaves at 7.45, which is 15 minutes after 7.30) | M1 | Correct value of \(x\) |
| \(y = 40.4 + 0.855(15) \approx 53.2\) minutes | M1 | Substitution into regression line |
| Arrives approximately \(53\) minutes after 7.45 = approximately 8.38 am | M1 | Correct time calculation |
| Minutes before 9.00 am \(= 60 - 53 = 7\) minutes (approx) | A1 A1 | Correct final answer to nearest minute |
| Answer | Marks | Guidance |
|---|---|---|
| \(y = 40.4 + 0.855(85) \approx 113\) minutes | B1 | Correct answer (allow values from their equation) |
| Answer | Marks |
|---|---|
| Statistical reason: \(x = 85\) is outside the range of the data (extrapolation) | B1 |
| Contextual reason: e.g. Craig's journey time is unlikely to increase linearly / traffic conditions may differ significantly for such a late departure | B1 |
# Question 5:
## Part (a)
The time taken depends on when Craig leaves home / $x$ (departure time) is controlled/chosen by Craig, whereas $y$ (journey time) cannot be controlled | B1 | Accept equivalent explanation that $y$ depends on $x$
## Part (b)
$\bar{x} = 25$, $\bar{y} = 62.$ (or exact values) | B1 | Correct means
$S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n}$, $S_{xy} = \sum xy - \frac{\sum x \sum y}{n}$ | M1 | Correct formula used
$S_{xx} = 9625 - \frac{275^2}{11} = 2750$ | A1 | Correct $S_{xx}$
$S_{xy} = 19350 - \frac{275 \times 680}{11} = 2350$ (allow correct values) | A1 | Correct $S_{xy}$
$b = \frac{S_{xy}}{S_{xx}} = \frac{2350}{2750} = \frac{47}{55} \approx 0.855$ | A1 | Correct $b$
$a = \bar{y} - b\bar{x} = 61.8 - 0.855(25) \approx 40.4$; $y = 40.4 + 0.855x$ | A1 | Correct equation
## Part (c)
$x = 15$ (leaves at 7.45, which is 15 minutes after 7.30) | M1 | Correct value of $x$
$y = 40.4 + 0.855(15) \approx 53.2$ minutes | M1 | Substitution into regression line
Arrives approximately $53$ minutes after 7.45 = approximately 8.38 am | M1 | Correct time calculation
Minutes before 9.00 am $= 60 - 53 = 7$ minutes (approx) | A1 A1 | Correct final answer to nearest minute
## Part (d)(i)
$y = 40.4 + 0.855(85) \approx 113$ minutes | B1 | Correct answer (allow values from their equation)
## Part (d)(ii)
Statistical reason: $x = 85$ is outside the range of the data (extrapolation) | B1 |
Contextual reason: e.g. Craig's journey time is unlikely to increase linearly / traffic conditions may differ significantly for such a late departure | B1 |
5 Craig uses his car to travel regularly from his home to the area hospital for treatment. He leaves home at $x$ minutes after 7.30 am and then takes $y$ minutes to arrive at the hospital's reception desk.
His results for 11 mornings are shown in the table.
\begin{center}
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | c | }
\hline
$\boldsymbol { x }$ & 0 & 5 & 10 & 15 & 20 & 25 & 30 & 35 & 40 & 45 & 50 \\
\hline
$\boldsymbol { y }$ & 31 & 42 & 32 & 58 & 47 & 56 & 79 & 68 & 89 & 95 & 85 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\alph*)]
\item Explain why the time taken by Craig between leaving home and arriving at the hospital's reception desk is the response variable.
\item Calculate the equation of the least squares regression line of $y$ on $x$, writing your answer in the form $y = a + b x$.
\item On a particular day, Craig needs to arrive at the hospital's reception desk no later than 9.00 am . He leaves home at 7.45 am .
Estimate the number of minutes before 9.00 am that Craig will arrive at the hospital's reception desk. Give your answer to the nearest minute.
\item \begin{enumerate}[label=(\roman*)]
\item Use your equation to estimate $y$ when $x = 85$.
\item Give one statistical reason and one reason based on the context of this question as to why your estimate in part (d)(i) is unlikely to be realistic.埗
\end{enumerate}\end{enumerate}
\hfill \mbox{\textit{AQA S1 2011 Q5 [14]}}