| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2019 |
| Session | June |
| Marks | 13 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Use regression line for prediction |
| Difficulty | Moderate -0.8 This is a standard S1 regression question requiring routine application of formulas for correlation coefficient, regression line equation, and interpretation. All necessary summary statistics are provided, requiring only substitution into standard formulas with no problem-solving or novel insight needed. Slightly easier than average due to the straightforward computational nature. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.08b Linear coding: effect on pmcc |
| VIIIV SIHI NI JIIYM ION OC | NAMV SIHIL NI JAHAM ION OC | VJ4V SIHII NI JIIYM ION OO |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Marks | Guidance |
| \([\sum y = 16 \times 20.5 = 328] \quad S_{yy} = 8266 - \frac{328^2}{16}\) | M1 | For an attempt at a correct expression for \(S_{yy}\) (fit their 328 provided intention is \(\sum y\)) |
| \(= 1542\) (allow awrt 1540) | A1 | For 1542 (allow awrt 1540 it leads to \(r = -0.83788\ldots\) and scores 2nd A0) |
| \([r =] \frac{-630.9}{\sqrt{368.16 \times \text{"1542"}}}\) | M1 | For a correct expression for r (fit their \(S_{yy}\) but use of 8266 is M0 here) |
| \(= -0.837336\ldots \quad \text{awrt } \mathbf{-0.837}\) | A1 | For awrt – 0.837 (ans only 4/4; awrt – 0.838 M1A1M1A0; – 0.84 M1A0M1A0) |
| As the distance from the hospital increases the percentage of referrals decreases (o.e.) e.g. smaller % of patients attend from clinics further away | B1 | For an interpretation of negative correlation in context (just "strong neg correlation" B0) |
| e.g. Points close to a straight line (of negative gradient) so does support belief | B1 | For "points close to a straight line" and stating does support manager's belief or allow "r is close to – 1" or "strong (negative) correlation" and supports manager's claim or for a curve drawn on scatter diagram and comment that non-linear model may be better |
| \(b = \frac{-630.9}{368.16} [= -1.7136\ldots]\) | M1 | |
| \(l = -1.7136\ldots\) | M1 | For a correct expression for a (fit their value of b or even letter b in correct formula) |
| \(a = 20.5 - \text{"-"1.7136\ldots \times 8.1 [= 34.3806\ldots]\) | A1, A1 | (dep on 1st M1) for b = awrt – 1.71 in an equation in y and x (no fractions) |
| \(y = 34.4 - 1.71x\) | (dep on 2nd M1) for a = awrt 34.4 in an equation in y and x | |
| \([\text{On average}]\) each km further from the hospital reduces the % attendance by 1.7% | B1 | For a comment with their b (<0) relating distance from hospital to % attendance/referrals Allow "as distance increases by 1 the % referrals decreases by 1.7" (o.e.) |
| Correct line drawn on scatter diagram (use overlay within guidelines) | B1 | For drawing the line on scatter diagram (within guidelines of overlay-check both graphs) |
| Correct point circled (3.2,19) [Allow coords stated instead of point circled but if both, prioritise circled point] | B1 | For correct point on scatter diagram circled (more than one point circled is B0) |
| Answer/Working | Marks | Guidance |
|---|---|---|
| $[\sum y = 16 \times 20.5 = 328] \quad S_{yy} = 8266 - \frac{328^2}{16}$ | M1 | For an attempt at a correct expression for $S_{yy}$ (fit their 328 provided intention is $\sum y$) |
| $= 1542$ (allow awrt 1540) | A1 | For 1542 (allow awrt 1540 it leads to $r = -0.83788\ldots$ and scores 2nd A0) |
| $[r =] \frac{-630.9}{\sqrt{368.16 \times \text{"1542"}}}$ | M1 | For a correct expression for r (fit their $S_{yy}$ but use of 8266 is M0 here) |
| $= -0.837336\ldots \quad \text{awrt } \mathbf{-0.837}$ | A1 | For awrt – 0.837 (ans only 4/4; awrt – 0.838 M1A1M1A0; – 0.84 M1A0M1A0) |
| As the distance from the hospital increases the percentage of referrals decreases (o.e.) e.g. smaller % of patients attend from clinics further away | B1 | For an interpretation of negative correlation **in context** (just "strong neg correlation" B0) |
| e.g. Points close to a straight line (of negative gradient) so does **support belief** | B1 | For "points close to a straight line" **and** stating does support manager's belief or allow "r is close to – 1" or "strong (negative) correlation" and supports manager's claim or for a curve drawn on scatter diagram and comment that non-linear model may be better |
| $b = \frac{-630.9}{368.16} [= -1.7136\ldots]$ | M1 | |
| $l = -1.7136\ldots$ | M1 | For a correct expression for a (fit their value of b or even letter b in correct formula) |
| $a = 20.5 - \text{"-"1.7136\ldots \times 8.1 [= 34.3806\ldots]$ | A1, A1 | (dep on 1st M1) for b = awrt – 1.71 in an equation in y and x (no fractions) |
| $y = 34.4 - 1.71x$ | | (dep on 2nd M1) for a = awrt 34.4 in an equation in y and x |
| $[\text{On average}]$ each km further from the hospital reduces the % attendance by 1.7% | B1 | For a comment with their b (<0) relating distance from hospital to % attendance/referrals Allow "as distance increases by 1 the % referrals decreases by 1.7" (o.e.) |
| Correct line drawn on scatter diagram (use overlay within guidelines) | B1 | For drawing the line on scatter diagram (within guidelines of overlay-check both graphs) |
| Correct point circled (3.2,19) [Allow coords stated instead of point circled but if both, prioritise circled point] | B1 | For correct point on scatter diagram circled (more than one point circled is B0) |
**Notes:**
**(a)** 1st M1 for an attempt at a correct expression for $S_{yy}$ (fit their 328 provided intention is $\sum y$)
1st A1 for 1542 (allow awrt 1540 it leads to $r = -0.83788\ldots$ and scores 2nd A0)
2nd M1 for a correct expression for r (fit their $S_{yy}$ but use of 8266 is M0 here)
2nd A1 for awrt – 0.837 (ans only 4/4; awrt – 0.838 M1A1M1A0; – 0.84 M1A0M1A0)
**(b)** B1 for an interpretation of negative correlation **in context** (just "strong neg correlation" B0)
**(c)** B1 for " points close to a straight line" **and** stating does support manager's belief
or allow "r is close to – 1" or "strong (negative) correlation" and supports manager's claim
or for a curve drawn on scatter diagram and comment that non-linear model may be better
**(d)** 1st M1 for a correct expression for b
2nd M1 for a correct expression for a (fit their value of b or even letter b in correct formula)
1st A1 (dep on 1st M1) for b = awrt – 1.71 in an equation in y and x (no fractions)
2nd A1 (dep on 2nd M1) for a = awrt 34.4 in an equation in y and x
**(e)** B1 for a comment with their b (<0) relating distance from hospital to % attendance/referrals
Allow "as distance increases by 1 the % referrals decreases by 1.7" (o.e.)
**(f)** B1 for drawing the line on scatter diagram (within guidelines of overlay-check both graphs)
**(g)** B1 for correct point on scatter diagram circled (more than one point circled is B0)
\begin{enumerate}
\item Ranpose hospital offers services to a large number of clinics that refer patients to a range of hospitals.\\
The manager at Ranpose hospital took a random sample of 16 clinics and recorded
\end{enumerate}
\begin{itemize}
\item the distance, $x \mathrm {~km}$, of the clinic from Ranpose hospital
\item the percentage, $y \%$, of the referrals from the clinic who attend Ranpose hospital.
\end{itemize}
The data are summarised as
$$\bar { x } = 8.1 \quad \bar { y } = 20.5 \quad \sum y ^ { 2 } = 8266 \quad \mathrm {~S} _ { x x } = 368.16 \quad \mathrm {~S} _ { x y } = - 630.9$$
(a) Find the product moment correlation coefficient for these data.\\
(b) Give an interpretation of your correlation coefficient.
The manager at Ranpose hospital believes that there may be a linear relationship between the distance of a clinic from the hospital and the percentage of the referrals who attend the hospital. She drew the following scatter diagram for these data.\\
\includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-20_1106_926_1133_511}\\
(c) State, giving a reason, whether or not these data support the manager's belief.\\
(1)
\begin{center}
\end{center}
\section*{[The summary data and the scatter diagram are repeated below.]}
The data are summarised as
$$\bar { x } = 8.1 \quad \bar { y } = 20.5 \quad \sum y ^ { 2 } = 8266 \quad \mathrm {~S} _ { x x } = 368.16 \quad \mathrm {~S} _ { x y } = - 630.9$$
\includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-22_1118_936_612_504}\\
(d) Find the equation of the regression line of $y$ on $x$, giving your answer in the form
$$y = a + b x$$
(e) Give an interpretation of the gradient of your regression line.\\
(f) Draw your regression line on the scatter diagram.
The manager believes that Ranpose hospital should be attracting an "above average" percentage of referrals from clinics that are less than 5 km from the hospital. She proposes to target one clinic with some extra publicity about the services Ranpose offers.\\
(g) On the scatter diagram circle the point representing the clinic she should target.
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
VIIIV SIHI NI JIIYM ION OC & NAMV SIHIL NI JAHAM ION OC & VJ4V SIHII NI JIIYM ION OO \\
\hline
\end{tabular}
\end{center}
\hfill \mbox{\textit{Edexcel S1 2019 Q6 [13]}}