| Exam Board | OCR MEI |
|---|---|
| Module | Further Statistics Major (Further Statistics Major) |
| Year | 2020 |
| Session | November |
| Marks | 13 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Identify response/explanatory variables |
| Difficulty | Moderate -0.3 This is a straightforward application of standard linear regression techniques from A-level Further Maths Statistics. Part (a) tests understanding of experimental design (counterbalancing), part (b) is routine calculation of regression line from summary statistics, parts (c-d) involve standard interpolation/extrapolation with commentary on reliability, and part (e) requires identifying an outlier from a scatter diagram. All parts are textbook exercises requiring recall and direct application rather than problem-solving or novel insight. Slightly easier than average due to the step-by-step scaffolding and lack of conceptual depth. |
| Spec | 5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09d Linear coding: effect on regression |
| Answer | Marks | Guidance |
|---|---|---|
| 5 | (a) | If all did the same test first, the experience gained in the |
| Answer | Marks |
|---|---|
| test | E1 |
| Answer | Marks |
|---|---|
| [2] | 3.4 |
| 2.4 | Allow 1 mark for ‘To avoid bias’ or ‘because one test might affect |
| Answer | Marks | Guidance |
|---|---|---|
| 5 | (b) | x =12.375 y =11.79375 |
| Answer | Marks |
|---|---|
| y = 0.4515 x + 6.207 | M1 |
| Answer | Marks |
|---|---|
| [5] | 1.1a |
| Answer | Marks |
|---|---|
| 1.1 | For attempt at gradient (b) |
| Answer | Marks | Guidance |
|---|---|---|
| 5 | (c) | Prediction for 12 is 11.6 |
| Prediction for 25 is 17.5 | B1 |
| Answer | Marks |
|---|---|
| [2] | 3.4 |
| 1.1 | FT Allow B1B0 if answers given to |
| Answer | Marks | Guidance |
|---|---|---|
| 5 | (d) | Because the points do not lie very close to the line, the |
| Answer | Marks |
|---|---|
| addition it is extrapolation. | E1 |
| Answer | Marks |
|---|---|
| [2] | 2.2a |
| 2.4 | Allow 1 mark for either not very |
| Answer | Marks | Guidance |
|---|---|---|
| 5 | (e) | Coordinates (18, 1.9) |
| Answer | Marks |
|---|---|
| genuine and if not then remove it from the analysis | B1 |
| Answer | Marks |
|---|---|
| [2] | 1.1 |
| 3.5c | Allow y-coordinate between 1.7 and 2 |
Question 5:
5 | (a) | If all did the same test first, the experience gained in the
first test might affect their performance in the second
test | E1
E1
[2] | 3.4
2.4 | Allow 1 mark for ‘To avoid bias’ or ‘because one test might affect
the other test’ oe
5 | (b) | x =12.375 y =11.79375
S 2554.87−(198.0×188.7/16) 219.7075
b= xy = =
S 2936.92−198.02 /16 486.67
xx
= 0.4515
For correct line (y on x)
hence regression line equation is:
y−y =b(x−x)
⇒
y – 11.79375 = 0.4515 (x – 12.375)
⇒
y = 0.4515 x + 6.207 | M1
A1
B1
M1
A1
[5] | 1.1a
1.1
1.1
3.3
1.1 | For attempt at gradient (b)
For 0.4515 cao
For equation of line
FT their b
5 | (c) | Prediction for 12 is 11.6
Prediction for 25 is 17.5 | B1
B1
[2] | 3.4
1.1 | FT Allow B1B0 if answers given to
more than 2 dp
FT
5 | (d) | Because the points do not lie very close to the line, the
first prediction is only moderately reliable.
The second prediction is rather less reliable because in
addition it is extrapolation. | E1
E1
[2] | 2.2a
2.4 | Allow 1 mark for either not very
close to line and so not very reliable
or for second value is extrapolation
so unreliable.
5 | (e) | Coordinates (18, 1.9)
The expert should check whether this data item is
genuine and if not then remove it from the analysis | B1
E1
[2] | 1.1
3.5c | Allow y-coordinate between 1.7 and 2
5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, $x$, and the web-based test, $y$, are both measured in the same suitable units.
\begin{enumerate}[label=(\alph*)]
\item Half of the participants do the laboratory-based test first and the other half do the web-based test first.
Explain why the expert adopts this approach.
The scatter diagram in Fig. 5 shows the data that the expert collected.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242}
\captionsetup{labelformat=empty}
\caption{Fig. 5}
\end{center}
\end{figure}
Summary statistics for these data are as follows.
$$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
\item Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
\item Estimate the web-based scores of people whose laboratory-based scores were as follows.
\begin{itemize}
\item 12
\item 25
\item Comment on the reliability of each of your estimates.
\item A colleague of the expert suggests that the regression line is not valid because one of the data values is an outlier.
\end{itemize}
Stating the approximate coordinates of the outlier, suggest what the expert should do.
\end{enumerate}
\hfill \mbox{\textit{OCR MEI Further Statistics Major 2020 Q5 [13]}}