Question 5 - A-Level Maths

OCR MEI Further Statistics Major 2020 November — Question 5 13 marks

Exam Board	OCR MEI
Module	Further Statistics Major (Further Statistics Major)
Year	2020
Session	November
Marks	13
Paper	Download PDF ↗
Mark scheme	Download PDF ↗
Topic	Linear regression
Type	Identify response/explanatory variables
Difficulty	Moderate -0.3 This is a straightforward application of standard linear regression techniques from A-level Further Maths Statistics. Part (a) tests understanding of experimental design (counterbalancing), part (b) is routine calculation of regression line from summary statistics, parts (c-d) involve standard interpolation/extrapolation with commentary on reliability, and part (e) requires identifying an outlier from a scatter diagram. All parts are textbook exercises requiring recall and direct application rather than problem-solving or novel insight. Slightly easier than average due to the step-by-step scaffolding and lack of conceptual depth.
Spec	5.09a Dependent/independent variables 5.09b Least squares regression: concepts 5.09c Calculate regression line 5.09d Linear coding: effect on regression

5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, $x$, and the web-based test, $y$, are both measured in the same suitable units.

Half of the participants do the laboratory-based test first and the other half do the web-based test first. Explain why the expert adopts this approach. The scatter diagram in Fig. 5 shows the data that the expert collected. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure} Summary statistics for these data are as follows. $$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
Estimate the web-based scores of people whose laboratory-based scores were as follows.
Stating the approximate coordinates of the outlier, suggest what the expert should do.

Show mark scheme Show mark scheme source

Question 5:

Answer	Marks	Guidance
5	(a)	If all did the same test first, the experience gained in the

first test might affect their performance in the second

Answer	Marks
test	E1

E1

Answer	Marks
[2]	3.4
2.4	Allow 1 mark for ‘To avoid bias’ or ‘because one test might affect

the other test’ oe

Answer	Marks	Guidance
5	(b)	x =12.375 y =11.79375

S 2554.87−(198.0×188.7/16) 219.7075

b= xy = =

S 2936.92−198.02 /16 486.67

xx

= 0.4515

For correct line (y on x)

hence regression line equation is:

y−y =b(x−x)

⇒

y – 11.79375 = 0.4515 (x – 12.375)

⇒

Answer	Marks
y = 0.4515 x + 6.207	M1

A1

B1

M1

A1

Answer	Marks
[5]	1.1a

1.1

3.3

Answer	Marks
1.1	For attempt at gradient (b)

For 0.4515 cao

For equation of line

FT their b

Answer	Marks	Guidance
5	(c)	Prediction for 12 is 11.6
Prediction for 25 is 17.5	B1

B1

Answer	Marks
[2]	3.4
1.1	FT Allow B1B0 if answers given to

more than 2 dp

FT

Answer	Marks	Guidance
5	(d)	Because the points do not lie very close to the line, the

first prediction is only moderately reliable.

The second prediction is rather less reliable because in

Answer	Marks
addition it is extrapolation.	E1

E1

Answer	Marks
[2]	2.2a
2.4	Allow 1 mark for either not very

close to line and so not very reliable

or for second value is extrapolation

so unreliable.

Answer	Marks	Guidance
5	(e)	Coordinates (18, 1.9)

The expert should check whether this data item is

Answer	Marks
genuine and if not then remove it from the analysis	B1

E1

Answer	Marks
[2]	1.1
3.5c	Allow y-coordinate between 1.7 and 2

Question 5:
5 | (a) | If all did the same test first, the experience gained in the
first test might affect their performance in the second
test | E1
E1
[2] | 3.4
2.4 | Allow 1 mark for ‘To avoid bias’ or ‘because one test might affect
the other test’ oe
5 | (b) | x =12.375 y =11.79375
S 2554.87−(198.0×188.7/16) 219.7075
b= xy = =
S 2936.92−198.02 /16 486.67
xx
= 0.4515
For correct line (y on x)
hence regression line equation is:
y−y =b(x−x)
⇒
y – 11.79375 = 0.4515 (x – 12.375)
⇒
y = 0.4515 x + 6.207 | M1
A1
B1
M1
A1
[5] | 1.1a
1.1
1.1
3.3
1.1 | For attempt at gradient (b)
For 0.4515 cao
For equation of line
FT their b
5 | (c) | Prediction for 12 is 11.6
Prediction for 25 is 17.5 | B1
B1
[2] | 3.4
1.1 | FT Allow B1B0 if answers given to
more than 2 dp
FT
5 | (d) | Because the points do not lie very close to the line, the
first prediction is only moderately reliable.
The second prediction is rather less reliable because in
addition it is extrapolation. | E1
E1
[2] | 2.2a
2.4 | Allow 1 mark for either not very
close to line and so not very reliable
or for second value is extrapolation
so unreliable.
5 | (e) | Coordinates (18, 1.9)
The expert should check whether this data item is
genuine and if not then remove it from the analysis | B1
E1
[2] | 1.1
3.5c | Allow y-coordinate between 1.7 and 2

Show LaTeX source

5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, $x$, and the web-based test, $y$, are both measured in the same suitable units.
\begin{enumerate}[label=(\alph*)]
\item Half of the participants do the laboratory-based test first and the other half do the web-based test first.

Explain why the expert adopts this approach.

The scatter diagram in Fig. 5 shows the data that the expert collected.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242}
\captionsetup{labelformat=empty}
\caption{Fig. 5}
\end{center}
\end{figure}

Summary statistics for these data are as follows.

$$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
\item Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
\item Estimate the web-based scores of people whose laboratory-based scores were as follows.

\begin{itemize}
  \item 12
  \item 25
\item Comment on the reliability of each of your estimates.
\item A colleague of the expert suggests that the regression line is not valid because one of the data values is an outlier.
\end{itemize}

Stating the approximate coordinates of the outlier, suggest what the expert should do.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Major 2020 Q5 [13]}}

This paper (11 questions)

View full paper

Q1 9 Q2 9 Q3 8 Q4 6 Q5 13 Q6 10 Q7 9 Q8 10 Q9 16 Q10 12 Q11 18