| Exam Board | OCR MEI |
|---|---|
| Module | Further Statistics Major (Further Statistics Major) |
| Year | 2021 |
| Session | November |
| Marks | 16 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Hypothesis test for zero correlation |
| Difficulty | Standard +0.3 This is a straightforward application of standard linear regression concepts from Further Statistics: substituting into a regression line, interpreting r², finding the intersection of two regression lines (which equals the mean point), and performing a routine hypothesis test for zero correlation. All parts require direct application of learned techniques with no novel problem-solving or proof required. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.09c Calculate regression line5.09d Linear coding: effect on regression |
| Answer | Marks | Guidance |
|---|---|---|
| 8 | (a) | (i) |
| [1] | 1.1 | Do not allow answer to |
| Answer | Marks | Guidance |
|---|---|---|
| 8 | (a) | (ii) |
| Answer | Marks |
|---|---|
| is only moderately reliable | B1 |
| Answer | Marks |
|---|---|
| [2] | 2.2a |
| 3.5b | Mention of 1 of the three points |
| Answer | Marks | Guidance |
|---|---|---|
| 8 | (a) | (iii) |
| [1] | 1.1 | |
| 8 | (a) | (iv) |
| of the x- and y-values respectively | B1 | |
| [1] | 1.1 | Allow ‘This is the centroid’ |
| 8 | (b) | (i) |
| so the distribution may be bivariate Normal | E1 |
| Answer | Marks |
|---|---|
| [2] | 3.5a |
| Answer | Marks | Guidance |
|---|---|---|
| 8 | (b) | (ii) |
| Answer | Marks |
|---|---|
| = −0.4255 | M1 |
| Answer | Marks |
|---|---|
| [4] | 1.1a |
| Answer | Marks |
|---|---|
| 1.1 | Numerical evaluations are not required |
| Answer | Marks | Guidance |
|---|---|---|
| 8 | (b) | (iii) |
| Answer | Marks | Guidance |
|---|---|---|
| Since | −0.4255 | > 0.3783 the result is significant, |
| Answer | Marks |
|---|---|
| 2max | B1 |
| Answer | Marks |
|---|---|
| [5] | 3.3 |
| Answer | Marks |
|---|---|
| 2.2b | For both hypotheses |
| Answer | Marks |
|---|---|
| FT for conclusion in words | Do not allow r in place |
Question 8:
8 | (a) | (i) | Predicted = 50.5 | B1
[1] | 1.1 | Do not allow answer to
more than 2dp
8 | (a) | (ii) | Although this point lies within the data (interpolation),
the points do not lie too close to the line
and the value of r2 is not too close to 1 so the estimate
is only moderately reliable | B1
B1
[2] | 2.2a
3.5b | Mention of 1 of the three points
Mention of at least 2 points with
correct conclusion
8 | (a) | (iii) | Coordinates (47.3, 48.7) | B1
[1] | 1.1
8 | (a) | (iv) | This is the point with coordinates which are the means
of the x- and y-values respectively | B1
[1] | 1.1 | Allow ‘This is the centroid’
8 | (b) | (i) | The scatter diagram is very roughly elliptical and
so the distribution may be bivariate Normal | E1
E1
[2] | 3.5a
2.4
8 | (b) | (ii) | S =3886.53− 1 ×80.37×970.86 (= −14.87…)
vt 20
S =324.71− 1 ×80.372 (= 1.743…)
tt 20
S =47829.24− 1 ×970.862 (= 700.78…)
vv 20
S −14.87
r = tv =
S S 1.743×700.78
tt vv
= −0.4255 | M1
M1
M1
A1
[4] | 1.1a
1.1
3.3
1.1 | Numerical evaluations are not required
at this stage
For either S or S
tt vv
For general form including sq. root
BC
8 | (b) | (iii) | H : ρ = 0, H : ρ < 0
0 1
where ρ is the population pmcc between t and v
For n = 20, the 5% critical value is 0.3783
Since |−0.4255| > 0.3783 the result is significant,
so there is sufficient evidence to reject H
0
There is sufficient evidence at the 5% level to suggest
that there is negative correlation between marathon
time and VO
2max | B1
B1
B1
M1
A1FT
[5] | 3.3
2.5
3.4
1.1
2.2b | For both hypotheses
For defining ρ
For correct critical value
For comparison and conclusion
Allow −0.4255 < −0.3783
FT for conclusion in words | Do not allow r in place
of ρ
Hypotheses in words
only get B1 unless
population mentioned
Answer must be in
context
8
\begin{enumerate}[label=(\alph*)]
\item $\mathrm { VO } _ { 2 \max }$ is a measure of athletic fitness. Since $\mathrm { VO } _ { 2 \max }$ is fairly time-consuming and expensive to measure, an exercise scientist wants to predict $\mathrm { VO } _ { 2 _ { \text {max } } }$ from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their $\mathrm { V } \mathrm { O } _ { 2 \text { max } }$ values, denoted by $y$, in suitable units. She also obtains accurate measurements of the $\mathrm { V } \mathrm { O } _ { 2 \text { max } }$ values, denoted by $x$, in the same units.
The scatter diagram in Fig. 8.1 shows the values of $x$ and $y$ obtained, together with the equation of the regression line of $y$ on $x$ and the value of $r ^ { 2 }$.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317}
\captionsetup{labelformat=empty}
\caption{Fig. 8.1}
\end{center}
\end{figure}
\begin{enumerate}[label=(\roman*)]
\item Use the regression line to estimate the predicted $\mathrm { VO } _ { 2 \text { max } }$ of an athlete whose accurately measured $\mathrm { VO } _ { 2 \text { max } }$ is 50 .
\item Comment on the reliability of your estimate.
\item The equation of the regression line of $x$ on $y$ is $x = 0.7565 y + 10.493$.
Find the coordinates of the point at which the two regression lines meet.
\item State what the point you found in part (iii) represents.
\end{enumerate}\item It is known that there is negative correlation between $\mathrm { VO } _ { 2 \text { max } }$ and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between $3 \frac { 1 } { 2 }$ hours and $4 \frac { 1 } { 2 }$ hours and accurately measures their $\mathrm { VO } _ { 2 \text { max } }$.
Fig. 8.2 is a scatter diagram of accurately measured $\mathrm { VO } _ { \text {2max } }$, $v$ units, against best marathon time, $t$ hours, for these runners.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319}
\captionsetup{labelformat=empty}
\caption{Fig. 8.2}
\end{center}
\end{figure}
\begin{enumerate}[label=(\roman*)]
\item Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
Summary statistics for the 20 runners are as follows.
$$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
\item Find the value of Pearson's product moment correlation coefficient.
\item Carry out a test at the $5 \%$ significance level to investigate whether there is negative correlation between accurately measured $\mathrm { VO } _ { 2 _ { \text {max } } }$ and best marathon time for runners whose best marathon times are between $3 \frac { 1 } { 2 }$ hours and $4 \frac { 1 } { 2 }$ hours.
\end{enumerate}\end{enumerate}
\hfill \mbox{\textit{OCR MEI Further Statistics Major 2021 Q8 [16]}}