OCR MEI Further Statistics Major 2021 November — Question 8 16 marks

Exam BoardOCR MEI
ModuleFurther Statistics Major (Further Statistics Major)
Year2021
SessionNovember
Marks16
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeHypothesis test for zero correlation
DifficultyStandard +0.3 This is a straightforward application of standard linear regression concepts from Further Statistics: substituting into a regression line, interpreting r², finding the intersection of two regression lines (which equals the mean point), and performing a routine hypothesis test for zero correlation. All parts require direct application of learned techniques with no novel problem-solving or proof required.
Spec5.08a Pearson correlation: calculate pmcc5.09c Calculate regression line5.09d Linear coding: effect on regression

8
  1. \(\mathrm { VO } _ { 2 \max }\) is a measure of athletic fitness. Since \(\mathrm { VO } _ { 2 \max }\) is fairly time-consuming and expensive to measure, an exercise scientist wants to predict \(\mathrm { VO } _ { 2 _ { \text {max } } }\) from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(y\), in suitable units. She also obtains accurate measurements of the \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(x\), in the same units. The scatter diagram in Fig. 8.1 shows the values of \(x\) and \(y\) obtained, together with the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317} \captionsetup{labelformat=empty} \caption{Fig. 8.1}
    \end{figure}
    1. Use the regression line to estimate the predicted \(\mathrm { VO } _ { 2 \text { max } }\) of an athlete whose accurately measured \(\mathrm { VO } _ { 2 \text { max } }\) is 50 .
    2. Comment on the reliability of your estimate.
    3. The equation of the regression line of \(x\) on \(y\) is \(x = 0.7565 y + 10.493\). Find the coordinates of the point at which the two regression lines meet.
    4. State what the point you found in part (iii) represents.
  2. It is known that there is negative correlation between \(\mathrm { VO } _ { 2 \text { max } }\) and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours and accurately measures their \(\mathrm { VO } _ { 2 \text { max } }\). Fig. 8.2 is a scatter diagram of accurately measured \(\mathrm { VO } _ { \text {2max } }\), \(v\) units, against best marathon time, \(t\) hours, for these runners. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319} \captionsetup{labelformat=empty} \caption{Fig. 8.2}
    \end{figure}
    1. Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid. Summary statistics for the 20 runners are as follows. $$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
    2. Find the value of Pearson's product moment correlation coefficient.
    3. Carry out a test at the \(5 \%\) significance level to investigate whether there is negative correlation between accurately measured \(\mathrm { VO } _ { 2 _ { \text {max } } }\) and best marathon time for runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours.

Question 8:
AnswerMarks Guidance
8(a) (i)
[1]1.1 Do not allow answer to
more than 2dp
AnswerMarks Guidance
8(a) (ii)
the points do not lie too close to the line
and the value of r2 is not too close to 1 so the estimate
AnswerMarks
is only moderately reliableB1
B1
AnswerMarks
[2]2.2a
3.5bMention of 1 of the three points
Mention of at least 2 points with
correct conclusion
AnswerMarks Guidance
8(a) (iii)
[1]1.1
8(a) (iv)
of the x- and y-values respectivelyB1
[1]1.1 Allow ‘This is the centroid’
8(b) (i)
so the distribution may be bivariate NormalE1
E1
AnswerMarks
[2]3.5a
2.4
AnswerMarks Guidance
8(b) (ii)
vt 20
S =324.71− 1 ×80.372 (= 1.743…)
tt 20
S =47829.24− 1 ×970.862 (= 700.78…)
vv 20
S −14.87
r = tv =
S S 1.743×700.78
tt vv
AnswerMarks
= −0.4255M1
M1
M1
A1
AnswerMarks
[4]1.1a
1.1
3.3
AnswerMarks
1.1Numerical evaluations are not required
at this stage
For either S or S
tt vv
For general form including sq. root
BC
AnswerMarks Guidance
8(b) (iii)
0 1
where ρ is the population pmcc between t and v
For n = 20, the 5% critical value is 0.3783
AnswerMarks Guidance
Since−0.4255 > 0.3783 the result is significant,
so there is sufficient evidence to reject H
0
There is sufficient evidence at the 5% level to suggest
that there is negative correlation between marathon
time and VO
AnswerMarks
2maxB1
B1
B1
M1
A1FT
AnswerMarks
[5]3.3
2.5
3.4
1.1
AnswerMarks
2.2bFor both hypotheses
For defining ρ
For correct critical value
For comparison and conclusion
Allow −0.4255 < −0.3783
AnswerMarks
FT for conclusion in wordsDo not allow r in place
of ρ
Hypotheses in words
only get B1 unless
population mentioned
Answer must be in
context
Question 8:
8 | (a) | (i) | Predicted = 50.5 | B1
[1] | 1.1 | Do not allow answer to
more than 2dp
8 | (a) | (ii) | Although this point lies within the data (interpolation),
the points do not lie too close to the line
and the value of r2 is not too close to 1 so the estimate
is only moderately reliable | B1
B1
[2] | 2.2a
3.5b | Mention of 1 of the three points
Mention of at least 2 points with
correct conclusion
8 | (a) | (iii) | Coordinates (47.3, 48.7) | B1
[1] | 1.1
8 | (a) | (iv) | This is the point with coordinates which are the means
of the x- and y-values respectively | B1
[1] | 1.1 | Allow ‘This is the centroid’
8 | (b) | (i) | The scatter diagram is very roughly elliptical and
so the distribution may be bivariate Normal | E1
E1
[2] | 3.5a
2.4
8 | (b) | (ii) | S =3886.53− 1 ×80.37×970.86 (= −14.87…)
vt 20
S =324.71− 1 ×80.372 (= 1.743…)
tt 20
S =47829.24− 1 ×970.862 (= 700.78…)
vv 20
S −14.87
r = tv =
S S 1.743×700.78
tt vv
= −0.4255 | M1
M1
M1
A1
[4] | 1.1a
1.1
3.3
1.1 | Numerical evaluations are not required
at this stage
For either S or S
tt vv
For general form including sq. root
BC
8 | (b) | (iii) | H : ρ = 0, H : ρ < 0
0 1
where ρ is the population pmcc between t and v
For n = 20, the 5% critical value is 0.3783
Since |−0.4255| > 0.3783 the result is significant,
so there is sufficient evidence to reject H
0
There is sufficient evidence at the 5% level to suggest
that there is negative correlation between marathon
time and VO
2max | B1
B1
B1
M1
A1FT
[5] | 3.3
2.5
3.4
1.1
2.2b | For both hypotheses
For defining ρ
For correct critical value
For comparison and conclusion
Allow −0.4255 < −0.3783
FT for conclusion in words | Do not allow r in place
of ρ
Hypotheses in words
only get B1 unless
population mentioned
Answer must be in
context
8
\begin{enumerate}[label=(\alph*)]
\item $\mathrm { VO } _ { 2 \max }$ is a measure of athletic fitness. Since $\mathrm { VO } _ { 2 \max }$ is fairly time-consuming and expensive to measure, an exercise scientist wants to predict $\mathrm { VO } _ { 2 _ { \text {max } } }$ from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their $\mathrm { V } \mathrm { O } _ { 2 \text { max } }$ values, denoted by $y$, in suitable units. She also obtains accurate measurements of the $\mathrm { V } \mathrm { O } _ { 2 \text { max } }$ values, denoted by $x$, in the same units.

The scatter diagram in Fig. 8.1 shows the values of $x$ and $y$ obtained, together with the equation of the regression line of $y$ on $x$ and the value of $r ^ { 2 }$.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317}
\captionsetup{labelformat=empty}
\caption{Fig. 8.1}
\end{center}
\end{figure}
\begin{enumerate}[label=(\roman*)]
\item Use the regression line to estimate the predicted $\mathrm { VO } _ { 2 \text { max } }$ of an athlete whose accurately measured $\mathrm { VO } _ { 2 \text { max } }$ is 50 .
\item Comment on the reliability of your estimate.
\item The equation of the regression line of $x$ on $y$ is $x = 0.7565 y + 10.493$.

Find the coordinates of the point at which the two regression lines meet.
\item State what the point you found in part (iii) represents.
\end{enumerate}\item It is known that there is negative correlation between $\mathrm { VO } _ { 2 \text { max } }$ and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between $3 \frac { 1 } { 2 }$ hours and $4 \frac { 1 } { 2 }$ hours and accurately measures their $\mathrm { VO } _ { 2 \text { max } }$.

Fig. 8.2 is a scatter diagram of accurately measured $\mathrm { VO } _ { \text {2max } }$, $v$ units, against best marathon time, $t$ hours, for these runners.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319}
\captionsetup{labelformat=empty}
\caption{Fig. 8.2}
\end{center}
\end{figure}
\begin{enumerate}[label=(\roman*)]
\item Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.

Summary statistics for the 20 runners are as follows.

$$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
\item Find the value of Pearson's product moment correlation coefficient.
\item Carry out a test at the $5 \%$ significance level to investigate whether there is negative correlation between accurately measured $\mathrm { VO } _ { 2 _ { \text {max } } }$ and best marathon time for runners whose best marathon times are between $3 \frac { 1 } { 2 }$ hours and $4 \frac { 1 } { 2 }$ hours.
\end{enumerate}\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Major 2021 Q8 [16]}}