OCR MEI Further Statistics Minor 2022 June — Question 5 14 marks

Exam BoardOCR MEI
ModuleFurther Statistics Minor (Further Statistics Minor)
Year2022
SessionJune
Marks14
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicHypothesis test of Spearman’s rank correlation coefficien
TypeHypothesis test for association
DifficultyStandard +0.3 This is a straightforward application of Spearman's rank correlation test with standard parts: identifying why Pearson's is inappropriate (likely non-linearity from scatter diagram), calculating ranks and rs, performing a hypothesis test against critical values, and explaining sampling concepts. All parts are routine textbook exercises requiring no novel insight, though the multi-part structure and calculation of Spearman's coefficient adds some work compared to the most basic questions.
Spec2.01a Population and sample: terminology5.08e Spearman rank correlation5.08f Hypothesis test: Spearman rank5.08g Compare: Pearson vs Spearman

5 A medical researcher is investigating whether there is any relationship between the age of a person and the level of a particular protein in the person's blood. She measures the levels of the protein (measured in suitable units) in a random sample of 12 hospital patients of various ages (in years). The spreadsheet shows the values obtained, together with a scatter diagram which illustrates the data. \includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-5_736_1470_1087_246}
  1. The researcher decides that a test based on Pearson's product moment correlation coefficient may not be valid. Explain why she comes to this conclusion.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a test based on this coefficient at the \(5 \%\) significance level to investigate whether there is any association between age and protein level.
  4. Explain why the researcher chose a sample that was random.
  5. The researcher had originally intended to use a sample size of 6 rather than the 12 that she actually used. Explain what advantage there is in using the larger sample size.

Question 5:
AnswerMarks Guidance
5(a) Because the scatter diagram does not appear to be
elliptical (but more of a funnel shape) so the
AnswerMarks
distribution is probably not bivariate Normal.E1
E1
AnswerMarks
[2]3.5a
2.4For not elliptical
For full answer (dependent on first
AnswerMarks
mark)“data is not bivariate
Normal” is E0
Normal bivariate is E0
AnswerMarks Guidance
5(b) Rank A 1 2 3 4 5 6
Rank P 8 12 6 10 11 7
Rank A 7 8 9 10 11 12
Rank P 5 9 4 3 1 2
112
Spearman’s rank coefficient = − 0.78(32) (= − )
AnswerMarks
143M1
M1
A1
AnswerMarks
[3]1.1
1.1
AnswerMarks
1.1For ranking Age
For ranking Protein consistent with
ranking for age
AnswerMarks Guidance
BCRanks may be reversed
5(c) H : There is no association between age and protein
0
(level) in the population
H : There is some association between age and protein
1
(level) in the population
Critical value is (±)0.5874
AnswerMarks
− 0.7832> 0.5874 (so reject H .)
0
There is sufficient evidence to suggest that there is
association between age and protein level (in the
AnswerMarks
population)B1
B1
B1
M1
A1FT
AnswerMarks
[5]3.3
1.2
3.4
1.1
AnswerMarks
2.2bNeed to see context and population in
at least one of the hypotheses
n = 12, 2-tailed 5%
For comparison of their r and sensible
s
AnswerMarks Guidance
critical value providedr < 1
s
FT their r and sensible critical value
s
Hypotheses need to have been stated
AnswerMarks
the right way roundConclusion must not
be too assertive and
refer to context
AnswerMarks Guidance
5(d) (Because a random sample) enables (proper) inference
about the population to be undertakenB2
[2]2.4
2.4B2 for correct explanation, as shown SC B1 for partially
correct explanation, eg
a random sample is
less likely to be biased
AnswerMarks Guidance
5(e) Because as the sample size increases, the random
variation in the sample tends to decrease.
The sample Spearman’s rank correlation coefficient
tends to get closer to the population correlation
AnswerMarks
coefficient.E1
E1
AnswerMarks
[2]2.2b
2.2bAllow E1 for ‘the influence of outliers
is reduced’ or for ‘gives a more
reliable result’ oe if there is no further
explanation.
AnswerMarks Guidance
Rank A1 2
Rank P8 12
Rank A7 8
Rank P5 9
Question 5:
5 | (a) | Because the scatter diagram does not appear to be
elliptical (but more of a funnel shape) so the
distribution is probably not bivariate Normal. | E1
E1
[2] | 3.5a
2.4 | For not elliptical
For full answer (dependent on first
mark) | “data is not bivariate
Normal” is E0
Normal bivariate is E0
5 | (b) | Rank A 1 2 3 4 5 6
Rank P 8 12 6 10 11 7
Rank A 7 8 9 10 11 12
Rank P 5 9 4 3 1 2
112
Spearman’s rank coefficient = − 0.78(32) (= − )
143 | M1
M1
A1
[3] | 1.1
1.1
1.1 | For ranking Age
For ranking Protein consistent with
ranking for age
BC | Ranks may be reversed
5 | (c) | H : There is no association between age and protein
0
(level) in the population
H : There is some association between age and protein
1
(level) in the population
Critical value is (±)0.5874
| − 0.7832 | > 0.5874 (so reject H .)
0
There is sufficient evidence to suggest that there is
association between age and protein level (in the
population) | B1
B1
B1
M1
A1FT
[5] | 3.3
1.2
3.4
1.1
2.2b | Need to see context and population in
at least one of the hypotheses
n = 12, 2-tailed 5%
For comparison of their r and sensible
s
critical value provided |r| < 1
s
FT their r and sensible critical value
s
Hypotheses need to have been stated
the right way round | Conclusion must not
be too assertive and
refer to context
5 | (d) | (Because a random sample) enables (proper) inference
about the population to be undertaken | B2
[2] | 2.4
2.4 | B2 for correct explanation, as shown | SC B1 for partially
correct explanation, eg
a random sample is
less likely to be biased
5 | (e) | Because as the sample size increases, the random
variation in the sample tends to decrease.
The sample Spearman’s rank correlation coefficient
tends to get closer to the population correlation
coefficient. | E1
E1
[2] | 2.2b
2.2b | Allow E1 for ‘the influence of outliers
is reduced’ or for ‘gives a more
reliable result’ oe if there is no further
explanation.
Rank A | 1 | 2 | 3 | 4 | 5 | 6
Rank P | 8 | 12 | 6 | 10 | 11 | 7
Rank A | 7 | 8 | 9 | 10 | 11 | 12
Rank P | 5 | 9 | 4 | 3 | 1 | 2
5 A medical researcher is investigating whether there is any relationship between the age of a person and the level of a particular protein in the person's blood. She measures the levels of the protein (measured in suitable units) in a random sample of 12 hospital patients of various ages (in years). The spreadsheet shows the values obtained, together with a scatter diagram which illustrates the data.\\
\includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-5_736_1470_1087_246}
\begin{enumerate}[label=(\alph*)]
\item The researcher decides that a test based on Pearson's product moment correlation coefficient may not be valid. Explain why she comes to this conclusion.
\item Calculate the value of Spearman's rank correlation coefficient.
\item Carry out a test based on this coefficient at the $5 \%$ significance level to investigate whether there is any association between age and protein level.
\item Explain why the researcher chose a sample that was random.
\item The researcher had originally intended to use a sample size of 6 rather than the 12 that she actually used.

Explain what advantage there is in using the larger sample size.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Minor 2022 Q5 [14]}}