OCR MEI Further Statistics Major 2023 June — Question 6 12 marks

Exam BoardOCR MEI
ModuleFurther Statistics Major (Further Statistics Major)
Year2023
SessionJune
Marks12
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicHypothesis test of Pearson’s product-moment correlation coefficient
TypeTwo-tailed test for any correlation
DifficultyStandard +0.3 This is a standard hypothesis test for correlation with routine calculations. Part (a) requires recognizing linearity from a scatter diagram, (b) is direct formula application with given summary statistics, (c) follows the standard test procedure comparing r to critical values, and (d) tests understanding of test assumptions. All parts are textbook-standard with no novel problem-solving required, though it's slightly above average difficulty due to being Further Maths content and requiring careful attention to the bivariate normal assumption.
Spec5.08a Pearson correlation: calculate pmcc5.08b Linear coding: effect on pmcc5.08d Hypothesis test: Pearson correlation

6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed \(x\) and upload speed \(y\) in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected. \includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
  1. Explain why the student decides to carry out a test based on the product moment correlation coefficient. Summary statistics for the 20 occasions are as follows. $$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any correlation between download speed and upload speed.
  4. Both of the variables, download speed and upload speed, are random. Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.

Question 6:
AnswerMarks Guidance
6(a) Scatter diagram appears to be roughly elliptical
so the distribution may be bivariate NormalE1
E1
AnswerMarks
[2]3.5a
2.4Condone ‘The data is bivariate normal’ or ‘The data comes
from a bivariate normal distribution’??
AnswerMarks Guidance
6(b) DR
S =4713.62− 1 342.10273.65 [= 32.837]
xy 20
S = 5 9 8 9 .5 3 − 12  3 4 2 .1 0 2 [= 137.91]
x x 0
S = 3 9 1 9 .5 3 − 12  2 7 3 .6 5 2 [= 175.31]
y y 0
S 3 2 .8 3 7
r = x y =
S S 1 3 7 .9  1 1 7 5 .3 1
x x y y
AnswerMarks
= 0.211M1
M1
M1
A1
AnswerMarks
[4]1.1a
1.1
3.3
AnswerMarks
1.1For S
x y
For either S or S
x x yy
For general form including sq. root
Allow awrt 0.21 without incorrect working
AnswerMarks Guidance
6(c) H :  = 0, H :  ≠ 0
0 1
where  is the population pmcc between x and y
For n = 20, the 5% critical value is 0.4438
Since 0.211 < 0.4438 the result is not significant, so there
is insufficient evidence to reject H
0
There is insufficient evidence at the 5% level to suggest
that there is correlation between download and upload
AnswerMarks
speedB1
B1
B1
M1
A1
AnswerMarks
[5]3.3
2.5
3.4
1.1
AnswerMarks
2.2bFor both hypotheses Allow any symbol in place of ρ if
defined as population pmcc
For defining  NB Hypotheses in words only get B1 unless
population mentioned ‘between x and y’ may be seen in the
hypotheses alongside the correct hypotheses in symbols,
rather than in the definition of 
For correct critical value
For comparison and conclusion
Must be in context
FT their pmcc and cv (provided between -1 and +1) for M1
but not for A1
AnswerMarks Guidance
6(d) Because download speed would not have a probability
distribution (so the distributional assumption could not be
AnswerMarks Guidance
met).E[1]
[1]2.2a Allow ‘the situation is not sampling from a bivariate
Normal population’ or ‘Statistical inference cannot be
carried out for non-random data’. Allow other suitable
answers.
E0 for ‘pmcc can only be found for random variables’
Question 6:
6 | (a) | Scatter diagram appears to be roughly elliptical
so the distribution may be bivariate Normal | E1
E1
[2] | 3.5a
2.4 | Condone ‘The data is bivariate normal’ or ‘The data comes
from a bivariate normal distribution’??
6 | (b) | DR
S =4713.62− 1 342.10273.65 [= 32.837]
xy 20
S = 5 9 8 9 .5 3 − 12  3 4 2 .1 0 2 [= 137.91]
x x 0
S = 3 9 1 9 .5 3 − 12  2 7 3 .6 5 2 [= 175.31]
y y 0
S 3 2 .8 3 7
r = x y =
S S 1 3 7 .9  1 1 7 5 .3 1
x x y y
= 0.211 | M1
M1
M1
A1
[4] | 1.1a
1.1
3.3
1.1 | For S
x y
For either S or S
x x yy
For general form including sq. root
Allow awrt 0.21 without incorrect working
6 | (c) | H :  = 0, H :  ≠ 0
0 1
where  is the population pmcc between x and y
For n = 20, the 5% critical value is 0.4438
Since 0.211 < 0.4438 the result is not significant, so there
is insufficient evidence to reject H
0
There is insufficient evidence at the 5% level to suggest
that there is correlation between download and upload
speed | B1
B1
B1
M1
A1
[5] | 3.3
2.5
3.4
1.1
2.2b | For both hypotheses Allow any symbol in place of ρ if
defined as population pmcc
For defining  NB Hypotheses in words only get B1 unless
population mentioned ‘between x and y’ may be seen in the
hypotheses alongside the correct hypotheses in symbols,
rather than in the definition of 
For correct critical value
For comparison and conclusion
Must be in context
FT their pmcc and cv (provided between -1 and +1) for M1
but not for A1
6 | (d) | Because download speed would not have a probability
distribution (so the distributional assumption could not be
met). | E[1]
[1] | 2.2a | Allow ‘the situation is not sampling from a bivariate
Normal population’ or ‘Statistical inference cannot be
carried out for non-random data’. Allow other suitable
answers.
E0 for ‘pmcc can only be found for random variables’
6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed $x$ and upload speed $y$ in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected.\\
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
\begin{enumerate}[label=(\alph*)]
\item Explain why the student decides to carry out a test based on the product moment correlation coefficient.

Summary statistics for the 20 occasions are as follows.

$$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
\item In this question you must show detailed reasoning.

Calculate the product moment correlation coefficient.
\item Carry out a hypothesis test at the $5 \%$ significance level to investigate whether there is any correlation between download speed and upload speed.
\item Both of the variables, download speed and upload speed, are random.

Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Major 2023 Q6 [12]}}