| Exam Board | OCR MEI |
|---|---|
| Module | Further Statistics Major (Further Statistics Major) |
| Year | 2023 |
| Session | June |
| Marks | 12 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Hypothesis test of Pearson’s product-moment correlation coefficient |
| Type | Two-tailed test for any correlation |
| Difficulty | Standard +0.3 This is a standard hypothesis test for correlation with routine calculations. Part (a) requires recognizing linearity from a scatter diagram, (b) is direct formula application with given summary statistics, (c) follows the standard test procedure comparing r to critical values, and (d) tests understanding of test assumptions. All parts are textbook-standard with no novel problem-solving required, though it's slightly above average difficulty due to being Further Maths content and requiring careful attention to the bivariate normal assumption. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.08b Linear coding: effect on pmcc5.08d Hypothesis test: Pearson correlation |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (a) | Scatter diagram appears to be roughly elliptical |
| so the distribution may be bivariate Normal | E1 |
| Answer | Marks |
|---|---|
| [2] | 3.5a |
| 2.4 | Condone ‘The data is bivariate normal’ or ‘The data comes |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (b) | DR |
| Answer | Marks |
|---|---|
| = 0.211 | M1 |
| Answer | Marks |
|---|---|
| [4] | 1.1a |
| Answer | Marks |
|---|---|
| 1.1 | For S |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (c) | H : = 0, H : ≠ 0 |
| Answer | Marks |
|---|---|
| speed | B1 |
| Answer | Marks |
|---|---|
| [5] | 3.3 |
| Answer | Marks |
|---|---|
| 2.2b | For both hypotheses Allow any symbol in place of ρ if |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (d) | Because download speed would not have a probability |
| Answer | Marks | Guidance |
|---|---|---|
| met). | E[1] | |
| [1] | 2.2a | Allow ‘the situation is not sampling from a bivariate |
Question 6:
6 | (a) | Scatter diagram appears to be roughly elliptical
so the distribution may be bivariate Normal | E1
E1
[2] | 3.5a
2.4 | Condone ‘The data is bivariate normal’ or ‘The data comes
from a bivariate normal distribution’??
6 | (b) | DR
S =4713.62− 1 342.10273.65 [= 32.837]
xy 20
S = 5 9 8 9 .5 3 − 12 3 4 2 .1 0 2 [= 137.91]
x x 0
S = 3 9 1 9 .5 3 − 12 2 7 3 .6 5 2 [= 175.31]
y y 0
S 3 2 .8 3 7
r = x y =
S S 1 3 7 .9 1 1 7 5 .3 1
x x y y
= 0.211 | M1
M1
M1
A1
[4] | 1.1a
1.1
3.3
1.1 | For S
x y
For either S or S
x x yy
For general form including sq. root
Allow awrt 0.21 without incorrect working
6 | (c) | H : = 0, H : ≠ 0
0 1
where is the population pmcc between x and y
For n = 20, the 5% critical value is 0.4438
Since 0.211 < 0.4438 the result is not significant, so there
is insufficient evidence to reject H
0
There is insufficient evidence at the 5% level to suggest
that there is correlation between download and upload
speed | B1
B1
B1
M1
A1
[5] | 3.3
2.5
3.4
1.1
2.2b | For both hypotheses Allow any symbol in place of ρ if
defined as population pmcc
For defining NB Hypotheses in words only get B1 unless
population mentioned ‘between x and y’ may be seen in the
hypotheses alongside the correct hypotheses in symbols,
rather than in the definition of
For correct critical value
For comparison and conclusion
Must be in context
FT their pmcc and cv (provided between -1 and +1) for M1
but not for A1
6 | (d) | Because download speed would not have a probability
distribution (so the distributional assumption could not be
met). | E[1]
[1] | 2.2a | Allow ‘the situation is not sampling from a bivariate
Normal population’ or ‘Statistical inference cannot be
carried out for non-random data’. Allow other suitable
answers.
E0 for ‘pmcc can only be found for random variables’
6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed $x$ and upload speed $y$ in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected.\\
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
\begin{enumerate}[label=(\alph*)]
\item Explain why the student decides to carry out a test based on the product moment correlation coefficient.
Summary statistics for the 20 occasions are as follows.
$$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
\item In this question you must show detailed reasoning.
Calculate the product moment correlation coefficient.
\item Carry out a hypothesis test at the $5 \%$ significance level to investigate whether there is any correlation between download speed and upload speed.
\item Both of the variables, download speed and upload speed, are random.
Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.
\end{enumerate}
\hfill \mbox{\textit{OCR MEI Further Statistics Major 2023 Q6 [12]}}