| Exam Board | OCR MEI |
|---|---|
| Module | S2 (Statistics 2) |
| Year | 2012 |
| Session | June |
| Marks | 19 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Hypothesis test of Pearson’s product-moment correlation coefficient |
| Type | Calculate PMCC from summary statistics |
| Difficulty | Standard +0.3 This is a standard Further Maths Statistics question requiring routine application of the PMCC formula and hypothesis test procedure. While it has multiple parts, each involves straightforward recall and calculation with no novel problem-solving. The conceptual questions (parts iii-v) require only textbook understanding of assumptions and significance levels, making it slightly easier than average overall. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.08d Hypothesis test: Pearson correlation |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| The underlying population must have a bivariate Normal distribution. | B1 | Condone "bivariate Normal distribution", "underlying bivariate Normal distribution", but do not allow "the data have a bivariate Normal distribution" |
| The points in the scatter diagram should have a roughly elliptical shape. | E1 | Condone 'oval' or suitable diagram |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| The hypothesis test has shown that there appears to be correlation. | E1 | For relevant comment relating to the test result or positive value of \(r\) in supporting (unless FT leads to not supporting) the commentator's suggestion. Or correlation does not imply causation. There may be a third factor. For questioning the use of the word 'must' |
| However it could be that there is a third causal factor | E1 | Allow any two suitable, statistically based comments. |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| Yes because the critical value at the 1% level is 0.7646 which is larger than the test statistic | B1* | B1 for 0.7646 seen |
| E1dep* | E1 for comment consistent with their (ii) provided \( | r |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| One advantage of a 1% level is that one is less likely to reject the null hypothesis when it is true. | E1 | Wording must be clear. |
| One disadvantage of a 1% level is that one is more likely to accept the null hypothesis when it is false. | E1 | o.e. |
# Question 1:
## Part (iii)
| Answer | Mark | Guidance |
|--------|------|----------|
| The underlying population must have a bivariate Normal distribution. | B1 | Condone "bivariate Normal distribution", "underlying bivariate Normal distribution", but **do not allow** "the data have a bivariate Normal distribution" |
| The points in the scatter diagram should have a roughly elliptical shape. | E1 | Condone 'oval' or suitable diagram |
## Part (iv)
| Answer | Mark | Guidance |
|--------|------|----------|
| The hypothesis test has shown that there appears to be correlation. | E1 | For relevant comment relating to the test result or positive value of $r$ in supporting (unless FT leads to not supporting) the commentator's suggestion. Or correlation does not imply causation. There may be a third factor. For questioning the use of the word 'must' |
| However it could be that there is a third causal factor | E1 | Allow any two suitable, statistically based comments. |
## Part (v)(A)
| Answer | Mark | Guidance |
|--------|------|----------|
| Yes because the critical value at the 1% level is 0.7646 which is larger than the test statistic | B1* | B1 for 0.7646 seen |
| | E1dep* | E1 for comment consistent with their (ii) provided $|r| < 1$ |
## Part (v)(B)
| Answer | Mark | Guidance |
|--------|------|----------|
| One advantage of a 1% level is that one is less likely to reject the null hypothesis when it is true. | E1 | Wording must be clear. |
| One disadvantage of a 1% level is that one is more likely to accept the null hypothesis when it is false. | E1 | o.e. |
---
1 The times, in seconds, taken by ten randomly selected competitors for the first and last sections of an Olympic bobsleigh run are denoted by $x$ and $y$ respectively. Summary statistics for these data are as follows.
$$\Sigma x = 113.69 \quad \Sigma y = 52.81 \quad \Sigma x ^ { 2 } = 1292.56 \quad \Sigma y ^ { 2 } = 278.91 \quad \Sigma x y = 600.41 \quad n = 10$$
\begin{enumerate}[label=(\roman*)]
\item Calculate the sample product moment correlation coefficient.
\item Carry out a hypothesis test at the $10 \%$ significance level to investigate whether there is any correlation between times taken for the first and last sections of the bobsleigh run.
\item State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
\item A commentator says that in order to have a fast time on the last section, you must have a fast time on the first section. Comment briefly on this suggestion.
\item (A) Would your conclusion in part (ii) have been different if you had carried out the hypothesis test at the $1 \%$ level rather than the $10 \%$ level? Explain your answer.\\
(B) State one advantage and one disadvantage of using a $1 \%$ significance level rather than a $10 \%$ significance level in a hypothesis test.
\end{enumerate}
\hfill \mbox{\textit{OCR MEI S2 2012 Q1 [19]}}