OCR MEI S2 2006 June — Question 3 18 marks

Exam BoardOCR MEI
ModuleS2 (Statistics 2)
Year2006
SessionJune
Marks18
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicHypothesis test of Pearson’s product-moment correlation coefficient
TypeCalculate PMCC from summary statistics
DifficultyStandard +0.3 This is a standard S2 hypothesis testing question requiring routine application of the PMCC formula from summary statistics and comparison with critical values. Part (i) is straightforward calculation, part (ii) is textbook hypothesis test procedure, parts (iii) and (iv) test understanding of significance levels and experimental design but require only recall of standard concepts. Slightly easier than average due to being methodical rather than requiring problem-solving insight.
Spec5.08a Pearson correlation: calculate pmcc5.08d Hypothesis test: Pearson correlation

3 A student is investigating the relationship between the length \(x \mathrm {~mm}\) and circumference \(y \mathrm {~mm}\) of plums from a large crop. The student measures the dimensions of a random sample of 10 plums from this crop. Summary statistics for these dimensions are as follows. $$\begin{aligned} & \sum x = 4715 \quad \sum y = 13175 \quad \sum x ^ { 2 } = 2237725 \\ & \sum y ^ { 2 } = 17455825 \quad \sum x y = 6235575 \quad n = 10 \end{aligned}$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between length and circumference of plums from this crop. State your hypotheses clearly, defining any symbols which you use.
  3. (A) Explain the meaning of a 5\% significance level.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(5 \%\) significance level in a hypothesis test. The student decides to take another random sample of 10 plums. Using the same hypotheses as in part (ii), the correlation coefficient for this second sample is significant at the \(5 \%\) level. The student decides to ignore the first result and concludes that there is correlation between the length and circumference of plums in the crop.
  4. Comment on the student's decision to ignore the first result. Suggest a better way in which the student could proceed.

Question 3:
Part (i)
AnswerMarks
\(S_{xx} = 2237725 - \frac{4715^2}{10} = 14693.75\)M1
\(S_{yy} = 17455825 - \frac{13175^2}{10} = 551.25\)A1
\(S_{xy} = 6235575 - \frac{4715 \times 13175}{10} = 2496.25\)A1
\(r = \frac{2496.25}{\sqrt{14693.75 \times 551.25}}\)M1
\(r = 0.877\)A1
Part (ii)
AnswerMarks
\(H_0: \rho = 0\); \(H_1: \rho \neq 0\) (\(\rho\) = population correlation coefficient)B1
Critical value at 5%, \(n=10\): \(r = 0.6319\)B1
\(0.877 > 0.6319\), reject \(H_0\)M1
Significant evidence of correlation between length and circumferenceA1 A1 A1
Part (iii)(A)
AnswerMarks
There is a 5% probability of rejecting \(H_0\) when it is in fact trueB1 B1
Part (iii)(B)
AnswerMarks
Advantage: less likely to incorrectly reject \(H_0\) / fewer Type I errorsB1
Disadvantage: less likely to detect a genuine correlation / more Type II errorsB1
Part (iv)
AnswerMarks
Wrong to ignore first result; both results should be consideredB1
Could combine samples or take larger sampleB1 B1
# Question 3:

## Part (i)
$S_{xx} = 2237725 - \frac{4715^2}{10} = 14693.75$ | M1 |
$S_{yy} = 17455825 - \frac{13175^2}{10} = 551.25$ | A1 |
$S_{xy} = 6235575 - \frac{4715 \times 13175}{10} = 2496.25$ | A1 |
$r = \frac{2496.25}{\sqrt{14693.75 \times 551.25}}$ | M1 |
$r = 0.877$ | A1 |

## Part (ii)
$H_0: \rho = 0$; $H_1: \rho \neq 0$ ($\rho$ = population correlation coefficient) | B1 |
Critical value at 5%, $n=10$: $r = 0.6319$ | B1 |
$0.877 > 0.6319$, reject $H_0$ | M1 |
Significant evidence of correlation between length and circumference | A1 A1 A1 |

## Part (iii)(A)
There is a 5% probability of rejecting $H_0$ when it is in fact true | B1 B1 |

## Part (iii)(B)
Advantage: less likely to incorrectly reject $H_0$ / fewer Type I errors | B1 |
Disadvantage: less likely to detect a genuine correlation / more Type II errors | B1 |

## Part (iv)
Wrong to ignore first result; both results should be considered | B1 |
Could combine samples or take larger sample | B1 B1 |

---
3 A student is investigating the relationship between the length $x \mathrm {~mm}$ and circumference $y \mathrm {~mm}$ of plums from a large crop. The student measures the dimensions of a random sample of 10 plums from this crop. Summary statistics for these dimensions are as follows.

$$\begin{aligned}
& \sum x = 4715 \quad \sum y = 13175 \quad \sum x ^ { 2 } = 2237725 \\
& \sum y ^ { 2 } = 17455825 \quad \sum x y = 6235575 \quad n = 10
\end{aligned}$$
\begin{enumerate}[label=(\roman*)]
\item Calculate the sample product moment correlation coefficient.
\item Carry out a hypothesis test at the $5 \%$ significance level to determine whether there is any correlation between length and circumference of plums from this crop. State your hypotheses clearly, defining any symbols which you use.
\item (A) Explain the meaning of a 5\% significance level.\\
(B) State one advantage and one disadvantage of using a $1 \%$ significance level rather than a $5 \%$ significance level in a hypothesis test.

The student decides to take another random sample of 10 plums. Using the same hypotheses as in part (ii), the correlation coefficient for this second sample is significant at the $5 \%$ level. The student decides to ignore the first result and concludes that there is correlation between the length and circumference of plums in the crop.
\item Comment on the student's decision to ignore the first result. Suggest a better way in which the student could proceed.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI S2 2006 Q3 [18]}}