| Exam Board | OCR MEI |
|---|---|
| Module | Paper 2 (Paper 2) |
| Year | 2020 |
| Session | November |
| Marks | 7 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Hypothesis test of Pearson’s product-moment correlation coefficient |
| Type | Interpret p-value for correlation test |
| Difficulty | Moderate -0.5 This question tests basic interpretation of p-values in hypothesis testing, which is a fundamental statistical concept. Part (a) requires identifying two simple errors in Lee's reasoning (comparing p-value to 0 instead of 0.05, and comparing r to 0.05 instead of p-value). This is straightforward recall and application of hypothesis testing procedure with no complex calculations or novel insights required. It's slightly easier than average because it's purely conceptual identification of errors rather than performing the test. |
| Spec | 5.08d Hypothesis test: Pearson correlation |
| Statistics | |
| \(n\) | 33 |
| Mean | 465467.9697 |
| \(\sigma\) | 201236.1345 |
| \(s\) | 204356.2606 |
| \(\Sigma x\) | 15360443 |
| \(\Sigma x ^ { 2 }\) | 8486161617387 |
| Min | 243500 |
| Q1 | 342500 |
| Median | 410000 |
| Q3 | 521000 |
| Max | 1200000 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| Lee is wrong because he should make the comparison of \(0.033\) with \(0.05\) | B1 (2.2a) | if B0B0: SC1 for Lee has confused \(r\) with \(p\) or for \(0.37154\) suggests positive correlation |
| he should make the comparison of \(0.37154\) with \(0\) | B1 [2] (2.2b) | allow he should have compared \(r\) with the critical value |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| \(465467 + 2 \times 204356\) | M1 (2.1) | condone use of \(201236\) instead of \(204356\); ignore work relating to lower tail; or \(521000 + 1.5 \times (521000 - 342500)\) |
| awrt \(874180\) (or \(867940\) from use of \(210236\)) from scatter diagram the outliers are approximately \(920\,000,\ 1\,200\,000\) | A1 [2] (2.2b) | numerical values must be mentioned; or \(788750\) in which case accept two or three outliers identified extra one is approximately \(800\,000\) |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| the pmcc would (probably) be closer to 0 because the scatter is less well modelled by a straight line | B1 (2.2b) | if B0B0 allow SC1 for \(r\) closer to 0 and \(p\)-value larger |
| the \(p\)-value would increase because a value which is closer to 0 is more likely assuming there is no correlation | B1 [2] (2.2b) |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| the student's suggestion is reasonable, since there are other regions defined in the LDS | B1 [1] (2.2b) |
## Question 13:
### Part (a):
| Answer | Marks | Guidance |
|--------|-------|----------|
| Lee is wrong because he should make the comparison of $0.033$ with $0.05$ | B1 (2.2a) | if B0B0: SC1 for Lee has confused $r$ with $p$ or for $0.37154$ suggests positive correlation |
| he should make the comparison of $0.37154$ with $0$ | B1 [2] (2.2b) | allow he should have compared $r$ with the critical value |
### Part (b):
| Answer | Marks | Guidance |
|--------|-------|----------|
| $465467 + 2 \times 204356$ | M1 (2.1) | condone use of $201236$ instead of $204356$; ignore work relating to lower tail; or $521000 + 1.5 \times (521000 - 342500)$ |
| awrt $874180$ (or $867940$ from use of $210236$) from scatter diagram the outliers are approximately $920\,000,\ 1\,200\,000$ | A1 [2] (2.2b) | numerical values must be mentioned; or $788750$ in which case accept two or three outliers identified extra one is approximately $800\,000$ |
### Part (c):
| Answer | Marks | Guidance |
|--------|-------|----------|
| the pmcc would (probably) be closer to 0 because the scatter is less well modelled by a straight line | B1 (2.2b) | if B0B0 allow SC1 for $r$ closer to 0 and $p$-value larger |
| the $p$-value would increase because a value which is closer to 0 is more likely assuming there is no correlation | B1 [2] (2.2b) | |
### Part (d):
| Answer | Marks | Guidance |
|--------|-------|----------|
| the student's suggestion is reasonable, since there are other regions defined in the LDS | B1 [1] (2.2b) | |
13 The pre-release material contains information concerning median house prices, recycling rates and employment rates. Fig. 13.1 shows a scatter diagram of recycling rate against employment rate for a random sample of 33 regions.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_629_1424_397_242}
\captionsetup{labelformat=empty}
\caption{Fig. 13.1}
\end{center}
\end{figure}
The product moment correlation coefficient for this sample is 0.37154 and the associated $p$-value is 0.033.
Lee conducts a hypothesis test at the $5 \%$ level to test whether there is any evidence to suggest there is positive correlation between recycling rate and employment rate. He concludes that there is no evidence to suggest positive correlation because $0.033 \approx 0$ and $0.37154 > 0.05$.
\begin{enumerate}[label=(\alph*)]
\item Explain whether Lee's reasoning is correct.
Fig. 13.2 shows a scatter diagram of recycling rate against median house price for a random sample of 33 regions.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_648_1474_1758_242}
\captionsetup{labelformat=empty}
\caption{Fig. 13.2}
\end{center}
\end{figure}
The product moment correlation coefficient for this sample is - 0.33278 and the associated $p$-value is 0.058 .
Fig. 13.3 shows summary statistics for the median house prices for the data in this sample.
\begin{table}[h]
\begin{center}
\begin{tabular}{ | l | l | }
\hline
\multicolumn{2}{|l|}{Statistics} \\
\hline
$n$ & 33 \\
\hline
Mean & 465467.9697 \\
\hline
$\sigma$ & 201236.1345 \\
\hline
$s$ & 204356.2606 \\
\hline
$\Sigma x$ & 15360443 \\
\hline
$\Sigma x ^ { 2 }$ & 8486161617387 \\
\hline
Min & 243500 \\
\hline
Q1 & 342500 \\
\hline
Median & 410000 \\
\hline
Q3 & 521000 \\
\hline
Max & 1200000 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Fig. 13.3}
\end{center}
\end{table}
\item Use the information in Fig. 13.3 and Fig. 13.2 to show that there are at least two outliers.
\item Describe the effect of removing the outliers on
\begin{itemize}
\item the product moment correlation coefficient between recycling rate and median house price,
\item the $p$-value associated with this correlation coefficient,\\
in each case explaining your answer.\\[0pt]
[2]\\
All 33 items in the sample are areas in London. A student suggests that it is very unlikely that only areas in London would be selected in a random sample.
\item Use your knowledge of the pre-release material to explain whether you think the student's suggestion is reasonable.
\end{itemize}
\end{enumerate}
\hfill \mbox{\textit{OCR MEI Paper 2 2020 Q13 [7]}}