OCR MEI Paper 2 2020 November — Question 13 7 marks

Exam BoardOCR MEI
ModulePaper 2 (Paper 2)
Year2020
SessionNovember
Marks7
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicHypothesis test of Pearson’s product-moment correlation coefficient
TypeInterpret p-value for correlation test
DifficultyModerate -0.5 This question tests basic interpretation of p-values in hypothesis testing, which is a fundamental statistical concept. Part (a) requires identifying two simple errors in Lee's reasoning (comparing p-value to 0 instead of 0.05, and comparing r to 0.05 instead of p-value). This is straightforward recall and application of hypothesis testing procedure with no complex calculations or novel insights required. It's slightly easier than average because it's purely conceptual identification of errors rather than performing the test.
Spec5.08d Hypothesis test: Pearson correlation

13 The pre-release material contains information concerning median house prices, recycling rates and employment rates. Fig. 13.1 shows a scatter diagram of recycling rate against employment rate for a random sample of 33 regions. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_629_1424_397_242} \captionsetup{labelformat=empty} \caption{Fig. 13.1}
\end{figure} The product moment correlation coefficient for this sample is 0.37154 and the associated \(p\)-value is 0.033. Lee conducts a hypothesis test at the \(5 \%\) level to test whether there is any evidence to suggest there is positive correlation between recycling rate and employment rate. He concludes that there is no evidence to suggest positive correlation because \(0.033 \approx 0\) and \(0.37154 > 0.05\).
  1. Explain whether Lee's reasoning is correct. Fig. 13.2 shows a scatter diagram of recycling rate against median house price for a random sample of 33 regions. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_648_1474_1758_242} \captionsetup{labelformat=empty} \caption{Fig. 13.2}
    \end{figure} The product moment correlation coefficient for this sample is - 0.33278 and the associated \(p\)-value is 0.058 . Fig. 13.3 shows summary statistics for the median house prices for the data in this sample. \begin{table}[h]
    Statistics
    \(n\)33
    Mean465467.9697
    \(\sigma\)201236.1345
    \(s\)204356.2606
    \(\Sigma x\)15360443
    \(\Sigma x ^ { 2 }\)8486161617387
    Min243500
    Q1342500
    Median410000
    Q3521000
    Max1200000
    \captionsetup{labelformat=empty} \caption{Fig. 13.3}
    \end{table}
  2. Use the information in Fig. 13.3 and Fig. 13.2 to show that there are at least two outliers.
  3. Describe the effect of removing the outliers on

Question 13:
Part (a):
AnswerMarks Guidance
AnswerMarks Guidance
Lee is wrong because he should make the comparison of \(0.033\) with \(0.05\)B1 (2.2a) if B0B0: SC1 for Lee has confused \(r\) with \(p\) or for \(0.37154\) suggests positive correlation
he should make the comparison of \(0.37154\) with \(0\)B1 [2] (2.2b) allow he should have compared \(r\) with the critical value
Part (b):
AnswerMarks Guidance
AnswerMarks Guidance
\(465467 + 2 \times 204356\)M1 (2.1) condone use of \(201236\) instead of \(204356\); ignore work relating to lower tail; or \(521000 + 1.5 \times (521000 - 342500)\)
awrt \(874180\) (or \(867940\) from use of \(210236\)) from scatter diagram the outliers are approximately \(920\,000,\ 1\,200\,000\)A1 [2] (2.2b) numerical values must be mentioned; or \(788750\) in which case accept two or three outliers identified extra one is approximately \(800\,000\)
Part (c):
AnswerMarks Guidance
AnswerMarks Guidance
the pmcc would (probably) be closer to 0 because the scatter is less well modelled by a straight lineB1 (2.2b) if B0B0 allow SC1 for \(r\) closer to 0 and \(p\)-value larger
the \(p\)-value would increase because a value which is closer to 0 is more likely assuming there is no correlationB1 [2] (2.2b)
Part (d):
AnswerMarks Guidance
AnswerMarks Guidance
the student's suggestion is reasonable, since there are other regions defined in the LDSB1 [1] (2.2b)
## Question 13:

### Part (a):
| Answer | Marks | Guidance |
|--------|-------|----------|
| Lee is wrong because he should make the comparison of $0.033$ with $0.05$ | B1 (2.2a) | if B0B0: SC1 for Lee has confused $r$ with $p$ or for $0.37154$ suggests positive correlation |
| he should make the comparison of $0.37154$ with $0$ | B1 [2] (2.2b) | allow he should have compared $r$ with the critical value |

### Part (b):
| Answer | Marks | Guidance |
|--------|-------|----------|
| $465467 + 2 \times 204356$ | M1 (2.1) | condone use of $201236$ instead of $204356$; ignore work relating to lower tail; or $521000 + 1.5 \times (521000 - 342500)$ |
| awrt $874180$ (or $867940$ from use of $210236$) from scatter diagram the outliers are approximately $920\,000,\ 1\,200\,000$ | A1 [2] (2.2b) | numerical values must be mentioned; or $788750$ in which case accept two or three outliers identified extra one is approximately $800\,000$ |

### Part (c):
| Answer | Marks | Guidance |
|--------|-------|----------|
| the pmcc would (probably) be closer to 0 because the scatter is less well modelled by a straight line | B1 (2.2b) | if B0B0 allow SC1 for $r$ closer to 0 and $p$-value larger |
| the $p$-value would increase because a value which is closer to 0 is more likely assuming there is no correlation | B1 [2] (2.2b) | |

### Part (d):
| Answer | Marks | Guidance |
|--------|-------|----------|
| the student's suggestion is reasonable, since there are other regions defined in the LDS | B1 [1] (2.2b) | |
13 The pre-release material contains information concerning median house prices, recycling rates and employment rates. Fig. 13.1 shows a scatter diagram of recycling rate against employment rate for a random sample of 33 regions.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_629_1424_397_242}
\captionsetup{labelformat=empty}
\caption{Fig. 13.1}
\end{center}
\end{figure}

The product moment correlation coefficient for this sample is 0.37154 and the associated $p$-value is 0.033.

Lee conducts a hypothesis test at the $5 \%$ level to test whether there is any evidence to suggest there is positive correlation between recycling rate and employment rate. He concludes that there is no evidence to suggest positive correlation because $0.033 \approx 0$ and $0.37154 > 0.05$.
\begin{enumerate}[label=(\alph*)]
\item Explain whether Lee's reasoning is correct.

Fig. 13.2 shows a scatter diagram of recycling rate against median house price for a random sample of 33 regions.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_648_1474_1758_242}
\captionsetup{labelformat=empty}
\caption{Fig. 13.2}
\end{center}
\end{figure}

The product moment correlation coefficient for this sample is - 0.33278 and the associated $p$-value is 0.058 .

Fig. 13.3 shows summary statistics for the median house prices for the data in this sample.

\begin{table}[h]
\begin{center}
\begin{tabular}{ | l | l | }
\hline
\multicolumn{2}{|l|}{Statistics} \\
\hline
$n$ & 33 \\
\hline
Mean & 465467.9697 \\
\hline
$\sigma$ & 201236.1345 \\
\hline
$s$ & 204356.2606 \\
\hline
$\Sigma x$ & 15360443 \\
\hline
$\Sigma x ^ { 2 }$ & 8486161617387 \\
\hline
Min & 243500 \\
\hline
Q1 & 342500 \\
\hline
Median & 410000 \\
\hline
Q3 & 521000 \\
\hline
Max & 1200000 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Fig. 13.3}
\end{center}
\end{table}
\item Use the information in Fig. 13.3 and Fig. 13.2 to show that there are at least two outliers.
\item Describe the effect of removing the outliers on

\begin{itemize}
  \item the product moment correlation coefficient between recycling rate and median house price,
  \item the $p$-value associated with this correlation coefficient,\\
in each case explaining your answer.\\[0pt]
[2]\\
All 33 items in the sample are areas in London. A student suggests that it is very unlikely that only areas in London would be selected in a random sample.
\item Use your knowledge of the pre-release material to explain whether you think the student's suggestion is reasonable.
\end{itemize}
\end{enumerate}

\hfill \mbox{\textit{OCR MEI Paper 2 2020 Q13 [7]}}