| Exam Board | OCR MEI |
|---|---|
| Module | Paper 2 (Paper 2) |
| Year | 2023 |
| Session | June |
| Marks | 8 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Interpret census or real-world data |
| Difficulty | Moderate -0.8 This is a straightforward data interpretation question requiring basic statistical literacy. Part (a) asks students to identify missing data (#N/A) that needs removal—a simple data cleaning concept. Part (b) requires recognizing that weak correlation (0.37) and non-linear scatter pattern suggest linear modeling is inappropriate, both standard A-level observations requiring no complex calculation or novel insight. |
| Spec | 2.02c Scatter diagrams and regression lines2.02j Clean data: missing data, errors5.08a Pearson correlation: calculate pmcc |
| Median Income of Taxpayers in £ | Percentage of Pupils Achieving 5 or more A*-C, including English and Maths | |
| City of London | 61100 | \#N/A |
| Barking and Dagenham | 21800 | 54.0 |
| Barnet | 27100 | 70.1 |
| Bexley | 24400 | 55.0 |
| Brent | 22700 | 60.0 |
| Bromley | 28100 | 68.0 |
|
| |||||
| mean | 27216 | 61.0 | ||||
| standard deviation | 4177.5 | 5.32 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| discard City of London (as part of the data not available) or discard any regions where one or more pieces of data are missing oe | B1 | LDS advantage; do not allow if answer spoiled; eg because it's an anomaly; eg because it's an outlier |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| scatter does not look linear oe | B1 | ignore extra comments unless they contradict an otherwise correct answer |
| pmcc not close to 1 oe | B1 | ignore extra comments unless they contradict an otherwise correct answer |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| \(27216 \pm 2 \times 4177.5\) or \(61.0 \pm 2 \times 5.32\) | M1 | use of 2 standard deviation check for one of the 4 calculations soi |
| \(m < 18861\) or \(m > 35571\) | A1 | allow \(\leq\) and \(\geq\) |
| percentage \(< 50.36\) or percentage \(> 71.64\) | A1 | allow \(\leq\) and \(\geq\); if M1A0A0 allow M1 SCB1 for all 4 correct values seen |
| [scatter diagram with outliers circled] | A1 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| between 0 and 0.3743 since eg outliers gave a false impression of linearity; eg scatter will be more like a circle | B1 | need to refer to the shape of the scatter oe |
## Question 14(a):
| Answer | Marks | Guidance |
|--------|-------|----------|
| discard City of London (as part of the data not available) or discard any regions where one or more pieces of data are missing oe | B1 | **LDS** advantage; do not allow if answer spoiled; eg because it's an anomaly; eg because it's an outlier |
---
## Question 14(b):
| Answer | Marks | Guidance |
|--------|-------|----------|
| **scatter** does not look linear oe | B1 | ignore extra comments unless they contradict an otherwise correct answer |
| pmcc not close to 1 oe | B1 | ignore extra comments unless they contradict an otherwise correct answer |
---
## Question 14(c):
| Answer | Marks | Guidance |
|--------|-------|----------|
| $27216 \pm 2 \times 4177.5$ or $61.0 \pm 2 \times 5.32$ | M1 | use of 2 standard deviation check for one of the 4 calculations soi |
| $m < 18861$ or $m > 35571$ | A1 | allow $\leq$ and $\geq$ |
| percentage $< 50.36$ or percentage $> 71.64$ | A1 | allow $\leq$ and $\geq$; if **M1A0A0** allow **M1 SCB1** for all 4 correct values seen |
| [scatter diagram with outliers circled] | A1 | |
---
## Question 14(d):
| Answer | Marks | Guidance |
|--------|-------|----------|
| between 0 and 0.3743 since eg outliers gave a false impression of linearity; eg scatter will be more like a circle | B1 | need to refer to the shape of the scatter oe |
---
14 The pre-release material contains information concerning the median income of taxpayers in $\pounds$ and the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C, including English and Maths, for different areas of London.
Some of the data for 2014/15 is shown in Fig. 14.1.
\begin{table}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Fig. 14.1}
\begin{tabular}{|l|l|l|}
\hline
& Median Income of Taxpayers in £ & Percentage of Pupils Achieving 5 or more A*-C, including English and Maths \\
\hline
City of London & 61100 & \#N/A \\
\hline
Barking and Dagenham & 21800 & 54.0 \\
\hline
Barnet & 27100 & 70.1 \\
\hline
Bexley & 24400 & 55.0 \\
\hline
Brent & 22700 & 60.0 \\
\hline
Bromley & 28100 & 68.0 \\
\hline
\end{tabular}
\end{center}
\end{table}
A student investigated whether there is any relationship between median income of taxpayers and percentage of pupils achieving 5 or more GCSEs at grade A*-C, including English and Maths.
\begin{enumerate}[label=(\alph*)]
\item With reference to Fig. 14.1, explain how the data should be cleaned before any analysis can take place.
After the data was cleaned, the student used software to draw the scatter diagram shown in Fig. 14.2.
Scatter diagram to show percentage of pupils achieving 5 A*-C grades against median income of taxpayers
\begin{figure}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Fig. 14.2}
\includegraphics[alt={},max width=\textwidth]{11788aaf-98fb-4a78-8a40-a40743b1fe15-10_574_1481_1900_241}
\end{center}
\end{figure}
The student calculated that the product moment correlation coefficient for these data is 0.3743 .
\item Give two reasons why it may not be appropriate to use a linear model for the relationship between median income of taxpayers in $\pounds$ and the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C.
The student carried out some further analysis. The results are shown in Fig. 14.3.
\begin{table}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Fig. 14.3}
\begin{tabular}{ | l | c | c | }
\hline
& \begin{tabular}{ l }
median income of \\
taxpayers in $\pounds$ \\
\end{tabular} & \begin{tabular}{ l }
percentage of pupils \\
achieving $5 + \mathrm { A } ^ { * } - \mathrm { C }$ \\
\end{tabular} \\
\hline
mean & 27216 & 61.0 \\
\hline
standard deviation & 4177.5 & 5.32 \\
\hline
\end{tabular}
\end{center}
\end{table}
The student identified three outliers in total.
\item \begin{itemize}
\item Use the information in Fig. 14.3 to determine the range of values of the median income of taxpayers in $\pounds$ which are outliers.
\item Use the information in Fig. 14.3 to determine the range of values of the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C which are outliers.
\item On the copy of Fig. 14.2 in the Printed Answer Booklet, circle the three outliers identified by the student.
\end{itemize}
The student decided to remove these outliers and recalculate the product moment correlation coefficient.
\item Explain whether the new value of the product moment correlation coefficient would be between 0.3743 and 1 or between 0 and 0.3743 .
\end{enumerate}
\hfill \mbox{\textit{OCR MEI Paper 2 2023 Q14 [8]}}