| Exam Board | OCR MEI |
|---|---|
| Module | AS Paper 2 (AS Paper 2) |
| Year | 2023 |
| Session | June |
| Marks | 4 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Assess appropriateness of correlation analysis |
| Difficulty | Moderate -0.5 This question tests understanding of data cleaning and appropriateness of correlation analysis, which are fundamental statistical concepts. Parts (a) and (b) involve straightforward identification of missing data (#N/A) and outliers from a scatter diagram. Part (c) requires explaining why linear regression may be inappropriate (likely due to weak/no correlation visible in the scatter), and part (d) involves contextual reasoning about whether pulse rates above 100 are errors. These are standard AS-level statistics tasks requiring interpretation rather than calculation, making it slightly easier than average. |
| Spec | 2.02c Scatter diagrams and regression lines2.02j Clean data: missing data, errors |
| Sex | Age | BMI | Pulse |
| Male | 62 | 29.54 | 60 |
| Female | 20 | 23.68 | \#N/A |
| Male | 17 | 26.97 | 72 |
| Male | 35 | 24.7 | 64 |
| Male | 17 | 20.09 | 54 |
| Male | 85 | 23.86 | 54 |
| Female | 81 | 24.04 | \#N/A |
| Answer | Marks | Guidance |
|---|---|---|
| Remove any data where #N/A is in the column, as there is no data available | B1 | Must refer to N/A/missing data. Remove data without a pulse reading is B0. Comments such as exclude 20/81 year old female are B0 |
| Answer | Marks | Guidance |
|---|---|---|
| \((62.77, 84)\) ringed and no others | B1 | Point at approximately BMI≈62.77, Pulse Rate≈84 must be circled and no other points circled |
| Answer | Marks | Guidance |
|---|---|---|
| No evidence of a linear relationship, so unlikely to be reliable | B1 | Any comment relating to interpolation or extrapolation is B0; need comment on appropriateness of model considering scatter diagram. Condone 'there appears to be little correlation/weak positive correlation' etc. 'No/Zero correlation' is B0 e.g. PMCC could be 0.1 etc |
| Answer | Marks | Guidance |
|---|---|---|
| None of the pulse rates are that unusual so should not be removed | B1 | Need 'no/keep' and reason. Condone 'No as they are not outliers'. Accept 'Higher pulse rates are not uncommon' |
# Question 8:
## Part (a)
Remove any data where #N/A is in the column, as there is no data available | B1 | Must refer to N/A/missing data. Remove data without a pulse reading is B0. Comments such as exclude 20/81 year old female are B0
## Question 8:
### Part (b):
$(62.77, 84)$ ringed and no others | **B1** | Point at approximately BMI≈62.77, Pulse Rate≈84 must be circled and no other points circled
### Part (c):
No evidence of a linear relationship, so unlikely to be reliable | **B1** | Any comment relating to interpolation or extrapolation is B0; need comment on appropriateness of model considering scatter diagram. Condone 'there appears to be little correlation/weak positive correlation' etc. 'No/Zero correlation' is B0 e.g. PMCC could be 0.1 etc
### Part (d):
None of the pulse rates are that unusual so should not be removed | **B1** | Need 'no/keep' and reason. Condone 'No as they are not outliers'. Accept 'Higher pulse rates are not uncommon'
---
8 The pre-release material contains information on Pulse Rate and Body Mass Index (BMI). A student is investigating whether there is a relationship between pulse rate and BMI. A section of the available data is shown in the table.
\begin{center}
\begin{tabular}{|l|l|l|l|}
\hline
Sex & Age & BMI & Pulse \\
\hline
Male & 62 & 29.54 & 60 \\
\hline
Female & 20 & 23.68 & \#N/A \\
\hline
Male & 17 & 26.97 & 72 \\
\hline
Male & 35 & 24.7 & 64 \\
\hline
Male & 17 & 20.09 & 54 \\
\hline
Male & 85 & 23.86 & 54 \\
\hline
Female & 81 & 24.04 & \#N/A \\
\hline
\end{tabular}
\end{center}
The student decides to draw a scatter diagram.
\begin{enumerate}[label=(\alph*)]
\item With reference to the table, explain which data should be cleaned before any analysis takes place.
The student cleans the data for BMI and Pulse Rate in the pre-release material and draws a scatter diagram.
\begin{figure}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Scatter diagram of Pulse Rate against BMI}
\includegraphics[alt={},max width=\textwidth]{82438df0-6550-4ffd-92d8-3c67bec59a6b-06_869_1575_1585_246}
\end{center}
\end{figure}
The student identifies one outlier.
\item On the copy of the scatter diagram in the Printed Answer Booklet, circle this outlier.
The student decides to remove this outlier from the data. They then use the LINEST function in the spreadsheet to obtain the following formula for the line of best fit.\\
$\mathrm { P } = 0.29 \mathrm { Q } + 64.2$,\\
where $P =$ PulseRate and $Q = \mathrm { BMI }$.
They use this to estimate the Pulse Rate of a person with BMI 23.68.\\
They obtain a value of 71 correct to the nearest whole number.
\item With reference to the scatter diagram, explain whether it is appropriate to use the formula for the line of best fit.
It is suggested that all pairs of values where the pulse rate is above 100 should also be cleaned from the data, as they must be incorrect.
\item Use your knowledge of the pre-release material to explain whether or not all pairs of values with a pulse rate of more than 100 should be cleaned from the data.
\end{enumerate}
\hfill \mbox{\textit{OCR MEI AS Paper 2 2023 Q8 [4]}}