OCR MEI AS Paper 2 2023 June — Question 8 4 marks

Exam BoardOCR MEI
ModuleAS Paper 2 (AS Paper 2)
Year2023
SessionJune
Marks4
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicBivariate data
TypeAssess appropriateness of correlation analysis
DifficultyModerate -0.5 This question tests understanding of data cleaning and appropriateness of correlation analysis, which are fundamental statistical concepts. Parts (a) and (b) involve straightforward identification of missing data (#N/A) and outliers from a scatter diagram. Part (c) requires explaining why linear regression may be inappropriate (likely due to weak/no correlation visible in the scatter), and part (d) involves contextual reasoning about whether pulse rates above 100 are errors. These are standard AS-level statistics tasks requiring interpretation rather than calculation, making it slightly easier than average.
Spec2.02c Scatter diagrams and regression lines2.02j Clean data: missing data, errors

8 The pre-release material contains information on Pulse Rate and Body Mass Index (BMI). A student is investigating whether there is a relationship between pulse rate and BMI. A section of the available data is shown in the table.
SexAgeBMIPulse
Male6229.5460
Female2023.68\#N/A
Male1726.9772
Male3524.764
Male1720.0954
Male8523.8654
Female8124.04\#N/A
The student decides to draw a scatter diagram.
  1. With reference to the table, explain which data should be cleaned before any analysis takes place. The student cleans the data for BMI and Pulse Rate in the pre-release material and draws a scatter diagram. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Scatter diagram of Pulse Rate against BMI} \includegraphics[alt={},max width=\textwidth]{82438df0-6550-4ffd-92d8-3c67bec59a6b-06_869_1575_1585_246}
    \end{figure} The student identifies one outlier.
  2. On the copy of the scatter diagram in the Printed Answer Booklet, circle this outlier. The student decides to remove this outlier from the data. They then use the LINEST function in the spreadsheet to obtain the following formula for the line of best fit. \(\mathrm { P } = 0.29 \mathrm { Q } + 64.2\),
    where \(P =\) PulseRate and \(Q = \mathrm { BMI }\). They use this to estimate the Pulse Rate of a person with BMI 23.68.
    They obtain a value of 71 correct to the nearest whole number.
  3. With reference to the scatter diagram, explain whether it is appropriate to use the formula for the line of best fit. It is suggested that all pairs of values where the pulse rate is above 100 should also be cleaned from the data, as they must be incorrect.
  4. Use your knowledge of the pre-release material to explain whether or not all pairs of values with a pulse rate of more than 100 should be cleaned from the data.

Question 8:
Part (a)
AnswerMarks Guidance
Remove any data where #N/A is in the column, as there is no data availableB1 Must refer to N/A/missing data. Remove data without a pulse reading is B0. Comments such as exclude 20/81 year old female are B0
Question 8:
Part (b):
AnswerMarks Guidance
\((62.77, 84)\) ringed and no othersB1 Point at approximately BMI≈62.77, Pulse Rate≈84 must be circled and no other points circled
Part (c):
AnswerMarks Guidance
No evidence of a linear relationship, so unlikely to be reliableB1 Any comment relating to interpolation or extrapolation is B0; need comment on appropriateness of model considering scatter diagram. Condone 'there appears to be little correlation/weak positive correlation' etc. 'No/Zero correlation' is B0 e.g. PMCC could be 0.1 etc
Part (d):
AnswerMarks Guidance
None of the pulse rates are that unusual so should not be removedB1 Need 'no/keep' and reason. Condone 'No as they are not outliers'. Accept 'Higher pulse rates are not uncommon'
# Question 8:

## Part (a)
Remove any data where #N/A is in the column, as there is no data available | B1 | Must refer to N/A/missing data. Remove data without a pulse reading is B0. Comments such as exclude 20/81 year old female are B0

## Question 8:

### Part (b):
$(62.77, 84)$ ringed and no others | **B1** | Point at approximately BMI≈62.77, Pulse Rate≈84 must be circled and no other points circled

### Part (c):
No evidence of a linear relationship, so unlikely to be reliable | **B1** | Any comment relating to interpolation or extrapolation is B0; need comment on appropriateness of model considering scatter diagram. Condone 'there appears to be little correlation/weak positive correlation' etc. 'No/Zero correlation' is B0 e.g. PMCC could be 0.1 etc

### Part (d):
None of the pulse rates are that unusual so should not be removed | **B1** | Need 'no/keep' and reason. Condone 'No as they are not outliers'. Accept 'Higher pulse rates are not uncommon'

---
8 The pre-release material contains information on Pulse Rate and Body Mass Index (BMI). A student is investigating whether there is a relationship between pulse rate and BMI. A section of the available data is shown in the table.

\begin{center}
\begin{tabular}{|l|l|l|l|}
\hline
Sex & Age & BMI & Pulse \\
\hline
Male & 62 & 29.54 & 60 \\
\hline
Female & 20 & 23.68 & \#N/A \\
\hline
Male & 17 & 26.97 & 72 \\
\hline
Male & 35 & 24.7 & 64 \\
\hline
Male & 17 & 20.09 & 54 \\
\hline
Male & 85 & 23.86 & 54 \\
\hline
Female & 81 & 24.04 & \#N/A \\
\hline
\end{tabular}
\end{center}

The student decides to draw a scatter diagram.
\begin{enumerate}[label=(\alph*)]
\item With reference to the table, explain which data should be cleaned before any analysis takes place.

The student cleans the data for BMI and Pulse Rate in the pre-release material and draws a scatter diagram.

\begin{figure}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Scatter diagram of Pulse Rate against BMI}
  \includegraphics[alt={},max width=\textwidth]{82438df0-6550-4ffd-92d8-3c67bec59a6b-06_869_1575_1585_246}
\end{center}
\end{figure}

The student identifies one outlier.
\item On the copy of the scatter diagram in the Printed Answer Booklet, circle this outlier.

The student decides to remove this outlier from the data. They then use the LINEST function in the spreadsheet to obtain the following formula for the line of best fit.\\
$\mathrm { P } = 0.29 \mathrm { Q } + 64.2$,\\
where $P =$ PulseRate and $Q = \mathrm { BMI }$.

They use this to estimate the Pulse Rate of a person with BMI 23.68.\\
They obtain a value of 71 correct to the nearest whole number.
\item With reference to the scatter diagram, explain whether it is appropriate to use the formula for the line of best fit.

It is suggested that all pairs of values where the pulse rate is above 100 should also be cleaned from the data, as they must be incorrect.
\item Use your knowledge of the pre-release material to explain whether or not all pairs of values with a pulse rate of more than 100 should be cleaned from the data.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI AS Paper 2 2023 Q8 [4]}}