OCR MEI Further Statistics Major 2024 June — Question 8 14 marks

Exam BoardOCR MEI
ModuleFurther Statistics Major (Further Statistics Major)
Year2024
SessionJune
Marks14
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeInterpret features of scatter diagram
DifficultyModerate -0.3 This is a straightforward linear regression question requiring standard calculations (finding regression line from summary statistics, making predictions, interpreting correlation). Part (a) tests basic scatter diagram interpretation, parts (b)-(c) are routine formula application, and parts (d)-(e) require standard commentary on correlation strength and appropriate use of regression lines. While it's a multi-part question worth several marks, all components are textbook exercises with no novel problem-solving required, making it slightly easier than average.
Spec5.09b Least squares regression: concepts5.09c Calculate regression line5.09e Use regression: for estimation in context

8 An estate agent collects data for a random selection of 13 flats in order to investigate the link between the floor areas of flats and their price. The scatter diagram shows the floor areas, \(x \mathrm {~m} ^ { 2 }\), and prices, \(\pounds y\) thousand, of the 13 flats. \includegraphics[max width=\textwidth, alt={}, center]{bab116b3-6e5f-44db-ac86-670e4040d649-07_613_1246_386_242}
  1. The estate agent notes that two of the data points are outliers. One is Flat A which has a large floor area but is in poor condition. The other is Flat B which has a balcony with a desirable view overlooking the sea. Label these two data points on the copy of the scatter diagram in the Printed Answer Booklet. The estate agent decides to remove these two data points from the analysis. Summary statistics for the remaining 11 flats are as follows. $$\sum x = 652.5 \quad \sum y = 5067 \quad \sum x ^ { 2 } = 41987.35 \quad \sum y ^ { 2 } = 2456813 \quad \sum x y = 315928.2$$
  2. In this question you must show detailed reasoning. Calculate the equation of a regression line which is suitable for estimating the price of a flat from its floor area.
  3. Use the regression line to estimate the price for the following floor areas.
    Comment briefly on the estate agent's idea.

Question 8:
AnswerMarks Guidance
8(a) Label flat A at the point approx. (120, 600)
Label flat B at the point approx. (90, 1000)B1
B1
AnswerMarks
[2]3.3
1.1B0 unless point labelled A
B0 unless point labelled B
AnswerMarks Guidance
8(b) 652.5 5067
DR NB: π‘₯Μ… = = 59.318, ̅𝑦 = = 460.63
11 11
𝑆π‘₯𝑦 315928.2βˆ’(652.5Γ—5067/11) 15362.97
𝑏 = = =
𝑆π‘₯π‘₯ 41987.35βˆ’652.52/11 3282.236
= 4.6806...
For correct line (y on x) so equation is π‘¦βˆ’π‘¦Μ… = 𝑏(π‘₯βˆ’π‘₯Μ…)
π‘¦βˆ’460.63 = 4.6806(π‘₯βˆ’59.318)
AnswerMarks
𝑦 = 4.6806π‘₯+182.99M1
A1
B1
M1
A1
AnswerMarks
[5]1.1a
1.1
3.3
1.1
AnswerMarks
1.1For attempt at gradient (b)
Use of 13 instead of 11 can get Max M1A0B1M1A0 which
would lead to 𝑦 = 6.669π‘₯+55.05
Allow 4.7 or better
For equation of line
Condone use of x on y regression line for Max
M1A0B0M1A0
AnswerMarks Guidance
8(c) Area 40 οƒž Β£370 thousand
Area 110 οƒž Β£698 thousandB1
B1
AnswerMarks
[2]1.1
1.1FT provided y on x. Allow B1B0 if answers given to more
than nearest whole number of thousands or if thousands
omitted and B0B0 if both
FT provided y on x.
n = 13 leads to Β£321 thousands and Β£788 thousands
AnswerMarks Guidance
8(d) Although prediction for 40 m2 lies within the data
(interpolation), the points do not lie too close to the line, so it is
not too reliable.
and the value of r2 is 0.585 which is not close to 1 which
further suggests that the estimate is only moderately reliable.
The prediction for 110 m2 is even less reliable since it is an
AnswerMarks
extrapolation.B1
B1
B1
AnswerMarks
[3]2.2a
3.5b
AnswerMarks
3.5bAllow first B1 for any correct comment about 40 m2
Condone β€˜Near the centre of the data’
Condone comment about the PMCC for first B1
Allow second B1 for all 3 correct comments about 40 m2
and must use r2 rather than r
Allow r2 is reasonably close to 1 and the points are fairly
close to a straight line
Max 2 out of 3 if any wrong comments seen
AnswerMarks Guidance
8(e) The regression line of x on y would be needed.
It would not be sensible since the line in part (b) only measures
AnswerMarks
the average cost for a given area and not the reverseB1
B1
AnswerMarks
[2]3.5b
3.5cAny suitable context
The regression line of floor area on price would be needed
gets B1B1.
Condone β€˜the regression coefficient will be calculated using
𝑆π‘₯𝑦
so the line found in part (b) cannot be used’ for B1
𝑆π‘₯π‘₯
Question 8:
8 | (a) | Label flat A at the point approx. (120, 600)
Label flat B at the point approx. (90, 1000) | B1
B1
[2] | 3.3
1.1 | B0 unless point labelled A
B0 unless point labelled B
8 | (b) | 652.5 5067
DR NB: π‘₯Μ… = = 59.318, ̅𝑦 = = 460.63
11 11
𝑆π‘₯𝑦 315928.2βˆ’(652.5Γ—5067/11) 15362.97
𝑏 = = =
𝑆π‘₯π‘₯ 41987.35βˆ’652.52/11 3282.236
= 4.6806...
For correct line (y on x) so equation is π‘¦βˆ’π‘¦Μ… = 𝑏(π‘₯βˆ’π‘₯Μ…)
π‘¦βˆ’460.63 = 4.6806(π‘₯βˆ’59.318)
𝑦 = 4.6806π‘₯+182.99 | M1
A1
B1
M1
A1
[5] | 1.1a
1.1
3.3
1.1
1.1 | For attempt at gradient (b)
Use of 13 instead of 11 can get Max M1A0B1M1A0 which
would lead to 𝑦 = 6.669π‘₯+55.05
Allow 4.7 or better
For equation of line
Condone use of x on y regression line for Max
M1A0B0M1A0
8 | (c) | Area 40 οƒž Β£370 thousand
Area 110 οƒž Β£698 thousand | B1
B1
[2] | 1.1
1.1 | FT provided y on x. Allow B1B0 if answers given to more
than nearest whole number of thousands or if thousands
omitted and B0B0 if both
FT provided y on x.
n = 13 leads to Β£321 thousands and Β£788 thousands
8 | (d) | Although prediction for 40 m2 lies within the data
(interpolation), the points do not lie too close to the line, so it is
not too reliable.
and the value of r2 is 0.585 which is not close to 1 which
further suggests that the estimate is only moderately reliable.
The prediction for 110 m2 is even less reliable since it is an
extrapolation. | B1
B1
B1
[3] | 2.2a
3.5b
3.5b | Allow first B1 for any correct comment about 40 m2
Condone β€˜Near the centre of the data’
Condone comment about the PMCC for first B1
Allow second B1 for all 3 correct comments about 40 m2
and must use r2 rather than r
Allow r2 is reasonably close to 1 and the points are fairly
close to a straight line
Max 2 out of 3 if any wrong comments seen
8 | (e) | The regression line of x on y would be needed.
It would not be sensible since the line in part (b) only measures
the average cost for a given area and not the reverse | B1
B1
[2] | 3.5b
3.5c | Any suitable context
The regression line of floor area on price would be needed
gets B1B1.
Condone β€˜the regression coefficient will be calculated using
𝑆π‘₯𝑦
so the line found in part (b) cannot be used’ for B1
𝑆π‘₯π‘₯
8 An estate agent collects data for a random selection of 13 flats in order to investigate the link between the floor areas of flats and their price. The scatter diagram shows the floor areas, $x \mathrm {~m} ^ { 2 }$, and prices, $\pounds y$ thousand, of the 13 flats.\\
\includegraphics[max width=\textwidth, alt={}, center]{bab116b3-6e5f-44db-ac86-670e4040d649-07_613_1246_386_242}
\begin{enumerate}[label=(\alph*)]
\item The estate agent notes that two of the data points are outliers. One is Flat A which has a large floor area but is in poor condition. The other is Flat B which has a balcony with a desirable view overlooking the sea.

Label these two data points on the copy of the scatter diagram in the Printed Answer Booklet.

The estate agent decides to remove these two data points from the analysis. Summary statistics for the remaining 11 flats are as follows.

$$\sum x = 652.5 \quad \sum y = 5067 \quad \sum x ^ { 2 } = 41987.35 \quad \sum y ^ { 2 } = 2456813 \quad \sum x y = 315928.2$$
\item In this question you must show detailed reasoning.

Calculate the equation of a regression line which is suitable for estimating the price of a flat from its floor area.
\item Use the regression line to estimate the price for the following floor areas.

\begin{itemize}
  \item $40 \mathrm {~m} ^ { 2 }$
  \item $110 \mathrm {~m} ^ { 2 }$
\item Given that the value of the product moment correlation coefficient for these 11 data items is 0.765 , comment on the reliability of your estimates.
\item The estate agent thinks that he can predict the floor area of a flat from its price, using the equation of the regression line found in part (b).
\end{itemize}

Comment briefly on the estate agent's idea.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Major 2024 Q8 [14]}}