OCR Further Statistics 2023 June — Question 2 8 marks

Exam BoardOCR
ModuleFurther Statistics (Further Statistics)
Year2023
SessionJune
Marks8
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeCalculate y on x from summary statistics
DifficultyStandard +0.3 This is a straightforward application of standard linear regression formulas using given summary statistics. Students substitute values into the formulae for gradient and intercept, then perform routine calculations. Part (b) tests understanding of units (a conceptual check), part (c) is simple substitution, and part (d) requires standard interpretation of correlation strength and interpolation vs extrapolation. All steps are procedural with no novel problem-solving required, making it slightly easier than average.
Spec5.09c Calculate regression line

2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, \(\pounds P\), of the most expensive tickets and the number of people in the audience, \(H\) hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.
\(P\) (£)7565554535
\multirow[t]{5}{*}{\(H\) (hundred)}2727272615
2727202112
2218169
191813
12169
\(\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535\)
  1. Calculate the equation of the regression line of \(h\) on \(p\).
  2. State what change, if any, there would be to your answer to part (a) if \(H\) had been measured in thousands (to 1 decimal place) rather than in hundreds. For a special charity concert, the most expensive tickets cost \(\pounds 50\).
  3. Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to \(\mathbf { 1 }\) decimal place.
  4. Comment on the reliability of your answer to part (c). You should refer to

Question 2:
AnswerMarks Guidance
2(a) h = 0.322p + 1.64 (1.636) or h= 975+ 961 p
596 2980B1
B1
AnswerMarks
[2]1.1
1.1a in range [1.63, 1.64] or b in range [0.322, 0.323]
All correct including h and p, but allow if a and b correct and numerical
or sign error made in writing out final equation. NB: not b = 1.66
SC: h = 32.2p + 164: B1
AnswerMarks Guidance
(b)New equation is h = 0.0322p + 0.1635
or “both a and b divided by 10”B1ft
[1]2.2a Both their coefficients divided by 10, ignore letters
(c)17.8 hundred B1
[1]1.1 In range [17.7, 17.8] hundred, or in range [1770, 1780], or 1.8 thousand.
Not just 17.8 or 1.8.
AnswerMarks
(d)Fair correlation only (so only fairly reliable)
50 is in range of data
e.g. audiences may be different for a charity event,
or attendance depends on causal factors,
or not based on a random sample,
or sample size small, etc
AnswerMarks
Overall not very reliableB1
B1
B1
B1
AnswerMarks
[4]1.1
1.1
2.4
AnswerMarks
2.3Any comment based on size of r, allow comparison with CV but not with
b, allow ‘association’
State that 50 is in the data range, but not just “50 is not one of the data
values”
Any reasonable relevant comment based on context or sampling, but not
“correlation not causality”, or similar rote statement.
Final nuanced conclusion between “fairly reliable” and “not reliable”
inclusive, based on at least two or three of the above, but if they use 0.642
> CV (or other wrong statement) this cannot count towards the ‘two or
three’. Not just separate assessments for each statement
[Comparison with critical values is not relevant. The issue is not “is there
any correlation?” (i.e., is  = 0?) but how strong is the correlation (is 
close to 1?), and hypothesis tests don’t tell you this.]
AnswerMarks
A0.642 quite high so fairly reliable; 50 is in data range so reliable; charity event not typical so less reliable (maximum for 3/4 – i.e., don’t
give final B1 for successively, e.g., “fairly reliable, more reliable, less reliable”)B1B1B1B0
Question 2:
2 | (a) | h = 0.322p + 1.64 (1.636) or h= 975+ 961 p
596 2980 | B1
B1
[2] | 1.1
1.1 | a in range [1.63, 1.64] or b in range [0.322, 0.323]
All correct including h and p, but allow if a and b correct and numerical
or sign error made in writing out final equation. NB: not b = 1.66
SC: h = 32.2p + 164: B1
(b) | New equation is h = 0.0322p + 0.1635
or “both a and b divided by 10” | B1ft
[1] | 2.2a | Both their coefficients divided by 10, ignore letters
(c) | 17.8 hundred | B1
[1] | 1.1 | In range [17.7, 17.8] hundred, or in range [1770, 1780], or 1.8 thousand.
Not just 17.8 or 1.8.
(d) | Fair correlation only (so only fairly reliable)
50 is in range of data
e.g. audiences may be different for a charity event,
or attendance depends on causal factors,
or not based on a random sample,
or sample size small, etc
Overall not very reliable | B1
B1
B1
B1
[4] | 1.1
1.1
2.4
2.3 | Any comment based on size of r, allow comparison with CV but not with
b, allow ‘association’
State that 50 is in the data range, but not just “50 is not one of the data
values”
Any reasonable relevant comment based on context or sampling, but not
“correlation not causality”, or similar rote statement.
Final nuanced conclusion between “fairly reliable” and “not reliable”
inclusive, based on at least two or three of the above, but if they use 0.642
> CV (or other wrong statement) this cannot count towards the ‘two or
three’. Not just separate assessments for each statement
[Comparison with critical values is not relevant. The issue is not “is there
any correlation?” (i.e., is  = 0?) but how strong is the correlation (is 
close to 1?), and hypothesis tests don’t tell you this.]
A | 0.642 quite high so fairly reliable; 50 is in data range so reliable; charity event not typical so less reliable (maximum for 3/4 – i.e., don’t
give final B1 for successively, e.g., “fairly reliable, more reliable, less reliable”) | B1B1B1B0
2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, $\pounds P$, of the most expensive tickets and the number of people in the audience, $H$ hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.

\begin{center}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
$P$ (£) & 75 & 65 & 55 & 45 & 35 \\
\hline
\multirow[t]{5}{*}{$H$ (hundred)} & 27 & 27 & 27 & 26 & 15 \\
\hline
 & 27 & 27 & 20 & 21 & 12 \\
\hline
 &  & 22 & 18 & 16 & 9 \\
\hline
 &  & 19 & 18 & 13 &  \\
\hline
 &  & 12 & 16 & 9 &  \\
\hline
\end{tabular}
\end{center}

$\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535$
\begin{enumerate}[label=(\alph*)]
\item Calculate the equation of the regression line of $h$ on $p$.
\item State what change, if any, there would be to your answer to part (a) if $H$ had been measured in thousands (to 1 decimal place) rather than in hundreds.

For a special charity concert, the most expensive tickets cost $\pounds 50$.
\item Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to $\mathbf { 1 }$ decimal place.
\item Comment on the reliability of your answer to part (c). You should refer to

\begin{itemize}
  \item the value of the product-moment correlation coefficient for the data, which is 0.642
  \item the value of $\pounds 50$
  \item any one other relevant factor that should be taken into account.
\end{itemize}
\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics 2023 Q2 [8]}}