Question 2 - A-Level Maths

OCR Further Statistics 2023 June — Question 2 8 marks

Exam Board	OCR
Module	Further Statistics (Further Statistics)
Year	2023
Session	June
Marks	8
Paper	Download PDF ↗
Mark scheme	Download PDF ↗
Topic	Linear regression
Type	Calculate y on x from summary statistics
Difficulty	Standard +0.3 This is a straightforward application of standard linear regression formulas using given summary statistics. Students substitute values into the formulae for gradient and intercept, then perform routine calculations. Part (b) tests understanding of units (a conceptual check), part (c) is simple substitution, and part (d) requires standard interpretation of correlation strength and interpolation vs extrapolation. All steps are procedural with no novel problem-solving required, making it slightly easier than average.
Spec	5.09c Calculate regression line

2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, \(\pounds P\), of the most expensive tickets and the number of people in the audience, \(H\) hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.

\(P\) (£)	75	65	55	45	35
\multirow[t]{5}{*}{\(H\) (hundred)}	27	27	27	26	15
	27	27	20	21	12
		22	18	16	9
		19	18	13
		12	16	9

\(\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535\)

Calculate the equation of the regression line of \(h\) on \(p\).
State what change, if any, there would be to your answer to part (a) if \(H\) had been measured in thousands (to 1 decimal place) rather than in hundreds. For a special charity concert, the most expensive tickets cost \(\pounds 50\).
Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to \(\mathbf { 1 }\) decimal place.
Comment on the reliability of your answer to part (c). You should refer to

Show mark scheme Show mark scheme source

Question 2:

Answer	Marks	Guidance
2	(a)	h = 0.322p + 1.64 (1.636) or h= 975+ 961 p
596 2980	B1

B1

Answer	Marks
[2]	1.1
1.1	a in range [1.63, 1.64] or b in range [0.322, 0.323]

All correct including h and p, but allow if a and b correct and numerical

or sign error made in writing out final equation. NB: not b = 1.66

SC: h = 32.2p + 164: B1

Answer	Marks	Guidance
(b)	New equation is h = 0.0322p + 0.1635
or “both a and b divided by 10”	B1ft
[1]	2.2a	Both their coefficients divided by 10, ignore letters
(c)	17.8 hundred	B1
[1]	1.1	In range [17.7, 17.8] hundred, or in range [1770, 1780], or 1.8 thousand.

Not just 17.8 or 1.8.

Answer	Marks
(d)	Fair correlation only (so only fairly reliable)

50 is in range of data

e.g. audiences may be different for a charity event,

or attendance depends on causal factors,

or not based on a random sample,

or sample size small, etc

Answer	Marks
Overall not very reliable	B1

B1

Answer	Marks
[4]	1.1

1.1

2.4

Answer	Marks
2.3	Any comment based on size of r, allow comparison with CV but not with

b, allow ‘association’

State that 50 is in the data range, but not just “50 is not one of the data

values”

Any reasonable relevant comment based on context or sampling, but not

“correlation not causality”, or similar rote statement.

Final nuanced conclusion between “fairly reliable” and “not reliable”

inclusive, based on at least two or three of the above, but if they use 0.642

> CV (or other wrong statement) this cannot count towards the ‘two or

three’. Not just separate assessments for each statement

[Comparison with critical values is not relevant. The issue is not “is there

any correlation?” (i.e., is  = 0?) but how strong is the correlation (is 

close to 1?), and hypothesis tests don’t tell you this.]

Answer	Marks
A	0.642 quite high so fairly reliable; 50 is in data range so reliable; charity event not typical so less reliable (maximum for 3/4 – i.e., don’t
give final B1 for successively, e.g., “fairly reliable, more reliable, less reliable”)	B1B1B1B0

Question 2:
2 | (a) | h = 0.322p + 1.64 (1.636) or h= 975+ 961 p
596 2980 | B1
B1
[2] | 1.1
1.1 | a in range [1.63, 1.64] or b in range [0.322, 0.323]
All correct including h and p, but allow if a and b correct and numerical
or sign error made in writing out final equation. NB: not b = 1.66
SC: h = 32.2p + 164: B1
(b) | New equation is h = 0.0322p + 0.1635
or “both a and b divided by 10” | B1ft
[1] | 2.2a | Both their coefficients divided by 10, ignore letters
(c) | 17.8 hundred | B1
[1] | 1.1 | In range [17.7, 17.8] hundred, or in range [1770, 1780], or 1.8 thousand.
Not just 17.8 or 1.8.
(d) | Fair correlation only (so only fairly reliable)
50 is in range of data
e.g. audiences may be different for a charity event,
or attendance depends on causal factors,
or not based on a random sample,
or sample size small, etc
Overall not very reliable | B1
B1
B1
B1
[4] | 1.1
1.1
2.4
2.3 | Any comment based on size of r, allow comparison with CV but not with
b, allow ‘association’
State that 50 is in the data range, but not just “50 is not one of the data
values”
Any reasonable relevant comment based on context or sampling, but not
“correlation not causality”, or similar rote statement.
Final nuanced conclusion between “fairly reliable” and “not reliable”
inclusive, based on at least two or three of the above, but if they use 0.642
> CV (or other wrong statement) this cannot count towards the ‘two or
three’. Not just separate assessments for each statement
[Comparison with critical values is not relevant. The issue is not “is there
any correlation?” (i.e., is  = 0?) but how strong is the correlation (is 
close to 1?), and hypothesis tests don’t tell you this.]
A | 0.642 quite high so fairly reliable; 50 is in data range so reliable; charity event not typical so less reliable (maximum for 3/4 – i.e., don’t
give final B1 for successively, e.g., “fairly reliable, more reliable, less reliable”) | B1B1B1B0

Show LaTeX source

2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, $\pounds P$, of the most expensive tickets and the number of people in the audience, $H$ hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.

\begin{center}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
$P$ (£) & 75 & 65 & 55 & 45 & 35 \\
\hline
\multirow[t]{5}{*}{$H$ (hundred)} & 27 & 27 & 27 & 26 & 15 \\
\hline
 & 27 & 27 & 20 & 21 & 12 \\
\hline
 &  & 22 & 18 & 16 & 9 \\
\hline
 &  & 19 & 18 & 13 &  \\
\hline
 &  & 12 & 16 & 9 &  \\
\hline
\end{tabular}
\end{center}

$\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535$
\begin{enumerate}[label=(\alph*)]
\item Calculate the equation of the regression line of $h$ on $p$.
\item State what change, if any, there would be to your answer to part (a) if $H$ had been measured in thousands (to 1 decimal place) rather than in hundreds.

For a special charity concert, the most expensive tickets cost $\pounds 50$.
\item Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to $\mathbf { 1 }$ decimal place.
\item Comment on the reliability of your answer to part (c). You should refer to

\begin{itemize}
  \item the value of the product-moment correlation coefficient for the data, which is 0.642
  \item the value of $\pounds 50$
  \item any one other relevant factor that should be taken into account.
\end{itemize}
\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics 2023 Q2 [8]}}

This paper (8 questions)

View full paper

Q1 8 Q2 8 Q3 6 Q4 10 Q5 10 Q6 7 Q7 10 Q8 16