| Exam Board | OCR |
|---|---|
| Module | Further Statistics (Further Statistics) |
| Year | 2023 |
| Session | June |
| Marks | 8 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Calculate y on x from summary statistics |
| Difficulty | Standard +0.3 This is a straightforward application of standard linear regression formulas using given summary statistics. Students substitute values into the formulae for gradient and intercept, then perform routine calculations. Part (b) tests understanding of units (a conceptual check), part (c) is simple substitution, and part (d) requires standard interpretation of correlation strength and interpolation vs extrapolation. All steps are procedural with no novel problem-solving required, making it slightly easier than average. |
| Spec | 5.09c Calculate regression line |
| \(P\) (£) | 75 | 65 | 55 | 45 | 35 |
| \multirow[t]{5}{*}{\(H\) (hundred)} | 27 | 27 | 27 | 26 | 15 |
| 27 | 27 | 20 | 21 | 12 | |
| 22 | 18 | 16 | 9 | ||
| 19 | 18 | 13 | |||
| 12 | 16 | 9 |
| Answer | Marks | Guidance |
|---|---|---|
| 2 | (a) | h = 0.322p + 1.64 (1.636) or h= 975+ 961 p |
| 596 2980 | B1 |
| Answer | Marks |
|---|---|
| [2] | 1.1 |
| 1.1 | a in range [1.63, 1.64] or b in range [0.322, 0.323] |
| Answer | Marks | Guidance |
|---|---|---|
| (b) | New equation is h = 0.0322p + 0.1635 | |
| or “both a and b divided by 10” | B1ft | |
| [1] | 2.2a | Both their coefficients divided by 10, ignore letters |
| (c) | 17.8 hundred | B1 |
| [1] | 1.1 | In range [17.7, 17.8] hundred, or in range [1770, 1780], or 1.8 thousand. |
| Answer | Marks |
|---|---|
| (d) | Fair correlation only (so only fairly reliable) |
| Answer | Marks |
|---|---|
| Overall not very reliable | B1 |
| Answer | Marks |
|---|---|
| [4] | 1.1 |
| Answer | Marks |
|---|---|
| 2.3 | Any comment based on size of r, allow comparison with CV but not with |
| Answer | Marks |
|---|---|
| A | 0.642 quite high so fairly reliable; 50 is in data range so reliable; charity event not typical so less reliable (maximum for 3/4 – i.e., don’t |
| give final B1 for successively, e.g., “fairly reliable, more reliable, less reliable”) | B1B1B1B0 |
Question 2:
2 | (a) | h = 0.322p + 1.64 (1.636) or h= 975+ 961 p
596 2980 | B1
B1
[2] | 1.1
1.1 | a in range [1.63, 1.64] or b in range [0.322, 0.323]
All correct including h and p, but allow if a and b correct and numerical
or sign error made in writing out final equation. NB: not b = 1.66
SC: h = 32.2p + 164: B1
(b) | New equation is h = 0.0322p + 0.1635
or “both a and b divided by 10” | B1ft
[1] | 2.2a | Both their coefficients divided by 10, ignore letters
(c) | 17.8 hundred | B1
[1] | 1.1 | In range [17.7, 17.8] hundred, or in range [1770, 1780], or 1.8 thousand.
Not just 17.8 or 1.8.
(d) | Fair correlation only (so only fairly reliable)
50 is in range of data
e.g. audiences may be different for a charity event,
or attendance depends on causal factors,
or not based on a random sample,
or sample size small, etc
Overall not very reliable | B1
B1
B1
B1
[4] | 1.1
1.1
2.4
2.3 | Any comment based on size of r, allow comparison with CV but not with
b, allow ‘association’
State that 50 is in the data range, but not just “50 is not one of the data
values”
Any reasonable relevant comment based on context or sampling, but not
“correlation not causality”, or similar rote statement.
Final nuanced conclusion between “fairly reliable” and “not reliable”
inclusive, based on at least two or three of the above, but if they use 0.642
> CV (or other wrong statement) this cannot count towards the ‘two or
three’. Not just separate assessments for each statement
[Comparison with critical values is not relevant. The issue is not “is there
any correlation?” (i.e., is = 0?) but how strong is the correlation (is
close to 1?), and hypothesis tests don’t tell you this.]
A | 0.642 quite high so fairly reliable; 50 is in data range so reliable; charity event not typical so less reliable (maximum for 3/4 – i.e., don’t
give final B1 for successively, e.g., “fairly reliable, more reliable, less reliable”) | B1B1B1B0
2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, $\pounds P$, of the most expensive tickets and the number of people in the audience, $H$ hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
$P$ (£) & 75 & 65 & 55 & 45 & 35 \\
\hline
\multirow[t]{5}{*}{$H$ (hundred)} & 27 & 27 & 27 & 26 & 15 \\
\hline
& 27 & 27 & 20 & 21 & 12 \\
\hline
& & 22 & 18 & 16 & 9 \\
\hline
& & 19 & 18 & 13 & \\
\hline
& & 12 & 16 & 9 & \\
\hline
\end{tabular}
\end{center}
$\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535$
\begin{enumerate}[label=(\alph*)]
\item Calculate the equation of the regression line of $h$ on $p$.
\item State what change, if any, there would be to your answer to part (a) if $H$ had been measured in thousands (to 1 decimal place) rather than in hundreds.
For a special charity concert, the most expensive tickets cost $\pounds 50$.
\item Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to $\mathbf { 1 }$ decimal place.
\item Comment on the reliability of your answer to part (c). You should refer to
\begin{itemize}
\item the value of the product-moment correlation coefficient for the data, which is 0.642
\item the value of $\pounds 50$
\item any one other relevant factor that should be taken into account.
\end{itemize}
\end{enumerate}
\hfill \mbox{\textit{OCR Further Statistics 2023 Q2 [8]}}