Edexcel S1 2022 January — Question 6 13 marks

Exam BoardEdexcel
ModuleS1 (Statistics 1)
Year2022
SessionJanuary
Marks13
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeDirect prediction from given regression line
DifficultyModerate -0.8 This is a straightforward S1 statistics question testing basic understanding of regression lines. Parts (a)-(e) involve simple substitution, interpretation, and algebraic manipulation of a given regression equation. Part (f) requires recalculating summary statistics after a data correction, but follows a standard procedure. All techniques are routine for this module with no novel problem-solving required.
Spec5.09b Least squares regression: concepts5.09e Use regression: for estimation in context

  1. Students on a psychology course were given a pre-test at the start of the course and a final exam at the end of the course. The teacher recorded the number of marks achieved on the pre-test, \(p\), and the number of marks achieved on the final exam, \(f\), for 34 students and displayed them on the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{fa1cb8a2-dab9-4133-b7a1-9108888c37d7-22_1121_1136_447_438}
The equation of the least squares regression line for these data is found to be $$f = 10.8 + 0.748 p$$ For these students, the mean number of marks on the pre-test is 62.4
  1. Use the regression model to find the mean number of marks on the final exam.
  2. Give an interpretation of the gradient of the regression line. Considering the equation of the regression line, Priya says that she would expect someone who scored 0 marks on the pre-test to score 10.8 marks on the final exam.
  3. Comment on the reliability of Priya's statement.
  4. Write down the number of marks achieved on the final exam for the student who exceeded the expectation of the regression model by the largest number of marks.
  5. Find the range of values of \(p\) for which this regression model, \(f = 10.8 + 0.748 p\), predicts a greater number of marks on the final exam than on the pre-test. Later the teacher discovers an error in the recorded data. The student who achieved a score of 98 on the pre-test, scored 92 not 29 on the final exam. The summary statistics used for the model \(f = 10.8 + 0.748 p\) are corrected to include this information and a new least squares regression line is found. Given the original summary statistics were, $$n = 34 \quad \sum p = 2120 \quad \sum p f = 133486 \quad \mathrm {~S} _ { p p } = 15573.76 \quad \mathrm {~S} _ { p f } = 11648.35$$
  6. calculate the gradient of the new regression line. Show your working clearly.

AnswerMarks Guidance
PartScheme Marks
(a)\(\hat{f} = 10.8 + 0.748\bar{p} = 10.8 + 0.748(62.4)\) M1 A1 (2)
awrt 57.5
(b)For each additional mark scored on the pre-test, the average mark on the final exam increases by 0.748 B1 (1)
(c)The statement is not reliable as there is no data below 19 (extrapolation). B1 (1)
(d)76 B1 (1)
(e)\(p < 10.8 + 0.748p\) M1
\(0.252p < 10.8\)M1
2nd M1 rearranging to the form \(ap < b\) with correct inequality sign. Allow \((1 - 0.748)p < 10.8\) May be implied by \(p < n\) (ignore any lower limit) where \(42 < n < 44\)
awrt 42.9A1 (3) A1 \(p <\) awrt 42.9 (ignore any lower limit) ISW
(f)[No change to] \(S_{pp} = 15\ 573.76\) M1
\(\sum pf = 133486 - 2842 + 9016 [= 139660]\)
\(\sum pf\) increases by 98(92 – 29)[= 6174]
\(\sum f = "57.47" \times 34 + (92 - 29)\) or \(\frac{133486 - 11648.35}{2120} \times 34 + (92 - 29) [= 1954 + 92 - 29 \approx 2017]\)M1 2nd M1 Correct method to find new \(\sum f\) or change in \(\sum f\) Allow 2018 or 2017
\(\sum^n\sum f\) increases by \(\frac{2120(92-29)}{34} [= 3928.235\ldots]\)
\(S_{pf} = "139660" - \frac{2120 \times "2017"}{34} [= 13894\ldots]\)dM1 3rd dM1 dep on both previous method marks being awarded. Correct method to find new \(S_{pf}\) with their changed \(\sum pf\) and \(\sum f\) or change in \(S_{pf}\)
\(S_{pf}\) increases by '6174' – 3928.235' \([= 2245.764\ldots]\)
\(b = \frac{"13894\ldots"}{15573.76} [= 0.89\ldots]\)M1 4th M1 expression for \(b = \frac{S_{pf}}{15573.76}\) with their changed \(S_{pf}\) and unchanged \(S_{pp}\)
\(b = \frac{11648.35 + "2245.764"}{15573.76}\)
awrt 0.9A1 (5) A1 awrt 0.9 (from correct working)
[13]
| Part | Scheme | Marks | Guidance |
|------|--------|-------|----------|
| (a) | $\hat{f} = 10.8 + 0.748\bar{p} = 10.8 + 0.748(62.4)$ | M1 A1 (2) | M1 for substituting 62.4 into the regression equation. Allow answer between 57 and 58. A1 awrt 57.5 |
| | awrt **57.5** | | |
| (b) | For each additional mark scored on the pre-test, the average mark on the final exam increases by 0.748 | B1 (1) | B1 must include context and reference to 0.748 Needs to refer to each mark being 0.748 or a multiple of e.g 10 marks is 7.48. Allow equivalent words eg score/ point for mark, pre or test for pre-test, exam or final for final exam |
| (c) | The statement is not reliable as there is no data below 19 (extrapolation). | B1 (1) | B1 Not reliable with correct supporting reason eg it (10.8)is an outlier, outside the range |
| (d) | 76 | B1 (1) | B1 76 cao |
| (e) | $p < 10.8 + 0.748p$ | M1 | 1st M1 for setting up inequality in $p$ only or for drawing the line $f = p$ on the graph. May be implied by $p < n$ (ignore any lower limit) where $40 \leq n < 46$ (allow incorrect inequality sign or =) |
| | $0.252p < 10.8$ | M1 | |
| | | | 2nd M1 rearranging to the form $ap < b$ with correct inequality sign. Allow $(1 - 0.748)p < 10.8$ May be implied by $p < n$ (ignore any lower limit) where $42 < n < 44$ |
| | awrt **42.9** | A1 (3) | A1 $p <$ awrt 42.9 (ignore any lower limit) ISW |
| (f) | [No change to] $S_{pp} = 15\ 573.76$ | M1 | 1st M1 Correct method to find new $\sum pf$ or change in $\sum pf$ |
| | $\sum pf = 133486 - 2842 + 9016 [= 139660]$ | | |
| | | | $\sum pf$ increases by 98(92 – 29)[= 6174] |
| | $\sum f = "57.47" \times 34 + (92 - 29)$ or $\frac{133486 - 11648.35}{2120} \times 34 + (92 - 29) [= 1954 + 92 - 29 \approx 2017]$ | M1 | 2nd M1 Correct method to find new $\sum f$ or change in $\sum f$ Allow 2018 or 2017 |
| | | | $\sum^n\sum f$ increases by $\frac{2120(92-29)}{34} [= 3928.235\ldots]$ |
| | $S_{pf} = "139660" - \frac{2120 \times "2017"}{34} [= 13894\ldots]$ | dM1 | 3rd dM1 dep on both previous method marks being awarded. Correct method to find new $S_{pf}$ with their changed $\sum pf$ and $\sum f$ or change in $S_{pf}$ |
| | | | $S_{pf}$ increases by '6174' – 3928.235' $[= 2245.764\ldots]$ |
| | $b = \frac{"13894\ldots"}{15573.76} [= 0.89\ldots]$ | M1 | 4th M1 expression for $b = \frac{S_{pf}}{15573.76}$ with their changed $S_{pf}$ and unchanged $S_{pp}$ |
| | | | $b = \frac{11648.35 + "2245.764"}{15573.76}$ |
| | awrt **0.9** | A1 (5) | A1 awrt 0.9 (from correct working) |
| | | [13] | |

---
\begin{enumerate}
  \item Students on a psychology course were given a pre-test at the start of the course and a final exam at the end of the course. The teacher recorded the number of marks achieved on the pre-test, $p$, and the number of marks achieved on the final exam, $f$, for 34 students and displayed them on the scatter diagram.\\
\includegraphics[max width=\textwidth, alt={}, center]{fa1cb8a2-dab9-4133-b7a1-9108888c37d7-22_1121_1136_447_438}
\end{enumerate}

The equation of the least squares regression line for these data is found to be

$$f = 10.8 + 0.748 p$$

For these students, the mean number of marks on the pre-test is 62.4\\
(a) Use the regression model to find the mean number of marks on the final exam.\\
(b) Give an interpretation of the gradient of the regression line.

Considering the equation of the regression line, Priya says that she would expect someone who scored 0 marks on the pre-test to score 10.8 marks on the final exam.\\
(c) Comment on the reliability of Priya's statement.\\
(d) Write down the number of marks achieved on the final exam for the student who exceeded the expectation of the regression model by the largest number of marks.

(e) Find the range of values of $p$ for which this regression model, $f = 10.8 + 0.748 p$, predicts a greater number of marks on the final exam than on the pre-test.

Later the teacher discovers an error in the recorded data. The student who achieved a score of 98 on the pre-test, scored 92 not 29 on the final exam.

The summary statistics used for the model $f = 10.8 + 0.748 p$ are corrected to include this information and a new least squares regression line is found.

Given the original summary statistics were,

$$n = 34 \quad \sum p = 2120 \quad \sum p f = 133486 \quad \mathrm {~S} _ { p p } = 15573.76 \quad \mathrm {~S} _ { p f } = 11648.35$$

(f) calculate the gradient of the new regression line. Show your working clearly.\\

\hfill \mbox{\textit{Edexcel S1 2022 Q6 [13]}}