Edexcel S1 2024 January — Question 4

Exam BoardEdexcel
ModuleS1 (Statistics 1)
Year2024
SessionJanuary
TopicLinear regression
TypeCalculate PMCC from raw data

  1. A French test and a Spanish test were sat by 11 students.
The table below shows their marks.
StudentABCDEFGHIJK
French mark ( f )2430323236364044506068
Spanish mark ( \(\boldsymbol { s }\) )1690242832363844484868
Greg says that if these points were plotted on a scatter diagram, then the point \(( 30,90 )\) would be an outlier because 90 is an outlier for the Spanish marks. An outlier is defined as a value that is $$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { or smaller than } Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  1. Show that 90 is an outlier for the Spanish marks. Ignoring the point (30, 90), Greg calculated the following summary statistics. $$\sum f = 422 \quad \sum s = 382 \quad S _ { f f } = 1667.6 \quad S _ { f s } = 1735.6$$
  2. Use these summary statistics to show that the equation of the least squares regression line of \(s\) on \(f\) for the remaining 10 students is $$s = - 5.72 + 1.04 f$$ where the values of the intercept and gradient are given to 3 significant figures. You must show your working.
  3. Give an interpretation of the gradient of the regression line. Two further students sat the French test but missed the Spanish test.
  4. Using the equation given in part (b), estimate
    1. a Spanish mark for the student who scored 55 marks in their French test,
    2. a Spanish mark for the student who scored 18 marks in their French test.
  5. State, giving a reason, which of the two estimates found in part (d) would be the more reliable estimate.