Calculate regression line then predict

A question is this sub-type if and only if the student must first calculate the regression line equation from summary statistics (using formulas for gradient and intercept) before making a prediction.

11 questions · Moderate -0.2

5.09c Calculate regression line
Sort by: Default | Easiest first | Hardest first
OCR S1 2005 January Q9
15 marks Standard +0.3
9 Five observations of bivariate data produce the following results, denoted as ( \(x _ { i } , y _ { i }\) ) for \(i = 1,2,3,4,5\). $$\begin{aligned} & ( 13,2.7 ) \\ & { \left[ \Sigma x = 90 , \Sigma y = 15.0 , \Sigma x ^ { 2 } = 1720 , \Sigma y ^ { 2 } = 46.86 , \Sigma x y = 264.0 . \right] } \end{aligned}$$
  1. Show that the regression line of \(y\) on \(x\) has gradient - 0.06 , and find its equation in the form \(y = a + b x\).
  2. The regression line is used to estimate the value of \(y\) corresponding to \(x = 20\), but the value \(x = 20\) is accurate only to the nearest whole number. Calculate the difference between the largest and the smallest values that the estimated value of \(y\) could take. The numbers \(e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }\) are defined by $$e _ { i } = a + b x _ { i } - y _ { i } \quad \text { for } i = 1,2,3,4,5$$
  3. The values of \(e _ { 1 } , e _ { 2 }\) and \(e _ { 3 }\) are \(0.6 , - 0.7\) and 0.2 respectively. Calculate the values of \(e _ { 4 }\) and \(e _ { 5 }\).
  4. Calculate the value of \(e _ { 1 } ^ { 2 } + e _ { 2 } ^ { 2 } + e _ { 3 } ^ { 2 } + e _ { 4 } ^ { 2 } + e _ { 5 } ^ { 2 }\) and explain the relevance of this quantity to the regression line found in part (i).
  5. Find the mean and the variance of \(e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }\).
CAIE FP2 2011 June Q9
11 marks Standard +0.3
9 The marks achieved by a random sample of 15 college students in a Physics examination ( \(x\) ) and in a General Studies examination (y) are summarised as follows. $$\Sigma x = 752 \quad \Sigma x ^ { 2 } = 38814 \quad \Sigma y = 773 \quad \Sigma y ^ { 2 } = 45351 \quad \Sigma x y = 40236$$
  1. Find the mean values, \(\bar { x }\) and \(\bar { y }\).
  2. Another college student achieved a mark of 56 in the General Studies examination, but was unable to take the Physics examination. Use the equation of a suitable regression line to estimate the mark that the student would have obtained in the Physics examination.
  3. Find the product moment correlation coefficient for the given data.
  4. Stating your hypotheses, test at the \(5 \%\) level of significance whether there is a non-zero product moment correlation coefficient between examination marks in Physics and in General Studies achieved by college students.
OCR Further Statistics AS 2023 June Q3
8 marks Standard +0.3
3 An insurance company collected data concerning the age, \(x\) years, of policy holders and the average size of claim, \(\pounds y\) thousand. The data is summarised as follows. \(n = 32 \quad \sum x = 1340 \quad \sum y = 612 \quad \sum x ^ { 2 } = 64282 \quad \sum y ^ { 2 } = 13418 \quad \sum x y = 27794\)
  1. Find the variance of \(x\).
  2. Find the equation of the regression line of \(y\) on \(x\).
  3. Hence estimate the expected size of claim from a policy holder of age 48. Tom is aged 48. He claims that the range of the data probably does not include people of his age because the mean age for the data is 41.875 , and 48 is not close to this.
  4. Use your answer to part (a) to determine how likely it is that Tom's claim is correct.
  5. Comment on the reliability of your estimate in part (c). You should refer to the value of the product-moment correlation coefficient for the data, which is 0.579 correct to 3 significant figures.
Edexcel S1 2018 October Q1
11 marks Moderate -0.8
  1. The heights above sea level ( \(h\) hundred metres) and the temperatures ( \(t ^ { \circ } \mathrm { C }\) ) at 12 randomly selected places in France, at 7 am on July 31st, were recorded.
    The data are summarised as follows
    1. Find the value of \(S _ { t t }\)
    2. Calculate the product moment correlation coefficient for these data.
    3. Interpret the relationship between \(t\) and \(h\).
    4. Find an equation of the regression line of \(t\) on \(h\).
    At 7 am on July 31st Yinka is on holiday in South Africa. He uses the regression equation to estimate the temperature when the height above sea level is 500 m .
  2. Find the estimated temperature Yinka calculates.
  3. Comment on the validity of your answer in part (e). $$\sum h = 112 \quad \sum t = 136 \quad \sum t ^ { 2 } = 1828 \quad S _ { h t } = - 236 \quad S _ { h h } = 297$$
  4. Find the value of \(S\) (2)
Edexcel S1 2004 November Q2
4 marks Moderate -0.8
2. An experiment carried out by a student yielded pairs of \(( x , y )\) observations such that $$\bar { x } = 36 , \quad \bar { y } = 28.6 , \quad S _ { x x } = 4402 , \quad S _ { x y } = 3477.6$$
  1. Calculate the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). Give your values of \(a\) and \(b\) to 2 decimal places.
  2. Find the value of \(y\) when \(x = 45\).
AQA S1 2008 June Q1
6 marks Moderate -0.8
1 The table shows the times taken, \(y\) minutes, for a wood glue to dry at different air temperatures, \(x ^ { \circ } \mathrm { C }\).
\(\boldsymbol { x }\)101215182022252830
\(\boldsymbol { y }\)42.940.638.535.433.030.728.025.322.6
  1. Calculate the equation of the least squares regression line \(y = a + b x\).
  2. Estimate the time taken for the glue to dry when the air temperature is \(21 ^ { \circ } \mathrm { C }\).
Edexcel FS2 AS 2022 June Q3
10 marks Standard +0.3
  1. Gabriela is investigating a particular type of fish, called bream. She wants to create a model to predict the weight, \(w\) grams, of bream based on their length, \(x \mathrm {~cm}\).
For a sample of 27 bream, some summary statistics are given below. $$\begin{gathered} \bar { x } = 31.07 \quad \bar { w } = 628.59 \quad \sum w ^ { 2 } = 11386134 \\ \mathrm {~S} _ { x w } = 13082.3 \quad \mathrm {~S} _ { x x } = 260.8 \end{gathered}$$
  1. Find the value of the product moment correlation coefficient between \(x\) and \(w\)
  2. Explain whether the answer to part (a) is consistent with a linear model for these data.
  3. Find the equation of the regression line of \(w\) on \(x\) in the form \(w = a + b x\) A residual plot for these data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{128c408d-3e08-4f74-8f19-d33ecd5c882f-06_931_1790_1107_139} One of the bream in the sample has a length of 32 cm .
  4. Find its weight.
  5. With reference to the residual plot, comment on the model for bream with lengths above 33 cm .
CAIE FP2 2015 November Q9
11 marks Standard +0.3
A random sample of 8 students is chosen from those sitting examinations in both Mathematics and French. Their marks in Mathematics, \(x\), and in French, \(y\), are summarised as follows. $$\Sigma x = 472 \qquad \Sigma x^2 = 29950 \qquad \Sigma y = 400 \qquad \Sigma y^2 = 21226 \qquad \Sigma xy = 24879$$ Another student scored 72 marks in the Mathematics examination but was unable to sit the French examination. Estimate the mark that this student would have obtained in the French examination. [5] Test, at the 5% significance level, whether there is non-zero correlation between marks in Mathematics and marks in French. [6]
Edexcel S1 Q6
15 marks Standard +0.3
The marks out of 75 obtained by a group of ten students in their first and second Statistics modules were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Module 1 \((x)\)\(54\)\(33\)\(42\)\(71\)\(60\)\(27\)\(39\)\(46\)\(59\)\(64\)
Module 2 \((y)\)\(50\)\(22\)\(44\)\(58\)\(42\)\(19\)\(35\)\(46\)\(55\)\(60\)
  1. Find \(\sum x\) and \(\sum y\). [2 marks]
Given that \(\sum x^2 = 26353\) and \(\sum xy = 22991\),
  1. obtain the equation of the regression line of \(y\) on \(x\). [5 marks]
  2. Estimate the Module 2 result of a student whose mark in Module 1 was (i) 65, (ii) 5. Explain why one of these estimates is less reliable than the other. [4 marks]
The equation of the regression line of \(x\) on \(y\) is \(x = 0.921y + 9.81\).
  1. Deduce the product moment correlation coefficient between \(x\) and \(y\), and briefly interpret its value. [4 marks]
OCR MEI S2 2007 January Q1
18 marks Moderate -0.8
In a science investigation into energy conservation in the home, a student is collecting data on the time taken for an electric kettle to boil as the volume of water in the kettle is varied. The student's data are shown in the table below, where \(v\) litres is the volume of water in the kettle and \(t\) seconds is the time taken for the kettle to boil (starting with the water at room temperature in each case). Also shown are summary statistics and a scatter diagram on which the regression line of \(t\) on \(v\) is drawn.
\(v\)0.20.40.60.81.0
\(t\)4478114156172
\(n = 5\), \(\Sigma v = 3.0\), \(\Sigma t = 564\), \(\Sigma v^2 = 2.20\), \(\Sigma vt = 405.2\). \includegraphics{figure_1}
  1. Calculate the equation of the regression line of \(t\) on \(v\), giving your answer in the form \(t = a + bv\). [5]
  2. Use this equation to predict the time taken for the kettle to boil when the amount of water which it contains is
    1. 0.5 litres,
    2. 1.5 litres.
    Comment on the reliability of each of these predictions. [4]
  3. In the equation of the regression line found in part (i), explain the role of the coefficient of \(v\) in the relationship between time taken and volume of water. [2]
  4. Calculate the values of the residuals for \(v = 0.8\) and \(v = 1.0\). [4]
  5. Explain how, on a scatter diagram with the regression line drawn accurately on it, a residual could be measured and its sign determined. [3]
WJEC Further Unit 2 2023 June Q2
8 marks Moderate -0.3
For a set of 30 pairs of observations of the variables \(x\) and \(y\), it is known that \(\sum x = 420\) and \(\sum y = 240\). The least squares regression line of \(y\) on \(x\) passes through the point with coordinates \((19, 20)\).
  1. Show that the equation of the regression line of \(y\) on \(x\) is \(y = 2 \cdot 4x - 25 \cdot 6\) and use it to predict the value of \(y\) when \(x = 26\). [6]
  2. State two reasons why your prediction in part (a) may not be reliable. [2]