5.09c Calculate regression line

235 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2014 June Q5
9 marks Moderate -0.8
5 Tariq collected information about typical prices, \(\pounds y\) million, of four-bedroomed houses at varying distances, \(x\) miles, from a large city. He chose houses at 10 -mile intervals from the city. His results are shown below.
\(x\)1020304050607080
\(y\)1.21.41.20.90.80.50.50.3
$$n = 8 \quad \Sigma x = 360 \quad \Sigma x ^ { 2 } = 20400 \quad \Sigma y = 6.8 \quad \Sigma y ^ { 2 } = 6.88 \quad \Sigma x y = 241$$
  1. Use an appropriate formula to calculate the product moment correlation coefficient, \(r\), showing that \(- 1.0 < r < - 0.9\).
  2. State what this value of \(r\) shows in this context.
  3. Tariq decides to recalculate the value of \(r\) with the house prices measured in hundreds of thousands of pounds, instead of millions of pounds. State what effect, if any, this will have on the value of \(r\).
  4. Calculate the equation of the regression line of \(y\) on \(x\).
  5. Explain why the regression line of \(y\) on \(x\), rather than \(x\) on \(y\), should be used for estimating a value of \(x\) from a given value of \(y\).
OCR S1 2015 June Q1
6 marks Moderate -0.8
1 For the top 6 clubs in the 2010/11 season of the English Premier League, the table shows the annual salary, \(\pounds x\) million, of the highest paid player and the number of points scored, \(y\).
ClubManchester UnitedManchester CityChelseaArsenalTottenhamLiverpool
\(x\)5.67.46.54.13.66.5
\(y\)807171686258
$$n = 6 \quad \sum x = 33.7 \quad \sum x ^ { 2 } = 200.39 \quad \sum y = 410 \quad \sum y ^ { 2 } = 28314 \quad \sum x y = 2313.9$$
  1. Use a suitable formula to calculate the product moment correlation coefficient, \(r\), between \(x\) and \(y\), showing that \(0 < r < 0.2\).
  2. State what this value of \(r\) shows in this context.
  3. A fan suggests that the data should be used to draw a regression line in order to estimate the number of points that would be scored by another Premier League club, whose highest paid player's salary is \(\pounds 1.7\) million. Give two reasons why such an estimate would be unlikely to be reliable.
OCR S1 2015 June Q4
9 marks Moderate -0.3
4 The table shows the load a lorry was carrying, \(x\) tonnes, and the fuel economy, \(y \mathrm {~km}\) per litre, for 8 different journeys. You should assume that neither variable is controlled.
Load
\(( x\) tonnes \()\)
5.15.86.57.17.68.49.510.5
Fuel economy
\(( y \mathrm {~km}\) per litre \()\)
6.26.15.95.65.35.45.35.1
$$n = 8 \quad \sum x = 60.5 \quad \sum y = 44.9 \quad \sum x ^ { 2 } = 481.13 \quad \sum y ^ { 2 } = 253.17 \quad \sum x y = 334.65$$
  1. Calculate the equation of the regression line of \(y\) on \(x\).
  2. Estimate the fuel economy for a load of 9.2 tonnes.
  3. An analyst calculated the equation of the regression line of \(x\) on \(y\). Without calculating this equation, state the coordinates of the point where the two regression lines intersect.
  4. Describe briefly the method required to estimate the load when the fuel economy is 5.8 km per litre.
OCR MEI S2 2009 January Q1
20 marks Moderate -0.3
1 A researcher is investigating whether there is a relationship between the population size of cities and the average walking speed of pedestrians in the city centres. Data for the population size, \(x\) thousands, and the average walking speed of pedestrians, \(y \mathrm {~m} \mathrm {~s} ^ { - 1 }\), of eight randomly selected cities are given in the table below.
\(x\)18435294982067841530
\(y\)1.150.971.261.351.281.421.321.64
  1. Calculate the value of Spearman's rank correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any association between population size and average walking speed. In another investigation, the researcher selects a random sample of six adult males of particular ages and measures their maximum walking speeds. The data are shown in the table below, where \(t\) years is the age of the adult and \(w \mathrm {~m} \mathrm {~s} ^ { - 1 }\) is the maximum walking speed. Also shown are summary statistics and a scatter diagram on which the regression line of \(w\) on \(t\) is drawn.
    \(t\)203040506070
    \(w\)2.492.412.382.141.972.03
    $$n = 6 \quad \Sigma t = 270 \quad \Sigma w = 13.42 \quad \Sigma t ^ { 2 } = 13900 \quad \Sigma w ^ { 2 } = 30.254 \quad \Sigma t w = 584.6$$ \includegraphics[max width=\textwidth, alt={}, center]{77b97142-afb6-41d6-8fec-e982b7a7501b-2_728_1091_1379_529}
  3. Calculate the equation of the regression line of \(w\) on \(t\).
  4. (A) Use this equation to calculate an estimate of maximum walking speed of an 80 -year-old male.
    (B) Explain why it might not be appropriate to use the equation to calculate an estimate of maximum walking speed of a 10 -year-old male.
OCR MEI S2 2010 January Q1
19 marks Moderate -0.3
1 A pilot records the take-off distance for his light aircraft on runways at various altitudes. The data are shown in the table below, where \(a\) metres is the altitude and \(t\) metres is the take-off distance. Also shown are summary statistics for these data.
\(a\)0300600900120015001800
\(t\)63570477683692310081105
$$n = 7 \quad \Sigma a = 6300 \quad \Sigma t = 5987 \quad \Sigma a ^ { 2 } = 8190000 \quad \Sigma t ^ { 2 } = 5288931 \quad \Sigma a t = 6037800$$
  1. Draw a scatter diagram to illustrate these data.
  2. State which of the two variables \(a\) and \(t\) is the independent variable and which is the dependent variable. Briefly explain your answer.
  3. Calculate the equation of the regression line of \(t\) on \(a\).
  4. Use the equation of the regression line to calculate estimates of the take-off distance for altitudes
    (A) 800 metres,
    (B) 2500 metres. Comment on the reliability of each of these estimates.
  5. Calculate the value of the residual for the data point where \(a = 1200\) and \(t = 923\), and comment on its sign.
OCR MEI S2 2013 January Q1
19 marks Standard +0.3
1 A manufacturer of playground safety tiles is testing a new type of tile. Tiles of various thicknesses are tested to estimate the maximum height at which people would be unlikely to sustain injury if they fell onto a tile. The results of the test are as follows.
Thickness \(( t \mathrm {~mm} )\)20406080100
Maximum height \(( h \mathrm {~m} )\)0.721.091.621.972.34
  1. Draw a scatter diagram to illustrate these data.
  2. State which of the two variables is the independent variable, giving a reason for your answer.
  3. Calculate the equation of the regression line of maximum height on thickness.
  4. Use the equation of the regression line to calculate estimates of the maximum height for thicknesses of
    (A) 70 mm ,
    (B) 120 mm . Comment on the reliability of each of these estimates.
  5. Calculate the value of the residual for the data point at which \(t = 40\).
  6. In a further experiment, the manufacturer tests a tile with a thickness of 200 mm and finds that the corresponding maximum height is 2.96 m . What can be said about the relationship between tile thickness and maximum height?
OCR MEI S2 2011 June Q1
18 marks Easy -1.2
1 An experiment is performed to determine the response of maize to nitrogen fertilizer. Data for the amount of nitrogen fertilizer applied, \(x \mathrm {~kg} / \mathrm { hectare }\), and the average yield of maize, \(y\) tonnes/hectare, in 5 experimental plots are given in the table below.
\(x\)0306090120
\(y\)0.52.54.76.27.4
  1. Draw a scatter diagram to illustrate these data.
  2. Calculate the equation of the regression line of \(y\) on \(x\).
  3. Draw your regression line on your scatter diagram and comment briefly on its fit.
  4. Calculate the value of the residual for the data point where \(x = 30\) and \(y = 2.5\).
  5. Use the equation of the regression line to calculate estimates of average yield with nitrogen fertilizer applications of
    (A) \(45 \mathrm {~kg} / \mathrm { hectare }\),
    (B) \(150 \mathrm {~kg} /\) hectare.
  6. In a plot where \(150 \mathrm {~kg} /\) hectare of nitrogen fertilizer is applied, the average yield of maize is 8.7 tonnes/hectare. Comment on this result.
OCR MEI S2 2015 June Q1
17 marks Moderate -0.5
1 A random sample of wheat seedlings is planted and their growth is measured. The table shows their average growth, \(y \mathrm {~mm}\), at half-day intervals.
Time \(t\) days00.511.522.53
Average growth \(y \mathrm {~mm}\)072133455662
  1. Draw a scatter diagram to illustrate these data.
  2. Calculate the equation of the regression line of \(y\) on \(t\).
  3. Calculate the value of the residual for the data point at which \(t = 2\).
  4. Use the equation of the regression line to calculate an estimate of the average growth after 5 days for wheat seedlings. Comment on the reliability of this estimate. It is suggested that it would be better to replace the regression line by a line which passes through the origin. You are given that the equation of such a line is \(y = a t\), where \(a = \frac { \sum y t } { \sum t ^ { 2 } }\).
  5. Find the equation of this line and plot the line on your scatter diagram.
CAIE FP2 2009 June Q7
8 marks Standard +0.3
7 An experiment was carried out to determine how much weedkiller to apply per \(100 \mathrm {~m} ^ { 2 }\) in a large field. Ten \(100 \mathrm {~m} ^ { 2 }\) areas of the field were randomly chosen and sprayed with predetermined volumes of the weedkiller. The volume of the weedkiller is denoted by \(x\) litres and the number of weeds that survived is denoted by \(y\). The results are given in the table.
\(x\)0.100.150.200.250.300.350.400.450.500.55
\(y\)484044353924101396
$$\left[ \Sigma x = 3.25 , \Sigma x ^ { 2 } = 1.2625 , \Sigma y = 268 , \Sigma y ^ { 2 } = 9548 , \Sigma x y = 66.10 . \right]$$ It is given that the product moment correlation coefficient for the data is - 0.951 , correct to 3 decimal places.
  1. Calculate the equation of a suitable regression line, giving a reason for your choice of line.
  2. Estimate the best volume of weedkiller to apply, and comment on the reliability of your estimate.
CAIE FP2 2010 June Q9
9 marks Standard +0.3
9 A set of 20 pairs of bivariate data \(( x , y )\) is summarised by $$\Sigma x = 200 , \quad \Sigma x ^ { 2 } = 2125 , \quad \Sigma y = 240 , \quad \Sigma y ^ { 2 } = 8245 .$$ The product moment correlation coefficient is - 0.992 .
  1. What does the value of the product moment correlation coefficient indicate about a scatter diagram of the data points?
  2. Find the equation of the regression line of \(y\) on \(x\).
  3. The equation of the regression line of \(x\) on \(y\) is \(x = a ^ { \prime } + b ^ { \prime } y\). Find the value of \(b ^ { \prime }\).
CAIE FP2 2011 June Q10
10 marks Standard +0.3
10 The mid-day temperature, \(x ^ { \circ } \mathrm { C }\), and the amount of sunshine, \(y\) hours, were recorded at a winter holiday resort on each of 12 days, chosen at random during the winter season. The results are summarised as follows. $$\Sigma x = 18.7 \quad \Sigma x ^ { 2 } = 106.43 \quad \Sigma y = 34.7 \quad \Sigma y ^ { 2 } = 133.43 \quad \Sigma x y = 92.01$$
  1. Find the product moment correlation coefficient for the data.
  2. Stating your hypotheses, test at the \(1 \%\) significance level whether there is a non-zero correlation between mid-day temperature and amount of sunshine.
  3. Use the equation of a suitable regression line to estimate the number of hours of sunshine on a day when the mid-day temperature is \(2 ^ { \circ } \mathrm { C }\).
CAIE FP2 2011 June Q9
11 marks Standard +0.3
9 The marks achieved by a random sample of 15 college students in a Physics examination ( \(x\) ) and in a General Studies examination (y) are summarised as follows. $$\Sigma x = 752 \quad \Sigma x ^ { 2 } = 38814 \quad \Sigma y = 773 \quad \Sigma y ^ { 2 } = 45351 \quad \Sigma x y = 40236$$
  1. Find the mean values, \(\bar { x }\) and \(\bar { y }\).
  2. Another college student achieved a mark of 56 in the General Studies examination, but was unable to take the Physics examination. Use the equation of a suitable regression line to estimate the mark that the student would have obtained in the Physics examination.
  3. Find the product moment correlation coefficient for the given data.
  4. Stating your hypotheses, test at the \(5 \%\) level of significance whether there is a non-zero product moment correlation coefficient between examination marks in Physics and in General Studies achieved by college students.
CAIE FP2 2012 June Q11 OR
Challenging +1.2
For a random sample of 5 pairs of values of \(x\) and \(y\), the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\) are respectively $$y = - 0.5 x + 5 \quad \text { and } \quad x = - 1.2 y + 7.6$$ Find the value of the product moment correlation coefficient for this sample. Test, at the \(5 \%\) significance level, whether the population product moment correlation coefficient differs from zero. The following table shows the sample data.
\(x\)1255\(p\)
\(y\)5342\(q\)
Find the values of \(p\) and \(q\).
CAIE FP2 2013 June Q10 OR
Standard +0.8
The regression line of \(y\) on \(x\), obtained from a random sample of five pairs of values of \(x\) and \(y\), has equation $$y = x + k$$ where \(k\) is a constant. The following table shows the data.
\(x\)2334\(p\)
\(y\)45842
Find the two possible values of \(p\). For the smaller of these two values of \(p\), find
  1. the product moment correlation coefficient,
  2. the equation of the regression line of \(x\) on \(y\).
CAIE FP2 2013 June Q9
9 marks Standard +0.8
9 A researcher records a random sample of \(n\) pairs of values of \(( x , y )\), giving the following summarised data. $$\Sigma x = 24 \quad \Sigma x ^ { 2 } = 160 \quad \Sigma y = 34 \quad \Sigma y ^ { 2 } = 324 \quad \Sigma x y = 192$$ The gradient of the regression line of \(y\) on \(x\) is \(- \frac { 3 } { 4 }\). Find
  1. the value of \(n\),
  2. the equation of the regression line of \(x\) on \(y\) in the form \(x = A y + B\), where \(A\) and \(B\) are constants to be determined,
  3. the product moment correlation coefficient. Another researcher records the same data in the form \(\left( x ^ { \prime } , y ^ { \prime } \right)\), where \(x ^ { \prime } = \frac { x } { k } , y ^ { \prime } = \frac { y } { k }\) and \(k\) is a constant.
    Without further calculation, state the equation of the regression line of \(x ^ { \prime }\) on \(y ^ { \prime }\).
CAIE FP2 2014 June Q10
11 marks Standard +0.3
10 Samples of rock from a number of geological sites were analysed for the quantities of two types, \(X\) and \(Y\), of rare minerals. The results, in milligrams, for 10 randomly chosen samples, each of 10 kg , are summarised as follows. $$\Sigma x = 866 \quad \Sigma x ^ { 2 } = 121276 \quad \Sigma y = 639 \quad \Sigma y ^ { 2 } = 55991 \quad \Sigma x y = 73527$$ Find the product moment correlation coefficient. Stating your hypotheses, test at the \(5 \%\) significance level whether there is non-zero correlation between quantities of the two rare minerals. Find the equation of the regression line of \(x\) on \(y\) in the form \(x = p y + q\), where \(p\) and \(q\) are constants to be determined.
CAIE FP2 2015 June Q10
13 marks Standard +0.3
10 Young children at a primary school are learning to throw a ball as far as they can. The distance thrown at the beginning of the school year and the distance thrown at the end of the same school year are recorded for each child. The distance thrown, in metres, at the beginning of the year is denoted by \(x\); the distance thrown, in metres, at the end of the year is denoted by \(y\). For a random sample of 10 children, the results are shown in the following table.
Child\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
\(x\)5.24.13.75.47.66.13.24.03.58.0
\(y\)6.24.85.05.67.77.04.04.53.68.5
$$\left[ \Sigma x = 50.8 , \quad \Sigma x ^ { 2 } = 284.16 , \quad \Sigma y = 56.9 , \quad \Sigma y ^ { 2 } = 347.59 , \quad \Sigma x y = 313.28 . \right]$$ A particular child threw the ball a distance of 7.0 metres at the beginning of the year, but he could not throw at the end of the year because he had broken his arm. By finding the equation of an appropriate regression line, estimate the distance this child would have thrown at the end of the year. The teacher suspects that, on average, the distance thrown by a child increases between the two throws by more than 0.4 metres. Stating suitable hypotheses and assuming a normal distribution, test the teacher's suspicion at the \(5 \%\) significance level.
CAIE FP2 2015 June Q7
11 marks Standard +0.8
7 For a random sample of 10 observations of pairs of values \(( x , y )\), the equation of the regression line of \(y\) on \(x\) is \(y = 3.25 x - 4.27\). The sum of the ten \(x\) values is 15.6 and the product moment correlation coefficient for the sample is 0.56 . Find the equation of the regression line of \(x\) on \(y\). Test, at the \(5 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
CAIE FP2 2016 June Q10
11 marks Standard +0.3
10 For a random sample of 6 observations of pairs of values \(( x , y )\), where \(0 < x < 21\) and \(0 < y < 14\), the following results are obtained. $$\Sigma x ^ { 2 } = 844.20 \quad \Sigma y ^ { 2 } = 481.50 \quad \Sigma x y = 625.59$$ It is also found that the variance of the \(x\)-values is 36.66 and the variance of the \(y\)-values is 9.69 .
  1. Find the product moment correlation coefficient for the sample.
  2. Find the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\).
  3. Use the appropriate regression line to estimate the value of \(x\) when \(y = 6.4\) and comment on the reliability of your estimate.
CAIE FP2 2018 June Q11 OR
Standard +0.8
The regression line of \(y\) on \(x\), obtained from a random sample of 6 pairs of values of \(x\) and \(y\), has equation $$y = 0.25 x + k$$ where \(k\) is a constant. The values from the sample are shown in the following table.
\(x\)45781014
\(y\)58\(p\)7\(p\)9
  1. Find the value of \(p\) and the value of \(k\).
  2. Find the product moment correlation coefficient for the data.
  3. Test, at the \(5 \%\) significance level, whether there is evidence of positive correlation between the variables.
    If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
CAIE FP2 2018 June Q8
9 marks Challenging +1.2
8 For a random sample of 6 observations of pairs of values \(( x , y )\), the equation of the regression line of \(y\) on \(x\) is \(y = b x + 1.306\), where \(b\) is a constant. The corresponding equation of the regression line of \(x\) on \(y\) is \(x = 0.6331 y + d\), where \(d\) is a constant. The values of \(x\) from the sample are $$\begin{array} { l l l l l l } 2.3 & 2.8 & 3.7 & p & 6.1 & 6.4 \end{array}$$ and the sum of the values of \(y\) is 46.5 . The product moment correlation coefficient is 0.9797 .
  1. Find the value of \(b\) correct to 3 decimal places.
  2. Find the value of \(p\).
  3. Use the equation of the regression line of \(x\) on \(y\) to estimate the value of \(x\) when \(y = 8.5\).
CAIE FP2 2019 June Q10
11 marks Standard +0.3
10 The values from a random sample of five pairs \(( x , y )\) taken from a bivariate distribution are shown below.
\(x\)34468
\(y\)57\(q\)67
The equation of the regression line of \(x\) on \(y\) is given by \(x = \frac { 5 } { 4 } y + c\).
  1. Given that \(q\) is an integer, find its value.
  2. Find the value of \(c\).
  3. Find the value of the product moment correlation coefficient.
CAIE FP2 2019 June Q10
12 marks Moderate -0.3
10 The means and variances for a random sample of 8 pairs of values of \(x\) and \(y\) taken from a bivariate distribution are given in the following table.
MeanVariance
\(x\)3.31253.3086
\(y\)6.73757.9473
The product moment correlation coefficient for the sample is 0.5815 , correct to 4 decimal places.
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between \(x\) and \(y\). [4]
  3. Calculate an estimate of \(y\) when \(x = 6.0\) and comment on the reliability of your estimate.
CAIE FP2 2008 November Q8
9 marks Moderate -0.3
8 The equations of the regression lines for a random sample of 25 pairs of data \(( x , y )\) from a bivariate population are $$\begin{array} { c c } y \text { on } x : & y = 1.28 - 0.425 x , \\ x \text { on } y : & x = 1.05 - 0.516 y . \end{array}$$
  1. Find the sample means, \(\bar { x }\) and \(\bar { y }\).
  2. Find the product moment correlation coefficient for the sample.
  3. Test at the \(5 \%\) significance level whether the population correlation coefficient differs from zero.
CAIE FP2 2011 November Q10 OR
Standard +0.8
The regression line of \(y\) on \(x\) obtained from a random sample of five pairs of values of \(x\) and \(y\) is $$y = 2.5 x - 1.5$$ The data is given in the following table.
\(x\)12426
\(y\)236\(p\)\(q\)
  1. Show that \(p + q = 19\).
  2. Find the values of \(p\) and \(q\).
  3. Determine the value of the product moment correlation coefficient for this sample.
  4. It is later discovered that the values of \(x\) given in the table have each been divided by 10 (that is, the actual values are \(10,20,40,20,60\) ). Without any further calculation, state
    1. the equation of the actual regression line of \(y\) on \(x\),
    2. the value of the actual product moment correlation coefficient.