5.09c Calculate regression line

235 questions

Sort by: Default | Easiest first | Hardest first
CAIE FP2 2019 June Q10
11 marks Standard +0.3
The values from a random sample of five pairs \((x, y)\) taken from a bivariate distribution are shown below.
\(x\)34468
\(y\)57\(q\)67
The equation of the regression line of \(x\) on \(y\) is given by \(x = \frac{5}{4}y + c\).
  1. Given that \(q\) is an integer, find its value. [5]
  2. Find the value of \(c\). [3]
  3. Find the value of the product moment correlation coefficient. [3]
CAIE FP2 2009 November Q11
28 marks Standard +0.3
Answer only one of the following two alternatives. EITHER A light elastic string, of natural length \(l\) and modulus of elasticity \(4mg\), is attached at one end to a fixed point and has a particle \(P\) of mass \(m\) attached to the other end. When \(P\) is hanging in equilibrium under gravity it is given a velocity \(\sqrt{(gl)}\) vertically downwards. At time \(t\) the downward displacement of \(P\) from its equilibrium position is \(x\). Show that, while the string is taut, $$\ddot{x} = -\frac{4g}{l}x.$$ [4] Find the speed of \(P\) when the length of the string is \(l\). [4] Show that the time taken for \(P\) to move from the lowest point to the highest point of its motion is $$\left(\frac{\pi}{3} + \frac{\sqrt{3}}{2}\right)\sqrt{\left(\frac{l}{g}\right)}.$$ [6] OR \includegraphics{figure_11} The scatter diagram shows a sample of size 5 of bivariate data, together with the regression line of \(y\) on \(x\). State what is minimised in obtaining this regression line, illustrating your answer on a copy of this diagram. [2] State, giving a reason, whether, for the data shown, the regression line of \(y\) on \(x\) is the same as the regression line of \(x\) on \(y\). [1] A car is travelling along a stretch of road with speed \(v\) km h\(^{-1}\) when the brakes are applied. The car comes to rest after travelling a further distance of \(z\) m. The values of \(z\) (and \(\sqrt{z}\)) for 8 different values of \(v\) are given in the table, correct to 2 decimal places.
\(v\)2530354045505560
\(z\)2.834.634.845.299.7310.3014.8215.21
\(\sqrt{z}\)1.682.152.202.303.123.213.853.90
[\(\sum v = 340\), \(\sum v^2 = 15500\), \(\sum \sqrt{z} = 22.41\), \(\sum z = 67.65\), \(\sum v\sqrt{z} = 1022.15\).]
  1. Calculate the product moment correlation coefficient between \(v\) and \(\sqrt{z}\). What does this indicate about the scatter diagram of the points \((v, \sqrt{z})\)? [4]
  2. Given that the product moment correlation coefficient between \(v\) and \(z\) is 0.965, correct to 3 decimal places, state why the regression line of \(\sqrt{z}\) on \(v\) is more suitable than the regression line of \(z\) on \(v\), and find the equation of the regression line of \(\sqrt{z}\) on \(v\). [5]
  3. Comment, in the context of the question, on the value of the constant term in the equation of the regression line of \(\sqrt{z}\) on \(v\). [2]
CAIE FP2 2010 November Q10
13 marks Standard +0.3
For each month of a certain year, a weather station recorded the average rainfall per day, \(x\) mm, and the average amount of sunshine per day, \(y\) hours. The results are summarised below. \(n = 12\), \(\Sigma x = 24.29\), \(\Sigma x^2 = 50.146\), \(\Sigma y = 45.8\), \(\Sigma y^2 = 211.16\), \(\Sigma xy = 88.415\).
  1. Find the mean values, \(\bar{x}\) and \(\bar{y}\). [1]
  2. Calculate the gradient of the line of regression of \(y\) on \(x\). [2]
  3. Use the answers to parts (i) and (ii) to obtain the equation of the line of regression of \(y\) on \(x\). [2]
  4. Find the product moment correlation coefficient and comment, in context, on its value. [4]
  5. Stating your hypotheses, test at the 1% level of significance whether there is negative correlation between average rainfall per day and average amount of sunshine per day. [4]
CAIE FP2 2014 November Q9
11 marks Standard +0.3
A random sample of 10 pairs of values of \(x\) and \(y\) is given in the following table.
\(x\)466827121495
\(y\)24686109865
  1. Find the equation of the regression line of \(y\) on \(x\). [4]
  2. Find the product moment correlation coefficient for the sample. [2]
  3. Find the estimated value of \(y\) when \(x = 10\), and comment on the reliability of this estimate. [2]
  4. Another sample of \(N\) pairs of data from the same population has the same product moment correlation coefficient as the first sample given. A test, at the 1% significance level, on this second sample indicates that there is sufficient evidence to conclude that there is positive correlation. Find the set of possible values of \(N\). [3]
CAIE FP2 2015 November Q9
11 marks Standard +0.3
A random sample of 8 students is chosen from those sitting examinations in both Mathematics and French. Their marks in Mathematics, \(x\), and in French, \(y\), are summarised as follows. $$\Sigma x = 472 \qquad \Sigma x^2 = 29950 \qquad \Sigma y = 400 \qquad \Sigma y^2 = 21226 \qquad \Sigma xy = 24879$$ Another student scored 72 marks in the Mathematics examination but was unable to sit the French examination. Estimate the mark that this student would have obtained in the French examination. [5] Test, at the 5% significance level, whether there is non-zero correlation between marks in Mathematics and marks in French. [6]
CAIE FP2 2018 November Q10
12 marks Standard +0.8
For a random sample of 10 observations of pairs of values \((x, y)\), the equation of the regression line of \(y\) on \(x\) is \(y = 1.1664 + 0.4604x\). It is given that $$\Sigma x^2 = 1419.98 \quad \text{and} \quad \Sigma y^2 = 439.68.$$ The mean value of \(y\) is 6.24.
  1. Find the equation of the regression line of \(x\) on \(y\). [6]
  2. Find the product moment correlation coefficient. [2]
  3. Test at the 5\% significance level whether there is evidence of positive correlation between the two variables. [4]
CAIE FP2 2019 November Q9
10 marks Standard +0.8
A random sample of five pairs of values of \(x\) and \(y\) is taken from a bivariate distribution. The values are shown in the following table, where \(p\) and \(q\) are constants.
\(x\)12345
\(y\)4\(p\)\(q\)21
The equation of the regression line of \(y\) on \(x\) is \(y = -0.5x + 3.5\).
  1. Find the values of \(p\) and \(q\). [7]
  2. Find the value of the product moment correlation coefficient. [3]
Edexcel S1 2023 June Q2
13 marks Moderate -0.3
Two students, Olive and Shan, collect data on the weight, \(w\) grams, and the tail length, \(t\) cm, of 15 mice. Olive summarised the data as follows \(S_tt = 5.3173\) \quad \(\sum w^2 = 6089.12\) \quad \(\sum tw = 2304.53\) \quad \(\sum w = 297.8\) \quad \(\sum t = 114.8\)
  1. Calculate the value of \(S_{ww}\) and the value of \(S_{tw}\) [3]
  2. Calculate the value of the product moment correlation coefficient between \(w\) and \(t\) [2]
  3. Show that the equation of the regression line of \(w\) on \(t\) can be written as $$w = -16.7 + 4.77t$$ [3]
  4. Give an interpretation of the gradient of the regression line. [1]
  5. Explain why it would not be appropriate to use the regression line in part (c) to estimate the weight of a mouse with a tail length of 2cm. [2]
Shan decided to code the data using \(x = t - 6\) and \(y = \frac{w}{2} - 5\)
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\) [1]
  2. Write down an equation of the regression line of \(y\) on \(x\) You do not need to simplify your equation. [1]
Edexcel S1 2011 June Q7
12 marks Moderate -0.8
A teacher took a random sample of 8 children from a class. For each child the teacher recorded the length of their left foot, \(f\) cm, and their height, \(h\) cm. The results are given in the table below.
\(f\)2326232227242021
\(h\)135144134136140134130132
(You may use \(\sum f = 186 \quad \sum h = 1085 \quad S_{ff} = 39.5 \quad S_{hh} = 139.875 \quad \sum fh = 25291\))
  1. Calculate \(S_{fh}\) [2]
  2. Find the equation of the regression line of \(h\) on \(f\) in the form \(h = a + bf\). Give the value of \(a\) and the value of \(b\) correct to 3 significant figures. [5]
  3. Use your equation to estimate the height of a child with a left foot length of 25 cm. [2]
  4. Comment on the reliability of your estimate in (c), giving a reason for your answer. [2]
The left foot length of the teacher is 25 cm.
  1. Give a reason why the equation in (b) should not be used to estimate the teacher's height. [1]
Edexcel S1 2002 November Q5
12 marks Standard +0.3
An agricultural researcher collected data, in appropriate units, on the annual rainfall \(x\) and the annual yield of wheat \(y\) at 8 randomly selected places. The data were coded using \(s = x - 6\) and \(t = y - 20\) and the following summations were obtained. \(\Sigma s = 48.5\), \(\Sigma t = 65.0\), \(\Sigma s^2 = 402.11\), \(\Sigma t^2 = 701.80\), \(\Sigma st = 523.23\)
  1. Find the equation of the regression line of \(t\) on \(s\) in the form \(t = p + qs\). [7]
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + bx\), giving \(a\) and \(b\) to 3 decimal places. [3]
The value of the product moment correlation coefficient between \(s\) and \(t\) is 0.943, to 3 decimal places.
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\). Give a justification for your answer. [2]
Edexcel S1 Specimen Q4
14 marks Moderate -0.3
A drilling machine can run at various speeds, but in general the higher the speed the sooner the drill needs to be replaced. Over several months, 15 pairs of observations relating to speed, \(s\) revolutions per minute, and life of drill, \(h\) hours, are collected. For convenience the data are coded so that \(x = s - 20\) and \(y = h - 100\) and the following summations obtained. \(\Sigma x = 143; \Sigma y = 391; \Sigma x^2 = 2413; \Sigma y^2 = 22441; \Sigma xy = 484\).
  1. Find the equation of the regression line of \(h\) on \(s\). [10]
  2. Interpret the slope of your regression line. [2]
Estimate the life of a drill revolving at 30 revolutions per minute. [2]
Edexcel S1 Q3
13 marks Moderate -0.3
The marks obtained by ten students in a Geography test and a History test were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Geography (\(x\))34574921845310776185
History (\(y\))404955407139476573
  1. Given that \(\sum y = 547\), calculate the mark obtained by student \(E\) in History. [1 mark] Given further that \(\sum x^2 = 34087\), \(\sum y^2 = 31575\) and \(\sum xy = 31342\), calculate
  2. the product moment correlation coefficient between \(x\) and \(y\), [4 marks]
  3. an equation of the regression line of \(y\) on \(x\), [4 marks]
  4. an estimate of the History mark of student \(K\), who scored 70 in Geography. [2 marks]
  5. State, with a reason, whether you would expect your answer to part (d) to be reliable. [2 marks]
Edexcel S1 Q3
13 marks Moderate -0.3
Twenty pairs of observations are made of two variables \(x\) and \(y\), which are believed to be related. It is found that $$\sum x = 200, \quad \sum y = 174, \quad \sum x^2 = 6201, \quad \sum y^2 = 5102, \quad \sum xy = 5200.$$ Find
  1. the product-moment correlation coefficient between \(x\) and \(y\), [3 marks]
  2. the equation of the regression line of \(y\) on \(x\). [4 marks]
Given that \(p = x + 30\) and \(q = y + 50\),
  1. find the equation of the regression line of \(q\) on \(p\), in the form \(q = mp + c\). [3 marks]
  2. Estimate the value of \(q\) when \(p = 46\), stating any assumptions you make. [3 marks]
Edexcel S1 Q7
15 marks Moderate -0.3
The following data was collected for seven cars, showing their engine size, \(x\) litres, and their fuel consumption, \(y\) km per litre, on a long journey.
Car\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)
\(x\)0.951.201.371.762.252.502.875
\(y\)21.317.215.519.114.711.49.0
\(\sum x = 12.905\), \(\sum x^2 = 26.8951\), \(\sum y = 108.2\), \(\sum y^2 = 1781.64\), \(\sum xy = 183.176\).
  1. Calculate the equation of the regression line of \(x\) on \(y\), expressing your answer in the form \(x = ay + b\). [6 marks]
  2. Calculate the product moment correlation coefficient between \(y\) and \(x\) and give a brief interpretation of its value. [4 marks]
  3. Use the equation of the regression line to estimate the value of \(x\) when \(y = 12\). State, with a reason, how accurate you would expect this estimate to be. [3 marks]
  4. Comment on the use of the line to find values of \(x\) as \(y\) gets very small. [2 marks]
Edexcel S1 Q6
15 marks Standard +0.3
The marks out of 75 obtained by a group of ten students in their first and second Statistics modules were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Module 1 \((x)\)\(54\)\(33\)\(42\)\(71\)\(60\)\(27\)\(39\)\(46\)\(59\)\(64\)
Module 2 \((y)\)\(50\)\(22\)\(44\)\(58\)\(42\)\(19\)\(35\)\(46\)\(55\)\(60\)
  1. Find \(\sum x\) and \(\sum y\). [2 marks]
Given that \(\sum x^2 = 26353\) and \(\sum xy = 22991\),
  1. obtain the equation of the regression line of \(y\) on \(x\). [5 marks]
  2. Estimate the Module 2 result of a student whose mark in Module 1 was (i) 65, (ii) 5. Explain why one of these estimates is less reliable than the other. [4 marks]
The equation of the regression line of \(x\) on \(y\) is \(x = 0.921y + 9.81\).
  1. Deduce the product moment correlation coefficient between \(x\) and \(y\), and briefly interpret its value. [4 marks]
Edexcel S1 Q6
13 marks Standard +0.3
Two variables \(x\) and \(y\) are such that, for a sample of ten pairs of values, $$\sum x = 104.5, \quad \sum y = 113.6, \quad \sum x^2 = 1954.1, \quad \sum y^2 = 2100.6.$$ The regression line of \(x\) on \(y\) has gradient 0.8. Find
  1. \(\sum xy\), [4 marks]
  2. the equation of the regression line of \(y\) on \(x\), [5 marks]
  3. the product moment correlation coefficient between \(y\) and \(x\). [3 marks]
  4. Describe the kind of correlation indicated by your answer to (c). [1 mark]
OCR S1 2013 January Q3
12 marks Moderate -0.3
The Gross Domestic Product per Capita (GDP), \(x\) dollars, and the Infant Mortality Rate per thousand (IMR), \(y\), of 6 African countries were recorded and summarised as follows. \(n = 6\) \quad \(\sum x = 7000\) \quad \(\sum x^2 = 8700000\) \quad \(\sum y = 456\) \quad \(\sum y^2 = 36262\) \quad \(\sum xy = 509900\)
  1. Calculate the equation of the regression line of \(y\) on \(x\) for these 6 countries. [4]
The original data were plotted on a scatter diagram and the regression line of \(y\) on \(x\) was drawn, as shown below. \includegraphics{figure_3}
  1. The GDP for another country, Tanzania, is 1300 dollars. Use the regression line in the diagram to estimate the IMR of Tanzania. [1]
  2. The GDP for Nigeria is 2400 dollars. Give two reasons why the regression line is unlikely to give a reliable estimate for the IMR for Nigeria. [2]
  3. The actual value of the IMR for Tanzania is 96. The data for Tanzania (\(x = 1300, y = 96\)) is now included with the original 6 countries. Calculate the value of the product moment correlation coefficient, \(r\), for all 7 countries. [4]
  4. The IMR is now redefined as the infant mortality rate per hundred instead of per thousand, and the value of \(r\) is recalculated for all 7 countries. Without calculation state what effect, if any, this would have on the value of \(r\) found in part (iv). [1]
OCR S1 2013 June Q5
9 marks Moderate -0.3
The table shows some of the values of the seasonally adjusted Unemployment Rate (UR), \(x\)\%, and the Consumer Price Index (CPI), \(y\)\%, in the United Kingdom from April 2008 to July 2010.
DateApril 2008July 2008October 2008January 2009April 2009July 2009October 2009January 2010April 2010July 2010
UR, \(x\)\%5.25.76.16.87.57.87.87.97.87.7
CPI, \(y\)\%3.04.44.53.02.31.81.53.53.73.1
These data are summarised below. $$n = 10 \quad \sum x = 70.3 \quad \sum x^2 = 503.45 \quad \sum y = 30.8 \quad \sum y^2 = 103.94 \quad \sum xy = 211.9$$
  1. Calculate the product moment correlation coefficient, \(r\), for the data, showing that \(-0.6 < r < -0.5\). [3]
  2. Karen says "The negative value of \(r\) shows that when the Unemployment Rate increases, it causes the Consumer Price Index to decrease." Give a criticism of this statement. [1]
    1. Calculate the equation of the regression line of \(x\) on \(y\). [3]
    2. Use your equation to estimate the value of the Unemployment Rate in a month when the Consumer Price Index is 4.0\%. [2]
Edexcel S1 Q6
17 marks Moderate -0.3
Penshop have stores selling stationary in each of 6 towns. The population, \(P\), in tens of thousands and the monthly turnover, \(T\), in thousands of pounds for each of the shops are as recorded below.
TownAbbertonBemberClasterDellerEdgetonFigland
\(P\) (0.000's)3.27.65.29.08.14.8
\(T\) (£ 000's)11.112.413.319.317.911.8
  1. Represent these data on a scatter diagram with \(T\) on the vertical axis. [4]
    1. Which town's shop might appear to be underachieving given the populations of the towns?
    2. Suggest two other factors that might affect each shop's turnover. [3]
You may assume that $$\Sigma P = 37.9, \quad \Sigma T = 85.8, \quad \Sigma P^2 = 264.69, \quad \Sigma T^2 = 1286, \quad \Sigma PT = 574.25.$$
  1. Find the equation of the regression line of \(T\) on \(P\). [7]
  2. Estimate the monthly turnover that might be expected if a shop were opened in Gratton, a town with a population of 68 000. [2]
  3. Why might the management of Penshop be reluctant to use the regression line to estimate the monthly turnover they could expect if a shop were opened in Haggin, a town with a population of 172 000? [1]
Edexcel S1 Q4
12 marks Standard +0.3
The owner of a mobile burger-bar believes that hot weather reduces his sales. To investigate the effect on his business he collected data on his daily sales, \(£P\), and the maximum temperature, \(T\)°C, on each of 20 days. He then coded the data, using \(x = T - 20\) and \(y = P - 300\), and calculated the summary statistics given below. $$\Sigma x = 57, \quad \Sigma y = 2222, \quad \Sigma x^2 = 401, \quad \Sigma y^2 = 305576, \quad \Sigma xy = 3871.$$
  1. Find an equation of the regression line of \(P\) on \(T\). [9 marks]
The owner of the bar doesn't believe it is profitable for him to run the bar if he takes less than £460 in a day.
  1. According to your regression line at what maximum daily temperature, to the nearest degree Celsius, does it become unprofitable for him to run the bar? [3 marks]
Edexcel S1 Q6
16 marks Moderate -0.3
The Principal of a school believes that more students are absent on days when the temperature is lower. Over a two-week period in December she records the percentage of students who are absent, \(A\%\), and the temperature, \(T°\)C, at 9 am each morning giving these results.
\(T\) (°C)4\(-3\)\(-2\)\(-6\)037\(-1\)32
\(A\) (\%)8.514.117.020.317.915.512.412.813.711.6
  1. Represent these data on a scatter diagram. [4 marks]
You may use $$\Sigma T = 7, \quad \Sigma A = 143.8, \quad \Sigma T^2 = 137, \quad \Sigma A^2 = 2172.66, \quad \Sigma TA = 20.7$$
  1. Calculate the product moment correlation coefficient for these data and comment on the Principal's hypothesis. [6 marks]
  2. Find an equation of the regression line of \(A\) on \(T\) in the form \(A = p + qT\). [4 marks]
  3. Draw the regression line on your scatter diagram. [2 marks]
OCR MEI S2 2007 January Q1
18 marks Moderate -0.8
In a science investigation into energy conservation in the home, a student is collecting data on the time taken for an electric kettle to boil as the volume of water in the kettle is varied. The student's data are shown in the table below, where \(v\) litres is the volume of water in the kettle and \(t\) seconds is the time taken for the kettle to boil (starting with the water at room temperature in each case). Also shown are summary statistics and a scatter diagram on which the regression line of \(t\) on \(v\) is drawn.
\(v\)0.20.40.60.81.0
\(t\)4478114156172
\(n = 5\), \(\Sigma v = 3.0\), \(\Sigma t = 564\), \(\Sigma v^2 = 2.20\), \(\Sigma vt = 405.2\). \includegraphics{figure_1}
  1. Calculate the equation of the regression line of \(t\) on \(v\), giving your answer in the form \(t = a + bv\). [5]
  2. Use this equation to predict the time taken for the kettle to boil when the amount of water which it contains is
    1. 0.5 litres,
    2. 1.5 litres.
    Comment on the reliability of each of these predictions. [4]
  3. In the equation of the regression line found in part (i), explain the role of the coefficient of \(v\) in the relationship between time taken and volume of water. [2]
  4. Calculate the values of the residuals for \(v = 0.8\) and \(v = 1.0\). [4]
  5. Explain how, on a scatter diagram with the regression line drawn accurately on it, a residual could be measured and its sign determined. [3]
OCR Further Statistics AS Specimen Q8
10 marks Standard +0.3
The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
source: www.tylervigen.com
  1. Find the equation of the regression line of \(y\) on \(x\). [2]
You are given that the product moment correlation coefficient is 0.959.
  1. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. [1]
It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  1. Assume that the data is a random sample of all years.
    1. Carry out the test at the 10\% significance level. [6]
    2. Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. [1]
WJEC Unit 2 2018 June Q05
6 marks Easy -1.2
A baker is aware that the pH of his sourdough, \(y\), and the hydration, \(x\), affect the taste and texture of the final product. The hydration is measured in ml of water per 100 g of flour (ml/100 g). The baker researches how the pH of his sourdough changes as the hydration changes. The results of his research are shown in the diagram below. \includegraphics{figure_5}
  1. Describe the relationship between pH and hydration. [2]
  2. The equation of the regression line for \(y\) on \(x\) is $$y = 5.4 - 0.02x.$$
    1. Interpret the gradient and intercept of the regression line in this context.
    2. Estimate the pH of the sourdough when the hydration is 20 ml/100 g. Comment on the reliability of this estimate. [4]
WJEC Unit 2 Specimen Q4
7 marks Easy -1.3
A researcher wishes to investigate the relationship between the amount of carbohydrate and the number of calories in different fruits. He compiles a list of 90 different fruits, e.g. apricots, kiwi fruits, raspberries. As he does not have enough time to collect data for each of the 90 different fruits, he decides to select a simple random sample of 14 different fruits from the list. For each fruit selected, he then uses a dieting website to find the number of calories (kcal) and the amount of carbohydrate (g) in a typical adult portion (e.g. a whole apple, a bunch of 10 grapes, half a cup of strawberries). He enters these data into a spreadsheet for analysis.
  1. Explain how the random number function on a calculator could be used to select this sample of 14 different fruits. [3]
  2. The scatter graph represents 'Number of calories' against 'Carbohydrate' for the sample of 14 different fruits.
    1. Describe the correlation between 'Number of calories' and 'Carbohydrate'. [1]
    2. Interpret the correlation between 'Number of calories' and 'Carbohydrate' in this context. [1]
    \includegraphics{figure_1}
  3. The equation of the regression line for this dataset is: 'Number of calories' = 12.4 + 2.9 × 'Carbohydrate'
    1. Interpret the gradient of the regression line in this context. [1]
    2. Explain why it is reasonable for the regression line to have a non-zero intercept in this context. [1]