5.09e Use regression: for estimation in context

129 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2005 June Q4
9 marks Moderate -0.3
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54 . State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate.
OCR S1 Specimen Q8
13 marks Moderate -0.8
8 An experiment was conducted to see whether there was any relationship between the maximum tidal current, \(y \mathrm {~cm} \mathrm {~s} ^ { - 1 }\), and the tidal range, \(x\) metres, at a particular marine location. [The tidal range is the difference between the height of high tide and the height of low tide.] Readings were taken over a period of 12 days, and the results are shown in the following table.
\(x\)2.02.43.03.13.43.73.83.94.04.54.64.9
\(y\)15.222.025.233.033.134.251.042.345.050.761.059.2
$$\left[ \Sigma x = 43.3 , \Sigma y = 471.9 , \Sigma x ^ { 2 } = 164.69 , \Sigma y ^ { 2 } = 20915.75 , \Sigma x y = 1837.78 . \right]$$ The scatter diagram below illustrates the data. \includegraphics[max width=\textwidth, alt={}, center]{2fb25fc5-0445-44fa-a23e-647d14b1a376-4_462_793_1464_644}
  1. Calculate the product moment correlation coefficient for the data, and comment briefly on your answer with reference to the appearance of the scatter diagram.
  2. Calculate the equation of the regression line of maximum tidal current on tidal range.
  3. Estimate the maximum tidal current on a day when the tidal range is 4.2 m , and comment briefly on how reliable you consider your estimate is likely to be.
  4. It is suggested that the equation found in part (ii) could be used to predict the maximum tidal current on a day when the tidal range is 15 m . Comment briefly on the validity of this suggestion.
Edexcel S1 2021 January Q5
17 marks Moderate -0.8
  1. A company director wants to introduce a performance-related pay structure for her managers. A random sample of 15 managers is taken and the annual salary, \(y\) in \(\pounds 1000\), was recorded for each manager. The director then calculated a performance score, \(x\), for each of these managers.
    The results are shown on the scatter diagram in Figure 1 on the next page.
    1. Describe the correlation between performance score and annual salary.
    The results are also summarised in the following statistics. $$\sum x = 465 \quad \sum y = 562 \quad \mathrm {~S} _ { x x } = 2492 \quad \sum y ^ { 2 } = 23140 \quad \sum x y = 19428$$
    1. Show that \(\mathrm { S } _ { x y } = 2006\)
    2. Find \(\mathrm { S } _ { y y }\)
  2. Find the product moment correlation coefficient between performance score and annual salary. The director believes that there is a linear relationship between performance score and annual salary.
  3. State, giving a reason, whether or not these data are consistent with the director's belief.
  4. Calculate the equation of the regression line of \(y\) on \(x\), in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  5. Give an interpretation of the value of \(b\).
  6. Plot your regression line on the scatter diagram in Figure 1 The director hears that one of the managers in the sample seems to be underperforming.
  7. On the scatter diagram, circle the point that best identifies this manager. The director decides to use this regression line for the new performance related pay structure.
    1. Estimate, to 3 significant figures, the new salary of a manager with a performance score of 30 \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{4f034b9a-94c8-42f2-bd77-9adec277aba6-15_1390_1408_299_187} \captionsetup{labelformat=empty} \caption{Figure 1}
      \end{figure} \includegraphics[max width=\textwidth, alt={}, center]{4f034b9a-94c8-42f2-bd77-9adec277aba6-17_2654_99_115_9} Annual salary (£1000) \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{Only use this scatter diagram if you need to redraw your line.} \includegraphics[alt={},max width=\textwidth]{4f034b9a-94c8-42f2-bd77-9adec277aba6-17_1378_1143_402_468}
      \end{figure}
Edexcel S1 2014 June Q1
12 marks Moderate -0.8
  1. A medical researcher is studying the relationship between age ( \(x\) years) and volume of blood ( \(y \mathrm { ml }\) ) pumped by each contraction of the heart. The researcher obtained the following data from a random sample of 8 patients.
Age (x)2025304555606570
Volume (y)7476777268676462
[You may use \(\sum x = 370 , \mathrm {~S} _ { x x } = 2587.5 , \sum y = 560 , \sum y ^ { 2 } = 39418 , \mathrm {~S} _ { x y } = - 710\) ]
  1. Calculate \(\mathrm { S } _ { y y }\)
  2. Calculate the product moment correlation coefficient for these data.
  3. Interpret your value of the correlation coefficient. The researcher believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the researcher's belief.
  5. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\) Jack is a 40-year-old patient.
    1. Use your regression line to estimate the volume of blood pumped by each contraction of Jack's heart.
    2. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 2015 June Q2
13 marks Moderate -0.3
2. Paul believes there is a relationship between the value and the floor size of a house. He takes a random sample of 20 houses and records the value, \(\pounds v\), and the floor size, \(s \mathrm {~m} ^ { 2 }\) The data were coded using \(x = \frac { s - 50 } { 10 }\) and \(y = \frac { v } { 100000 }\) and the following statistics obtained. $$\sum x = 441.5 , \quad \sum y = 59.8 , \quad \sum x ^ { 2 } = 11261.25 , \quad \sum y ^ { 2 } = 196.66 , \quad \sum x y = 1474.1$$
  1. Find the value of \(S _ { x y }\) and the value of \(S _ { x x }\)
  2. Find the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) The least squares regression line of \(v\) on \(s\) is \(v = c + d s\)
  3. Show that \(d = 1020\) to 3 significant figures and find the value of \(c\)
  4. Estimate the value of a house of floor size \(130 \mathrm {~m} ^ { 2 }\)
  5. Interpret the value \(d\) Paul wants to increase the value of his house. He decides to add an extension to increase the floor size by \(31 \mathrm {~m} ^ { 2 }\)
  6. Estimate the increase in the value of Paul's house after adding the extension.
Edexcel S1 2004 January Q1
13 marks Moderate -0.8
  1. An office has the heating switched on at 7.00 a.m. each morning. On a particular day, the temperature of the office, \(t { } ^ { \circ } \mathrm { C }\), was recorded \(m\) minutes after 7.00 a.m. The results are shown in the table below.
\(m\)01020304050
\(t\)6.08.911.813.515.316.1
  1. Calculate the exact values of \(S _ { m t }\) and \(S _ { m m }\).
  2. Calculate the equation of the regression line of \(t\) on \(m\) in the form \(t = a + b m\).
  3. Use your equation to estimate the value of \(t\) at 7.35 a.m.
  4. State, giving a reason, whether or not you would use the regression equation in (b) to estimate the temperature
    1. at 9.00 a.m. that day,
    2. at 7.15 a.m. one month later.
OCR S1 2009 January Q2
8 marks Moderate -0.8
2 The table shows the age, \(x\) years, and the mean diameter, \(y \mathrm {~cm}\), of the trunk of each of seven randomly selected trees of a certain species.
Age \(( x\) years \()\)11122028354551
Mean trunk diameter \(( y \mathrm {~cm} )\)12.216.026.439.239.651.360.6
$$\left[ n = 7 , \Sigma x = 202 , \Sigma y = 245.3 , \Sigma x ^ { 2 } = 7300 , \Sigma y ^ { 2 } = 10510.65 , \Sigma x y = 8736.9 . \right]$$
  1. (a) Use an appropriate formula to show that the gradient of the regression line of \(y\) on \(x\) is 1.13 , correct to 2 decimal places.
    (b) Find the equation of the regression line of \(y\) on \(x\).
  2. Use your equation to estimate the mean trunk diameter of a tree of this species with age
    (a) 30 years,
    (b) 100 years. It is given that the value of the product moment correlation coefficient for the data in the table is 0.988 , correct to 3 decimal places.
  3. Comment on the reliability of each of your two estimates.
OCR S1 2011 January Q3
12 marks Moderate -0.8
3 A firm wishes to assess whether there is a linear relationship between the annual amount spent on advertising, \(\pounds x\) thousand, and the annual profit, \(\pounds y\) thousand. A summary of the figures for 12 years is as follows. $$n = 12 \quad \Sigma x = 86.6 \quad \Sigma y = 943.8 \quad \Sigma x ^ { 2 } = 658.76 \quad \Sigma y ^ { 2 } = 83663.00 \quad \Sigma x y = 7351.12$$
  1. Calculate the product moment correlation coefficient, showing that it is greater than 0.9 .
  2. Comment briefly on this value in this context.
  3. A manager claims that this result shows that spending more money on advertising in the future will result in greater profits. Make two criticisms of this claim.
  4. Calculate the equation of the regression line of \(y\) on \(x\).
  5. Estimate the annual profit during a year when \(\pounds 7400\) was spent on advertising.
OCR S1 2015 June Q4
9 marks Moderate -0.3
4 The table shows the load a lorry was carrying, \(x\) tonnes, and the fuel economy, \(y \mathrm {~km}\) per litre, for 8 different journeys. You should assume that neither variable is controlled.
Load
\(( x\) tonnes \()\)
5.15.86.57.17.68.49.510.5
Fuel economy
\(( y \mathrm {~km}\) per litre \()\)
6.26.15.95.65.35.45.35.1
$$n = 8 \quad \sum x = 60.5 \quad \sum y = 44.9 \quad \sum x ^ { 2 } = 481.13 \quad \sum y ^ { 2 } = 253.17 \quad \sum x y = 334.65$$
  1. Calculate the equation of the regression line of \(y\) on \(x\).
  2. Estimate the fuel economy for a load of 9.2 tonnes.
  3. An analyst calculated the equation of the regression line of \(x\) on \(y\). Without calculating this equation, state the coordinates of the point where the two regression lines intersect.
  4. Describe briefly the method required to estimate the load when the fuel economy is 5.8 km per litre.
OCR MEI S2 2010 January Q1
19 marks Moderate -0.3
1 A pilot records the take-off distance for his light aircraft on runways at various altitudes. The data are shown in the table below, where \(a\) metres is the altitude and \(t\) metres is the take-off distance. Also shown are summary statistics for these data.
\(a\)0300600900120015001800
\(t\)63570477683692310081105
$$n = 7 \quad \Sigma a = 6300 \quad \Sigma t = 5987 \quad \Sigma a ^ { 2 } = 8190000 \quad \Sigma t ^ { 2 } = 5288931 \quad \Sigma a t = 6037800$$
  1. Draw a scatter diagram to illustrate these data.
  2. State which of the two variables \(a\) and \(t\) is the independent variable and which is the dependent variable. Briefly explain your answer.
  3. Calculate the equation of the regression line of \(t\) on \(a\).
  4. Use the equation of the regression line to calculate estimates of the take-off distance for altitudes
    (A) 800 metres,
    (B) 2500 metres. Comment on the reliability of each of these estimates.
  5. Calculate the value of the residual for the data point where \(a = 1200\) and \(t = 923\), and comment on its sign.
OCR MEI S2 2013 January Q1
19 marks Standard +0.3
1 A manufacturer of playground safety tiles is testing a new type of tile. Tiles of various thicknesses are tested to estimate the maximum height at which people would be unlikely to sustain injury if they fell onto a tile. The results of the test are as follows.
Thickness \(( t \mathrm {~mm} )\)20406080100
Maximum height \(( h \mathrm {~m} )\)0.721.091.621.972.34
  1. Draw a scatter diagram to illustrate these data.
  2. State which of the two variables is the independent variable, giving a reason for your answer.
  3. Calculate the equation of the regression line of maximum height on thickness.
  4. Use the equation of the regression line to calculate estimates of the maximum height for thicknesses of
    (A) 70 mm ,
    (B) 120 mm . Comment on the reliability of each of these estimates.
  5. Calculate the value of the residual for the data point at which \(t = 40\).
  6. In a further experiment, the manufacturer tests a tile with a thickness of 200 mm and finds that the corresponding maximum height is 2.96 m . What can be said about the relationship between tile thickness and maximum height?
OCR MEI S2 2011 June Q1
18 marks Easy -1.2
1 An experiment is performed to determine the response of maize to nitrogen fertilizer. Data for the amount of nitrogen fertilizer applied, \(x \mathrm {~kg} / \mathrm { hectare }\), and the average yield of maize, \(y\) tonnes/hectare, in 5 experimental plots are given in the table below.
\(x\)0306090120
\(y\)0.52.54.76.27.4
  1. Draw a scatter diagram to illustrate these data.
  2. Calculate the equation of the regression line of \(y\) on \(x\).
  3. Draw your regression line on your scatter diagram and comment briefly on its fit.
  4. Calculate the value of the residual for the data point where \(x = 30\) and \(y = 2.5\).
  5. Use the equation of the regression line to calculate estimates of average yield with nitrogen fertilizer applications of
    (A) \(45 \mathrm {~kg} / \mathrm { hectare }\),
    (B) \(150 \mathrm {~kg} /\) hectare.
  6. In a plot where \(150 \mathrm {~kg} /\) hectare of nitrogen fertilizer is applied, the average yield of maize is 8.7 tonnes/hectare. Comment on this result.
OCR MEI S2 2015 June Q1
17 marks Moderate -0.5
1 A random sample of wheat seedlings is planted and their growth is measured. The table shows their average growth, \(y \mathrm {~mm}\), at half-day intervals.
Time \(t\) days00.511.522.53
Average growth \(y \mathrm {~mm}\)072133455662
  1. Draw a scatter diagram to illustrate these data.
  2. Calculate the equation of the regression line of \(y\) on \(t\).
  3. Calculate the value of the residual for the data point at which \(t = 2\).
  4. Use the equation of the regression line to calculate an estimate of the average growth after 5 days for wheat seedlings. Comment on the reliability of this estimate. It is suggested that it would be better to replace the regression line by a line which passes through the origin. You are given that the equation of such a line is \(y = a t\), where \(a = \frac { \sum y t } { \sum t ^ { 2 } }\).
  5. Find the equation of this line and plot the line on your scatter diagram.
CAIE FP2 2009 June Q7
8 marks Standard +0.3
7 An experiment was carried out to determine how much weedkiller to apply per \(100 \mathrm {~m} ^ { 2 }\) in a large field. Ten \(100 \mathrm {~m} ^ { 2 }\) areas of the field were randomly chosen and sprayed with predetermined volumes of the weedkiller. The volume of the weedkiller is denoted by \(x\) litres and the number of weeds that survived is denoted by \(y\). The results are given in the table.
\(x\)0.100.150.200.250.300.350.400.450.500.55
\(y\)484044353924101396
$$\left[ \Sigma x = 3.25 , \Sigma x ^ { 2 } = 1.2625 , \Sigma y = 268 , \Sigma y ^ { 2 } = 9548 , \Sigma x y = 66.10 . \right]$$ It is given that the product moment correlation coefficient for the data is - 0.951 , correct to 3 decimal places.
  1. Calculate the equation of a suitable regression line, giving a reason for your choice of line.
  2. Estimate the best volume of weedkiller to apply, and comment on the reliability of your estimate.
CAIE FP2 2014 June Q10
11 marks Standard +0.3
10 Samples of rock from a number of geological sites were analysed for the quantities of two types, \(X\) and \(Y\), of rare minerals. The results, in milligrams, for 10 randomly chosen samples, each of 10 kg , are summarised as follows. $$\Sigma x = 866 \quad \Sigma x ^ { 2 } = 121276 \quad \Sigma y = 639 \quad \Sigma y ^ { 2 } = 55991 \quad \Sigma x y = 73527$$ Find the product moment correlation coefficient. Stating your hypotheses, test at the \(5 \%\) significance level whether there is non-zero correlation between quantities of the two rare minerals. Find the equation of the regression line of \(x\) on \(y\) in the form \(x = p y + q\), where \(p\) and \(q\) are constants to be determined.
CAIE FP2 2015 June Q10
13 marks Standard +0.3
10 Young children at a primary school are learning to throw a ball as far as they can. The distance thrown at the beginning of the school year and the distance thrown at the end of the same school year are recorded for each child. The distance thrown, in metres, at the beginning of the year is denoted by \(x\); the distance thrown, in metres, at the end of the year is denoted by \(y\). For a random sample of 10 children, the results are shown in the following table.
Child\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
\(x\)5.24.13.75.47.66.13.24.03.58.0
\(y\)6.24.85.05.67.77.04.04.53.68.5
$$\left[ \Sigma x = 50.8 , \quad \Sigma x ^ { 2 } = 284.16 , \quad \Sigma y = 56.9 , \quad \Sigma y ^ { 2 } = 347.59 , \quad \Sigma x y = 313.28 . \right]$$ A particular child threw the ball a distance of 7.0 metres at the beginning of the year, but he could not throw at the end of the year because he had broken his arm. By finding the equation of an appropriate regression line, estimate the distance this child would have thrown at the end of the year. The teacher suspects that, on average, the distance thrown by a child increases between the two throws by more than 0.4 metres. Stating suitable hypotheses and assuming a normal distribution, test the teacher's suspicion at the \(5 \%\) significance level.
CAIE FP2 2016 June Q10
11 marks Standard +0.3
10 For a random sample of 6 observations of pairs of values \(( x , y )\), where \(0 < x < 21\) and \(0 < y < 14\), the following results are obtained. $$\Sigma x ^ { 2 } = 844.20 \quad \Sigma y ^ { 2 } = 481.50 \quad \Sigma x y = 625.59$$ It is also found that the variance of the \(x\)-values is 36.66 and the variance of the \(y\)-values is 9.69 .
  1. Find the product moment correlation coefficient for the sample.
  2. Find the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\).
  3. Use the appropriate regression line to estimate the value of \(x\) when \(y = 6.4\) and comment on the reliability of your estimate.
CAIE FP2 2019 June Q10
12 marks Moderate -0.3
10 The means and variances for a random sample of 8 pairs of values of \(x\) and \(y\) taken from a bivariate distribution are given in the following table.
MeanVariance
\(x\)3.31253.3086
\(y\)6.73757.9473
The product moment correlation coefficient for the sample is 0.5815 , correct to 4 decimal places.
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between \(x\) and \(y\). [4]
  3. Calculate an estimate of \(y\) when \(x = 6.0\) and comment on the reliability of your estimate.
CAIE FP2 2017 Specimen Q9
11 marks Standard +0.8
9 A random sample of 8 students is chosen from those sitting examinations in both Mathematics and French. Their marks in Mathematics, \(x\), and in French, \(y\), are summarised as follows. $$\Sigma x = 472 \quad \Sigma x ^ { 2 } = 29950 \quad \Sigma y = 400 \quad \Sigma y ^ { 2 } = 21226 \quad \Sigma x y = 24879$$ Another student scored 72 marks in the Mathematics examination but was unable to sit the French examination.
  1. Estimate the mark that this student would have obtained in the French examination.
  2. Test, at the \(5 \%\) significance level, whether there is non-zero correlation between marks in Mathematics and marks in French.
OCR MEI S2 Q3
18 marks Standard +0.3
3 In a triathlon, competitors have to swim 600 metres, cycle 40 kilometres and run 10 kilometres. To improve her strength, a triathlete undertakes a training programme in which she carries weights in a rucksack whilst running. She runs a specific course and notes the total time taken for each run. Her coach is investigating the relationship between time taken and weight carried. The times taken with eight different weights are illustrated on the scatter diagram below, together with the summary statistics for these data. The variables \(x\) and \(y\) represent weight carried in kilograms and time taken in minutes respectively. \includegraphics[max width=\textwidth, alt={}, center]{d138173d-c70c-46db-b9b9-d5f19334c5f1-04_627_1536_630_281} Summary statistics: \(n = 8 , \Sigma x = 36 , \Sigma y = 214.8 , \Sigma x ^ { 2 } = 204 , \Sigma y ^ { 2 } = 5775.28 , \Sigma x y = 983.6\).
  1. Calculate the equation of the regression line of \(y\) on \(x\). On one of the eight runs, the triathlete was carrying 4 kilograms and took 27.5 minutes. On this run she was delayed when she tripped and fell over.
  2. Calculate the value of the residual for this weight.
  3. The coach decides to recalculate the equation of the regression line without the data for this run. Would it be preferable to use this recalculated equation or the equation found in part (i) to estimate the delay when the triathlete tripped and fell over? Explain your answer. The triathlete's coach claims that there is positive correlation between cycling and swimming times in triathlons. The product moment correlation coefficient of the times of twenty randomly selected competitors in these two sections is 0.209 .
  4. Carry out a hypothesis test at the \(5 \%\) level to examine the coach's claim, explaining your conclusions clearly.
  5. What distributional assumption is necessary for this test to be valid? How can you use a scatter diagram to decide whether this assumption is likely to be true?
Edexcel AS Paper 2 2019 June Q1
5 marks Easy -1.2
  1. A sixth form college has 84 students in Year 12 and 56 students in Year 13
The head teacher selects a stratified sample of 40 students, stratified by year group.
  1. Describe how this sample could be taken. The head teacher is investigating the relationship between the amount of sleep, s hours, that each student had the night before they took an aptitude test and their performance in the test, \(p\) marks.
    For the sample of 40 students, he finds the equation of the regression line of \(p\) on \(s\) to be $$p = 26.1 + 5.60 s$$
  2. With reference to this equation, describe the effect that an extra 0.5 hours of sleep may have, on average, on a student's performance in the aptitude test.
  3. Describe one limitation of this regression model.
Edexcel Paper 3 2019 June Q3
9 marks Standard +0.3
3. Barbara is investigating the relationship between average income (GDP per capita), \(x\) US dollars, and average annual carbon dioxide ( \(\mathrm { CO } _ { 2 }\) ) emissions, \(y\) tonnes, for different countries. She takes a random sample of 24 countries and finds the product moment correlation coefficient between average annual \(\mathrm { CO } _ { 2 }\) emissions and average income to be 0.446
  1. Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not the product moment correlation coefficient for all countries is greater than zero. Barbara believes that a non-linear model would be a better fit to the data.
    She codes the data using the coding \(m = \log _ { 10 } x\) and \(c = \log _ { 10 } y\) and obtains the model \(c = - 1.82 + 0.89 m\) The product moment correlation coefficient between \(c\) and \(m\) is found to be 0.882
  2. Explain how this value supports Barbara's belief.
  3. Show that the relationship between \(y\) and \(x\) can be written in the form \(y = a x ^ { n }\) where \(a\) and \(n\) are constants to be found.
OCR Further Statistics 2024 June Q7
8 marks Standard +0.3
7 The coordinates of a set of 10 points are denoted by ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) for \(i = 1,2 , \ldots , 10\). For a particular set of values of ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) and any constants \(a\) and \(b\) it can be shown that \(\Sigma \left( y _ { i } - a - b x _ { i } \right) ^ { 2 } = 10 ( 11 - a - 6 b ) ^ { 2 } + 126 \left( b - \frac { 83 } { 42 } \right) ^ { 2 } + \frac { 139 } { 14 }\).
    1. Explain why \(\sum \left( \mathrm { y } _ { \mathrm { i } } - \mathrm { a } - \mathrm { bx } _ { \mathrm { i } } \right) ^ { 2 }\) is minimised by taking \(b = \frac { 83 } { 42 }\) and \(\mathrm { a } = 11 - 6 \mathrm {~b}\).
    2. Hence explain why the equation of the regression line of \(y\) on \(x\) for these points is given by the corresponding values of \(a\) and \(b\) (so that the equation is \(\mathrm { y } = \frac { 83 } { 42 } \mathrm { x } - \frac { 6 } { 7 }\) ).
  1. State which of the following terms cannot apply to the variable \(X\) if the regression line of \(y\) on \(x\) can be used for estimating values of \(Y\). Dependent Independent Controlled Response
  2. Use the regression line to estimate the value of \(y\) corresponding to \(x = 8\).
  3. State what must be true of the value \(x = 8\) if the estimate in part (c) is to be reliable.
  4. Variables \(u\) and \(v\) are related to \(x\) and \(y\) by the following relationships. \(u = 2 + 4 x \quad v = 8 - 2 y\) Show that the gradient of the regression line of \(v\) on \(u\) is very close to - 1 .
OCR Further Statistics 2021 November Q1
6 marks Standard +0.3
1 At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows. \(\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297\)
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(\mathrm { c } = \frac { 5 } { 9 } ( \mathrm { f } - 32 )\). Find the equation of the new regression line.
OCR Further Statistics Specimen Q1
6 marks Easy -1.2
1 The table below shows the typical stopping distances \(d\) metres for a particular car travelling at \(v\) miles per hour.
\(v\)203040506070
\(d\)132436527294
  1. State each of the following words that describe the variable \(v\). \section*{Independent Dependent Controlled Response}
  2. Calculate the equation of the regression line of \(d\) on \(v\).
  3. Use the equation found in part (ii) to estimate the typical stopping distance when this car is travelling at 45 miles per hour. It is given that the product moment correlation coefficient for the data is 0.990 correct to three significant figures.
  4. Explain whether your estimate found in part (iii) is reliable.