Interpret regression line parameters

A question is this type if and only if it asks to interpret the meaning of the gradient, intercept, or other feature of a regression line in context.

9 questions

OCR MEI S2 2005 June Q3
3 In a triathlon, competitors have to swim 600 metres, cycle 40 kilometres and run 10 kilometres. To improve her strength, a triathlete undertakes a training programme in which she carries weights in a rucksack whilst running. She runs a specific course and notes the total time taken for each run. Her coach is investigating the relationship between time taken and weight carried. The times taken with eight different weights are illustrated on the scatter diagram below, together with the summary statistics for these data. The variables \(x\) and \(y\) represent weight carried in kilograms and time taken in minutes respectively.
\includegraphics[max width=\textwidth, alt={}, center]{be463718-caf7-4bc8-b838-143ab4681d6e-4_627_1536_630_281} Summary statistics: \(n = 8 , \Sigma x = 36 , \Sigma y = 214.8 , \Sigma x ^ { 2 } = 204 , \Sigma y ^ { 2 } = 5775.28 , \Sigma x y = 983.6\).
  1. Calculate the equation of the regression line of \(y\) on \(x\). On one of the eight runs, the triathlete was carrying 4 kilograms and took 27.5 minutes. On this run she was delayed when she tripped and fell over.
  2. Calculate the value of the residual for this weight.
  3. The coach decides to recalculate the equation of the regression line without the data for this run. Would it be preferable to use this recalculated equation or the equation found in part (i) to estimate the delay when the triathlete tripped and fell over? Explain your answer. The triathlete's coach claims that there is positive correlation between cycling and swimming times in triathlons. The product moment correlation coefficient of the times of twenty randomly selected competitors in these two sections is 0.209 .
  4. Carry out a hypothesis test at the \(5 \%\) level to examine the coach's claim, explaining your conclusions clearly.
  5. What distributional assumption is necessary for this test to be valid? How can you use a scatter diagram to decide whether this assumption is likely to be true?
Edexcel S1 2023 January Q6
  1. A research student is investigating the maximum weight, \(y\) grams, of sugar that will dissolve in 100 grams of water at various temperatures, \(x ^ { \circ } \mathrm { C }\), where \(10 \leqslant x \leqslant 80\)
The research student calculated the regression line of \(y\) on \(x\) and found it to be $$y = 151.2 + 2.72 x$$
  1. Give an interpretation of the gradient of the regression line.
  2. Use the regression line to estimate the maximum weight of sugar that will dissolve in 100 grams of water when the temperature is \(90 ^ { \circ } \mathrm { C }\).
  3. Comment on the reliability of your estimate, giving a reason for your answer. Using the regression line of \(y\) on \(x\) and the following summary statistics $$\sum y = 3119 \quad \sum y ^ { 2 } = 851093 \quad \sum x ^ { 2 } = 24500 \quad n = 12$$
  4. show that the product moment correlation coefficient for these data is 0.988 to 3 decimal places. The research student's supervisor plotted the original data on a scatter diagram, shown on page 23 With reference to both the scatter diagram and the correlation coefficient,
  5. discuss the suitability of a linear regression model to describe the relationship between \(x\) and \(y\).
    \includegraphics[max width=\textwidth, alt={}]{c316fa29-dedc-4890-bd82-31eb0bb819f9-23_990_1138_205_356}
Edexcel AS Paper 2 2019 June Q1
  1. A sixth form college has 84 students in Year 12 and 56 students in Year 13
The head teacher selects a stratified sample of 40 students, stratified by year group.
  1. Describe how this sample could be taken. The head teacher is investigating the relationship between the amount of sleep, s hours, that each student had the night before they took an aptitude test and their performance in the test, \(p\) marks.
    For the sample of 40 students, he finds the equation of the regression line of \(p\) on \(s\) to be $$p = 26.1 + 5.60 s$$
  2. With reference to this equation, describe the effect that an extra 0.5 hours of sleep may have, on average, on a student's performance in the aptitude test.
  3. Describe one limitation of this regression model.
Edexcel AS Paper 2 2022 June Q1
  1. The relationship between two variables \(p\) and \(t\) is modelled by the regression line with equation
$$p = 22 - 1.1 t$$ The model is based on observations of the independent variable, \(t\), between 1 and 10
  1. Describe the correlation between \(p\) and \(t\) implied by this model. Given that \(p\) is measured in centimetres and \(t\) is measured in days,
  2. state the units of the gradient of the regression line. Using the model,
  3. calculate the change in \(p\) over a 3-day period. Tisam uses this model to estimate the value of \(p\) when \(t = 19\)
  4. Comment, giving a reason, on the reliability of this estimate.
Edexcel Paper 3 2024 June Q2
  1. Amar is studying the flight of a bird from its nest.
He measures the bird's height above the ground, \(h\) metres, at time \(t\) seconds for 10 values of \(t\)
Amar finds the equation of the regression line for the data to be \(h = 38.6 - 1.28 t\)
  1. Interpret the gradient of this line. The product moment correlation coefficient between \(h\) and \(t\) is - 0.510
  2. Test whether or not there is evidence of a negative correlation between the height above the ground and the time during the flight.
    You should
    • state your hypotheses clearly
    • use a \(5 \%\) level of significance
    • state the critical value used
    Jane draws the following scatter diagram for Amar’s data.
    \includegraphics[max width=\textwidth, alt={}, center]{ab7f7951-e6fe-4853-bb69-8016cf3e796c-06_1024_1033_1135_516}
  3. With reference to the scatter diagram, state, giving a reason, whether or not the regression line \(h = 38.6 - 1.28 t\) is an appropriate model for these data. Jane suggests an improved model using the variable \(u = ( t - k ) ^ { 2 }\) where \(k\) is a constant.
    She obtains the equation \(h = 38.1 - 0.78 u\)
  4. Choose a suitable value for \(k\) to write Jane's improved model for \(h\) in terms of \(t\) only.
OCR MEI Further Statistics A AS 2018 June Q6
6 A researcher is investigating various bodily characteristics of frogs of various species. She collects data on length, \(x \mathrm {~mm}\), and head width, \(y \mathrm {~mm}\), of a random sample of 14 frogs of a particular species. A scatter diagram of the data is shown in Fig. 6, together with the equation of the regression line of \(y\) on \(x\) and also the value of \(r ^ { 2 }\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e3ac0ba0-9692-4018-894e-2b04b07eaf32-6_949_1616_450_228} \captionsetup{labelformat=empty} \caption{Fig. 6}
\end{figure}
  1. (A) Use the equation of the regression line to estimate the mean head width for frogs of each of the following lengths.
    • 45 mm
    • 60 mm
      (B) Comment briefly on each of the estimates in part (i)(A).
    • Explain how the mean length of frogs with head width 16 mm should be estimated.
    • Calculate the value of the product moment correlation coefficient.
    • In the light of the information in the scatter diagram, comment on the goodness of fit of the regression line.
OCR MEI Further Statistics A AS 2019 June Q6
6 A meteorologist is investigating the relationship between altitude \(x\) metres and mean annual temperature \(y ^ { \circ } \mathrm { C }\) in an American state.
She selects 12 locations at various altitudes and then stations a remote monitoring device at each of them to measure the temperature over the course of a year. Fig. 6 illustrates the data which she obtains. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-6_686_1477_486_292} \captionsetup{labelformat=empty} \caption{Fig. 6}
\end{figure}
  1. Explain why it would not be appropriate to carry out a hypothesis test for correlation based on the product moment correlation coefficient.
  2. Explain why altitude has been plotted on the horizontal axis in Fig. 6. Summary statistics for \(x\) and \(y\) are as follows. $$\sum x = 21200 \quad \sum y = 105.4 \quad \sum x ^ { 2 } = 39100000 \quad \sum y ^ { 2 } = 1004 \quad \sum x y = 176090$$
  3. Calculate the equation of the regression line of \(y\) on \(x\).
  4. Use the equation of the regression line to predict the values of the mean annual temperature at each of the following altitudes.
    • 2000 metres
    • 3000 metres
    • Comment on the reliability of your predictions in part (d).
    • Calculate the value of the residual for the data point ( \(1600,8.1\) ).
WJEC Further Unit 2 2022 June Q7
7. Data from a large dataset shows the percentage of children enrolled in secondary education and the percentage of the adult population who are literate. The following graphs show data from 30 randomly selected regions from each of the Arab World, Africa and Asia. In each case, the least squares regression line of '\% Literacy' on '\% Enrolled in Secondary Education' is shown.
\includegraphics[max width=\textwidth, alt={}, center]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-6_682_1200_584_395} \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Africa} \includegraphics[alt={},max width=\textwidth]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-6_623_1191_1548_397}
\end{figure} \includegraphics[max width=\textwidth, alt={}, center]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-7_665_1200_331_434}
  1. Calculate the equation of the least squares regression line of '\% Literacy' ( \(y\) ) on '\% Enrolled in Secondary Education' ( \(x\) ) for Asia, given the following summary statistics. $$\begin{array} { l l l } \sum x = 2850.836 & \sum y = 2738.656 & S _ { x x } = 88.42142
    S _ { y y } = 204.733 & S _ { x y } = 96.60984 & n = 30 \end{array}$$
  2. The Arab World, Africa and Asia each contain a region where \(70 \%\) are enrolled in secondary education. The three regression lines are used to estimate the corresponding \% Literacy. Which of these estimates is likely to be the most reliable? Clearly explain your reasoning. \section*{END OF PAPER}
Edexcel AS Paper 2 2018 June Q1
  1. A company is introducing a job evaluation scheme. Points ( \(x\) ) will be awarded to each job based on the qualifications and skills needed and the level of responsibility. Pay ( \(\pounds y\) ) will then be allocated to each job according to the number of points awarded.
Before the scheme is introduced, a random sample of 8 employees was taken and the linear regression equation of pay on points was \(y = 4.5 x - 47\)
  1. Describe the correlation between points and pay.
  2. Give an interpretation of the gradient of this regression line.
  3. Explain why this model might not be appropriate for all jobs in the company.