5.09e Use regression: for estimation in context

129 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 2017 October Q5
13 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. Last year's performance score \(x\) and annual salary \(y\), in thousands of dollars, were recorded for a random sample of 10 employees of the company.
The performance scores were $$\begin{array} { l l l l l l l l l l } 15 & 24 & 32 & 39 & 41 & 18 & 16 & 22 & 34 & 42 \end{array}$$ (You may use \(\sum x ^ { 2 } = 9011\) )
  1. Find the mean and the variance of these performance scores. The corresponding \(y\) values for these 10 employees are summarised by $$\sum y = 306.1 \quad \text { and } \quad \mathrm { S } _ { y y } = 546.3$$
  2. Find the mean and the variance of these \(y\) values. The regression line of \(y\) on \(x\) based on this sample is $$y = 12.0 + 0.659 x$$
  3. Find the product moment correlation coefficient for these data.
  4. State, giving a reason, whether or not the value of the product moment correlation coefficient supports the use of a regression line to model the relationship between performance score and annual salary. The company decides to use this regression model to determine future salaries.
  5. Find the proposed annual salary, in dollars, for an employee who has a performance score of 35
Edexcel S1 2003 June Q7
16 marks Moderate -0.8
  1. Eight students took tests in mathematics and physics. The marks for each student are given in the table below where \(m\) represents the mathematics mark and \(p\) the physics mark.
\multirow{2}{*}{}Student
\(A\)B\(C\)D\(E\)\(F\)G\(H\)
\multirow{2}{*}{Mark}\(m\)9141310782017
\(p\)1123211519103126
A science teacher believes that students' marks in physics depend upon their mathematical ability. The teacher decides to investigate this relationship using the test marks.
  1. Write down which is the explanatory variable in this investigation.
  2. Draw a scatter diagram to illustrate these data.
  3. Showing your working, find the equation of the regression line of \(p\) on \(m\).
  4. Draw the regression line on your scatter diagram. A ninth student was absent for the physics test, but she sat the mathematics test and scored 15 .
  5. Using this model, estimate the mark she would have scored in the physics test.
AQA S1 2005 January Q3
12 marks Moderate -0.8
3 [Figure 1, printed on the insert, is provided for use in this question.]
A parcel delivery company has a depot on the outskirts of a town. Each weekday, a van leaves the depot to deliver parcels across a nearby area. The table below shows, for a random sample of 10 weekdays, the number, \(x\), of parcels to be delivered and the total time, \(y\) minutes, that the van is out of the depot.
\(\boldsymbol { x }\)9162211192614101117
\(\boldsymbol { y }\)791271721091522141318094148
  1. On Figure 1, plot a scatter diagram of these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\) and draw your line on Figure 1.
  3. Use your regression equation to estimate the total time that the van is out of the depot when delivering:
    1. 15 parcels;
    2. 35 parcels. Comment on the likely reliability of each of your estimates.
  4. The time that the van is out of the depot delivering parcels may be thought of as the time needed to travel to and from the area plus an amount of time proportional to the number of parcels to be delivered. Given that the regression line of \(y\) on \(x\) is of the form \(y = a + b x\), give an interpretation, in context, for each of your values of \(a\) and \(b\).
    (2 marks)
AQA S1 2007 January Q7
15 marks Moderate -0.8
7 [Figure 1, printed on the insert, is provided for use in this question.]
Stan is a retired academic who supplements his pension by mowing lawns for customers who live nearby. As part of a review of his charges for this work, he measures the areas, \(x \mathrm {~m} ^ { 2 }\), of a random sample of eight of his customers' lawns and notes the times, \(y\) minutes, that it takes him to mow these lawns. His results are shown in the table.
Customer\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)
\(\boldsymbol { x }\)3601408606001180540260480
\(\boldsymbol { y }\)502513570140905570
  1. On Figure 1, plot a scatter diagram of these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\). Draw your line on Figure 1.
  3. Calculate the value of the residual for Customer H and indicate how your value is confirmed by your scatter diagram.
  4. Given that Stan charges \(\pounds 12\) per hour, estimate the charge for mowing a customer's lawn that has an area of \(560 \mathrm {~m} ^ { 2 }\).
AQA S1 2010 January Q3
8 marks Moderate -0.3
3 The table shows, for each of a random sample of 7 weeks, the number of customers, \(x\), who purchased fuel from a filling station, together with the total volume, \(y\) litres, of fuel purchased by these customers.
\(\boldsymbol { x }\)230184165147241174210
\(\boldsymbol { y }\)4551341032523756378740244254
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. Estimate the volume of fuel sold during a week in which 200 customers purchase fuel.
  3. Comment on the likely reliability of your estimate in part (b), given that, for the regression line calculated in part (a), the values of the 7 residuals lie between approximately - 415 litres and + 430 litres.
AQA S1 2005 June Q4
12 marks Moderate -0.8
4 The time taken for a fax machine to scan an A4 sheet of paper is dependent, in part, on the number of lines of print on the sheet. The table below shows, for each of a random sample of 8 sheets of A4 paper, the number, \(x\), of lines of print and the scanning time, \(y\) seconds, taken by the fax machine.
Sheet\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)
\(\boldsymbol { x }\)1016232731353844
\(\boldsymbol { y }\)2.43.53.24.14.15.64.65.3
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. The following table lists some of the residuals for the regression line.
    Sheet\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)
    Residual- 0.1740.4180.085- 0.2540.906- 0.157
    1. Calculate the values of the residuals for sheets 3 and 7 .
    2. Hence explain what can be deduced about the regression line.
  3. The time, \(z\) seconds, to transmit an A4 page after scanning is given by: $$z = 0.80 + 0.05 x$$ Estimate the total time to scan and transmit an A4 page containing:
    1. 15 lines of print;
    2. 75 lines of print. In each case comment on the likely reliability of your estimate.
AQA S1 2006 June Q3
11 marks Moderate -0.8
3 A new car tyre is fitted to a wheel. The tyre is inflated to its recommended pressure of 265 kPa and the wheel left unused. At 3-month intervals thereafter, the tyre pressure is measured with the following results:
Time after fitting
\(( x\) months \()\)
03691215182124
Tyre pressure
\(( y\) kPa \()\)
265250240235225215210195180
    1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
    2. Interpret in context the value for the gradient of your line.
    3. Comment on the value for the intercept with the \(y\)-axis of your line.
  1. The tyre manufacturer states that, when one of these new tyres is fitted to the wheel of a car and then inflated to 265 kPa , a suitable regression equation is of the form $$y = 265 + b x$$ The manufacturer also states that, as the car is used, the tyre pressure will decrease at twice the rate of that found in part (a).
    1. Suggest a suitable value for \(b\).
    2. One of these new tyres is fitted to the wheel of a car and inflated to 265 kPa . The car is then used for 8 months, after which the tyre pressure is checked for the first time. Show that, accepting the manufacturer's statements, the tyre pressure can be expected to have fallen below its minimum safety value of 220 kPa .
      (2 marks)
AQA S1 2015 June Q5
11 marks Moderate -0.8
5 The table shows the number of customers, \(x\), and the takings, \(\pounds y\), recorded to the nearest \(\pounds 10\), at a local butcher's shop on each of 10 randomly selected weekdays.
\(\boldsymbol { x }\)86606546719356817557
\(\boldsymbol { y }\)9407906205307701050690780860550
  1. The first 6 pairs of data values in this table are plotted on the scatter diagram shown on the opposite page. Plot the final 4 pairs of data values on the scatter diagram.
    1. Calculate the equation of the least squares regression line in the form \(y = a + b x\) and draw your line on the scatter diagram.
    2. Interpret your value for \(b\) in the context of the question.
    3. State why your value for \(a\) has no practical interpretation.
  2. Estimate, to the nearest \(\pounds 10\), the shop's takings when the number of customers is 50 .
    [0pt] [1 mark]
    \includegraphics[max width=\textwidth, alt={}]{4c679380-894f-4d36-aec8-296b662058e2-14_1255_1705_1448_155}
    Butcher's shop \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Answer space for question 5} \includegraphics[alt={},max width=\textwidth]{4c679380-894f-4d36-aec8-296b662058e2-15_2335_1760_372_100}
    \end{figure}
AQA S1 2015 June Q4
15 marks Moderate -0.3
4 Stephan is a roofing contractor who is often required to replace loose ridge tiles on house roofs. In order to help him to quote more accurately the prices for such jobs in the future, he records, for each of 11 recently repaired roofs, the number of ridge tiles replaced, \(x _ { i }\), and the time taken, \(y _ { i }\) hours. His results are shown in the table.
Roof \(( \boldsymbol { i } )\)\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)\(\mathbf { 1 1 }\)
\(\boldsymbol { x } _ { \boldsymbol { i } }\)811141416202222252730
\(\boldsymbol { y } _ { \boldsymbol { i } }\)5.05.26.37.28.08.810.611.011.812.113.0
  1. The pairs of data values for roofs 1 to 7 are plotted on the scatter diagram shown on the opposite page. Plot the 4 pairs of data values for roofs 8 to 11 on the scatter diagram.
    1. Calculate the equation of the least squares regression line of \(y _ { i }\) on \(x _ { i }\), and draw your line on the scatter diagram.
    2. Interpret your values for the gradient and for the intercept of this regression line.
  2. Estimate the time that it would take Stephan to replace 15 loose ridge tiles on a house roof.
  3. Given that \(r _ { i }\) denotes the residual for the point representing roof \(i\) :
    1. calculate the value of \(r _ { 6 }\);
    2. state why the value of \(\sum _ { i = 1 } ^ { 11 } r _ { i }\) gives no useful information about the connection between the number of ridge tiles replaced and the time taken.
      [0pt] [1 mark]
      \section*{Answer space for question 4}
      \includegraphics[max width=\textwidth, alt={}]{6fbb8891-e6de-42fe-a195-ea643552fdcf-11_2385_1714_322_155}
OCR S1 Q4
8 marks Moderate -0.3
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54. State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate. \section*{June 2005}
OCR MEI Further Statistics Major Specimen Q3
11 marks Standard +0.3
3 A researcher is investigating factors that might affect how many hours per day different species of mammals spend asleep. First she investigates human beings. She collects data on body mass index, \(x\), and hours of sleep, \(y\), for a random sample of people. A scatter diagram of the data is shown in Fig. 3.1 together with the regression line of \(y\) on \(x\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-04_885_1584_598_274} \captionsetup{labelformat=empty} \caption{Fig. 3.1}
\end{figure}
  1. Calculate the residual for the data point which has the residual with the greatest magnitude.
  2. Use the equation of the regression line to estimate the mean number of hours spent asleep by a person with body mass index
    (A) 26,
    (B) 16,
    commenting briefly on each of your predictions. The researcher then collects additional data for a large number of species of mammals and analyses different factors for effect size. Definitions of the variables measured for a typical animal of the species, the correlations between these variables, and guidelines often used when considering effect size are given in Fig. 3.2.
    VariableDefinition
    Body massMass of animal in kg
    Brain massMass of brain in g
    Hours of sleep/dayNumber of hours per day spent asleep
    Life spanHow many years the animal lives
    DangerA measure of how dangerous the animal's situation is when asleep, taking into account predators and how protected the animal's den is: higher value indicates greater danger.
    Correlations (pmcc)Body MassBrain MassHours of sleep/dayLife spanDanger
    Body Mass1.00
    Brain Mass0.931.00
    Hours of sleep/day-0.31-0.361.00
    Life span0.300.51-0.411.00
    Danger0.130.15-0.590.061.00
    \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Fig. 3.2}
    \end{table}
  3. State two conclusions the researcher might draw from these tables, relevant to her investigation into how many hours mammals spend asleep. One of the researcher's students notices the high correlation between body mass and brain mass and produces a scatter diagram for these two variables, shown in Fig. 3.3 below. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-05_675_698_1802_735} \captionsetup{labelformat=empty} \caption{Fig. 3.3}
    \end{figure}
  4. Comment on the suitability of a linear model for these two variables.
Edexcel S1 2024 October Q2
Moderate -0.8
  1. A biologist records the length, \(y \mathrm {~cm}\), and the weight, \(w \mathrm {~kg}\), of 50 rabbits. The following summary statistics are calculated from these data.
$$\sum y = 2015 \quad \sum y ^ { 2 } = 81938.5 \quad \sum w = 125 \quad \mathrm {~S} _ { w w } = 72.25 \quad \mathrm {~S} _ { y w } = 219.55$$
    1. Show that \(\mathrm { S } _ { y y } = 734\)
    2. Calculate the product moment correlation coefficient for these data. Give your answer to 3 decimal places.
  1. Interpret your value of the product moment correlation coefficient. The biologist believes that a linear regression model may be appropriate to describe these data.
  2. State, with a reason, whether or not your value of the product moment correlation coefficient is consistent with the biologist’s belief.
  3. Find the equation of the regression line of \(w\) on \(y\), giving your answer in the form \(w = a + b y\) Jeff has a pet rabbit of length 45 cm .
  4. Use your regression equation to estimate the weight of Jeff's rabbit.
Pre-U Pre-U 9794/3 2017 June Q2
9 marks Moderate -0.8
2 The table shows the turnover, in millions of pounds, of a small company at 3-year intervals over a period of 15 years, starting in 2000.
Year since 200003691215
Turnover ( \(\pounds\) millions)2.302.943.373.974.936.13
  1. For the information in the table find the equation of the least squares regression line of \(y\) on \(x\), where \(x\) is the year since 2000 and \(y\) is the turnover in millions of pounds.
  2. Use the equation of the regression line to calculate the residual for 2009.
  3. Use the equation of the regression line to estimate the turnover in 2024, and explain why it is inadvisable to rely on this estimate.
Pre-U Pre-U 9794/3 2018 June Q2
9 marks Moderate -0.3
2 A teacher is monitoring the progress of students. The length of time, \(x\) hours, spent revising in a given week is compared to the score, \(y\), achieved in an assessment at the end of the week. The scatter diagram for a random sample of 8 students is shown below. \includegraphics[max width=\textwidth, alt={}, center]{35d24778-1203-4d5d-be4b-bb375344fe09-2_866_967_715_589} The data are summarised as \(\Sigma x = 24.6 , \Sigma y = 404 , \Sigma x ^ { 2 } = 105.56 , \Sigma y ^ { 2 } = 20820\) and \(\Sigma x y = 1350.2\).
  1. Find the equation of the least squares regression line of \(y\) on \(x\).
  2. Calculate the product moment correlation coefficient for the data.
  3. A ninth student, Jane, revises for 1.5 hours.
    1. Estimate her score in the assessment.
    2. Comment on the reliability of this estimate.
Edexcel S1 2023 June Q2
13 marks Moderate -0.3
Two students, Olive and Shan, collect data on the weight, \(w\) grams, and the tail length, \(t\) cm, of 15 mice. Olive summarised the data as follows \(S_tt = 5.3173\) \quad \(\sum w^2 = 6089.12\) \quad \(\sum tw = 2304.53\) \quad \(\sum w = 297.8\) \quad \(\sum t = 114.8\)
  1. Calculate the value of \(S_{ww}\) and the value of \(S_{tw}\) [3]
  2. Calculate the value of the product moment correlation coefficient between \(w\) and \(t\) [2]
  3. Show that the equation of the regression line of \(w\) on \(t\) can be written as $$w = -16.7 + 4.77t$$ [3]
  4. Give an interpretation of the gradient of the regression line. [1]
  5. Explain why it would not be appropriate to use the regression line in part (c) to estimate the weight of a mouse with a tail length of 2cm. [2]
Shan decided to code the data using \(x = t - 6\) and \(y = \frac{w}{2} - 5\)
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\) [1]
  2. Write down an equation of the regression line of \(y\) on \(x\) You do not need to simplify your equation. [1]
Edexcel S1 2011 June Q7
12 marks Moderate -0.8
A teacher took a random sample of 8 children from a class. For each child the teacher recorded the length of their left foot, \(f\) cm, and their height, \(h\) cm. The results are given in the table below.
\(f\)2326232227242021
\(h\)135144134136140134130132
(You may use \(\sum f = 186 \quad \sum h = 1085 \quad S_{ff} = 39.5 \quad S_{hh} = 139.875 \quad \sum fh = 25291\))
  1. Calculate \(S_{fh}\) [2]
  2. Find the equation of the regression line of \(h\) on \(f\) in the form \(h = a + bf\). Give the value of \(a\) and the value of \(b\) correct to 3 significant figures. [5]
  3. Use your equation to estimate the height of a child with a left foot length of 25 cm. [2]
  4. Comment on the reliability of your estimate in (c), giving a reason for your answer. [2]
The left foot length of the teacher is 25 cm.
  1. Give a reason why the equation in (b) should not be used to estimate the teacher's height. [1]
Edexcel S1 Specimen Q4
14 marks Moderate -0.3
A drilling machine can run at various speeds, but in general the higher the speed the sooner the drill needs to be replaced. Over several months, 15 pairs of observations relating to speed, \(s\) revolutions per minute, and life of drill, \(h\) hours, are collected. For convenience the data are coded so that \(x = s - 20\) and \(y = h - 100\) and the following summations obtained. \(\Sigma x = 143; \Sigma y = 391; \Sigma x^2 = 2413; \Sigma y^2 = 22441; \Sigma xy = 484\).
  1. Find the equation of the regression line of \(h\) on \(s\). [10]
  2. Interpret the slope of your regression line. [2]
Estimate the life of a drill revolving at 30 revolutions per minute. [2]
Edexcel S1 Q3
13 marks Moderate -0.3
The marks obtained by ten students in a Geography test and a History test were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Geography (\(x\))34574921845310776185
History (\(y\))404955407139476573
  1. Given that \(\sum y = 547\), calculate the mark obtained by student \(E\) in History. [1 mark] Given further that \(\sum x^2 = 34087\), \(\sum y^2 = 31575\) and \(\sum xy = 31342\), calculate
  2. the product moment correlation coefficient between \(x\) and \(y\), [4 marks]
  3. an equation of the regression line of \(y\) on \(x\), [4 marks]
  4. an estimate of the History mark of student \(K\), who scored 70 in Geography. [2 marks]
  5. State, with a reason, whether you would expect your answer to part (d) to be reliable. [2 marks]
Edexcel S1 Q5
13 marks Standard +0.3
The following marks out of 50 were given by two judges to the contestants in a talent contest:
Contestant\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Judge 1 (\(x\))4332402147112938
Judge 2 (\(y\))3925402236132732
Given that \(\sum x = 261\), \(\sum x^2 = 9529\) and \(\sum xy = 8373\),
  1. calculate the product-moment correlation coefficient between the two judges' marks [5 marks]
  2. Find an equation of the regression line of \(x\) on \(y\). [4 marks]
Contestant \(I\) was awarded 45 marks by Judge 2.
  1. Estimate the mark that this contestant would have received from Judge 1. [2 marks]
  2. Comment, with explanation, on the probable accuracy of your answer. [2 marks]
Edexcel S1 Q6
15 marks Standard +0.3
The marks out of 75 obtained by a group of ten students in their first and second Statistics modules were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Module 1 \((x)\)\(54\)\(33\)\(42\)\(71\)\(60\)\(27\)\(39\)\(46\)\(59\)\(64\)
Module 2 \((y)\)\(50\)\(22\)\(44\)\(58\)\(42\)\(19\)\(35\)\(46\)\(55\)\(60\)
  1. Find \(\sum x\) and \(\sum y\). [2 marks]
Given that \(\sum x^2 = 26353\) and \(\sum xy = 22991\),
  1. obtain the equation of the regression line of \(y\) on \(x\). [5 marks]
  2. Estimate the Module 2 result of a student whose mark in Module 1 was (i) 65, (ii) 5. Explain why one of these estimates is less reliable than the other. [4 marks]
The equation of the regression line of \(x\) on \(y\) is \(x = 0.921y + 9.81\).
  1. Deduce the product moment correlation coefficient between \(x\) and \(y\), and briefly interpret its value. [4 marks]
OCR S1 2010 June Q3
10 marks Moderate -0.8
  1. Some values, \((x, y)\), of a bivariate distribution are plotted on a scatter diagram and a regression line is to be drawn. Explain how to decide whether the regression line of \(y\) on \(x\) or the regression line of \(x\) on \(y\) is appropriate. [2]
  2. In an experiment the temperature, \(x\) °C, of a rod was gradually increased from 0 °C, and the extension, \(y\), was measured nine times at 50 °C intervals. The results are summarised below. \(n = 9\) \quad \(\Sigma x = 1800\) \quad \(\Sigma y = 14.4\) \quad \(\Sigma x^2 = 510000\) \quad \(\Sigma y^2 = 32.6416\) \quad \(\Sigma xy = 4080\)
    1. Show that the gradient of the regression line of \(y\) on \(x\) is 0.008 and find the equation of this line. [4]
    2. Use your equation to estimate the temperature when the extension is 2.5 mm. [1]
    3. Use your equation to estimate the extension for a temperature of \(-50\) °C. [1]
    4. Comment on the meaning and the reliability of your estimate in part (c). [2]
OCR S1 2013 June Q5
9 marks Moderate -0.3
The table shows some of the values of the seasonally adjusted Unemployment Rate (UR), \(x\)\%, and the Consumer Price Index (CPI), \(y\)\%, in the United Kingdom from April 2008 to July 2010.
DateApril 2008July 2008October 2008January 2009April 2009July 2009October 2009January 2010April 2010July 2010
UR, \(x\)\%5.25.76.16.87.57.87.87.97.87.7
CPI, \(y\)\%3.04.44.53.02.31.81.53.53.73.1
These data are summarised below. $$n = 10 \quad \sum x = 70.3 \quad \sum x^2 = 503.45 \quad \sum y = 30.8 \quad \sum y^2 = 103.94 \quad \sum xy = 211.9$$
  1. Calculate the product moment correlation coefficient, \(r\), for the data, showing that \(-0.6 < r < -0.5\). [3]
  2. Karen says "The negative value of \(r\) shows that when the Unemployment Rate increases, it causes the Consumer Price Index to decrease." Give a criticism of this statement. [1]
    1. Calculate the equation of the regression line of \(x\) on \(y\). [3]
    2. Use your equation to estimate the value of the Unemployment Rate in a month when the Consumer Price Index is 4.0\%. [2]
Edexcel S1 Q6
17 marks Moderate -0.3
Penshop have stores selling stationary in each of 6 towns. The population, \(P\), in tens of thousands and the monthly turnover, \(T\), in thousands of pounds for each of the shops are as recorded below.
TownAbbertonBemberClasterDellerEdgetonFigland
\(P\) (0.000's)3.27.65.29.08.14.8
\(T\) (£ 000's)11.112.413.319.317.911.8
  1. Represent these data on a scatter diagram with \(T\) on the vertical axis. [4]
    1. Which town's shop might appear to be underachieving given the populations of the towns?
    2. Suggest two other factors that might affect each shop's turnover. [3]
You may assume that $$\Sigma P = 37.9, \quad \Sigma T = 85.8, \quad \Sigma P^2 = 264.69, \quad \Sigma T^2 = 1286, \quad \Sigma PT = 574.25.$$
  1. Find the equation of the regression line of \(T\) on \(P\). [7]
  2. Estimate the monthly turnover that might be expected if a shop were opened in Gratton, a town with a population of 68 000. [2]
  3. Why might the management of Penshop be reluctant to use the regression line to estimate the monthly turnover they could expect if a shop were opened in Haggin, a town with a population of 172 000? [1]
Edexcel S1 Q4
12 marks Standard +0.3
The owner of a mobile burger-bar believes that hot weather reduces his sales. To investigate the effect on his business he collected data on his daily sales, \(£P\), and the maximum temperature, \(T\)°C, on each of 20 days. He then coded the data, using \(x = T - 20\) and \(y = P - 300\), and calculated the summary statistics given below. $$\Sigma x = 57, \quad \Sigma y = 2222, \quad \Sigma x^2 = 401, \quad \Sigma y^2 = 305576, \quad \Sigma xy = 3871.$$
  1. Find an equation of the regression line of \(P\) on \(T\). [9 marks]
The owner of the bar doesn't believe it is profitable for him to run the bar if he takes less than £460 in a day.
  1. According to your regression line at what maximum daily temperature, to the nearest degree Celsius, does it become unprofitable for him to run the bar? [3 marks]
OCR MEI S2 2007 January Q1
18 marks Moderate -0.8
In a science investigation into energy conservation in the home, a student is collecting data on the time taken for an electric kettle to boil as the volume of water in the kettle is varied. The student's data are shown in the table below, where \(v\) litres is the volume of water in the kettle and \(t\) seconds is the time taken for the kettle to boil (starting with the water at room temperature in each case). Also shown are summary statistics and a scatter diagram on which the regression line of \(t\) on \(v\) is drawn.
\(v\)0.20.40.60.81.0
\(t\)4478114156172
\(n = 5\), \(\Sigma v = 3.0\), \(\Sigma t = 564\), \(\Sigma v^2 = 2.20\), \(\Sigma vt = 405.2\). \includegraphics{figure_1}
  1. Calculate the equation of the regression line of \(t\) on \(v\), giving your answer in the form \(t = a + bv\). [5]
  2. Use this equation to predict the time taken for the kettle to boil when the amount of water which it contains is
    1. 0.5 litres,
    2. 1.5 litres.
    Comment on the reliability of each of these predictions. [4]
  3. In the equation of the regression line found in part (i), explain the role of the coefficient of \(v\) in the relationship between time taken and volume of water. [2]
  4. Calculate the values of the residuals for \(v = 0.8\) and \(v = 1.0\). [4]
  5. Explain how, on a scatter diagram with the regression line drawn accurately on it, a residual could be measured and its sign determined. [3]