Calculate y on x from raw data table

Questions that provide raw bivariate data in a table and ask to find the regression line of y on x.

66 questions

Edexcel S1 Q6
6. Penshop have stores selling stationary in each of 6 towns. The population, \(P\), in tens of thousands and the monthly turnover, \(T\), in thousands of pounds for each of the shops are as recorded below.
TownAbbertonBemberClasterDellerEdgetonFigland
\(P\) (0000's)3.27.65.29.08.14.8
T (£ 000's)11.112.413.319.317.911.8
  1. Represent these data on a scatter diagram with \(T\) on the verical axis.
    1. Which town's shop might appear to be underachieving given the populations of the towns?
    2. Suggest two other factors that might affect each shop's turnover. You may assume that $$\Sigma P = 37.9 , \quad \Sigma T = 85.8 , \quad \Sigma P ^ { 2 } = 264.69 , \quad \Sigma T ^ { 2 } = 1286 , \quad \Sigma P T = 574.25 .$$
  2. Find the equation of the regression line of \(T\) on \(P\).
  3. Estimate the monthly turnover that might be expected if a shop were opened in Gratton, a town with a population of 68000.
  4. Why might the management of Penshop be reluctant to use the regression line to estimate the monthly turnover they could expect if a shop were opened in Haggin, a town with a population of 172000 ?
Edexcel S1 Q7
7. A doctor wished to investigate the effects of staying awake for long periods on a person's ability to complete simple tasks. She recorded the number of times, \(n\), that a subject could clinch his or her fist in 30 seconds after being awake for \(h\) hours. The results for one subject were as follows.
\(h\) (hours)161718192021222324
\(n\)1161141091019494868180
  1. Plot a scatter diagram of \(n\) against \(h\) for these results. You may use $$\Sigma h = 180 , \quad \Sigma n = 875 , \quad \Sigma h ^ { 2 } = 3660 , \quad \Sigma h n = 17204 .$$
  2. Obtain the equation of the regression line of \(n\) on \(h\) in the form \(n = a + b h\).
  3. Give a practical interpretation of the constant b.
  4. Explain why this regression line would be unlikely to be appropriate for values of \(h\) between 0 and 16 .
    (2 marks)
    Another subject underwent the same tests giving rise to a regression line of \(n = 213.4 - 5.87\) h
  5. After how many hours of being awake together would you expect these two subjects to be able to clench their fists the same number of times in 30 seconds?
Edexcel S1 Q6
6. A school introduced a new programme of support lessons in 1994 with a view to improving grades in GCSE English. The table below shows the number of years since 1994, n, and the corresponding percentage of students achieving A to C grades in GCSE English, \(p\), for each year.
\(n\)123456
\(p ( \% )\)35.237.140.639.043.444.8
  1. Represent these data on a scatter diagram. You may use the following values. $$\Sigma n = 21 , \quad \Sigma p = 240.1 , \quad \Sigma n ^ { 2 } = 91 , \quad \Sigma p ^ { 2 } = 9675.41 , \quad \Sigma n p = 873 .$$
  2. Find an equation of the regression line of \(p\) on \(n\) and draw it on your graph.
  3. Calculate the product moment correlation coefficient for these data and comment on the suitability of a linear model for the relationship between \(n\) and \(p\) during this period.
Edexcel S1 Q7
7. Pipes-R-us manufacture a special lightweight aluminium tubing. The price \(\pounds P\), for each length, \(l\) metres, that the company sells is shown in the table.
\(l\) (metres)0.50.81.01.5246
\(P ( \pounds )\)2.503.404.005.206.0010.5015.00
  1. Represent these data on a scatter diagram. You may use $$\Sigma l = 15.8 , \quad \Sigma P = 46.6 , \quad \Sigma l ^ { 2 } = 60.14 , \quad \Sigma l P = 159.77$$
  2. Find the equation of the regression line of \(P\) on \(l\) in the form \(P = a + b l\).
  3. Give a practical interpretation of the constant b. In response to customer demand Pipes- \(R\)-us decide to start selling tubes cut to specific lengths. Initially the company decides to use the regression line found in part (b) as a pricing formula for this new service.
  4. Calculate the price that Pipes- \(R\)-us should charge for 5.2 metres of the tubing.
  5. Suggest a reason why Pipes- \(R\)-us might not offer prices based on the regression line for any length of tubing.
Edexcel S1 Q6
6. A physics student recorded the length, \(l \mathrm {~cm}\), of a spring when different masses, \(m\) grams, were suspended from it giving the following results.
\(m ( \mathrm {~g} )\)50100200300400500600700
\(l ( \mathrm {~cm} )\)7.810.716.522.128.033.935.235.6
  1. Represent these data on a scatter diagram with \(l\) on the vertical axis. The student decides to find the equation of a regression line of the form \(l = a + b m\) using only the data for \(m \leq 500 \mathrm {~g}\).
  2. Give a reason to support the fitting of such a regression line and explain why the student is excluding two of his values.
    (2 marks)
    You may use $$\Sigma m = 1550 , \quad \Sigma l = 119 , \quad \Sigma m ^ { 2 } = 552500 , \quad \Sigma l ^ { 2 } = 2869.2 , \quad \Sigma m l = 39540 .$$
  3. Find the values of \(a\) and \(b\).
  4. Explain the significance of the values of \(a\) and \(b\) in this situation.
OCR MEI Further Statistics Major 2022 June Q5
5 A motorist is investigating the relationship between tyre pressure and temperature. As the temperature increases during a hot day, she records the pressure (measured in bars) of one of her car tyres at specific temperatures of \(20 ^ { \circ } \mathrm { C } , 22 ^ { \circ } \mathrm { C } , \ldots , 36 ^ { \circ } \mathrm { C }\). The results are shown in Table 5.1. \begin{table}[h]
Temperature \(\left( t ^ { \circ } \mathrm { C } \right)\)202224262830323436
Tyre pressure \(( P\) bar \()\)2.0122.0362.0652.0742.1142.1402.1492.1762.192
\captionsetup{labelformat=empty} \caption{Table 5.1}
\end{table}
  1. Calculate the equation of the regression line of pressure on temperature. Give your answer in the form \(P = a t + b\), giving the values of \(a\) and \(b\) to \(\mathbf { 4 }\) significant figures.
  2. Table 5.2 shows the residuals for most of the data values. Complete the copy of the table in the Printed Answer Booklet. \begin{table}[h]
    Temperature202224262830323436
    Residual tyre
    pressure
    - 0.003- 0.0020.004- 0.0100.011- 0.0030.001
    \captionsetup{labelformat=empty} \caption{Table 5.2}
    \end{table}
  3. With reference to the values of the residuals, comment on the goodness of fit of the regression line.
  4. Use your answer to part (a) to calculate an estimate of the pressure in the tyre at each of the following temperatures, giving your answers to \(\mathbf { 3 }\) decimal places.
    • \(25 ^ { \circ } \mathrm { C }\)
    • \(10 ^ { \circ } \mathrm { C }\)
    • Comment on the reliability of each of your estimates.
Edexcel FS2 Specimen Q6
  1. A random sample of 10 female pigs was taken. The number of piglets, \(x\), born to each female pig and their average weight at birth, \(m \mathrm {~kg}\), was recorded. The results were as follows:
Number of piglets, \(\boldsymbol { x }\)45678910111213
Average weight at
birth, \(\boldsymbol { m } \mathbf { ~ k g }\)
1.501.201.401.401.231.301.201.151.251.15
(You may use \(\mathrm { S } _ { x x } = 82.5\) and \(\mathrm { S } _ { m m } = 0.12756\) and \(\mathrm { S } _ { x m } = - 2.29\) )
  1. Find the equation of the regression line of \(m\) on \(x\) in the form \(m = a + b x\) as a model for these results.
  2. Show that the residual sum of squares (RSS) is 0.064 to 3 decimal places.
  3. Calculate the residual values.
  4. Write down the outlier.
    1. Comment on the validity of ignoring this outlier.
    2. Ignoring the outlier, produce another model.
    3. Use this model to estimate the average weight at birth if \(x = 15\)
    4. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 2003 June Q7
  1. Eight students took tests in mathematics and physics. The marks for each student are given in the table below where \(m\) represents the mathematics mark and \(p\) the physics mark.
\multirow{2}{*}{}Student
\(A\)B\(C\)D\(E\)\(F\)G\(H\)
\multirow{2}{*}{Mark}\(m\)9141310782017
\(p\)1123211519103126
A science teacher believes that students' marks in physics depend upon their mathematical ability. The teacher decides to investigate this relationship using the test marks.
  1. Write down which is the explanatory variable in this investigation.
  2. Draw a scatter diagram to illustrate these data.
  3. Showing your working, find the equation of the regression line of \(p\) on \(m\).
  4. Draw the regression line on your scatter diagram. A ninth student was absent for the physics test, but she sat the mathematics test and scored 15 .
  5. Using this model, estimate the mark she would have scored in the physics test.
AQA S1 2005 January Q3
3 [Figure 1, printed on the insert, is provided for use in this question.]
A parcel delivery company has a depot on the outskirts of a town. Each weekday, a van leaves the depot to deliver parcels across a nearby area. The table below shows, for a random sample of 10 weekdays, the number, \(x\), of parcels to be delivered and the total time, \(y\) minutes, that the van is out of the depot.
\(\boldsymbol { x }\)9162211192614101117
\(\boldsymbol { y }\)791271721091522141318094148
  1. On Figure 1, plot a scatter diagram of these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\) and draw your line on Figure 1.
  3. Use your regression equation to estimate the total time that the van is out of the depot when delivering:
    1. 15 parcels;
    2. 35 parcels. Comment on the likely reliability of each of your estimates.
  4. The time that the van is out of the depot delivering parcels may be thought of as the time needed to travel to and from the area plus an amount of time proportional to the number of parcels to be delivered. Given that the regression line of \(y\) on \(x\) is of the form \(y = a + b x\), give an interpretation, in context, for each of your values of \(a\) and \(b\).
    (2 marks)
AQA S1 2007 January Q7
7 [Figure 1, printed on the insert, is provided for use in this question.]
Stan is a retired academic who supplements his pension by mowing lawns for customers who live nearby. As part of a review of his charges for this work, he measures the areas, \(x \mathrm {~m} ^ { 2 }\), of a random sample of eight of his customers' lawns and notes the times, \(y\) minutes, that it takes him to mow these lawns. His results are shown in the table.
Customer\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)
\(\boldsymbol { x }\)3601408606001180540260480
\(\boldsymbol { y }\)502513570140905570
  1. On Figure 1, plot a scatter diagram of these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\). Draw your line on Figure 1.
  3. Calculate the value of the residual for Customer H and indicate how your value is confirmed by your scatter diagram.
  4. Given that Stan charges \(\pounds 12\) per hour, estimate the charge for mowing a customer's lawn that has an area of \(560 \mathrm {~m} ^ { 2 }\).
AQA S1 2010 January Q3
3 The table shows, for each of a random sample of 7 weeks, the number of customers, \(x\), who purchased fuel from a filling station, together with the total volume, \(y\) litres, of fuel purchased by these customers.
\(\boldsymbol { x }\)230184165147241174210
\(\boldsymbol { y }\)4551341032523756378740244254
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. Estimate the volume of fuel sold during a week in which 200 customers purchase fuel.
  3. Comment on the likely reliability of your estimate in part (b), given that, for the regression line calculated in part (a), the values of the 7 residuals lie between approximately - 415 litres and + 430 litres.
AQA S1 2005 June Q4
4 The time taken for a fax machine to scan an A4 sheet of paper is dependent, in part, on the number of lines of print on the sheet. The table below shows, for each of a random sample of 8 sheets of A4 paper, the number, \(x\), of lines of print and the scanning time, \(y\) seconds, taken by the fax machine.
Sheet\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)
\(\boldsymbol { x }\)1016232731353844
\(\boldsymbol { y }\)2.43.53.24.14.15.64.65.3
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. The following table lists some of the residuals for the regression line.
    Sheet\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)
    Residual- 0.1740.4180.085- 0.2540.906- 0.157
    1. Calculate the values of the residuals for sheets 3 and 7 .
    2. Hence explain what can be deduced about the regression line.
  3. The time, \(z\) seconds, to transmit an A4 page after scanning is given by: $$z = 0.80 + 0.05 x$$ Estimate the total time to scan and transmit an A4 page containing:
    1. 15 lines of print;
    2. 75 lines of print. In each case comment on the likely reliability of your estimate.
AQA S1 2006 June Q3
3 A new car tyre is fitted to a wheel. The tyre is inflated to its recommended pressure of 265 kPa and the wheel left unused. At 3-month intervals thereafter, the tyre pressure is measured with the following results:
Time after fitting
\(( x\) months \()\)
03691215182124
Tyre pressure
\(( y\) kPa \()\)
265250240235225215210195180
    1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
    2. Interpret in context the value for the gradient of your line.
    3. Comment on the value for the intercept with the \(y\)-axis of your line.
  1. The tyre manufacturer states that, when one of these new tyres is fitted to the wheel of a car and then inflated to 265 kPa , a suitable regression equation is of the form $$y = 265 + b x$$ The manufacturer also states that, as the car is used, the tyre pressure will decrease at twice the rate of that found in part (a).
    1. Suggest a suitable value for \(b\).
    2. One of these new tyres is fitted to the wheel of a car and inflated to 265 kPa . The car is then used for 8 months, after which the tyre pressure is checked for the first time. Show that, accepting the manufacturer's statements, the tyre pressure can be expected to have fallen below its minimum safety value of 220 kPa .
      (2 marks)
AQA S1 2015 June Q5
1 marks
5 The table shows the number of customers, \(x\), and the takings, \(\pounds y\), recorded to the nearest \(\pounds 10\), at a local butcher's shop on each of 10 randomly selected weekdays.
\(\boldsymbol { x }\)86606546719356817557
\(\boldsymbol { y }\)9407906205307701050690780860550
  1. The first 6 pairs of data values in this table are plotted on the scatter diagram shown on the opposite page. Plot the final 4 pairs of data values on the scatter diagram.
    1. Calculate the equation of the least squares regression line in the form \(y = a + b x\) and draw your line on the scatter diagram.
    2. Interpret your value for \(b\) in the context of the question.
    3. State why your value for \(a\) has no practical interpretation.
  2. Estimate, to the nearest \(\pounds 10\), the shop's takings when the number of customers is 50 .
    [0pt] [1 mark]
    \includegraphics[max width=\textwidth, alt={}]{4c679380-894f-4d36-aec8-296b662058e2-14_1255_1705_1448_155}
    Butcher's shop \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Answer space for question 5} \includegraphics[alt={},max width=\textwidth]{4c679380-894f-4d36-aec8-296b662058e2-15_2335_1760_372_100}
    \end{figure}
AQA S1 2015 June Q4
1 marks
4 Stephan is a roofing contractor who is often required to replace loose ridge tiles on house roofs. In order to help him to quote more accurately the prices for such jobs in the future, he records, for each of 11 recently repaired roofs, the number of ridge tiles replaced, \(x _ { i }\), and the time taken, \(y _ { i }\) hours. His results are shown in the table.
Roof \(( \boldsymbol { i } )\)\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)\(\mathbf { 1 1 }\)
\(\boldsymbol { x } _ { \boldsymbol { i } }\)811141416202222252730
\(\boldsymbol { y } _ { \boldsymbol { i } }\)5.05.26.37.28.08.810.611.011.812.113.0
  1. The pairs of data values for roofs 1 to 7 are plotted on the scatter diagram shown on the opposite page. Plot the 4 pairs of data values for roofs 8 to 11 on the scatter diagram.
    1. Calculate the equation of the least squares regression line of \(y _ { i }\) on \(x _ { i }\), and draw your line on the scatter diagram.
    2. Interpret your values for the gradient and for the intercept of this regression line.
  2. Estimate the time that it would take Stephan to replace 15 loose ridge tiles on a house roof.
  3. Given that \(r _ { i }\) denotes the residual for the point representing roof \(i\) :
    1. calculate the value of \(r _ { 6 }\);
    2. state why the value of \(\sum _ { i = 1 } ^ { 11 } r _ { i }\) gives no useful information about the connection between the number of ridge tiles replaced and the time taken.
      [0pt] [1 mark]
      \section*{Answer space for question 4}
      \includegraphics[max width=\textwidth, alt={}]{6fbb8891-e6de-42fe-a195-ea643552fdcf-11_2385_1714_322_155}
OCR S1 Q4
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54. State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate. \section*{June 2005}