Calculate y on x from raw data table

Questions that provide raw bivariate data in a table and ask to find the regression line of y on x.

66 questions

OCR S1 2007 January Q5
5 A chemical solution was gradually heated. At five-minute intervals the time, \(x\) minutes, and the temperature, \(y ^ { \circ } \mathrm { C }\), were noted.
\(x\)05101520253035
\(y\)0.83.06.810.915.619.623.426.7
$$\left[ n = 8 , \Sigma x = 140 , \Sigma y = 106.8 , \Sigma x ^ { 2 } = 3500 , \Sigma y ^ { 2 } = 2062.66 , \Sigma x y = 2685.0 . \right]$$
  1. Calculate the equation of the regression line of \(y\) on \(x\).
  2. Use your equation to estimate the temperature after 12 minutes.
  3. It is given that the value of the product moment correlation coefficient is close to + 1 . Comment on the reliability of using your equation to estimate \(y\) when
    (a) \(x = 17\),
    (b) \(x = 57\).
OCR S1 Specimen Q8
8 An experiment was conducted to see whether there was any relationship between the maximum tidal current, \(y \mathrm {~cm} \mathrm {~s} ^ { - 1 }\), and the tidal range, \(x\) metres, at a particular marine location. [The tidal range is the difference between the height of high tide and the height of low tide.] Readings were taken over a period of 12 days, and the results are shown in the following table.
\(x\)2.02.43.03.13.43.73.83.94.04.54.64.9
\(y\)15.222.025.233.033.134.251.042.345.050.761.059.2
$$\left[ \Sigma x = 43.3 , \Sigma y = 471.9 , \Sigma x ^ { 2 } = 164.69 , \Sigma y ^ { 2 } = 20915.75 , \Sigma x y = 1837.78 . \right]$$ The scatter diagram below illustrates the data.
\includegraphics[max width=\textwidth, alt={}, center]{2fb25fc5-0445-44fa-a23e-647d14b1a376-4_462_793_1464_644}
  1. Calculate the product moment correlation coefficient for the data, and comment briefly on your answer with reference to the appearance of the scatter diagram.
  2. Calculate the equation of the regression line of maximum tidal current on tidal range.
  3. Estimate the maximum tidal current on a day when the tidal range is 4.2 m , and comment briefly on how reliable you consider your estimate is likely to be.
  4. It is suggested that the equation found in part (ii) could be used to predict the maximum tidal current on a day when the tidal range is 15 m . Comment briefly on the validity of this suggestion.
Edexcel S1 2014 January Q3
3. Jean works for an insurance company. She randomly selects 8 people and records the price of their car insurance, \(\pounds p\), and the time, \(t\) years, since they passed their driving test. The data is shown in the table below.
\(t\)1013171822242527
\(p\)720650430490500390280300
$$\text { (You may use } \bar { t } = 19.5 , \bar { p } = 470 , S _ { t p } = - 6080 , S _ { t t } = 254 , S _ { p p } = 169200 \text { ) }$$
  1. On the graph below draw a scatter diagram for these data.
  2. Comment on the relationship between \(p\) and \(t\).
  3. Find the equation of the regression line of \(p\) on \(t\).
  4. Use your regression equation to estimate the price of car insurance for someone who passed their driving test 20 years ago. Jack passed his test 39 years ago and decides to use Jean's data to predict the price of his car insurance.
  5. Comment on Jack's decision. Give a reason for your answer.
    \includegraphics[max width=\textwidth, alt={}, center]{a839a89a-17f0-473b-ac10-bcec3dbe97f7-06_951_1365_1603_294}
Edexcel S1 2014 June Q1
  1. A medical researcher is studying the relationship between age ( \(x\) years) and volume of blood ( \(y \mathrm { ml }\) ) pumped by each contraction of the heart. The researcher obtained the following data from a random sample of 8 patients.
Age (x)2025304555606570
Volume (y)7476777268676462
[You may use \(\sum x = 370 , \mathrm {~S} _ { x x } = 2587.5 , \sum y = 560 , \sum y ^ { 2 } = 39418 , \mathrm {~S} _ { x y } = - 710\) ]
  1. Calculate \(\mathrm { S } _ { y y }\)
  2. Calculate the product moment correlation coefficient for these data.
  3. Interpret your value of the correlation coefficient. The researcher believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the researcher's belief.
  5. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\) Jack is a 40-year-old patient.
    1. Use your regression line to estimate the volume of blood pumped by each contraction of Jack's heart.
    2. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 2004 January Q1
  1. An office has the heating switched on at 7.00 a.m. each morning. On a particular day, the temperature of the office, \(t { } ^ { \circ } \mathrm { C }\), was recorded \(m\) minutes after 7.00 a.m. The results are shown in the table below.
\(m\)01020304050
\(t\)6.08.911.813.515.316.1
  1. Calculate the exact values of \(S _ { m t }\) and \(S _ { m m }\).
  2. Calculate the equation of the regression line of \(t\) on \(m\) in the form \(t = a + b m\).
  3. Use your equation to estimate the value of \(t\) at 7.35 a.m.
  4. State, giving a reason, whether or not you would use the regression equation in (b) to estimate the temperature
    1. at 9.00 a.m. that day,
    2. at 7.15 a.m. one month later.
OCR S1 2009 January Q2
2 The table shows the age, \(x\) years, and the mean diameter, \(y \mathrm {~cm}\), of the trunk of each of seven randomly selected trees of a certain species.
Age \(( x\) years \()\)11122028354551
Mean trunk diameter \(( y \mathrm {~cm} )\)12.216.026.439.239.651.360.6
$$\left[ n = 7 , \Sigma x = 202 , \Sigma y = 245.3 , \Sigma x ^ { 2 } = 7300 , \Sigma y ^ { 2 } = 10510.65 , \Sigma x y = 8736.9 . \right]$$
  1. (a) Use an appropriate formula to show that the gradient of the regression line of \(y\) on \(x\) is 1.13 , correct to 2 decimal places.
    (b) Find the equation of the regression line of \(y\) on \(x\).
  2. Use your equation to estimate the mean trunk diameter of a tree of this species with age
    (a) 30 years,
    (b) 100 years. It is given that the value of the product moment correlation coefficient for the data in the table is 0.988 , correct to 3 decimal places.
  3. Comment on the reliability of each of your two estimates.
OCR MEI S2 2010 January Q1
1 A pilot records the take-off distance for his light aircraft on runways at various altitudes. The data are shown in the table below, where \(a\) metres is the altitude and \(t\) metres is the take-off distance. Also shown are summary statistics for these data.
\(a\)0300600900120015001800
\(t\)63570477683692310081105
$$n = 7 \quad \Sigma a = 6300 \quad \Sigma t = 5987 \quad \Sigma a ^ { 2 } = 8190000 \quad \Sigma t ^ { 2 } = 5288931 \quad \Sigma a t = 6037800$$
  1. Draw a scatter diagram to illustrate these data.
  2. State which of the two variables \(a\) and \(t\) is the independent variable and which is the dependent variable. Briefly explain your answer.
  3. Calculate the equation of the regression line of \(t\) on \(a\).
  4. Use the equation of the regression line to calculate estimates of the take-off distance for altitudes
    (A) 800 metres,
    (B) 2500 metres. Comment on the reliability of each of these estimates.
  5. Calculate the value of the residual for the data point where \(a = 1200\) and \(t = 923\), and comment on its sign.
OCR MEI S2 2015 June Q1
1 A random sample of wheat seedlings is planted and their growth is measured. The table shows their average growth, \(y \mathrm {~mm}\), at half-day intervals.
Time \(t\) days00.511.522.53
Average growth \(y \mathrm {~mm}\)072133455662
  1. Draw a scatter diagram to illustrate these data.
  2. Calculate the equation of the regression line of \(y\) on \(t\).
  3. Calculate the value of the residual for the data point at which \(t = 2\).
  4. Use the equation of the regression line to calculate an estimate of the average growth after 5 days for wheat seedlings. Comment on the reliability of this estimate. It is suggested that it would be better to replace the regression line by a line which passes through the origin. You are given that the equation of such a line is \(y = a t\), where \(a = \frac { \sum y t } { \sum t ^ { 2 } }\).
  5. Find the equation of this line and plot the line on your scatter diagram.
CAIE FP2 2009 June Q7
7 An experiment was carried out to determine how much weedkiller to apply per \(100 \mathrm {~m} ^ { 2 }\) in a large field. Ten \(100 \mathrm {~m} ^ { 2 }\) areas of the field were randomly chosen and sprayed with predetermined volumes of the weedkiller. The volume of the weedkiller is denoted by \(x\) litres and the number of weeds that survived is denoted by \(y\). The results are given in the table.
\(x\)0.100.150.200.250.300.350.400.450.500.55
\(y\)484044353924101396
$$\left[ \Sigma x = 3.25 , \Sigma x ^ { 2 } = 1.2625 , \Sigma y = 268 , \Sigma y ^ { 2 } = 9548 , \Sigma x y = 66.10 . \right]$$ It is given that the product moment correlation coefficient for the data is - 0.951 , correct to 3 decimal places.
  1. Calculate the equation of a suitable regression line, giving a reason for your choice of line.
  2. Estimate the best volume of weedkiller to apply, and comment on the reliability of your estimate.
CAIE FP2 2012 November Q10
10 Delegates who travelled to a conference were asked to report the distance, \(y \mathrm {~km}\), that they had travelled and the time taken, \(x\) minutes. The values reported by a random sample of 8 delegates are given in the following table.
Delegate\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(x\)90467298526510582
\(y\)90556985455011074
$$\left[ \Sigma x = 610 , \Sigma x ^ { 2 } = 49682 , \Sigma y = 578 , \Sigma y ^ { 2 } = 45212 , \Sigma x y = 47136 . \right]$$ Find the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\). Estimate the time taken by a delegate who travelled 100 km to the conference. Calculate the product moment correlation coefficient for this sample.
CAIE FP2 2014 November Q9
9 A random sample of 10 pairs of values of \(x\) and \(y\) is given in the following table.
\(x\)466827121495
\(y\)24686109865
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Find the product moment correlation coefficient for the sample.
  3. Find the estimated value of \(y\) when \(x = 10\), and comment on the reliability of this estimate.
  4. Another sample of \(N\) pairs of data from the same population has the same product moment correlation coefficient as the first sample given. A test, at the \(1 \%\) significance level, on this second sample indicates that there is sufficient evidence to conclude that there is positive correlation. Find the set of possible values of \(N\).
Edexcel S1 Specimen Q6
  1. A travel agent sells flights to different destinations from Beerow airport. The distance \(d\), measured in 100 km , of the destination from the airport and the fare \(\pounds f\) are recorded for a random sample of 6 destinations.
Destination\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)
\(d\)2.24.06.02.58.05.0
\(f\)182025233228
$$\text { [You may use } \sum d ^ { 2 } = 152.09 \quad \sum f ^ { 2 } = 3686 \quad \sum f d = 723.1 \text { ] }$$
  1. Using the axes below, complete a scatter diagram to illustrate this information.
  2. Explain why a linear regression model may be appropriate to describe the relationship between \(f\) and \(d\).
  3. Calculate \(S _ { d d }\) and \(S _ { f d }\)
  4. Calculate the equation of the regression line of \(f\) on \(d\) giving your answer in the form \(f = a + b d\).
  5. Give an interpretation of the value of \(b\). Jane is planning her holiday and wishes to fly from Beerow airport to a destination \(t \mathrm {~km}\) away. A rival travel agent charges 5 p per km.
  6. Find the range of values of \(t\) for which the first travel agent is cheaper than the rival.
    \includegraphics[max width=\textwidth, alt={}, center]{61983561-79f7-4883-8ae7-ab1f4955d444-20_967_1630_1722_164}
Edexcel S1 2001 January Q6
6. A local authority is investigating the cost of reconditioning its incinerators. Data from 10 randomly chosen incinerators were collected. The variables monitored were the operating time \(x\) (in thousands of hours) since last reconditioning and the reconditioning cost \(y\) (in \(\pounds 1000\) ). None of the incinerators had been used for more than 3000 hours since last reconditioning. The data are summarised below, $$\Sigma x = 25.0 , \Sigma x ^ { 2 } = 65.68 , \Sigma y = 50.0 , \Sigma y ^ { 2 } = 260.48 , \Sigma x y = 130.64 .$$
  1. Find \(\mathrm { S } _ { x x } , \mathrm {~S} _ { x y } , \mathrm {~S} _ { y y }\).
  2. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. Explain why this value might support the fitting of a linear regression model of the form \(y = a + b x\).
  4. Find the values of \(a\) and \(b\).
  5. Give an interpretation of \(a\).
  6. Estimate
    1. the reconditioning cost for an operating time of 2400 hours,
    2. the financial effect of an increase of 1500 hours in operating time.
  7. Suggest why the authority might be cautious about making a prediction of the reconditioning cost of an incinerator which had been operating for 4500 hours since its last reconditioning.
Edexcel S1 2002 January Q7
7. A number of people were asked to guess the calorific content of 10 foods. The
mean \(s\) of the guesses for each food and the true calorific content \(t\) are given in the table below.
Food\(t\)\(s\)
Packet of biscuits170420
1 potato90160
1 apple80110
Crisp breads1070
Chocolate bar260360
1 slice white bread75135
1 slice brown bread60115
Portion of beef curry270350
Portion of rice pudding165390
Half a pint of milk160200
[You may assume that \(\Sigma t = 1340 , \Sigma s = 2310 , \Sigma t s = 396775 , \Sigma t ^ { 2 } = 246050 , \Sigma s ^ { 2 } = 694650\).]
  1. Draw a scatter diagram, indicating clearly which is the explanatory (independent) and which is the response (dependent) variable.
  2. Calculate, to 3 significant figures, the product moment correlation coefficient for the above data.
  3. State, with a reason, whether or not the value of the product moment correlation coefficient changes if all the guesses are 50 calories higher than the values in the table. The mean of the guesses for the portion of rice pudding and for the packet of biscuits are outside the linear relation of the other eight foods.
  4. Find the equation of the regression line of \(s\) on \(t\) excluding the values for rice pudding and biscuits.
    [0pt] [You may now assume that \(S _ { t s } = 72587 , S _ { t t } = 63671.875 , \bar { t } = 125.625 , \bar { s } = 187.5\).]
  5. Draw the regression line on your scatter diagram.
  6. State, with a reason, what the effect would be on the regression line of including the values for a portion of rice pudding and a packet of biscuits. \section*{END}
Edexcel S1 2003 January Q6
6. The chief executive of Rex cars wants to investigate the relationship between the number of new car sales and the amount of money spent on advertising. She collects data from company records on the number of new car sales, \(c\), and the cost of advertising each year, \(p\) (£000). The data are shown in the table below.
YearNumber of new car sale, \(c\)Cost of advertising (£000), \(p\)
19904240120
19914380126
19924420132
19934440134
19944430137
19954520144
19964590148
19974660150
19984700153
19994790158
  1. Using the coding \(x = ( p - 100 )\) and \(y = \frac { 1 } { 10 } ( c - 4000 )\), draw a scatter diagram to represent these data. Explain why \(x\) is the explanatory variable.
  2. Find the equation of the least squares regression line of \(y\) on \(x\). $$\text { [Use } \left. \Sigma x = 402 , \Sigma y = 517 , \Sigma x ^ { 2 } = 17538 \text { and } \Sigma x y = 22611 . \right]$$
  3. Deduce the equation of the least squares regression line of \(c\) on \(p\) in the form \(c = a + b p\).
  4. Interpret the value of \(a\).
  5. Predict the number of extra new cars sales for an increase of \(\pounds 2000\) in advertising budget. Comment on the validity of your answer.
    (2)
Edexcel S1 2008 January Q4
4. A second hand car dealer has 10 cars for sale. She decides to investigate the link between the age of the cars, \(x\) years, and the mileage, \(y\) thousand miles. The data collected from the cars are shown in the table below.
Age, \(x\)
(years)
22.5344.54.55366.5
Mileage, \(y\)
(thousands)
22343337404549305858
[You may assume that \(\sum x = 41 , \sum y = 406 , \sum x ^ { 2 } = 188 , \sum x y = 1818.5\) ]
  1. Find \(S _ { x x }\) and \(S _ { x y }\).
  2. Find the equation of the least squares regression line in the form \(y = a + b x\). Give the values of \(a\) and \(b\) to 2 decimal places.
  3. Give a practical interpretation of the slope \(b\).
  4. Using your answer to part (b), find the mileage predicted by the regression line for a 5 year old car.
    \(\_\_\_\_\)}
Edexcel S1 2009 January Q1
  1. A teacher is monitoring the progress of students using a computer based revision course. The improvement in performance, \(y\) marks, is recorded for each student along with the time, \(x\) hours, that the student spent using the revision course. The results for a random sample of 10 students are recorded below.
\(x\)
hours
1.03.54.01.51.30.51.82.52.33.0
\(y\)
marks
5302710- 3- 5715- 1020
$$\text { [You may use } \sum x = 21.4 , \quad \sum y = 96 , \quad \sum x ^ { 2 } = 57.22 , \quad \sum x y = 313.7 \text { ] }$$
  1. Calculate \(S _ { x x }\) and \(S _ { x y }\).
  2. Find the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\).
  3. Give an interpretation of the gradient of your regression line. Rosemary spends 3.3 hours using the revision course.
  4. Predict her improvement in marks. Lee spends 8 hours using the revision course claiming that this should give him an improvement in performance of over 60 marks.
  5. Comment on Lee's claim.
Edexcel S1 2010 January Q6
  1. The blood pressures, \(p\) mmHg, and the ages, \(t\) years, of 7 hospital patients are shown in the table below.
PatientABCDEFG
\(t\)42744835562660
\(p\)981301208818280135
$$\left[ \sum t = 341 , \sum p = 833 , \sum t ^ { 2 } = 18181 , \sum p ^ { 2 } = 106397 , \sum t p = 42948 \right]$$
  1. Find \(S _ { p p } , S _ { t p }\) and \(S _ { t t }\) for these data.
  2. Calculate the product moment correlation coefficient for these data.
  3. Interpret the correlation coefficient.
  4. On the graph paper on page 17, draw the scatter diagram of blood pressure against age for these 7 patients.
  5. Find the equation of the regression line of \(p\) on \(t\).
  6. Plot your regression line on your scatter diagram.
  7. Use your regression line to estimate the blood pressure of a 40 year old patient. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Question 6 continued} \includegraphics[alt={},max width=\textwidth]{a0058e3c-046f-4271-aee4-33a74c719e2a-12_2071_1729_386_157}
    \end{figure}
Edexcel S1 2013 January Q3
3. A biologist is comparing the intervals ( \(m\) seconds) between the mating calls of a certain species of tree frog and the surrounding temperature ( \(t { } ^ { \circ } \mathrm { C }\) ). The following results were obtained.
\(t { } ^ { \circ } \mathrm { C }\)813141515202530
\(m\) secs6.54.5654321
$$\text { (You may use } \sum t m = 469.5 , \quad \mathrm {~S} _ { t t } = 354 , \quad \mathrm {~S} _ { m m } = 25.5 \text { ) }$$
  1. Show that \(\mathrm { S } _ { t m } = - 90.5\)
  2. Find the equation of the regression line of \(m\) on \(t\) giving your answer in the form \(m = a + b t\).
  3. Use your regression line to estimate the time interval between mating calls when the surrounding temperature is \(10 ^ { \circ } \mathrm { C }\).
  4. Comment on the reliability of this estimate, giving a reason for your answer.
Edexcel S1 2001 June Q7
7. A music teacher monitored the sight-reading ability of one of her pupils over a 10 week period. At the end of each week, the pupil was given a new piece to sight-read and the teacher noted the number of errors \(y\). She also recorded the
number of hours \(x\) that the pupil had practised each week. The data are shown in the table below.
\(x\)1215711184693
\(y\)84138181215141216
  1. Plot these data on a scatter diagram.
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). $$\text { (You may use } \left. \Sigma x ^ { 2 } = 746 , \Sigma x y = 749 . \right)$$
  3. Give an interpretation of the slope and the intercept of your regression line.
  4. State whether or not you think the regression model is reasonable
    1. for the range of \(x\)-values given in the table,
    2. for all possible \(x\)-values. In each case justify your answer either by giving a reason for accepting the model or by suggesting an alternative model. END
Edexcel S1 2002 June Q7
7. An ice cream seller believes that there is a relationship between the temperature on a summer day and the number of ice creams sold. Over a period of 10 days he records the temperature at 1 p.m., \(t ^ { \circ } \mathrm { C }\), and the number of ice creams sold, \(c\), in the next hour. The data he collects is summarised in the table below.
\(t\)\(c\)
1324
2255
1735
2045
1020
1530
1939
1219
1836
2354
[Use \(\left. \Sigma t ^ { 2 } = 3025 , \Sigma c ^ { 2 } = 14245 , \Sigma c t = 6526 .\right]\)
  1. Calculate the value of the product moment correlation coefficient between \(t\) and \(c\).
  2. State whether or not your value supports the use of a regression equation to predict the number of ice creams sold. Give a reason for your answer.
  3. Find the equation of the least squares regression line of \(c\) on \(t\) in the form \(c = a + b t\).
  4. Interpret the value of \(b\).
  5. Estimate the number of ice creams sold between 1 p.m. and 2 p.m. when the temperature at 1 p.m. is \(16 ^ { \circ } \mathrm { C }\).
    (3)
  6. At 1 p.m. on a particular day, the highest temperature for 50 years was recorded. Give a reason why you should not use the regression equation to predict ice cream sales on that day.
    (1)
Edexcel S1 2004 June Q2
2. A researcher thinks there is a link between a person's height and level of confidence. She measured the height \(h\), to the nearest cm , of a random sample of 9 people. She also devised a test to measure the level of confidence \(c\) of each person. The data are shown in the table below.
\(h\)179169187166162193161177168
\(c\)569561579561540598542565573
[You may use \(\Sigma h ^ { 2 } = 272094 , \Sigma c ^ { 2 } = 2878966 , \Sigma h c = 884484\) ]
  1. Draw a scatter diagram to illustrate these data.
  2. Find exact values of \(S _ { h c } S _ { h h }\) and \(S _ { c c }\).
  3. Calculate the value of the product moment correlation coefficient for these data.
  4. Give an interpretation of your correlation coefficient.
  5. Calculate the equation of the regression line of \(c\) on \(h\) in the form \(c = a + b h\).
  6. Estimate the level of confidence of a person of height 180 cm .
  7. State the range of values of \(h\) for which estimates of \(c\) are reliable.
Edexcel S1 2006 June Q3
  1. A metallurgist measured the length, \(l \mathrm {~mm}\), of a copper rod at various temperatures, \(t ^ { \circ } \mathrm { C }\), and recorded the following results.
\(t\)\(l\)
20.42461.12
27.32461.41
32.12461.73
39.02461.88
42.92462.03
49.72462.37
58.32462.69
67.42463.05
The results were then coded such that \(x = t\) and \(y = l - 2460.00\).
  1. Calculate \(S _ { x y }\) and \(S _ { x x }\).
    (You may use \(\Sigma x ^ { 2 } = 15965.01\) and \(\Sigma x y = 757.467\) )
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\).
  3. Estimate the length of the rod at \(40 ^ { \circ } \mathrm { C }\).
  4. Find the equation of the regression line of \(l\) on \(t\).
  5. Estimate the length of the rod at \(90 ^ { \circ } \mathrm { C }\).
  6. Comment on the reliability of your estimate in part (e).
Edexcel S1 2007 June Q3
3. A student is investigating the relationship between the price ( \(y\) pence) of 100 g of chocolate and the percentage ( \(x \%\) ) of cocoa solids in the chocolate.
The following data is obtained
Chocolate brandABC\(D\)\(E\)\(F\)G\(H\)
\(x\) (\% cocoa)1020303540506070
\(y\) (pence)3555401006090110130
(You may use: \(\sum x = 315 , \sum x ^ { 2 } = 15225 , \sum y = 620 , \sum y ^ { 2 } = 56550 , \sum x y = 28750\) )
  1. On the graph paper on page 9 draw a scatter diagram to represent these data.
  2. Show that \(S _ { x y } = 4337.5\) and find \(S _ { x x }\). The student believes that a linear relationship of the form \(y = a + b x\) could be used to describe these data.
  3. Use linear regression to find the value of \(a\) and the value of \(b\), giving your answers to 1 decimal place.
  4. Draw the regression line on your scatter diagram. The student believes that one brand of chocolate is overpriced.
  5. Use the scatter diagram to
    1. state which brand is overpriced,
    2. suggest a fair price for this brand. Give reasons for both your answers.
      \includegraphics[max width=\textwidth, alt={}]{045e10d2-1766-4399-aa0a-5619dd0cce0f-06_2454_1485_282_228}
      The data on page 8 has been repeated here to help you
      Chocolate brandA\(B\)\(C\)D\(E\)\(F\)G\(H\)
      \(x\) (\% cocoa)1020303540506070
      \(y\) (pence)3555401006090110130
      (You may use: \(\sum x = 315 , \sum x ^ { 2 } = 15225 , \sum y = 620 , \sum y ^ { 2 } = 56550 , \sum x y = 28750\) )
Edexcel S1 2009 June Q5
5. The weight, \(w\) grams, and the length, \(l \mathrm {~mm}\), of 10 randomly selected newborn turtles are given in the table below.
\(l\)49.052.053.054.554.153.450.051.649.551.2
\(w\)29323439383530312930
$$\text { (You may use } \mathrm { S } _ { l l } = 33.381 \quad \mathrm {~S} _ { w l } = 59.99 \quad \mathrm {~S} _ { w w } = 120.1 \text { ) }$$
  1. Find the equation of the regression line of \(w\) on \(l\) in the form \(w = a + b l\).
  2. Use your regression line to estimate the weight of a newborn turtle of length 60 mm .
  3. Comment on the reliability of your estimate giving a reason for your answer.