5.09c Calculate regression line

235 questions

Sort by: Default | Easiest first | Hardest first
CAIE FP2 2012 November Q8
11 marks Moderate -0.8
8 The yield of a particular crop on a farm is thought to depend principally on the amount of sunshine during the growing season. For a random sample of 8 years, the average yield, \(y\) kilograms per square metre, and the average amount of sunshine per day, \(x\) hours, are recorded. The results are given in the following table.
\(x\)12.210.45.26.311.810.014.22.3
\(y\)159107811126
$$\left[ \Sigma x = 72.4 , \Sigma x ^ { 2 } = 769.9 , \Sigma y = 78 , \Sigma y ^ { 2 } = 820 , \Sigma x y = 761.3 . \right]$$
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Find the product moment correlation coefficient.
  3. Test, at the \(5 \%\) significance level, whether there is positive correlation between the average yield and the average amount of sunshine per day.
CAIE FP2 2012 November Q10
10 marks Moderate -0.3
10 Delegates who travelled to a conference were asked to report the distance, \(y \mathrm {~km}\), that they had travelled and the time taken, \(x\) minutes. The values reported by a random sample of 8 delegates are given in the following table.
Delegate\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(x\)90467298526510582
\(y\)90556985455011074
$$\left[ \Sigma x = 610 , \Sigma x ^ { 2 } = 49682 , \Sigma y = 578 , \Sigma y ^ { 2 } = 45212 , \Sigma x y = 47136 . \right]$$ Find the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\). Estimate the time taken by a delegate who travelled 100 km to the conference. Calculate the product moment correlation coefficient for this sample.
CAIE FP2 2013 November Q9
11 marks Standard +0.3
9 For a random sample of 10 observations of pairs of values \(( x , y )\), the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are $$y = 4.21 x - 0.862 \quad \text { and } \quad x = 0.043 y + 6.36$$ respectively.
  1. Find the value of the product moment correlation coefficient for the sample.
  2. Test, at the \(10 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
  3. Find the mean values of \(x\) and \(y\) for this sample.
  4. Estimate the value of \(x\) when \(y = 2.3\) and comment on the reliability of your answer.
CAIE FP2 2013 November Q10
11 marks Standard +0.3
10 The lengths, \(x \mathrm {~m}\), and masses, \(y \mathrm {~kg}\), of 12 randomly chosen babies born at a particular hospital last year are summarised as follows. $$\Sigma x = 7.50 \quad \Sigma x ^ { 2 } = 4.73 \quad \Sigma y = 38.6 \quad \Sigma y ^ { 2 } = 124.84 \quad \Sigma x y = 24.25$$ Find the value of the product moment correlation coefficient for this sample. Obtain an estimate for the mass of a baby, born last year at the hospital, whose length is 0.64 m . Test, at the \(2 \%\) significance level, whether there is non-zero correlation between the two variables.
CAIE FP2 2014 November Q9
11 marks Standard +0.8
9 A random sample of 10 pairs of values of \(x\) and \(y\) is given in the following table.
\(x\)466827121495
\(y\)24686109865
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Find the product moment correlation coefficient for the sample.
  3. Find the estimated value of \(y\) when \(x = 10\), and comment on the reliability of this estimate.
  4. Another sample of \(N\) pairs of data from the same population has the same product moment correlation coefficient as the first sample given. A test, at the \(1 \%\) significance level, on this second sample indicates that there is sufficient evidence to conclude that there is positive correlation. Find the set of possible values of \(N\).
CAIE FP2 2016 November Q10 OR
Challenging +1.2
For a random sample, \(A\), of 5 pairs of values of \(x\) and \(y\), the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\) are respectively \(y = 4.5 + 0.3 x\) and \(x = 3 y - 13\). Four of the five pairs of data are given in the following table.
\(x\)1579
\(y\)5677
Find
  1. the fifth pair of values of \(x\) and \(y\),
  2. the value of the product moment correlation coefficient. A second random sample, \(B\), of 5 pairs of values of \(x\) and \(y\) is summarised as follows. $$\Sigma x = 20 \quad \Sigma x ^ { 2 } = 100 \quad \Sigma y = 17 \quad \Sigma y ^ { 2 } = 69 \quad \Sigma x y = 75$$ The two samples, \(A\) and \(B\), are combined to form a single random sample of size 10 .
  3. Use this combined sample to test, at the \(5 \%\) significance level, whether the population product moment correlation coefficient is different from zero.
CAIE FP2 2017 November Q11 OR
Moderate -0.3
A large number of people attended a course to improve the speed of their logical thinking. The times taken to complete a particular type of logic puzzle at the beginning of the course and at the end of the course are recorded for each person. The time taken, in minutes, at the beginning of the course is denoted by \(x\) and the time taken, in minutes, at the end of the course is denoted by \(y\). For a random sample of 9 people, the results are summarised as follows. $$\Sigma x = 45.3 \quad \Sigma x ^ { 2 } = 245.59 \quad \Sigma y = 40.5 \quad \Sigma y ^ { 2 } = 195.11 \quad \Sigma x y = 218.72$$ Ken attended the course, but his time to complete the puzzle at the beginning of the course was not recorded. His time to complete the puzzle at the end of the course was 4.2 minutes.
  1. By finding, showing all necessary working, the equation of a suitable regression line, find an estimate for the time that Ken would have taken to complete the puzzle at the beginning of the course.
    The values of \(x - y\) for the sample of 9 people are as follows. $$\begin{array} { l l l l l l l l l } 0.2 & 0.8 & 0.5 & 1.0 & 0.2 & 0.6 & 0.2 & 0.5 & 0.8 \end{array}$$ The organiser of the course believes that, on average, the time taken to complete the puzzle decreases between the beginning and the end of the course by more than 0.3 minutes.
  2. Stating suitable hypotheses and assuming a normal distribution, test the organiser's belief at the \(2 \frac { 1 } { 2 } \%\) significance level.
CAIE FP2 2017 Specimen Q9
11 marks Standard +0.8
9 A random sample of 8 students is chosen from those sitting examinations in both Mathematics and French. Their marks in Mathematics, \(x\), and in French, \(y\), are summarised as follows. $$\Sigma x = 472 \quad \Sigma x ^ { 2 } = 29950 \quad \Sigma y = 400 \quad \Sigma y ^ { 2 } = 21226 \quad \Sigma x y = 24879$$ Another student scored 72 marks in the Mathematics examination but was unable to sit the French examination.
  1. Estimate the mark that this student would have obtained in the French examination.
  2. Test, at the \(5 \%\) significance level, whether there is non-zero correlation between marks in Mathematics and marks in French.
OCR MEI S2 Q3
18 marks Standard +0.3
3 In a triathlon, competitors have to swim 600 metres, cycle 40 kilometres and run 10 kilometres. To improve her strength, a triathlete undertakes a training programme in which she carries weights in a rucksack whilst running. She runs a specific course and notes the total time taken for each run. Her coach is investigating the relationship between time taken and weight carried. The times taken with eight different weights are illustrated on the scatter diagram below, together with the summary statistics for these data. The variables \(x\) and \(y\) represent weight carried in kilograms and time taken in minutes respectively. \includegraphics[max width=\textwidth, alt={}, center]{d138173d-c70c-46db-b9b9-d5f19334c5f1-04_627_1536_630_281} Summary statistics: \(n = 8 , \Sigma x = 36 , \Sigma y = 214.8 , \Sigma x ^ { 2 } = 204 , \Sigma y ^ { 2 } = 5775.28 , \Sigma x y = 983.6\).
  1. Calculate the equation of the regression line of \(y\) on \(x\). On one of the eight runs, the triathlete was carrying 4 kilograms and took 27.5 minutes. On this run she was delayed when she tripped and fell over.
  2. Calculate the value of the residual for this weight.
  3. The coach decides to recalculate the equation of the regression line without the data for this run. Would it be preferable to use this recalculated equation or the equation found in part (i) to estimate the delay when the triathlete tripped and fell over? Explain your answer. The triathlete's coach claims that there is positive correlation between cycling and swimming times in triathlons. The product moment correlation coefficient of the times of twenty randomly selected competitors in these two sections is 0.209 .
  4. Carry out a hypothesis test at the \(5 \%\) level to examine the coach's claim, explaining your conclusions clearly.
  5. What distributional assumption is necessary for this test to be valid? How can you use a scatter diagram to decide whether this assumption is likely to be true?
Edexcel AS Paper 2 2019 June Q1
5 marks Easy -1.2
  1. A sixth form college has 84 students in Year 12 and 56 students in Year 13
The head teacher selects a stratified sample of 40 students, stratified by year group.
  1. Describe how this sample could be taken. The head teacher is investigating the relationship between the amount of sleep, s hours, that each student had the night before they took an aptitude test and their performance in the test, \(p\) marks.
    For the sample of 40 students, he finds the equation of the regression line of \(p\) on \(s\) to be $$p = 26.1 + 5.60 s$$
  2. With reference to this equation, describe the effect that an extra 0.5 hours of sleep may have, on average, on a student's performance in the aptitude test.
  3. Describe one limitation of this regression model.
OCR MEI Paper 2 2020 November Q11
10 marks Moderate -0.8
11 The pre-release material contains information concerning median house prices over the period 2004-2015. A spreadsheet has been used to generate a time series graph for two areas: the London borough of "Barking and Dagenham" and "North West". This is shown together with the raw data in Fig. 11.1. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-12_572_1751_447_159} \captionsetup{labelformat=empty} \caption{Fig. 11.1}
\end{figure} Dr Procter suggests that it is unusual for median house prices in a London borough to be consistently higher than those in other parts of the country.
  1. Use your knowledge of the large data set to comment on Dr Procter's suggestion. Dr Procter wishes to predict the median house price in Barking and Dagenham in 2016. She uses the spreadsheet function LINEST to find the equation of the line of best fit for the given data. She obtains the equation \(P = 4897 Y - 9657847\), where \(P\) is the median house price in pounds and \(Y\) is the calendar year, for example 2015.
  2. Use Dr Procter's equation to predict the median house price in Barking and Dagenham in
    Professor Jackson uses a simpler model by using the data from 2014 and 2015 only to form a straight-line model.
  3. Find the equation Professor Jackson uses in her model.
  4. Use Professor Jackson's equation to predict the median house price in Barking and Dagenham in
    Professor Jackson carries out some research online. She finds some information about median house prices in Barking and Dagenham, which is shown in Fig. 11.2. \begin{table}[h]
    20162017
    \(\pounds 290000\)\(\pounds 300000\)
    \captionsetup{labelformat=empty} \caption{Fig. 11.2}
    \end{table}
  5. Comment on how well
OCR Further Statistics AS 2018 June Q7
8 marks Moderate -0.8
7 An environmentalist measures the mean concentration, \(c\) milligrams per litre, of a particular chemical in a group of rivers, and the mean mass, \(m\) pounds, of fish of a certain species found in those rivers. The results are given in the table.
\(c\)1.941.781.621.511.521.4
\(m\)6.57.27.47.68.39.7
  1. State which, if either, of \(m\) and \(c\) is an independent variable.
  2. Calculate the equation of the least squares regression line of \(c\) on \(m\).
  3. State what effect, if any, there would be on your answer to part (ii) if the masses of the fish had been recorded in kilograms rather than pounds. ( \(1 \mathrm {~kg} \approx 2.2\) pounds.)
  4. The data is illustrated in the scatter diagram. Explain what is meant by 'least squares', illustrating your answer using the copy of this diagram in the Printed Answer Booklet.
    [diagram]
OCR Further Statistics AS 2022 June Q1
8 marks Moderate -0.3
1 A geography student chose a certain point in a stream and took measurements of the speed of flow, \(v \mathrm {~ms} ^ { - 1 }\), of water at various depths, \(d \mathrm {~m}\), below the surface at that point. The results are shown in the table.
\(d\)0.10.150.20.250.30.350.40.450.5
\(v\)0.80.50.71.21.11.31.61.40.4
\(n = 9 \quad \sum d = 2.7 \quad \sum v = 9.0 \quad \sum d ^ { 2 } = 0.96 \quad \sum v ^ { 2 } = 10.4 \quad \sum \mathrm {~d} v = 2.85\)
    1. Explain why \(d\) is an example of an independent, controlled variable.
    2. Use two relevant terms to describe the variable \(v\) in a similar way. A statistician believes that the point ( \(0.5,0.4\) ) may be an anomaly.
  1. Calculate the equation of the least squares regression line of \(v\) on \(d\) for all the points in the table apart from ( \(0.5,0.4\) ).
  2. Use the equation of the line found in part (b) to estimate the value of \(v\) when \(d = 0.5\).
  3. Use your answer to part (c) to comment on the statistician's belief.
  4. Use the diagram in the Printed Answer Booklet (which does not illustrate the data in this question) to explain what is meant by "least squares regression line".
OCR Further Statistics AS 2023 June Q3
8 marks Standard +0.3
3 An insurance company collected data concerning the age, \(x\) years, of policy holders and the average size of claim, \(\pounds y\) thousand. The data is summarised as follows. \(n = 32 \quad \sum x = 1340 \quad \sum y = 612 \quad \sum x ^ { 2 } = 64282 \quad \sum y ^ { 2 } = 13418 \quad \sum x y = 27794\)
  1. Find the variance of \(x\).
  2. Find the equation of the regression line of \(y\) on \(x\).
  3. Hence estimate the expected size of claim from a policy holder of age 48. Tom is aged 48. He claims that the range of the data probably does not include people of his age because the mean age for the data is 41.875 , and 48 is not close to this.
  4. Use your answer to part (a) to determine how likely it is that Tom's claim is correct.
  5. Comment on the reliability of your estimate in part (c). You should refer to the value of the product-moment correlation coefficient for the data, which is 0.579 correct to 3 significant figures.
OCR Further Statistics AS 2024 June Q3
11 marks Standard +0.3
3 The ages, \(x\) years, and the reaction time, \(t\) seconds, in an experiment carried out on a sample of 15 volunteers are summarised as follows. \(n = 15 \quad \sum x = 762 \quad \sum t = 8.7 \quad \sum x ^ { 2 } = 44204 \quad \sum t ^ { 2 } = 5.65 \quad \sum x t = 490.1\)
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(t\).
  2. Calculate the equation of the line of regression of \(t\) on \(x\). Give your answer in the form \(\mathrm { t } = \mathrm { a } + \mathrm { bx }\) where \(a\) and \(b\) are constants to be determined.
  3. Explain the relevance of the quantity \(\sum ( t - a - b x ) ^ { 2 }\) to your answer to part (b).
  4. Estimate the reaction time, in seconds, for a volunteer aged 42. It is subsequently decided to measure the reaction time in tenths of a second rather than in seconds (so, for example, a time of 0.6 seconds would now be recorded as 6 ).
    1. State what effect, if any, this change would have on your answer to part (a).
    2. State what effect, if any, this change would have on your answer to part (b). It is known that the sample of 15 volunteers consisted almost entirely of students and retired people.
  5. Using this information, and the value of the product moment correlation coefficient, comment on the reliability of your estimate in part (d).
OCR Further Statistics AS 2020 November Q3
9 marks Moderate -0.3
3 An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
AccountABCDEFGH
\(p\)1.62.12.42.72.83.35.28.4
\(q\)1.62.32.22.23.12.97.64.8
\(n = 8 \quad \sum \mathrm { p } = 28.5 \quad \sum \mathrm { q } = 26.7 \quad \sum \mathrm { p } ^ { 2 } = 136.35 \quad \sum \mathrm { q } ^ { 2 } = 116.35 \quad \sum \mathrm { pq } = 116.70\) \includegraphics[max width=\textwidth, alt={}, center]{bf1468d1-e02e-47d2-bf41-5bc8f5b4d7c4-3_782_1280_998_242}
  1. State which, if either, of the variables \(p\) and \(q\) is independent.
  2. Calculate the equation of the regression line of \(q\) on \(p\).
    1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
    2. Give two reasons why this estimate could be considered reliable.
  3. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
OCR Further Statistics AS 2021 November Q3
7 marks Moderate -0.3
3
  1. Using the scatter diagram in the Printed Answer Booklet, explain what is meant by least squares in the context of a regression line of \(y\) on \(x\).
  2. A set of bivariate data \(( t , u )\) is summarised as follows. \(n = 5 \quad \sum t = 35 \quad \sum u = 54\) \(\sum t ^ { 2 } = 285 \quad \sum u ^ { 2 } = 758 \quad \sum \mathrm { tu } = 460\)
    1. Calculate the equation of the regression line of \(u\) on \(t\).
    2. The variables \(t\) and \(u\) are now scaled using the following scaling. \(\mathrm { v } = 2 \mathrm { t } , \mathrm { w } = \mathrm { u } + 4\) Find the equation of the regression line of \(w\) on \(v\), giving your equation in the form \(w = f ( v )\).
OCR Further Statistics 2019 June Q1
5 marks Standard +0.3
1 A set of bivariate data ( \(X , Y\) ) is summarised as follows. \(n = 25 , \sum x = 9.975 , \sum y = 11.175 , \sum x ^ { 2 } = 5.725 , \sum y ^ { 2 } = 46.200 , \sum x y = 11.575\)
  1. Calculate the value of Pearson's product-moment correlation coefficient.
  2. Calculate the equation of the regression line of \(y\) on \(x\). It is desired to know whether the regression line of \(y\) on \(x\) will provide a reliable estimate of \(y\) when \(x = 0.75\).
  3. State one reason for believing that the estimate will be reliable.
  4. State what further information is needed in order to determine whether the estimate is reliable.
OCR Further Statistics 2023 June Q2
8 marks Standard +0.3
2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, \(\pounds P\), of the most expensive tickets and the number of people in the audience, \(H\) hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.
\(P\) (£)7565554535
\multirow[t]{5}{*}{\(H\) (hundred)}2727272615
2727202112
2218169
191813
12169
\(\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535\)
  1. Calculate the equation of the regression line of \(h\) on \(p\).
  2. State what change, if any, there would be to your answer to part (a) if \(H\) had been measured in thousands (to 1 decimal place) rather than in hundreds. For a special charity concert, the most expensive tickets cost \(\pounds 50\).
  3. Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to \(\mathbf { 1 }\) decimal place.
  4. Comment on the reliability of your answer to part (c). You should refer to
OCR Further Statistics 2024 June Q7
8 marks Standard +0.3
7 The coordinates of a set of 10 points are denoted by ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) for \(i = 1,2 , \ldots , 10\). For a particular set of values of ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) and any constants \(a\) and \(b\) it can be shown that \(\Sigma \left( y _ { i } - a - b x _ { i } \right) ^ { 2 } = 10 ( 11 - a - 6 b ) ^ { 2 } + 126 \left( b - \frac { 83 } { 42 } \right) ^ { 2 } + \frac { 139 } { 14 }\).
    1. Explain why \(\sum \left( \mathrm { y } _ { \mathrm { i } } - \mathrm { a } - \mathrm { bx } _ { \mathrm { i } } \right) ^ { 2 }\) is minimised by taking \(b = \frac { 83 } { 42 }\) and \(\mathrm { a } = 11 - 6 \mathrm {~b}\).
    2. Hence explain why the equation of the regression line of \(y\) on \(x\) for these points is given by the corresponding values of \(a\) and \(b\) (so that the equation is \(\mathrm { y } = \frac { 83 } { 42 } \mathrm { x } - \frac { 6 } { 7 }\) ).
  1. State which of the following terms cannot apply to the variable \(X\) if the regression line of \(y\) on \(x\) can be used for estimating values of \(Y\). Dependent Independent Controlled Response
  2. Use the regression line to estimate the value of \(y\) corresponding to \(x = 8\).
  3. State what must be true of the value \(x = 8\) if the estimate in part (c) is to be reliable.
  4. Variables \(u\) and \(v\) are related to \(x\) and \(y\) by the following relationships. \(u = 2 + 4 x \quad v = 8 - 2 y\) Show that the gradient of the regression line of \(v\) on \(u\) is very close to - 1 .
OCR Further Statistics 2021 November Q1
6 marks Standard +0.3
1 At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows. \(\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297\)
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(\mathrm { c } = \frac { 5 } { 9 } ( \mathrm { f } - 32 )\). Find the equation of the new regression line.
OCR Further Statistics Specimen Q1
6 marks Easy -1.2
1 The table below shows the typical stopping distances \(d\) metres for a particular car travelling at \(v\) miles per hour.
\(v\)203040506070
\(d\)132436527294
  1. State each of the following words that describe the variable \(v\). \section*{Independent Dependent Controlled Response}
  2. Calculate the equation of the regression line of \(d\) on \(v\).
  3. Use the equation found in part (ii) to estimate the typical stopping distance when this car is travelling at 45 miles per hour. It is given that the product moment correlation coefficient for the data is 0.990 correct to three significant figures.
  4. Explain whether your estimate found in part (iii) is reliable.
Edexcel S1 2016 June Q1
12 marks Moderate -0.8
  1. The percentage oil content, \(p\), and the weight, \(w\) milligrams, of each of 10 randomly selected sunflower seeds were recorded. These data are summarised below.
$$\sum w ^ { 2 } = 41252 \quad \sum w p = 27557.8 \quad \sum w = 640 \quad \sum p = 431 \quad \mathrm {~S} _ { p p } = 2.72$$
  1. Find the value of \(\mathrm { S } _ { w w }\) and the value of \(\mathrm { S } _ { w p }\)
  2. Calculate the product moment correlation coefficient between \(p\) and \(w\)
  3. Give an interpretation of your product moment correlation coefficient. The equation of the regression line of \(p\) on \(w\) is given in the form \(p = a + b w\)
  4. Find the equation of the regression line of \(p\) on \(w\)
  5. Hence estimate the percentage oil content of a sunflower seed which weighs 60 milligrams.
Edexcel S1 2018 June Q1
13 marks Moderate -0.3
  1. A random sample of 10 cars of different makes and sizes is taken and the published miles per gallon, \(p\), and the actual miles per gallon, \(m\), are recorded. The data are coded using variables \(x = \frac { p } { 10 }\) and \(y = m - 25\)
The results for the coded data are summarised below.
\(\boldsymbol { x }\)6.893.675.925.044.873.924.715.143.655.23
\(\boldsymbol { y }\)30322151381513.5319
(You may use \(\sum y ^ { 2 } = 2628.25 \quad \sum x y = 768.58 \quad \mathrm {~S} _ { x x } = 9.25924 \quad \mathrm {~S} _ { x y } = 74.664\) )
  1. Show that \(\mathrm { S } _ { y y } = 626.025\)
  2. Find the product moment correlation coefficient between \(x\) and \(y\).
  3. Give a reason to support fitting a regression model of the form \(y = a + b x\) to these data.
  4. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
    Give the value of \(a\) and the value of \(b\) to 3 significant figures. A car's published miles per gallon is 44
  5. Estimate the actual miles per gallon for this particular car.
  6. Comment on the reliability of your estimate in part (e). Give a reason for your answer.
Edexcel S1 2020 June Q5
15 marks Moderate -0.3
  1. A large company rents shops in different parts of the country. A random sample of 10 shops was taken and the floor area, \(x\) in \(10 \mathrm {~m} ^ { 2 }\), and the annual rent, \(y\) in thousands of dollars, were recorded.
    The data are summarised by the following statistics
$$\sum x = 900 \quad \sum x ^ { 2 } = 84818 \quad \sum y = 183 \quad \sum y ^ { 2 } = 3434$$ and the regression line of \(y\) on \(x\) has equation \(y = 6.066 + 0.136 x\)
  1. Use the regression line to estimate the annual rent in dollars for a shop with a floor area of \(800 \mathrm {~m} ^ { 2 }\)
  2. Find \(\mathrm { S } _ { y y }\) and \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(y\) and \(x\). An 11th shop is added to the sample. The floor area is \(900 \mathrm {~m} ^ { 2 }\) and the annual rent is 15000 dollars.
  4. Use the formula \(\mathrm { S } _ { x y } = \sum ( x - \bar { x } ) ( y - \bar { y } )\) to show that the value of \(\mathrm { S } _ { x y }\) for the 11 shops will be the same as it was for the original 10 shops.
  5. Find the new equation of the regression line of \(y\) on \(x\) for the 11 shops. The company is considering renting a larger shop with area of \(3000 \mathrm {~m} ^ { 2 }\)
  6. Comment on the suitability of using the new regression line to estimate the annual rent. Give a reason for your answer.