5.09b Least squares regression: concepts

144 questions

Sort by: Default | Easiest first | Hardest first
CAIE FP2 2017 Specimen Q9
11 marks Standard +0.8
9 A random sample of 8 students is chosen from those sitting examinations in both Mathematics and French. Their marks in Mathematics, \(x\), and in French, \(y\), are summarised as follows. $$\Sigma x = 472 \quad \Sigma x ^ { 2 } = 29950 \quad \Sigma y = 400 \quad \Sigma y ^ { 2 } = 21226 \quad \Sigma x y = 24879$$ Another student scored 72 marks in the Mathematics examination but was unable to sit the French examination.
  1. Estimate the mark that this student would have obtained in the French examination.
  2. Test, at the \(5 \%\) significance level, whether there is non-zero correlation between marks in Mathematics and marks in French.
Edexcel Paper 3 2022 June Q6
9 marks Standard +0.3
6. Anna is investigating the relationship between exercise and resting heart rate. She takes a random sample of 19 people in her year at school and records for each person
  • their resting heart rate, \(h\) beats per minute
  • the number of minutes, \(m\), spent exercising each week
Her results are shown on the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{3a09f809-fa28-4b3d-bb69-ea074433bd8f-16_531_551_653_740}
  1. Interpret the nature of the relationship between \(h\) and \(m\) Anna codes the data using the formulae $$\begin{aligned} & x = \log _ { 10 } m \\ & y = \log _ { 10 } h \end{aligned}$$ The product moment correlation coefficient between \(x\) and \(y\) is - 0.897
  2. Test whether or not there is significant evidence of a negative correlation between \(x\) and \(y\) You should
    The equation of the line of best fit of \(y\) on \(x\) is $$y = - 0.05 x + 1.92$$
  3. Use the equation of the line of best fit of \(y\) on \(x\) to find a model for \(h\) on \(m\) in the form $$h = a m ^ { k }$$ where \(a\) and \(k\) are constants to be found.
OCR Further Statistics AS 2023 June Q3
8 marks Standard +0.3
3 An insurance company collected data concerning the age, \(x\) years, of policy holders and the average size of claim, \(\pounds y\) thousand. The data is summarised as follows. \(n = 32 \quad \sum x = 1340 \quad \sum y = 612 \quad \sum x ^ { 2 } = 64282 \quad \sum y ^ { 2 } = 13418 \quad \sum x y = 27794\)
  1. Find the variance of \(x\).
  2. Find the equation of the regression line of \(y\) on \(x\).
  3. Hence estimate the expected size of claim from a policy holder of age 48. Tom is aged 48. He claims that the range of the data probably does not include people of his age because the mean age for the data is 41.875 , and 48 is not close to this.
  4. Use your answer to part (a) to determine how likely it is that Tom's claim is correct.
  5. Comment on the reliability of your estimate in part (c). You should refer to the value of the product-moment correlation coefficient for the data, which is 0.579 correct to 3 significant figures.
OCR Further Statistics AS 2024 June Q3
11 marks Standard +0.3
3 The ages, \(x\) years, and the reaction time, \(t\) seconds, in an experiment carried out on a sample of 15 volunteers are summarised as follows. \(n = 15 \quad \sum x = 762 \quad \sum t = 8.7 \quad \sum x ^ { 2 } = 44204 \quad \sum t ^ { 2 } = 5.65 \quad \sum x t = 490.1\)
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(t\).
  2. Calculate the equation of the line of regression of \(t\) on \(x\). Give your answer in the form \(\mathrm { t } = \mathrm { a } + \mathrm { bx }\) where \(a\) and \(b\) are constants to be determined.
  3. Explain the relevance of the quantity \(\sum ( t - a - b x ) ^ { 2 }\) to your answer to part (b).
  4. Estimate the reaction time, in seconds, for a volunteer aged 42. It is subsequently decided to measure the reaction time in tenths of a second rather than in seconds (so, for example, a time of 0.6 seconds would now be recorded as 6 ).
    1. State what effect, if any, this change would have on your answer to part (a).
    2. State what effect, if any, this change would have on your answer to part (b). It is known that the sample of 15 volunteers consisted almost entirely of students and retired people.
  5. Using this information, and the value of the product moment correlation coefficient, comment on the reliability of your estimate in part (d).
OCR Further Statistics AS 2020 November Q3
9 marks Moderate -0.3
3 An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
AccountABCDEFGH
\(p\)1.62.12.42.72.83.35.28.4
\(q\)1.62.32.22.23.12.97.64.8
\(n = 8 \quad \sum \mathrm { p } = 28.5 \quad \sum \mathrm { q } = 26.7 \quad \sum \mathrm { p } ^ { 2 } = 136.35 \quad \sum \mathrm { q } ^ { 2 } = 116.35 \quad \sum \mathrm { pq } = 116.70\) \includegraphics[max width=\textwidth, alt={}, center]{bf1468d1-e02e-47d2-bf41-5bc8f5b4d7c4-3_782_1280_998_242}
  1. State which, if either, of the variables \(p\) and \(q\) is independent.
  2. Calculate the equation of the regression line of \(q\) on \(p\).
    1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
    2. Give two reasons why this estimate could be considered reliable.
  3. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
OCR Further Statistics AS 2021 November Q3
7 marks Moderate -0.3
3
  1. Using the scatter diagram in the Printed Answer Booklet, explain what is meant by least squares in the context of a regression line of \(y\) on \(x\).
  2. A set of bivariate data \(( t , u )\) is summarised as follows. \(n = 5 \quad \sum t = 35 \quad \sum u = 54\) \(\sum t ^ { 2 } = 285 \quad \sum u ^ { 2 } = 758 \quad \sum \mathrm { tu } = 460\)
    1. Calculate the equation of the regression line of \(u\) on \(t\).
    2. The variables \(t\) and \(u\) are now scaled using the following scaling. \(\mathrm { v } = 2 \mathrm { t } , \mathrm { w } = \mathrm { u } + 4\) Find the equation of the regression line of \(w\) on \(v\), giving your equation in the form \(w = f ( v )\).
OCR Further Statistics 2024 June Q7
8 marks Standard +0.3
7 The coordinates of a set of 10 points are denoted by ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) for \(i = 1,2 , \ldots , 10\). For a particular set of values of ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) and any constants \(a\) and \(b\) it can be shown that \(\Sigma \left( y _ { i } - a - b x _ { i } \right) ^ { 2 } = 10 ( 11 - a - 6 b ) ^ { 2 } + 126 \left( b - \frac { 83 } { 42 } \right) ^ { 2 } + \frac { 139 } { 14 }\).
    1. Explain why \(\sum \left( \mathrm { y } _ { \mathrm { i } } - \mathrm { a } - \mathrm { bx } _ { \mathrm { i } } \right) ^ { 2 }\) is minimised by taking \(b = \frac { 83 } { 42 }\) and \(\mathrm { a } = 11 - 6 \mathrm {~b}\).
    2. Hence explain why the equation of the regression line of \(y\) on \(x\) for these points is given by the corresponding values of \(a\) and \(b\) (so that the equation is \(\mathrm { y } = \frac { 83 } { 42 } \mathrm { x } - \frac { 6 } { 7 }\) ).
  1. State which of the following terms cannot apply to the variable \(X\) if the regression line of \(y\) on \(x\) can be used for estimating values of \(Y\). Dependent Independent Controlled Response
  2. Use the regression line to estimate the value of \(y\) corresponding to \(x = 8\).
  3. State what must be true of the value \(x = 8\) if the estimate in part (c) is to be reliable.
  4. Variables \(u\) and \(v\) are related to \(x\) and \(y\) by the following relationships. \(u = 2 + 4 x \quad v = 8 - 2 y\) Show that the gradient of the regression line of \(v\) on \(u\) is very close to - 1 .
OCR Further Statistics 2021 November Q1
6 marks Standard +0.3
1 At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows. \(\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297\)
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(\mathrm { c } = \frac { 5 } { 9 } ( \mathrm { f } - 32 )\). Find the equation of the new regression line.
OCR Further Statistics Specimen Q1
6 marks Easy -1.2
1 The table below shows the typical stopping distances \(d\) metres for a particular car travelling at \(v\) miles per hour.
\(v\)203040506070
\(d\)132436527294
  1. State each of the following words that describe the variable \(v\). \section*{Independent Dependent Controlled Response}
  2. Calculate the equation of the regression line of \(d\) on \(v\).
  3. Use the equation found in part (ii) to estimate the typical stopping distance when this car is travelling at 45 miles per hour. It is given that the product moment correlation coefficient for the data is 0.990 correct to three significant figures.
  4. Explain whether your estimate found in part (iii) is reliable.
Edexcel S1 2016 October Q4
15 marks Moderate -0.3
  1. A doctor is studying the scans of 30 -week old foetuses. She takes a random sample of 8 scans and measures the length, \(f \mathrm {~mm}\), of the leg bone called the femur. She obtains the following results.
$$\begin{array} { l l l l l l l l } 52 & 53 & 56 & 57 & 57 & 59 & 60 & 62 \end{array}$$
  1. Show that \(\mathrm { S } _ { f f } = 80\) The doctor also measures the head circumference, \(h \mathrm {~mm}\), of each foetus and her results are summarised as $$\sum h = 2209 \quad \sum h ^ { 2 } = 610463 \quad \mathrm {~S} _ { f h } = 182$$
  2. Find \(\mathrm { S } _ { h h }\)
  3. Calculate the product moment correlation coefficient between the length of the femur and the head circumference for these data. The doctor believes that there is a linear relationship between the length of the femur and the head circumference of 30-week old foetuses.
  4. State, giving a reason, whether or not your calculation in part (c) supports the doctor's belief.
  5. Find an equation of the regression line of \(h\) on \(f\). The doctor plans in future to measure the femur length, \(f\), and then use the regression line to estimate the corresponding head circumference, \(h\). A statistician points out that there will always be the chance of an error between the true head circumference and the estimated value of the head circumference. Given that the error, \(E \mathrm {~mm}\), has the normal distribution \(\mathrm { N } \left( 0,4 ^ { 2 } \right)\)
  6. find the probability that the estimate is within 3 mm of the true value.
Edexcel S1 2018 October Q1
11 marks Moderate -0.8
  1. The heights above sea level ( \(h\) hundred metres) and the temperatures ( \(t ^ { \circ } \mathrm { C }\) ) at 12 randomly selected places in France, at 7 am on July 31st, were recorded.
    The data are summarised as follows
    1. Find the value of \(S _ { t t }\)
    2. Calculate the product moment correlation coefficient for these data.
    3. Interpret the relationship between \(t\) and \(h\).
    4. Find an equation of the regression line of \(t\) on \(h\).
    At 7 am on July 31st Yinka is on holiday in South Africa. He uses the regression equation to estimate the temperature when the height above sea level is 500 m .
  2. Find the estimated temperature Yinka calculates.
  3. Comment on the validity of your answer in part (e). $$\sum h = 112 \quad \sum t = 136 \quad \sum t ^ { 2 } = 1828 \quad S _ { h t } = - 236 \quad S _ { h h } = 297$$
  4. Find the value of \(S\) (2)
Edexcel S1 2022 October Q2
13 marks Moderate -0.5
  1. The production cost, \(\pounds c\) million, of a film and the total ticket sales, \(\pounds t\) million, earned by the film are recorded for a sample of 40 films.
Some summary statistics are given below. $$\sum c = 1634 \quad \sum t = 1361 \quad \sum t ^ { 2 } = 82873 \quad \sum c t = 83634 \quad \mathrm {~S} _ { c c } = 28732.1$$
  1. Find the exact value of \(\mathrm { S } _ { t t }\) and the exact value of \(\mathrm { S } _ { c t }\)
  2. Calculate the value of the product moment correlation coefficient for these data.
  3. Give an interpretation of your answer to part (b)
  4. Show that the equation of the linear regression line of \(t\) on \(c\) can be written as $$t = - 5.84 + 0.976 c$$ where the values of the intercept and gradient are given to 3 significant figures.
  5. Find the expected total ticket sales for a film with a production cost of \(\pounds 90\) million. Using the regression line in part (d)
  6. find the range of values of the production cost of a film for which the total ticket sales are less than \(80 \%\) of its production cost.
Edexcel S1 2023 October Q6
12 marks Moderate -0.3
  1. The variables \(x\) and \(y\) have the following regression equations based on the same 12 observations.
\cline { 2 - 2 } \multicolumn{1}{c|}{}Regression equation
\(y\) on \(x\)\(y = 1.4 x + 1.5\)
\(x\) on \(y\)\(x = 1.2 + 0.2 y\)
    1. Find the point of intersection of these lines.
    2. Hence show that \(\sum x = 25\) Given that $$\sum x y = \frac { 6961 } { 60 }$$
  1. Find \(S _ { x y }\)
  2. Find the product moment correlation coefficient between \(x\) and \(y\)
Edexcel S1 2018 Specimen Q1
12 marks Moderate -0.8
  1. The percentage oil content, \(p\), and the weight, \(w\) milligrams, of each of 10 randomly selected sunflower seeds were recorded. These data are summarised below.
$$\sum w ^ { 2 } = 41252 \quad \sum w p = 27557.8 \quad \sum w = 640 \quad \sum p = 431 \quad \mathrm {~S} _ { p p } = 2.72$$
  1. Find the value of \(\mathrm { S } _ { w w }\) and the value of \(\mathrm { S } _ { w p }\)
  2. Calculate the product moment correlation coefficient between \(p\) and \(w\)
  3. Give an interpretation of your product moment correlation coefficient. The equation of the regression line of \(p\) on \(w\) is given in the form \(p = a + b w\)
  4. Find the equation of the regression line of \(p\) on \(w\)
  5. Hence estimate the percentage oil content of a sunflower seed which weighs 60 milligrams. \(\_\_\_\_\) VAYV SIHI NI JIIIM ION OC
    VJYV SIHI NI JIIIM ION OC
    VJYV SIHI NI JLIYM ION OC
Edexcel S1 Specimen Q6
14 marks Moderate -0.8
  1. A travel agent sells flights to different destinations from Beerow airport. The distance \(d\), measured in 100 km , of the destination from the airport and the fare \(\pounds f\) are recorded for a random sample of 6 destinations.
Destination\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)
\(d\)2.24.06.02.58.05.0
\(f\)182025233228
$$\text { [You may use } \sum d ^ { 2 } = 152.09 \quad \sum f ^ { 2 } = 3686 \quad \sum f d = 723.1 \text { ] }$$
  1. Using the axes below, complete a scatter diagram to illustrate this information.
  2. Explain why a linear regression model may be appropriate to describe the relationship between \(f\) and \(d\).
  3. Calculate \(S _ { d d }\) and \(S _ { f d }\)
  4. Calculate the equation of the regression line of \(f\) on \(d\) giving your answer in the form \(f = a + b d\).
  5. Give an interpretation of the value of \(b\). Jane is planning her holiday and wishes to fly from Beerow airport to a destination \(t \mathrm {~km}\) away. A rival travel agent charges 5 p per km.
  6. Find the range of values of \(t\) for which the first travel agent is cheaper than the rival. \includegraphics[max width=\textwidth, alt={}, center]{61983561-79f7-4883-8ae7-ab1f4955d444-20_967_1630_1722_164}
Edexcel S1 2001 January Q6
18 marks Moderate -0.8
6. A local authority is investigating the cost of reconditioning its incinerators. Data from 10 randomly chosen incinerators were collected. The variables monitored were the operating time \(x\) (in thousands of hours) since last reconditioning and the reconditioning cost \(y\) (in \(\pounds 1000\) ). None of the incinerators had been used for more than 3000 hours since last reconditioning. The data are summarised below, $$\Sigma x = 25.0 , \Sigma x ^ { 2 } = 65.68 , \Sigma y = 50.0 , \Sigma y ^ { 2 } = 260.48 , \Sigma x y = 130.64 .$$
  1. Find \(\mathrm { S } _ { x x } , \mathrm {~S} _ { x y } , \mathrm {~S} _ { y y }\).
  2. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. Explain why this value might support the fitting of a linear regression model of the form \(y = a + b x\).
  4. Find the values of \(a\) and \(b\).
  5. Give an interpretation of \(a\).
  6. Estimate
    1. the reconditioning cost for an operating time of 2400 hours,
    2. the financial effect of an increase of 1500 hours in operating time.
  7. Suggest why the authority might be cautious about making a prediction of the reconditioning cost of an incinerator which had been operating for 4500 hours since its last reconditioning.
Edexcel S1 2003 January Q6
19 marks Moderate -0.3
6. The chief executive of Rex cars wants to investigate the relationship between the number of new car sales and the amount of money spent on advertising. She collects data from company records on the number of new car sales, \(c\), and the cost of advertising each year, \(p\) (£000). The data are shown in the table below.
YearNumber of new car sale, \(c\)Cost of advertising (£000), \(p\)
19904240120
19914380126
19924420132
19934440134
19944430137
19954520144
19964590148
19974660150
19984700153
19994790158
  1. Using the coding \(x = ( p - 100 )\) and \(y = \frac { 1 } { 10 } ( c - 4000 )\), draw a scatter diagram to represent these data. Explain why \(x\) is the explanatory variable.
  2. Find the equation of the least squares regression line of \(y\) on \(x\). $$\text { [Use } \left. \Sigma x = 402 , \Sigma y = 517 , \Sigma x ^ { 2 } = 17538 \text { and } \Sigma x y = 22611 . \right]$$
  3. Deduce the equation of the least squares regression line of \(c\) on \(p\) in the form \(c = a + b p\).
  4. Interpret the value of \(a\).
  5. Predict the number of extra new cars sales for an increase of \(\pounds 2000\) in advertising budget. Comment on the validity of your answer.
    (2)
Edexcel S1 2005 January Q3
15 marks Easy -1.3
3. The following table shows the height \(x\), to the nearest cm , and the weight \(y\), to the nearest kg , of a random sample of 12 students.
\(x\)148164156172147184162155182165175152
\(y\)395956774477654980727052
  1. On graph paper, draw a scatter diagram to represent these data.
  2. Write down, with a reason, whether the correlation coefficient between \(x\) and \(y\) is positive or negative. The data in the table can be summarised as follows. $$\Sigma x = 1962 , \quad \Sigma y = 740 , \quad \Sigma y ^ { 2 } = 47746 , \quad \Sigma x y = 122783 , \quad S _ { x x } = 1745 .$$
  3. Find \(S _ { x y }\). The equation of the regression line of \(y\) on \(x\) is \(y = - 106.331 + b x\).
  4. Find, to 3 decimal places, the value of \(b\).
  5. Find, to 3 significant figures, the mean \(\bar { y }\) and the standard deviation \(s\) of the weights of this sample of students.
  6. Find the values of \(\bar { y } \pm 1.96 s\).
  7. Comment on whether or not you think that the weights of these students could be modelled by a normal distribution.
Edexcel S1 2006 January Q3
18 marks Easy -1.2
3. A manufacturer stores drums of chemicals. During storage, evaporation takes place. A random sample of 10 drums was taken and the time in storage, \(x\) weeks, and the evaporation loss, \(y \mathrm { ml }\), are shown in the table below.
\(x\)3568101213151618
\(y\)36505361697982908896
  1. On graph paper, draw a scatter diagram to represent these data.
  2. Give a reason to support fitting a regression model of the form \(y = a + b x\) to these data.
  3. Find, to 2 decimal places, the value of \(a\) and the value of \(b\). $$\text { (You may use } \Sigma x ^ { 2 } = 1352 , \Sigma y ^ { 2 } = 53112 \text { and } \Sigma x y = 8354 \text {.) }$$
  4. Give an interpretation of the value of \(b\).
  5. Using your model, predict the amount of evaporation that would take place after
    1. 19 weeks,
    2. 35 weeks.
  6. Comment, with a reason, on the reliability of each of your predictions.
Edexcel S1 2008 January Q4
10 marks Moderate -0.8
4. A second hand car dealer has 10 cars for sale. She decides to investigate the link between the age of the cars, \(x\) years, and the mileage, \(y\) thousand miles. The data collected from the cars are shown in the table below.
Age, \(x\)
(years)
22.5344.54.55366.5
Mileage, \(y\)
(thousands)
22343337404549305858
[You may assume that \(\sum x = 41 , \sum y = 406 , \sum x ^ { 2 } = 188 , \sum x y = 1818.5\) ]
  1. Find \(S _ { x x }\) and \(S _ { x y }\).
  2. Find the equation of the least squares regression line in the form \(y = a + b x\). Give the values of \(a\) and \(b\) to 2 decimal places.
  3. Give a practical interpretation of the slope \(b\).
  4. Using your answer to part (b), find the mileage predicted by the regression line for a 5 year old car. \(\_\_\_\_\)
Edexcel S1 2009 January Q1
11 marks Moderate -0.8
  1. A teacher is monitoring the progress of students using a computer based revision course. The improvement in performance, \(y\) marks, is recorded for each student along with the time, \(x\) hours, that the student spent using the revision course. The results for a random sample of 10 students are recorded below.
\(x\)
hours
1.03.54.01.51.30.51.82.52.33.0
\(y\)
marks
5302710- 3- 5715- 1020
$$\text { [You may use } \sum x = 21.4 , \quad \sum y = 96 , \quad \sum x ^ { 2 } = 57.22 , \quad \sum x y = 313.7 \text { ] }$$
  1. Calculate \(S _ { x x }\) and \(S _ { x y }\).
  2. Find the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\).
  3. Give an interpretation of the gradient of your regression line. Rosemary spends 3.3 hours using the revision course.
  4. Predict her improvement in marks. Lee spends 8 hours using the revision course claiming that this should give him an improvement in performance of over 60 marks.
  5. Comment on Lee's claim.
Edexcel S1 2011 January Q4
6 marks Moderate -0.8
  1. A farmer collected data on the annual rainfall, \(x \mathrm {~cm}\), and the annual yield of peas, \(p\) tonnes per acre.
The data for annual rainfall was coded using \(v = \frac { x - 5 } { 10 }\) and the following statistics were found. $$S _ { v v } = 5.753 \quad S _ { p v } = 1.688 \quad S _ { p p } = 1.168 \quad \bar { p } = 3.22 \quad \bar { v } = 4.42$$
  1. Find the equation of the regression line of \(p\) on \(v\) in the form \(p = a + b v\).
  2. Using your regression line estimate the annual yield of peas per acre when the annual rainfall is 85 cm .
Edexcel S1 2012 January Q5
15 marks Moderate -0.8
  1. The age, \(t\) years, and weight, \(w\) grams, of each of 10 coins were recorded. These data are summarised below.
$$\sum t ^ { 2 } = 2688 \quad \sum t w = 1760.62 \quad \sum t = 158 \quad \sum w = 111.75 \quad S _ { w w } = 0.16$$
  1. Find \(S _ { t t }\) and \(S _ { t w }\) for these data.
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(t\) and \(w\).
  3. Find the equation of the regression line of \(w\) on \(t\) in the form \(w = a + b t\)
  4. State, with a reason, which variable is the explanatory variable.
  5. Using this model, estimate
    1. the weight of a coin which is 5 years old,
    2. the effect of an increase of 4 years in age on the weight of a coin. It was discovered that a coin in the original sample, which was 5 years old and weighed 20 grams, was a fake.
  6. State, without any further calculations, whether the exclusion of this coin would increase or decrease the value of the product moment correlation coefficient. Give a reason for your answer.
Edexcel S1 2013 January Q3
10 marks Moderate -0.8
3. A biologist is comparing the intervals ( \(m\) seconds) between the mating calls of a certain species of tree frog and the surrounding temperature ( \(t { } ^ { \circ } \mathrm { C }\) ). The following results were obtained.
\(t { } ^ { \circ } \mathrm { C }\)813141515202530
\(m\) secs6.54.5654321
$$\text { (You may use } \sum t m = 469.5 , \quad \mathrm {~S} _ { t t } = 354 , \quad \mathrm {~S} _ { m m } = 25.5 \text { ) }$$
  1. Show that \(\mathrm { S } _ { t m } = - 90.5\)
  2. Find the equation of the regression line of \(m\) on \(t\) giving your answer in the form \(m = a + b t\).
  3. Use your regression line to estimate the time interval between mating calls when the surrounding temperature is \(10 ^ { \circ } \mathrm { C }\).
  4. Comment on the reliability of this estimate, giving a reason for your answer.
Edexcel S1 2001 June Q7
16 marks Moderate -0.3
7. A music teacher monitored the sight-reading ability of one of her pupils over a 10 week period. At the end of each week, the pupil was given a new piece to sight-read and the teacher noted the number of errors \(y\). She also recorded the
number of hours \(x\) that the pupil had practised each week. The data are shown in the table below.
\(x\)1215711184693
\(y\)84138181215141216
  1. Plot these data on a scatter diagram.
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). $$\text { (You may use } \left. \Sigma x ^ { 2 } = 746 , \Sigma x y = 749 . \right)$$
  3. Give an interpretation of the slope and the intercept of your regression line.
  4. State whether or not you think the regression model is reasonable
    1. for the range of \(x\)-values given in the table,
    2. for all possible \(x\)-values. In each case justify your answer either by giving a reason for accepting the model or by suggesting an alternative model. END