2.02c Scatter diagrams and regression lines

115 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI Paper 2 2023 June Q14
8 marks Moderate -0.8
14 The pre-release material contains information concerning the median income of taxpayers in \(\pounds\) and the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C, including English and Maths, for different areas of London. Some of the data for 2014/15 is shown in Fig. 14.1. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Fig. 14.1}
Median Income of Taxpayers in £Percentage of Pupils Achieving 5 or more A*-C, including English and Maths
City of London61100\#N/A
Barking and Dagenham2180054.0
Barnet2710070.1
Bexley2440055.0
Brent2270060.0
Bromley2810068.0
\end{table} A student investigated whether there is any relationship between median income of taxpayers and percentage of pupils achieving 5 or more GCSEs at grade A*-C, including English and Maths.
  1. With reference to Fig. 14.1, explain how the data should be cleaned before any analysis can take place. After the data was cleaned, the student used software to draw the scatter diagram shown in Fig. 14.2. Scatter diagram to show percentage of pupils achieving 5 A*-C grades against median income of taxpayers \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 14.2} \includegraphics[alt={},max width=\textwidth]{11788aaf-98fb-4a78-8a40-a40743b1fe15-10_574_1481_1900_241}
    \end{figure} The student calculated that the product moment correlation coefficient for these data is 0.3743 .
  2. Give two reasons why it may not be appropriate to use a linear model for the relationship between median income of taxpayers in \(\pounds\) and the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C. The student carried out some further analysis. The results are shown in Fig. 14.3. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 14.3}
    median income of
    taxpayers in \(\pounds\)
    percentage of pupils
    achieving \(5 + \mathrm { A } ^ { * } - \mathrm { C }\)
    mean2721661.0
    standard deviation4177.55.32
    \end{table} The student identified three outliers in total.
    The student decided to remove these outliers and recalculate the product moment correlation coefficient.
  3. Explain whether the new value of the product moment correlation coefficient would be between 0.3743 and 1 or between 0 and 0.3743 .
OCR MEI Paper 2 2024 June Q11
5 marks Moderate -0.8
11 A householder is investigating whether there is any relationship between his monthly cost of gas and his monthly cost of electricity, both measured in pounds ( \(\pounds\) ). The householder collects a random sample of monthly costs and presents them in the scatter diagram below. \includegraphics[max width=\textwidth, alt={}, center]{8e48bbd3-2166-49e7-8906-833261f331ca-08_604_1452_392_244} One of the points on the diagram represents the energy costs in a month when the householder was away on holiday for three weeks. The other points represent the energy costs in months when the householder did not go away on holiday.
  1. On the copy of the diagram in the Printed Answer Booklet, circle the point which represents the month when the householder was most likely to have been away on holiday for three weeks.
  2. With reference to the diagram, describe the relationship between the cost of gas and the cost of electricity. The householder decides to test whether there is evidence to suggest that there is any association between the monthly cost of gas and the monthly cost of electricity. The value of Spearman's rank correlation coefficient for this sample is 0.4359 and the associated \(p\)-value is 0.09195 .
  3. Determine whether there is any evidence to suggest, at the \(5 \%\) level, that there is any association between the monthly cost of gas and the monthly cost of electricity.
Edexcel S1 2021 June Q6
16 marks Standard +0.3
  1. Two economics students, Andi and Behrouz, are studying some data relating to unemployment, \(x \%\), and increase in wages, \(y \%\), for a European country. The least squares regression line of \(y\) on \(x\) has equation
$$y = 3.684 - 0.3242 x$$ and $$\sum y = 23.7 \quad \sum y ^ { 2 } = 42.63 \quad \sum x ^ { 2 } = 756.81 \quad n = 16$$
  1. Show that \(\mathrm { S } _ { y y } = 7.524375\)
  2. Find \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(x\) and \(y\). Behrouz claims that, assuming the model is valid, the data show that when unemployment is 2\% wages increase at over 3\%
  4. Explain how Behrouz could have come to this conclusion. Andi uses the formula $$\text { range } = \text { mean } \pm 3 \times \text { standard deviation }$$ to estimate the range of values for \(x\).
  5. Find estimates of the minimum value and the maximum value of \(x\) in these data using Andi's formula.
  6. Comment, giving a reason, on the reliability of Behrouz's claim. Andi suggests using the regression line with equation \(y = 3.684 - 0.3242 x\) to estimate unemployment when wages are increasing at \(2 \%\)
  7. Comment, giving a reason, on Andi's suggestion.
    \includegraphics[max width=\textwidth, alt={}]{a439724e-b570-434d-bf75-de2b50915042-20_2647_1835_118_116}
Edexcel S1 2022 June Q2
14 marks Moderate -0.8
  1. Stuart is investigating the relationship between Gross Domestic Product (GDP) and the size of the population for a particular country.
    He takes a random sample of 9 years and records the size of the population, \(t\) millions, and the GDP, \(g\) billion dollars for each of these years.
The data are summarised as $$n = 9 \quad \sum t = 7.87 \quad \sum g = 144.84 \quad \sum g ^ { 2 } = 3624.41 \quad S _ { t t } = 1.29 \quad S _ { t g } = 40.25$$
  1. Calculate the product moment correlation coefficient between \(t\) and \(g\)
  2. Give an interpretation of your product moment correlation coefficient.
  3. Find the equation of the least squares regression line of \(g\) on \(t\) in the form \(g = a + b t\)
  4. Give an interpretation of the value of \(b\) in your regression line.
    1. Use the regression line from part (c) to estimate the GDP, in billions of dollars, for a population of 7000000
    2. Comment on the reliability of your answer in part (i). Give a reason, in context, for your answer. Using the regression line from part (c), Stuart estimates that for a population increase of \(x\) million there will be an increase of 0.1 billion dollars in GDP.
  5. Find the value of \(x\)
Edexcel S1 2024 June Q4
13 marks Moderate -0.3
  1. A biologist is studying bears. The biologist records the length, \(d \mathrm {~cm}\), and the girth, \(g \mathrm {~cm}\), of 8 bears. The biologist summarises the data as follows
$$\begin{gathered} \sum d = 1456.8 \quad \sum g = 713.2 \quad \sum d g = 141978.84 \quad \sum g ^ { 2 } = 72675.98 \\ S _ { d d } = 16769.78 \end{gathered}$$
  1. Calculate the exact value of \(S _ { d g }\) and the exact value of \(S _ { g g }\)
  2. Calculate the value of the product moment correlation coefficient between \(d\) and \(g\)
  3. Show that the equation of the regression line of \(g\) on \(d\) can be written as $$g = - 42.3 + 0.722 d$$ where the values of the intercept and gradient are given to 3 significant figures.
  4. Give an interpretation, in context, of the gradient of the regression line. Using the equation of the regression line given in part (c)
    1. estimate the girth of a bear with a length of 2.5 metres,
    2. explain why an estimate for the girth of a bear with a length of 0.5 metres is not reliable. Using the regression line from part (c), the biologist estimates that for each \(x \mathrm {~cm}\) increase in the length of a bear there will be a 17.3 cm increase in the girth.
  5. Find the value of \(x\)
Edexcel S1 2005 January Q3
15 marks Easy -1.3
3. The following table shows the height \(x\), to the nearest cm , and the weight \(y\), to the nearest kg , of a random sample of 12 students.
\(x\)148164156172147184162155182165175152
\(y\)395956774477654980727052
  1. On graph paper, draw a scatter diagram to represent these data.
  2. Write down, with a reason, whether the correlation coefficient between \(x\) and \(y\) is positive or negative. The data in the table can be summarised as follows. $$\Sigma x = 1962 , \quad \Sigma y = 740 , \quad \Sigma y ^ { 2 } = 47746 , \quad \Sigma x y = 122783 , \quad S _ { x x } = 1745 .$$
  3. Find \(S _ { x y }\). The equation of the regression line of \(y\) on \(x\) is \(y = - 106.331 + b x\).
  4. Find, to 3 decimal places, the value of \(b\).
  5. Find, to 3 significant figures, the mean \(\bar { y }\) and the standard deviation \(s\) of the weights of this sample of students.
  6. Find the values of \(\bar { y } \pm 1.96 s\).
  7. Comment on whether or not you think that the weights of these students could be modelled by a normal distribution.
Edexcel S1 2006 January Q3
18 marks Easy -1.2
3. A manufacturer stores drums of chemicals. During storage, evaporation takes place. A random sample of 10 drums was taken and the time in storage, \(x\) weeks, and the evaporation loss, \(y \mathrm { ml }\), are shown in the table below.
\(x\)3568101213151618
\(y\)36505361697982908896
  1. On graph paper, draw a scatter diagram to represent these data.
  2. Give a reason to support fitting a regression model of the form \(y = a + b x\) to these data.
  3. Find, to 2 decimal places, the value of \(a\) and the value of \(b\). $$\text { (You may use } \Sigma x ^ { 2 } = 1352 , \Sigma y ^ { 2 } = 53112 \text { and } \Sigma x y = 8354 \text {.) }$$
  4. Give an interpretation of the value of \(b\).
  5. Using your model, predict the amount of evaporation that would take place after
    1. 19 weeks,
    2. 35 weeks.
  6. Comment, with a reason, on the reliability of each of your predictions.
Edexcel S1 2012 January Q5
15 marks Moderate -0.8
  1. The age, \(t\) years, and weight, \(w\) grams, of each of 10 coins were recorded. These data are summarised below.
$$\sum t ^ { 2 } = 2688 \quad \sum t w = 1760.62 \quad \sum t = 158 \quad \sum w = 111.75 \quad S _ { w w } = 0.16$$
  1. Find \(S _ { t t }\) and \(S _ { t w }\) for these data.
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(t\) and \(w\).
  3. Find the equation of the regression line of \(w\) on \(t\) in the form \(w = a + b t\)
  4. State, with a reason, which variable is the explanatory variable.
  5. Using this model, estimate
    1. the weight of a coin which is 5 years old,
    2. the effect of an increase of 4 years in age on the weight of a coin. It was discovered that a coin in the original sample, which was 5 years old and weighed 20 grams, was a fake.
  6. State, without any further calculations, whether the exclusion of this coin would increase or decrease the value of the product moment correlation coefficient. Give a reason for your answer.
Edexcel S1 2013 January Q1
7 marks Easy -1.2
  1. A teacher asked a random sample of 10 students to record the number of hours of television, \(t\), they watched in the week before their mock exam. She then calculated their grade, \(g\), in their mock exam. The results are summarised as follows.
$$\sum t = 258 \quad \sum t ^ { 2 } = 8702 \quad \sum g = 63.6 \quad \mathrm {~S} _ { g g } = 7.864 \quad \sum g t = 1550.2$$
  1. Find \(\mathrm { S } _ { t t }\) and \(\mathrm { S } _ { g t }\)
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(t\) and \(g\). The teacher also recorded the number of hours of revision, \(v\), these 10 students completed during the week before their mock exam. The correlation coefficient between \(t\) and \(v\) was -0.753
  3. Describe, giving a reason, the nature of the correlation you would expect to find between \(v\) and \(g\).
Edexcel S1 2013 January Q3
10 marks Moderate -0.8
3. A biologist is comparing the intervals ( \(m\) seconds) between the mating calls of a certain species of tree frog and the surrounding temperature ( \(t { } ^ { \circ } \mathrm { C }\) ). The following results were obtained.
\(t { } ^ { \circ } \mathrm { C }\)813141515202530
\(m\) secs6.54.5654321
$$\text { (You may use } \sum t m = 469.5 , \quad \mathrm {~S} _ { t t } = 354 , \quad \mathrm {~S} _ { m m } = 25.5 \text { ) }$$
  1. Show that \(\mathrm { S } _ { t m } = - 90.5\)
  2. Find the equation of the regression line of \(m\) on \(t\) giving your answer in the form \(m = a + b t\).
  3. Use your regression line to estimate the time interval between mating calls when the surrounding temperature is \(10 ^ { \circ } \mathrm { C }\).
  4. Comment on the reliability of this estimate, giving a reason for your answer.
Edexcel S1 2001 June Q2
5 marks Easy -1.2
2. On a particular day in summer 1993 at 0800 hours the height above sea level, \(x\) metres, and the temperature, \(y ^ { \circ } \mathrm { C }\), were recorded in 10 Mediterranean towns. The following summary statistics were calculated from the results. $$\Sigma x = 7300 , \Sigma x ^ { 2 } = 6599600 , S _ { x y } = - 13060 , S _ { y y } = 140.9 .$$
  1. Find \(S _ { x x }\).
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(x\) and \(y\).
  3. Give an interpretation of your coefficient.
Edexcel S1 2001 June Q7
16 marks Moderate -0.3
7. A music teacher monitored the sight-reading ability of one of her pupils over a 10 week period. At the end of each week, the pupil was given a new piece to sight-read and the teacher noted the number of errors \(y\). She also recorded the
number of hours \(x\) that the pupil had practised each week. The data are shown in the table below.
\(x\)1215711184693
\(y\)84138181215141216
  1. Plot these data on a scatter diagram.
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). $$\text { (You may use } \left. \Sigma x ^ { 2 } = 746 , \Sigma x y = 749 . \right)$$
  3. Give an interpretation of the slope and the intercept of your regression line.
  4. State whether or not you think the regression model is reasonable
    1. for the range of \(x\)-values given in the table,
    2. for all possible \(x\)-values. In each case justify your answer either by giving a reason for accepting the model or by suggesting an alternative model. END
Edexcel S1 2002 June Q7
16 marks Moderate -0.8
7. An ice cream seller believes that there is a relationship between the temperature on a summer day and the number of ice creams sold. Over a period of 10 days he records the temperature at 1 p.m., \(t ^ { \circ } \mathrm { C }\), and the number of ice creams sold, \(c\), in the next hour. The data he collects is summarised in the table below.
\(t\)\(c\)
1324
2255
1735
2045
1020
1530
1939
1219
1836
2354
[Use \(\left. \Sigma t ^ { 2 } = 3025 , \Sigma c ^ { 2 } = 14245 , \Sigma c t = 6526 .\right]\)
  1. Calculate the value of the product moment correlation coefficient between \(t\) and \(c\).
  2. State whether or not your value supports the use of a regression equation to predict the number of ice creams sold. Give a reason for your answer.
  3. Find the equation of the least squares regression line of \(c\) on \(t\) in the form \(c = a + b t\).
  4. Interpret the value of \(b\).
  5. Estimate the number of ice creams sold between 1 p.m. and 2 p.m. when the temperature at 1 p.m. is \(16 ^ { \circ } \mathrm { C }\).
    (3)
  6. At 1 p.m. on a particular day, the highest temperature for 50 years was recorded. Give a reason why you should not use the regression equation to predict ice cream sales on that day.
    (1)
Edexcel S1 2004 June Q2
18 marks Moderate -0.8
2. A researcher thinks there is a link between a person's height and level of confidence. She measured the height \(h\), to the nearest cm , of a random sample of 9 people. She also devised a test to measure the level of confidence \(c\) of each person. The data are shown in the table below.
\(h\)179169187166162193161177168
\(c\)569561579561540598542565573
[You may use \(\Sigma h ^ { 2 } = 272094 , \Sigma c ^ { 2 } = 2878966 , \Sigma h c = 884484\) ]
  1. Draw a scatter diagram to illustrate these data.
  2. Find exact values of \(S _ { h c } S _ { h h }\) and \(S _ { c c }\).
  3. Calculate the value of the product moment correlation coefficient for these data.
  4. Give an interpretation of your correlation coefficient.
  5. Calculate the equation of the regression line of \(c\) on \(h\) in the form \(c = a + b h\).
  6. Estimate the level of confidence of a person of height 180 cm .
  7. State the range of values of \(h\) for which estimates of \(c\) are reliable.
Edexcel S1 2005 June Q1
6 marks Easy -1.2
  1. The scatter diagrams below were drawn by a student.
$$\begin{aligned} & y \underset { x } { \begin{array} { l l l l } & & \\ + & & & \\ + & + & + & \\ + & + & + \end{array} } \end{aligned}$$ The student calculated the value of the product moment correlation coefficient for each of the sets of data. The values were $$\begin{array} { l l l } 0.68 & - 0.79 & 0.08 \end{array}$$ Write down, with a reason, which value corresponds to which scatter diagram.
(6)
Edexcel S1 2007 June Q3
15 marks Moderate -0.3
3. A student is investigating the relationship between the price ( \(y\) pence) of 100 g of chocolate and the percentage ( \(x \%\) ) of cocoa solids in the chocolate.
The following data is obtained
Chocolate brandABC\(D\)\(E\)\(F\)G\(H\)
\(x\) (\% cocoa)1020303540506070
\(y\) (pence)3555401006090110130
(You may use: \(\sum x = 315 , \sum x ^ { 2 } = 15225 , \sum y = 620 , \sum y ^ { 2 } = 56550 , \sum x y = 28750\) )
  1. On the graph paper on page 9 draw a scatter diagram to represent these data.
  2. Show that \(S _ { x y } = 4337.5\) and find \(S _ { x x }\). The student believes that a linear relationship of the form \(y = a + b x\) could be used to describe these data.
  3. Use linear regression to find the value of \(a\) and the value of \(b\), giving your answers to 1 decimal place.
  4. Draw the regression line on your scatter diagram. The student believes that one brand of chocolate is overpriced.
  5. Use the scatter diagram to
    1. state which brand is overpriced,
    2. suggest a fair price for this brand. Give reasons for both your answers.
      \includegraphics[max width=\textwidth, alt={}]{045e10d2-1766-4399-aa0a-5619dd0cce0f-06_2454_1485_282_228}
      The data on page 8 has been repeated here to help you
      Chocolate brandA\(B\)\(C\)D\(E\)\(F\)G\(H\)
      \(x\) (\% cocoa)1020303540506070
      \(y\) (pence)3555401006090110130
      (You may use: \(\sum x = 315 , \sum x ^ { 2 } = 15225 , \sum y = 620 , \sum y ^ { 2 } = 56550 , \sum x y = 28750\) )
Edexcel S1 2010 June Q6
14 marks Moderate -0.8
6. A travel agent sells flights to different destinations from Beerow airport. The distance \(d\), measured in 100 km , of the destination from the airport and the fare \(\pounds f\) are recorded for a random sample of 6 destinations.
Destination\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)
\(d\)2.24.06.02.58.05.0
\(f\)182025233228
$$\text { [You may use } \sum d ^ { 2 } = 152.09 \quad \sum f ^ { 2 } = 3686 \quad \sum f d = 723.1 \text { ] }$$
  1. Using the axes below, complete a scatter diagram to illustrate this information.
  2. Explain why a linear regression model may be appropriate to describe the relationship between \(f\) and \(d\).
  3. Calculate \(S _ { d d }\) and \(S _ { f d }\)
  4. Calculate the equation of the regression line of \(f\) on \(d\) giving your answer in the form \(f = a + b d\).
  5. Give an interpretation of the value of \(b\). Jane is planning her holiday and wishes to fly from Beerow airport to a destination \(t \mathrm {~km}\) away. A rival travel agent charges 5 p per km.
  6. Find the range of values of \(t\) for which the first travel agent is cheaper than the rival. \includegraphics[max width=\textwidth, alt={}, center]{039e6fcf-3222-40cc-95ea-37b8dc4a4ddb-11_1013_1701_1718_116}
Edexcel S1 2012 June Q3
15 marks Moderate -0.5
3. A scientist is researching whether or not birds of prey exposed to pollutants lay eggs with thinner shells. He collects a random sample of egg shells from each of 6 different nests and tests for pollutant level, \(p\), and measures the thinning of the shell, \(t\). The results are shown in the table below.
\(p\)3830251512
\(t\)1391056
[You may use \(\sum p ^ { 2 } = 1967\) and \(\sum p t = 694\) ]
  1. Draw a scatter diagram on the axes on page 7 to represent these data.
  2. Explain why a linear regression model may be appropriate to describe the relationship between \(p\) and \(t\).
  3. Calculate the value of \(S _ { p t }\) and the value of \(S _ { p p }\).
  4. Find the equation of the regression line of \(t\) on \(p\), giving your answer in the form \(t = a + b p\).
  5. Plot the point ( \(\bar { p } , \bar { t }\) ) and draw the regression line on your scatter diagram. The scientist reviews similar studies and finds that pollutant levels above 16 are likely to result in the death of a chick soon after hatching.
  6. Estimate the minimum thinning of the shell that is likely to result in the death of a chick. \includegraphics[max width=\textwidth, alt={}, center]{0593544d-392d-465b-b922-c9cb1435abb5-05_1257_1568_301_173}
Edexcel S1 2014 June Q3
16 marks Moderate -0.8
3. A large company is analysing how much money it spends on paper in its offices every year. The number of employees, \(x\), and the amount of money spent on paper, \(p\) ( \(\pounds\) hundreds), in 8 randomly selected offices are given in the table below.
\(x\)891214731619
\(p\) (£ hundreds)40.536.130.439.432.631.143.445.7
$$\text { (You may use } \sum x ^ { 2 } = 1160 \quad \sum p = 299.2 \quad \sum p ^ { 2 } = 11422 \quad \sum x p = 3449.5 \text { ) }$$
  1. Show that \(S _ { p p } = 231.92\) and find the value of \(S _ { x x }\) and the value of \(S _ { x p }\)
  2. Calculate the product moment correlation coefficient between \(x\) and \(p\). The equation of the regression line of \(p\) on \(x\) is given in the form \(p = a + b x\).
  3. Show that, to 3 significant figures, \(b = 0.824\) and find the value of \(a\).
  4. Estimate the amount of money spent on paper in an office with 10 employees.
  5. Explain the effect each additional employee has on the amount of money spent on paper. Later the company realised it had made a mistake in adding up its costs, \(p\). The true costs were actually half of the values recorded. The product moment correlation coefficient and the equation of the linear regression line are recalculated using this information.
  6. Write down the new value of
    1. the product moment correlation coefficient,
    2. the gradient of the regression line.
Edexcel S1 2014 June Q3
13 marks Easy -1.2
3. The table shows data on the number of visitors to the UK in a month, \(v\) (1000s), and the amount of money they spent, \(m\) ( \(\pounds\) millions), for each of 8 months.
Number of visitors
\(v ( 1000 \mathrm {~s} )\)
24502480254024202350229024002460
Amount of money spent
\(m ( \pounds\) millions \()\)
13701350140013301270121013301350
You may use \(S _ { v v } = 42587.5 \quad S _ { v m } = 31512.5 \quad S _ { m m } = 25187.5 \quad \sum v = 19390 \quad \sum m = 10610\)
  1. Find the product moment correlation coefficient between \(m\) and \(v\).
  2. Give a reason to support fitting a regression model of the form \(m = a + b v\) to these data.
  3. Find the value of \(b\) correct to 3 decimal places.
  4. Find the equation of the regression line of \(m\) on \(v\).
  5. Interpret your value of \(b\).
  6. Use your answer to part (d) to estimate the amount of money spent when the number of visitors to the UK in a month is 2500000
  7. Comment on the reliability of your estimate in part (f). Give a reason for your answer.
Edexcel S1 2016 June Q1
11 marks Moderate -0.8
  1. A biologist is studying the behaviour of bees in a hive. Once a bee has located a source of food, it returns to the hive and performs a dance to indicate to the other bees how far away the source of the food is. The dance consists of a series of wiggles. The biologist records the distance, \(d\) metres, of the food source from the hive and the average number of wiggles, \(w\), in the dance.
Distance, \(\boldsymbol { d } \mathbf { m }\)305080100150400500650
Average number
of wiggles, \(\boldsymbol { w }\)
0.7251.2101.7752.2503.5186.3828.1859.555
[You may use \(\sum w = 33.6 \sum d w = 13833 \mathrm {~S} _ { d d } = 394600 \mathrm {~S} _ { w w } = 80.481\) (to 3 decimal places)]
  1. Show that \(\mathrm { S } _ { d w } = 5601\)
  2. State, giving a reason, which is the response variable.
  3. Calculate the product moment correlation coefficient for these data.
  4. Calculate the equation of the regression line of \(w\) on \(d\), giving your answer in the form \(w = a + b d\) A new source of food is located 350 m from the hive.
    1. Use your regression equation to estimate the average number of wiggles in the corresponding dance.
    2. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 Q6
16 marks Moderate -0.8
6. To test the heating of tyre material, tyres are run on a test rig at chosen speeds under given conditions of load, pressure and surrounding temperature. The following table gives values of \(x\), the test rig speed in miles per hour (mph), and the temperature, \(y ^ { \circ } \mathrm { C }\), generated in the shoulder of the tyre for a particular tyre material.
\(x ( \mathrm { mph } )\)1520253035404550
\(y \left( { } ^ { \circ } \mathrm { C } \right)\)53556365788391101
  1. Draw a scatter diagram to represent these data.
  2. Give a reason to support the fitting of a regression line of the form \(y = a + b x\) through these points.
  3. Find the values of \(a\) and \(b\).
    (You may use \(\Sigma x ^ { 2 } = 9500 , \Sigma y ^ { 2 } = 45483 , \Sigma x y = 20615\) )
  4. Give an interpretation for each of \(a\) and \(b\).
  5. Use your line to estimate the temperature at 50 mph and explain why this estimate differs from the value given in the table. A tyre specialist wants to estimate the temperature of this tyre material at 12 mph and 85 mph .
  6. Explain briefly whether or not you would recommend the specialist to use this regression equation to obtain these estimates.
Edexcel S1 2003 November Q1
16 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. The performance score \(x\) and the annual salary, \(y\) in \(\pounds 100\) s, for a random sample of 10 of its employees for last year were recorded. The results are shown in the table below.
\(x\)15402739271520301924
\(y\)216384234399226132175316187196
$$\text { [You may assume } \left. \Sigma x y = 69798 , \Sigma x ^ { 2 } = 7266 \right]$$
  1. Draw a scatter diagram to represent these data.
  2. Calculate exact values of \(S _ { x y }\) and \(S _ { x x }\).
    1. Calculate the equation of the regression line of \(y\) on \(x\), in the form \(y = a + b x\). Give the values of \(a\) and \(b\) to 3 significant figures.
    2. Draw this line on your scatter diagram.
  3. Interpret the gradient of the regression line. The company decides to use this regression model to determine future salaries.
  4. Find the proposed annual salary for an employee who has a performance score of 35 .
AQA S1 2006 January Q5
11 marks Easy -1.2
5 [Figure 1, printed on the insert, is provided for use in this question.]
The table shows the times, in seconds, taken by a random sample of 10 boys from a junior swimming club to swim 50 metres freestyle and 50 metres backstroke.
BoyABCDEFGHIJ
Freestyle ( \(\boldsymbol { x }\) seconds)30.232.825.131.831.235.632.438.036.134.1
Backstroke ( \(y\) seconds)33.535.437.427.234.738.237.741.442.338.4
  1. On Figure 1, complete the scatter diagram for these data.
  2. Hence:
    1. give two distinct comments on what your scatter diagram reveals;
    2. state, without calculation, which of the following 3 values is most likely to be the value of the product moment correlation coefficient for the data in your scatter diagram. $$0.912 \quad 0.088 \quad 0.462$$
  3. In the sample of 10 boys, one boy is a junior-champion freestyle swimmer and one boy is a junior-champion backstroke swimmer. Identify the two most likely boys.
  4. Removing the data for the two boys whom you identified in part (c):
    1. calculate the value of the product moment correlation coefficient for the remaining 8 pairs of values of \(x\) and \(y\);
    2. comment, in context, on the value that you obtain.
AQA S1 2008 January Q4
12 marks Moderate -0.3
4 [Figure 1, printed on the insert, is provided for use in this question.]
Roseen is a self-employed decorator who wishes to estimate the times that it will take her to decorate bedrooms based upon their floor areas. She records the floor area, \(x \mathrm {~m} ^ { 2 }\), and the decorating time, \(y\) hours, for each of 10 bedrooms she has recently decorated.
\(\boldsymbol { x }\)11.022.07.521.013.016.514.016.018.520.5
\(\boldsymbol { y }\)15.035.016.023.524.017.514.527.522.534.5
  1. On Figure 1, plot a scatter diagram of these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  3. Draw your regression line on Figure 1.
    1. Use your regression equation to estimate the time that Roseen will take to decorate a bedroom with a floor area of \(15 \mathrm {~m} ^ { 2 }\).
    2. Making reference to Figure 1, comment on the likely reliability of your estimate in part (d)(i).