5.09e Use regression: for estimation in context

129 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 2009 June Q5
9 marks Moderate -0.8
5. The weight, \(w\) grams, and the length, \(l \mathrm {~mm}\), of 10 randomly selected newborn turtles are given in the table below.
\(l\)49.052.053.054.554.153.450.051.649.551.2
\(w\)29323439383530312930
$$\text { (You may use } \mathrm { S } _ { l l } = 33.381 \quad \mathrm {~S} _ { w l } = 59.99 \quad \mathrm {~S} _ { w w } = 120.1 \text { ) }$$
  1. Find the equation of the regression line of \(w\) on \(l\) in the form \(w = a + b l\).
  2. Use your regression line to estimate the weight of a newborn turtle of length 60 mm .
  3. Comment on the reliability of your estimate giving a reason for your answer.
Edexcel S1 2010 June Q6
14 marks Moderate -0.8
6. A travel agent sells flights to different destinations from Beerow airport. The distance \(d\), measured in 100 km , of the destination from the airport and the fare \(\pounds f\) are recorded for a random sample of 6 destinations.
Destination\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)
\(d\)2.24.06.02.58.05.0
\(f\)182025233228
$$\text { [You may use } \sum d ^ { 2 } = 152.09 \quad \sum f ^ { 2 } = 3686 \quad \sum f d = 723.1 \text { ] }$$
  1. Using the axes below, complete a scatter diagram to illustrate this information.
  2. Explain why a linear regression model may be appropriate to describe the relationship between \(f\) and \(d\).
  3. Calculate \(S _ { d d }\) and \(S _ { f d }\)
  4. Calculate the equation of the regression line of \(f\) on \(d\) giving your answer in the form \(f = a + b d\).
  5. Give an interpretation of the value of \(b\). Jane is planning her holiday and wishes to fly from Beerow airport to a destination \(t \mathrm {~km}\) away. A rival travel agent charges 5 p per km.
  6. Find the range of values of \(t\) for which the first travel agent is cheaper than the rival. \includegraphics[max width=\textwidth, alt={}, center]{039e6fcf-3222-40cc-95ea-37b8dc4a4ddb-11_1013_1701_1718_116}
Edexcel S1 2012 June Q3
15 marks Moderate -0.5
3. A scientist is researching whether or not birds of prey exposed to pollutants lay eggs with thinner shells. He collects a random sample of egg shells from each of 6 different nests and tests for pollutant level, \(p\), and measures the thinning of the shell, \(t\). The results are shown in the table below.
\(p\)3830251512
\(t\)1391056
[You may use \(\sum p ^ { 2 } = 1967\) and \(\sum p t = 694\) ]
  1. Draw a scatter diagram on the axes on page 7 to represent these data.
  2. Explain why a linear regression model may be appropriate to describe the relationship between \(p\) and \(t\).
  3. Calculate the value of \(S _ { p t }\) and the value of \(S _ { p p }\).
  4. Find the equation of the regression line of \(t\) on \(p\), giving your answer in the form \(t = a + b p\).
  5. Plot the point ( \(\bar { p } , \bar { t }\) ) and draw the regression line on your scatter diagram. The scientist reviews similar studies and finds that pollutant levels above 16 are likely to result in the death of a chick soon after hatching.
  6. Estimate the minimum thinning of the shell that is likely to result in the death of a chick. \includegraphics[max width=\textwidth, alt={}, center]{0593544d-392d-465b-b922-c9cb1435abb5-05_1257_1568_301_173}
Edexcel S1 2013 June Q1
13 marks Moderate -0.8
  1. A meteorologist believes that there is a relationship between the height above sea level, \(h \mathrm {~m}\), and the air temperature, \(t ^ { \circ } \mathrm { C }\). Data is collected at the same time from 9 different places on the same mountain. The data is summarised in the table below.
\(h\)140011002608409005501230100770
\(t\)310209101352416
[You may assume that \(\sum h = 7150 , \sum t = 110 , \sum h ^ { 2 } = 7171500 , \sum t ^ { 2 } = 1716\), \(\sum t h = 64980\) and \(\mathrm { S } _ { t t } = 371.56\) ]
  1. Calculate \(\mathrm { S } _ { t h }\) and \(\mathrm { S } _ { h h }\). Give your answers to 3 significant figures.
  2. Calculate the product moment correlation coefficient for this data.
  3. State whether or not your value supports the use of a regression equation to predict the air temperature at different heights on this mountain. Give a reason for your answer.
  4. Find the equation of the regression line of \(t\) on \(h\) giving your answer in the form \(t = a + b h\).
  5. Interpret the value of \(b\).
  6. Estimate the difference in air temperature between a height of 500 m and a height of 1000 m .
Edexcel S1 2014 June Q3
13 marks Easy -1.2
3. The table shows data on the number of visitors to the UK in a month, \(v\) (1000s), and the amount of money they spent, \(m\) ( \(\pounds\) millions), for each of 8 months.
Number of visitors
\(v ( 1000 \mathrm {~s} )\)
24502480254024202350229024002460
Amount of money spent
\(m ( \pounds\) millions \()\)
13701350140013301270121013301350
You may use \(S _ { v v } = 42587.5 \quad S _ { v m } = 31512.5 \quad S _ { m m } = 25187.5 \quad \sum v = 19390 \quad \sum m = 10610\)
  1. Find the product moment correlation coefficient between \(m\) and \(v\).
  2. Give a reason to support fitting a regression model of the form \(m = a + b v\) to these data.
  3. Find the value of \(b\) correct to 3 decimal places.
  4. Find the equation of the regression line of \(m\) on \(v\).
  5. Interpret your value of \(b\).
  6. Use your answer to part (d) to estimate the amount of money spent when the number of visitors to the UK in a month is 2500000
  7. Comment on the reliability of your estimate in part (f). Give a reason for your answer.
Edexcel S1 2015 June Q4
14 marks Easy -1.2
  1. Statistical models can provide a cheap and quick way to describe a real world situation.
    1. Give two other reasons why statistical models are used.
    A scientist wants to develop a model to describe the relationship between the average daily temperature, \(x ^ { \circ } \mathrm { C }\), and her household's daily energy consumption, \(y \mathrm { kWh }\), in winter. A random sample of the average daily temperature and her household's daily energy consumption are taken from 10 winter days and shown in the table.
    \(x\)- 0.4- 0.20.30.81.11.41.82.12.52.6
    \(y\)28302625262726242221
    $$\text { [You may use } \sum x ^ { 2 } = 24.76 \quad \sum y = 255 \quad \sum x y = 283.8 \quad \mathrm {~S} _ { x x } = 10.36 \text { ] }$$
  2. Find \(\mathrm { S } _ { x y }\) for these data.
  3. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  4. Give an interpretation of the value of \(a\)
  5. Estimate her household's daily energy consumption when the average daily temperature is \(2 ^ { \circ } \mathrm { C }\) The scientist wants to use the linear regression model to predict her household's energy consumption in the summer.
  6. Discuss the reliability of using this model to predict her household's energy consumption in the summer.
Edexcel S1 2016 June Q1
11 marks Moderate -0.8
  1. A biologist is studying the behaviour of bees in a hive. Once a bee has located a source of food, it returns to the hive and performs a dance to indicate to the other bees how far away the source of the food is. The dance consists of a series of wiggles. The biologist records the distance, \(d\) metres, of the food source from the hive and the average number of wiggles, \(w\), in the dance.
Distance, \(\boldsymbol { d } \mathbf { m }\)305080100150400500650
Average number
of wiggles, \(\boldsymbol { w }\)
0.7251.2101.7752.2503.5186.3828.1859.555
[You may use \(\sum w = 33.6 \sum d w = 13833 \mathrm {~S} _ { d d } = 394600 \mathrm {~S} _ { w w } = 80.481\) (to 3 decimal places)]
  1. Show that \(\mathrm { S } _ { d w } = 5601\)
  2. State, giving a reason, which is the response variable.
  3. Calculate the product moment correlation coefficient for these data.
  4. Calculate the equation of the regression line of \(w\) on \(d\), giving your answer in the form \(w = a + b d\) A new source of food is located 350 m from the hive.
    1. Use your regression equation to estimate the average number of wiggles in the corresponding dance.
    2. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 2017 June Q1
14 marks Moderate -0.5
  1. A clothes shop manager records the weekly sales figures, \(\pounds s\), and the average weekly temperature, \(t ^ { \circ } \mathrm { C }\), for 6 weeks during the summer. The sales figures were coded so that \(w = \frac { s } { 1000 }\)
The data are summarised as follows $$\mathrm { S } _ { w w } = 50 \quad \sum w t = 784 \quad \sum t ^ { 2 } = 2435 \quad \sum t = 119 \quad \sum w = 42$$
  1. Find \(\mathrm { S } _ { w t }\) and \(\mathrm { S } _ { t t }\)
  2. Write down the value of \(\mathrm { S } _ { s s }\) and the value of \(\mathrm { S } _ { s t }\)
  3. Find the product moment correlation coefficient between \(s\) and \(t\). The manager of the clothes shop believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the manager's belief.
  5. Find the equation of the regression line of \(w\) on \(t\), giving your answer in the form \(w = a + b t\)
  6. Hence find the equation of the regression line of \(s\) on \(t\), giving your answer in the form \(s = c + d t\), where \(c\) and \(d\) are correct to 3 significant figures.
  7. Using your equation in part (f), interpret the effect of a \(1 ^ { \circ } \mathrm { C }\) increase in average weekly temperature on weekly sales during the summer.
Edexcel S1 Q6
16 marks Moderate -0.8
6. To test the heating of tyre material, tyres are run on a test rig at chosen speeds under given conditions of load, pressure and surrounding temperature. The following table gives values of \(x\), the test rig speed in miles per hour (mph), and the temperature, \(y ^ { \circ } \mathrm { C }\), generated in the shoulder of the tyre for a particular tyre material.
\(x ( \mathrm { mph } )\)1520253035404550
\(y \left( { } ^ { \circ } \mathrm { C } \right)\)53556365788391101
  1. Draw a scatter diagram to represent these data.
  2. Give a reason to support the fitting of a regression line of the form \(y = a + b x\) through these points.
  3. Find the values of \(a\) and \(b\).
    (You may use \(\Sigma x ^ { 2 } = 9500 , \Sigma y ^ { 2 } = 45483 , \Sigma x y = 20615\) )
  4. Give an interpretation for each of \(a\) and \(b\).
  5. Use your line to estimate the temperature at 50 mph and explain why this estimate differs from the value given in the table. A tyre specialist wants to estimate the temperature of this tyre material at 12 mph and 85 mph .
  6. Explain briefly whether or not you would recommend the specialist to use this regression equation to obtain these estimates.
Edexcel S1 2003 November Q1
16 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. The performance score \(x\) and the annual salary, \(y\) in \(\pounds 100\) s, for a random sample of 10 of its employees for last year were recorded. The results are shown in the table below.
\(x\)15402739271520301924
\(y\)216384234399226132175316187196
$$\text { [You may assume } \left. \Sigma x y = 69798 , \Sigma x ^ { 2 } = 7266 \right]$$
  1. Draw a scatter diagram to represent these data.
  2. Calculate exact values of \(S _ { x y }\) and \(S _ { x x }\).
    1. Calculate the equation of the regression line of \(y\) on \(x\), in the form \(y = a + b x\). Give the values of \(a\) and \(b\) to 3 significant figures.
    2. Draw this line on your scatter diagram.
  3. Interpret the gradient of the regression line. The company decides to use this regression model to determine future salaries.
  4. Find the proposed annual salary for an employee who has a performance score of 35 .
AQA S1 2006 January Q1
11 marks Moderate -0.8
1 At a certain small restaurant, the waiting time is defined as the time between sitting down at a table and a waiter first arriving at the table. This waiting time is dependent upon the number of other customers already seated in the restaurant. Alex is a customer who visited the restaurant on 10 separate days. The table shows, for each of these days, the number, \(x\), of customers already seated and his waiting time, \(y\) minutes.
\(\boldsymbol { x }\)9341081271126
\(\boldsymbol { y }\)11651191391247
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\).
  2. Give an interpretation, in context, for each of your values of \(a\) and \(b\).
  3. Use your regression equation to estimate Alex's waiting time when the number of customers already seated in the restaurant is:
    1. 5 ;
    2. 25 .
  4. Comment on the likely reliability of each of your estimates in part (c), given that, for the regression line calculated in part (a), the values of the 10 residuals lie between + 1.1 minutes and - 1.1 minutes.
AQA S1 2008 January Q4
12 marks Moderate -0.3
4 [Figure 1, printed on the insert, is provided for use in this question.]
Roseen is a self-employed decorator who wishes to estimate the times that it will take her to decorate bedrooms based upon their floor areas. She records the floor area, \(x \mathrm {~m} ^ { 2 }\), and the decorating time, \(y\) hours, for each of 10 bedrooms she has recently decorated.
\(\boldsymbol { x }\)11.022.07.521.013.016.514.016.018.520.5
\(\boldsymbol { y }\)15.035.016.023.524.017.514.527.522.534.5
  1. On Figure 1, plot a scatter diagram of these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  3. Draw your regression line on Figure 1.
    1. Use your regression equation to estimate the time that Roseen will take to decorate a bedroom with a floor area of \(15 \mathrm {~m} ^ { 2 }\).
    2. Making reference to Figure 1, comment on the likely reliability of your estimate in part (d)(i).
AQA S1 2009 January Q6
15 marks Moderate -0.3
6 [Figure 1, printed on the insert, is provided for use in this question.]
For a random sample of 10 patients who underwent hip-replacement operations, records were kept of their ages, \(x\) years, and of the number of days, \(y\), following their operations before they were able to walk unaided safely.
Patient\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)
\(\boldsymbol { x }\)55516266725978556270
\(\boldsymbol { y }\)34333949484351414651
  1. On Figure 1, complete the scatter diagram for these data.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  3. Draw your regression line on Figure 1.
  4. In fact, patients H, I and J were males and the other 7 patients were females.
    1. Calculate the mean of the residuals for the 3 male patients.
    2. Hence estimate, for a male patient aged 65 years, the number of days following his hip-replacement operation before he is able to walk unaided safely.
AQA S1 2011 January Q5
14 marks Moderate -0.3
5 Craig uses his car to travel regularly from his home to the area hospital for treatment. He leaves home at \(x\) minutes after 7.30 am and then takes \(y\) minutes to arrive at the hospital's reception desk. His results for 11 mornings are shown in the table.
\(\boldsymbol { x }\)05101520253035404550
\(\boldsymbol { y }\)3142325847567968899585
  1. Explain why the time taken by Craig between leaving home and arriving at the hospital's reception desk is the response variable.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\), writing your answer in the form \(y = a + b x\).
  3. On a particular day, Craig needs to arrive at the hospital's reception desk no later than 9.00 am . He leaves home at 7.45 am . Estimate the number of minutes before 9.00 am that Craig will arrive at the hospital's reception desk. Give your answer to the nearest minute.
    1. Use your equation to estimate \(y\) when \(x = 85\).
    2. Give one statistical reason and one reason based on the context of this question as to why your estimate in part (d)(i) is unlikely to be realistic.埗
AQA S1 2009 June Q4
8 marks Moderate -0.8
4 As part of an investigation, a chlorine block is immersed in a large tank of water held at a constant temperature. The block slowly dissolves, and its weight, \(y\) grams, is noted \(x\) days after immersion. The results are shown in the table.
\(\boldsymbol { x }\) days51015203040506075
\(\boldsymbol { y }\) grams47444238352723169
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. Hence estimate, to the nearest gram, the initial weight of the block.
  3. A company which markets the chlorine blocks claims that a block will usually dissolve completely after about 13 weeks. Comment, with justification, on this claim.
    PART PEFRENC
    .................................................................................................................................................
    \(\_\_\_\_\)\(\_\_\_\_\)
    \(\_\_\_\_\)
    \(\_\_\_\_\)
    \includegraphics[max width=\textwidth, alt={}]{adf1c0d2-b0a6-4a2f-baf2-cfb45d771315-08_57_1681_2227_161}
    \(\_\_\_\_\)
    .......... \(\_\_\_\_\) \includegraphics[max width=\textwidth, alt={}, center]{adf1c0d2-b0a6-4a2f-baf2-cfb45d771315-09_40_118_529_159}
AQA S1 2010 June Q6
14 marks Moderate -0.3
6 During a study of reaction times, each of a random sample of 12 people, aged between 40 and 80 years, was asked to react as quickly as possible to a stimulus displayed on a computer screen. Their ages, \(x\) years, and reaction times, \(y\) milliseconds, are shown in the table.
PersonAge ( \(\boldsymbol { x }\) years)Reaction time ( \(y \mathrm {~ms}\) )
A41520
B54750
C66650
D72920
E71280
F57620
G60740
H47950
I77970
J65780
K51550
L59730
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
    1. Draw your regression line on the scatter diagram on page 16.
    2. Comment on what this reveals.
  2. It was later discovered that the reaction times for persons E and H had been recorded incorrectly. The values should have been 820 and 590 respectively. After making these corrections, computations gave $$S _ { x x } = 1272 \quad S _ { x y } = 14760 \quad \bar { x } = 60 \quad \bar { y } = 720$$
    1. Using the symbol ⋅ , plot the correct values for persons E and H on the scatter diagram on page 16.
    2. Recalculate the equation of the least squares regression line of \(y\) on \(x\), and draw this regression line on the scatter diagram on page 16.
    3. Hence revise as necessary your comments in part (b)(ii).
      \includegraphics[max width=\textwidth, alt={}]{c4844a30-6a86-49e3-b6aa-8e213dfc8ca1-15_2484_1709_223_153}
      \section*{Reaction Times}
      \includegraphics[max width=\textwidth, alt={}]{c4844a30-6a86-49e3-b6aa-8e213dfc8ca1-16_1943_1301_351_292}
      \includegraphics[max width=\textwidth, alt={}]{c4844a30-6a86-49e3-b6aa-8e213dfc8ca1-17_2484_1707_223_155}
AQA S1 2011 June Q3
15 marks Moderate -0.8
3
  1. During a particular summer holiday, Rick worked in a fish and chip shop at a seaside resort. He suspected that the shop's takings, \(\pounds y\), on a weekday were dependent upon the forecast of that day's maximum temperature, \(x ^ { \circ } \mathrm { C }\), in the resort, made at 6.00 pm on the previous day. To investigate this suspicion, he recorded values of \(x\) and \(y\) for a random sample of 7 weekdays during July.
    \(\boldsymbol { x }\)23182719252022
    \(\boldsymbol { y }\)4290318851063829505742644485
    1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
    2. Estimate the shop's takings on a weekday during July when the maximum temperature was forecast to be \(24 ^ { \circ } \mathrm { C }\).
    3. Explain why your equation may not be suitable for estimating the shop's takings on a weekday during February.
    4. Describe, in the context of this question, a variable other than the maximum temperature, \(x\), that may affect \(y\).
  2. Seren, who also worked in the fish and chip shop, investigated the possible linear relationship between the shop's takings, \(\pounds z\), recorded in \(\pounds 000\) s, and each of two other explanatory variables, \(v\) and \(w\).
    1. She calculated correctly that the regression line of \(z\) on \(v\) had a \(z\)-intercept of - 1 and a gradient of 0.15 . Draw this line, for values of \(v\) from 0 to 40, on Figure 1 on page 4.
    2. She also calculated correctly that the regression line of \(z\) on \(w\) had a \(z\)-intercept of 5 and a gradient of - 0.40 . Draw this line, for values of \(w\) from 0 to 10, on Figure 2 below. \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{Figure 1} \includegraphics[alt={},max width=\textwidth]{767ec629-6350-41d9-bbb9-e059a5fd8c70-4_792_604_680_717}
      \end{figure} \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{Figure 2} \includegraphics[alt={},max width=\textwidth]{767ec629-6350-41d9-bbb9-e059a5fd8c70-4_792_696_1692_687}
      \end{figure}
AQA S1 2016 June Q4
9 marks Moderate -0.8
4 As part of her science project, a student found the mass, \(y\) grams, of a particular compound that dissolved in 100 ml of water at each of 12 different set temperatures, \(x ^ { \circ } \mathrm { C }\). The results are shown in the table.
\(\boldsymbol { x }\)202530354045505560657075
\(\boldsymbol { y }\)242262269290298310326355359375390412
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. Interpret, in context, your value for the gradient of this regression line.
  3. Use your equation to estimate the mass of the compound which will dissolve in 100 ml of water at \(68 ^ { \circ } \mathrm { C }\).
  4. Given that the values of the 12 residuals for the regression line of \(y\) on \(x\) lie between - 7 and + 9 , comment, with justification, on the likely accuracy of your estimate in part (c).
    [0pt] [2 marks]
Edexcel S1 Q7
17 marks Moderate -0.8
7. A doctor wished to investigate the effects of staying awake for long periods on a person's ability to complete simple tasks. She recorded the number of times, \(n\), that a subject could clinch his or her fist in 30 seconds after being awake for \(h\) hours. The results for one subject were as follows.
\(h\) (hours)161718192021222324
\(n\)1161141091019494868180
  1. Plot a scatter diagram of \(n\) against \(h\) for these results. You may use $$\Sigma h = 180 , \quad \Sigma n = 875 , \quad \Sigma h ^ { 2 } = 3660 , \quad \Sigma h n = 17204 .$$
  2. Obtain the equation of the regression line of \(n\) on \(h\) in the form \(n = a + b h\).
  3. Give a practical interpretation of the constant b.
  4. Explain why this regression line would be unlikely to be appropriate for values of \(h\) between 0 and 16 .
    (2 marks)
    Another subject underwent the same tests giving rise to a regression line of \(n = 213.4 - 5.87\) h
  5. After how many hours of being awake together would you expect these two subjects to be able to clench their fists the same number of times in 30 seconds?
Edexcel S1 Q7
15 marks Moderate -0.8
7. Pipes-R-us manufacture a special lightweight aluminium tubing. The price \(\pounds P\), for each length, \(l\) metres, that the company sells is shown in the table.
\(l\) (metres)0.50.81.01.5246
\(P ( \pounds )\)2.503.404.005.206.0010.5015.00
  1. Represent these data on a scatter diagram. You may use $$\Sigma l = 15.8 , \quad \Sigma P = 46.6 , \quad \Sigma l ^ { 2 } = 60.14 , \quad \Sigma l P = 159.77$$
  2. Find the equation of the regression line of \(P\) on \(l\) in the form \(P = a + b l\).
  3. Give a practical interpretation of the constant b. In response to customer demand Pipes- \(R\)-us decide to start selling tubes cut to specific lengths. Initially the company decides to use the regression line found in part (b) as a pricing formula for this new service.
  4. Calculate the price that Pipes- \(R\)-us should charge for 5.2 metres of the tubing.
  5. Suggest a reason why Pipes- \(R\)-us might not offer prices based on the regression line for any length of tubing.
Edexcel S1 Q7
17 marks Standard +0.3
7. A new vaccine is tested over a six-month period in one health authority. The table shows the number of new cases of the disease, \(d\), reported in the \(m\) th month after the trials began.
\(m\)123456
\(d\)1026961585248
A doctor suggests that a relationship of the form \(d = a + b x\) where \(x = \frac { 1 } { m }\) can be used to model the situation.
  1. Tabulate the values of \(x\) corresponding to the given values of \(d\) and plot a scatter diagram of \(d\) against \(x\).
  2. Explain how your scatter diagram supports the suggested model. You may use $$\Sigma x = 2.45 , \quad \Sigma d = 390 , \quad \Sigma x ^ { 2 } = 1.491 , \quad \Sigma x d = 189.733$$
  3. Find an equation of the regression line \(d\) on \(x\) in the form \(d = a + b x\).
  4. Use your regression line to estimate how many new cases of the disease there will be in the 13th month after the trial began.
  5. Comment on the reliability of your answer to part (d).
Edexcel S1 Q6
14 marks Moderate -0.8
6. A physics student recorded the length, \(l \mathrm {~cm}\), of a spring when different masses, \(m\) grams, were suspended from it giving the following results.
\(m ( \mathrm {~g} )\)50100200300400500600700
\(l ( \mathrm {~cm} )\)7.810.716.522.128.033.935.235.6
  1. Represent these data on a scatter diagram with \(l\) on the vertical axis. The student decides to find the equation of a regression line of the form \(l = a + b m\) using only the data for \(m \leq 500 \mathrm {~g}\).
  2. Give a reason to support the fitting of such a regression line and explain why the student is excluding two of his values.
    (2 marks)
    You may use $$\Sigma m = 1550 , \quad \Sigma l = 119 , \quad \Sigma m ^ { 2 } = 552500 , \quad \Sigma l ^ { 2 } = 2869.2 , \quad \Sigma m l = 39540 .$$
  3. Find the values of \(a\) and \(b\).
  4. Explain the significance of the values of \(a\) and \(b\) in this situation.
Edexcel S1 Q4
11 marks Standard +0.3
  1. An engineer tested a new material under extreme conditions in a wind tunnel. He recorded the number of microfractures, \(n\), that formed and the wind speed, \(v\) metres per second, for 8 different values of \(v\) with all other conditions remaining constant. He then coded the data using \(x = v - 700\) and \(y = n - 20\) and calculated the following summary statistics.
$$\Sigma x = 100 , \quad \Sigma y = 23 , \quad \Sigma x ^ { 2 } = 215000 , \quad \Sigma x y = 11600 .$$
  1. Find an equation of the regression line of \(y\) on \(x\).
  2. Hence, find an equation of the regression line of \(n\) on \(v\).
  3. Use your regression line to estimate the number of microfractures that would be formed if the material was tested in a wind speed of 900 metres per second with all other conditions remaining constant.
    (2 marks)
OCR MEI Further Statistics A AS 2019 June Q5
13 marks Standard +0.3
5 A researcher is investigating births of females and males in a particular species of animal which very often produces litters of 7 offspring.
The table shows some data about the number of females per litter in 200 litters of 7 offspring. The researcher thinks that a binomial distribution \(\mathrm { B } ( 7 , p )\) may be an appropriate model for these data. (c) Complete the test at the \(5 \%\) significance level. Fig. 5 shows the probability distribution \(\mathrm { B } ( 7,0.35 )\) together with the relative frequencies of the observed data (the numbers of litters each divided by 200). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-5_659_1285_342_319} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure} (d) Comment on the result of the test completed in part (c) by considering Fig. 5.
OCR MEI Further Statistics A AS 2019 June Q6
13 marks Standard +0.3
6 A meteorologist is investigating the relationship between altitude \(x\) metres and mean annual temperature \(y ^ { \circ } \mathrm { C }\) in an American state.
She selects 12 locations at various altitudes and then stations a remote monitoring device at each of them to measure the temperature over the course of a year. Fig. 6 illustrates the data which she obtains. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-6_686_1477_486_292} \captionsetup{labelformat=empty} \caption{Fig. 6}
\end{figure}
  1. Explain why it would not be appropriate to carry out a hypothesis test for correlation based on the product moment correlation coefficient.
  2. Explain why altitude has been plotted on the horizontal axis in Fig. 6. Summary statistics for \(x\) and \(y\) are as follows. $$\sum x = 21200 \quad \sum y = 105.4 \quad \sum x ^ { 2 } = 39100000 \quad \sum y ^ { 2 } = 1004 \quad \sum x y = 176090$$
  3. Calculate the equation of the regression line of \(y\) on \(x\).
  4. Use the equation of the regression line to predict the values of the mean annual temperature at each of the following altitudes.