5.09d Linear coding: effect on regression

65 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 2003 January Q6
19 marks Moderate -0.3
6. The chief executive of Rex cars wants to investigate the relationship between the number of new car sales and the amount of money spent on advertising. She collects data from company records on the number of new car sales, \(c\), and the cost of advertising each year, \(p\) (£000). The data are shown in the table below.
YearNumber of new car sale, \(c\)Cost of advertising (£000), \(p\)
19904240120
19914380126
19924420132
19934440134
19944430137
19954520144
19964590148
19974660150
19984700153
19994790158
  1. Using the coding \(x = ( p - 100 )\) and \(y = \frac { 1 } { 10 } ( c - 4000 )\), draw a scatter diagram to represent these data. Explain why \(x\) is the explanatory variable.
  2. Find the equation of the least squares regression line of \(y\) on \(x\). $$\text { [Use } \left. \Sigma x = 402 , \Sigma y = 517 , \Sigma x ^ { 2 } = 17538 \text { and } \Sigma x y = 22611 . \right]$$
  3. Deduce the equation of the least squares regression line of \(c\) on \(p\) in the form \(c = a + b p\).
  4. Interpret the value of \(a\).
  5. Predict the number of extra new cars sales for an increase of \(\pounds 2000\) in advertising budget. Comment on the validity of your answer.
    (2)
Edexcel S1 2011 January Q4
6 marks Moderate -0.8
  1. A farmer collected data on the annual rainfall, \(x \mathrm {~cm}\), and the annual yield of peas, \(p\) tonnes per acre.
The data for annual rainfall was coded using \(v = \frac { x - 5 } { 10 }\) and the following statistics were found. $$S _ { v v } = 5.753 \quad S _ { p v } = 1.688 \quad S _ { p p } = 1.168 \quad \bar { p } = 3.22 \quad \bar { v } = 4.42$$
  1. Find the equation of the regression line of \(p\) on \(v\) in the form \(p = a + b v\).
  2. Using your regression line estimate the annual yield of peas per acre when the annual rainfall is 85 cm .
Edexcel S1 2005 June Q3
10 marks Moderate -0.3
  1. A long distance lorry driver recorded the distance travelled, \(m\) miles, and the amount of fuel used, \(f\) litres, each day. Summarised below are data from the driver's records for a random sample of 8 days.
The data are coded such that \(x = m - 250\) and \(y = f - 100\). $$\Sigma x = 130 \quad \Sigma y = 48 \quad \Sigma x y = 8880 \quad \mathrm {~S} _ { x x } = 20487.5$$
  1. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\).
  2. Hence find the equation of the regression line of \(f\) on \(m\).
  3. Predict the amount of fuel used on a journey of 235 miles.
Edexcel S1 2013 June Q1
10 marks Moderate -0.8
  1. Sammy is studying the number of units of gas, \(g\), and the number of units of electricity, \(e\), used in her house each week. A random sample of 10 weeks use was recorded and the data for each week were coded so that \(x = \frac { g - 60 } { 4 }\) and \(y = \frac { e } { 10 }\). The results for the coded data are summarised below
$$\sum x = 48.0 \quad \sum y = 58.0 \quad \mathrm {~S} _ { x x } = 312.1 \quad \mathrm {~S} _ { y y } = 2.10 \quad \mathrm {~S} _ { x y } = 18.35$$
  1. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). Give the values of \(a\) and \(b\) correct to 3 significant figures.
  2. Hence find the equation of the regression line of \(e\) on \(g\) in the form \(e = c + d g\). Give the values of \(c\) and \(d\) correct to 2 significant figures.
  3. Use your regression equation to estimate the number of units of electricity used in a week when 100 units of gas were used.
  4. Find the probability distribution of \(X\) .
  5. Write down the value of \(\mathrm { F } ( 1.8 )\) .
  6. Find the probability distribution of \(X\) .勤
Edexcel S1 2014 June Q3
13 marks Easy -1.2
3. The table shows data on the number of visitors to the UK in a month, \(v\) (1000s), and the amount of money they spent, \(m\) ( \(\pounds\) millions), for each of 8 months.
Number of visitors
\(v ( 1000 \mathrm {~s} )\)
24502480254024202350229024002460
Amount of money spent
\(m ( \pounds\) millions \()\)
13701350140013301270121013301350
You may use \(S _ { v v } = 42587.5 \quad S _ { v m } = 31512.5 \quad S _ { m m } = 25187.5 \quad \sum v = 19390 \quad \sum m = 10610\)
  1. Find the product moment correlation coefficient between \(m\) and \(v\).
  2. Give a reason to support fitting a regression model of the form \(m = a + b v\) to these data.
  3. Find the value of \(b\) correct to 3 decimal places.
  4. Find the equation of the regression line of \(m\) on \(v\).
  5. Interpret your value of \(b\).
  6. Use your answer to part (d) to estimate the amount of money spent when the number of visitors to the UK in a month is 2500000
  7. Comment on the reliability of your estimate in part (f). Give a reason for your answer.
Edexcel S1 2015 June Q4
14 marks Easy -1.2
  1. Statistical models can provide a cheap and quick way to describe a real world situation.
    1. Give two other reasons why statistical models are used.
    A scientist wants to develop a model to describe the relationship between the average daily temperature, \(x ^ { \circ } \mathrm { C }\), and her household's daily energy consumption, \(y \mathrm { kWh }\), in winter. A random sample of the average daily temperature and her household's daily energy consumption are taken from 10 winter days and shown in the table.
    \(x\)- 0.4- 0.20.30.81.11.41.82.12.52.6
    \(y\)28302625262726242221
    $$\text { [You may use } \sum x ^ { 2 } = 24.76 \quad \sum y = 255 \quad \sum x y = 283.8 \quad \mathrm {~S} _ { x x } = 10.36 \text { ] }$$
  2. Find \(\mathrm { S } _ { x y }\) for these data.
  3. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  4. Give an interpretation of the value of \(a\)
  5. Estimate her household's daily energy consumption when the average daily temperature is \(2 ^ { \circ } \mathrm { C }\) The scientist wants to use the linear regression model to predict her household's energy consumption in the summer.
  6. Discuss the reliability of using this model to predict her household's energy consumption in the summer.
Edexcel S1 Q6
16 marks Moderate -0.8
6. To test the heating of tyre material, tyres are run on a test rig at chosen speeds under given conditions of load, pressure and surrounding temperature. The following table gives values of \(x\), the test rig speed in miles per hour (mph), and the temperature, \(y ^ { \circ } \mathrm { C }\), generated in the shoulder of the tyre for a particular tyre material.
\(x ( \mathrm { mph } )\)1520253035404550
\(y \left( { } ^ { \circ } \mathrm { C } \right)\)53556365788391101
  1. Draw a scatter diagram to represent these data.
  2. Give a reason to support the fitting of a regression line of the form \(y = a + b x\) through these points.
  3. Find the values of \(a\) and \(b\).
    (You may use \(\Sigma x ^ { 2 } = 9500 , \Sigma y ^ { 2 } = 45483 , \Sigma x y = 20615\) )
  4. Give an interpretation for each of \(a\) and \(b\).
  5. Use your line to estimate the temperature at 50 mph and explain why this estimate differs from the value given in the table. A tyre specialist wants to estimate the temperature of this tyre material at 12 mph and 85 mph .
  6. Explain briefly whether or not you would recommend the specialist to use this regression equation to obtain these estimates.
Edexcel S1 2003 November Q1
16 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. The performance score \(x\) and the annual salary, \(y\) in \(\pounds 100\) s, for a random sample of 10 of its employees for last year were recorded. The results are shown in the table below.
\(x\)15402739271520301924
\(y\)216384234399226132175316187196
$$\text { [You may assume } \left. \Sigma x y = 69798 , \Sigma x ^ { 2 } = 7266 \right]$$
  1. Draw a scatter diagram to represent these data.
  2. Calculate exact values of \(S _ { x y }\) and \(S _ { x x }\).
    1. Calculate the equation of the regression line of \(y\) on \(x\), in the form \(y = a + b x\). Give the values of \(a\) and \(b\) to 3 significant figures.
    2. Draw this line on your scatter diagram.
  3. Interpret the gradient of the regression line. The company decides to use this regression model to determine future salaries.
  4. Find the proposed annual salary for an employee who has a performance score of 35 .
AQA S1 2012 January Q5
17 marks Moderate -0.8
5 An experiment was undertaken to collect information on the burning of a specific type of wood as a source of energy. At given fixed levels of the wood's moisture content, \(x\) per cent, its corresponding calorific value, \(y \mathrm { MWh } /\) tonne, on burning was determined. The results are shown in the table.
\(\boldsymbol { x }\)5101520253035404550556065
\(\boldsymbol { y }\)5.24.74.34.03.22.82.52.21.81.51.31.00.6
  1. Explain why calorific value is the response variable.
  2. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  3. Interpret, in context, your values for \(a\) and \(b\).
  4. Use your equation to estimate the wood's calorific value when it has a moisture content of 27 per cent.
  5. Calculate the value of the residual for the point \(( 35,2.5 )\).
  6. Given that the values of the 13 residuals lie between - 0.28 and + 0.23 , comment on the likely accuracy of your estimate in part (d).
    1. Give a general reason why your equation should not be used to estimate the wood's calorific value when it has a moisture content of 80 per cent.
    2. Give a specific reason, based on the context of this question and with numerical support, why your equation cannot be used to estimate the wood's calorific value when it has a moisture content of 80 per cent.
AQA S1 2013 January Q1
9 marks Moderate -0.8
1 Bob, a church warden, decides to investigate the lifetime of a particular manufacturer's brand of beeswax candle. Each candle is 30 cm in length. From a box containing a large number of such candles, he selects one candle at random. He lights the candle and, after it has burned continuously for \(x\) hours, he records its length, \(y \mathrm {~cm}\), to the nearest centimetre. His results are shown in the table.
\(\boldsymbol { x }\)51015202530354045
\(\boldsymbol { y }\)272521191611952
  1. State the value that you would expect for \(a\) in the equation of the least squares regression line, \(y = a + b x\).
    1. Calculate the equation of the least squares regression line, \(y = a + b x\).
    2. Interpret the value that you obtain for \(b\).
    3. It is claimed by the candle manufacturer that the total length of time that such candles are likely to burn for is more than 50 hours. Comment on this claim, giving a numerical justification for your answer.
AQA S1 2007 June Q5
13 marks Moderate -0.8
5 Bob, a gardener, measures the time taken, \(y\) minutes, for 60 grams of weedkiller pellets to dissolve in 10 litres of water at different set temperatures, \(x ^ { \circ } \mathrm { C }\). His results are shown in the table.
\(\boldsymbol { x }\)1620242832364044485256
\(\boldsymbol { y }\)4.74.33.83.53.02.72.42.01.81.61.1
  1. State why the explanatory variable is temperature.
  2. Calculate the equation of the least squares regression line \(y = a + b x\).
    1. Interpret, in the context of this question, your value for \(b\).
    2. Explain why no sensible practical interpretation can be given for your value of \(a\).
    1. Estimate the time taken to dissolve 60 grams of weedkiller pellets in 10 litres of water at \(30 ^ { \circ } \mathrm { C }\).
    2. Show why the equation cannot be used to make a valid estimate of the time taken to dissolve 60 grams of weedkiller pellets in 10 litres of water at \(75 ^ { \circ } \mathrm { C }\). (2 marks)
AQA S1 2008 June Q1
6 marks Moderate -0.8
1 The table shows the times taken, \(y\) minutes, for a wood glue to dry at different air temperatures, \(x ^ { \circ } \mathrm { C }\).
\(\boldsymbol { x }\)101215182022252830
\(\boldsymbol { y }\)42.940.638.535.433.030.728.025.322.6
  1. Calculate the equation of the least squares regression line \(y = a + b x\).
  2. Estimate the time taken for the glue to dry when the air temperature is \(21 ^ { \circ } \mathrm { C }\).
AQA S1 2012 June Q3
11 marks Moderate -0.3
3 The table shows the maximum weight, \(y _ { A }\) grams, of Salt \(A\) that will dissolve in 100 grams of water at various temperatures, \(x ^ { \circ } \mathrm { C }\).
\(\boldsymbol { x }\)101520253035404550607080
\(\boldsymbol { y } _ { \boldsymbol { A } }\)203548577792101111121137159182
  1. Calculate the equation of the least squares regression line of \(y _ { A }\) on \(x\).
  2. The data in the above table are plotted on the scatter diagram on page 4. Draw your regression line on this scatter diagram.
  3. For water temperatures in the range \(10 ^ { \circ } \mathrm { C }\) to \(80 ^ { \circ } \mathrm { C }\), the maximum weight, \(y _ { B }\) grams, of Salt \(B\) that will dissolve in 100 grams of water is given by the equation $$y _ { B } = 60.1 + 0.255 x$$
    1. Draw this line on the scatter diagram.
    2. Estimate the water temperature at which the maximum weight of Salt \(A\) that will dissolve in 100 grams of water is the same as that of Salt B.
    3. For Salt \(A\) and Salt \(B\), compare the effects of water temperature on the maximum weight that will dissolve in 100 grams of water. Your answer should identify two distinct differences. \section*{Temperatures and Maximum Weights}
      \includegraphics[max width=\textwidth, alt={}]{91466019-8feb-4292-b616-e8e8667e2e54-4_2023_1682_404_173}
AQA S1 2014 June Q3
11 marks Moderate -0.8
3 The table shows the body mass index (BMI), \(x\), and the systolic blood pressure (SBP), \(y \mathrm { mmHg }\), for each of a random sample of 10 men, aged between 35 years and 40 years, from a particular population.
\(\boldsymbol { x }\)13232935173425203127
\(\boldsymbol { y }\)103115124126108120113117118119
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. Use your equation to estimate the SBP of a man from this population who is aged 38 years and who has a BMI of 30 .
  3. State why your equation might not be appropriate for estimating the SBP of a man from this population:
    1. who is aged 38 years and who has a BMI of 45 ;
    2. who is aged 50 years and who has a BMI of 25 .
  4. Find the value of the residual for the point \(( 20,117 )\).
  5. The mean of the vertical distances of the 10 points from the regression line calculated in part (a) is 2.71, correct to three significant figures. Comment on the likely accuracy of your estimate in part (b).
    [0pt] [1 mark]
AQA S1 2014 June Q6
12 marks Moderate -0.8
6 A rubber seal is fitted to the bottom of a flood barrier. When no pressure is applied, the depth of the seal is 15 cm . When pressure is applied, a watertight seal is created between the flood barrier and the ground. The table shows the pressure, \(x\) kilopascals ( kPa ), applied to the seal and the resultant depth, \(y\) centimetres, of the seal.
\(\boldsymbol { x }\)255075100125150175200250300
\(\boldsymbol { y }\)14.713.412.811.911.010.39.79.07.56.7
    1. State the value that you would expect for \(a\) in the equation of the least squares regression line, \(y = a + b x\).
    2. Calculate the equation of the least squares regression line, \(y = a + b x\).
    3. Interpret, in context, your value for \(b\).
  1. Calculate an estimate of the depth of the seal when it is subjected to a pressure of 225 kPa .
    1. Give a statistical reason as to why your equation is unlikely to give a realistic estimate of the depth of the seal if it were to be subjected to a pressure of 400 kPa .
    2. Give a reason based on the context of this question as to why your equation will not give a realistic estimate of the depth of the seal if it were to be subjected to a pressure of 525 kPa .
      [0pt] [3 marks]
      \includegraphics[max width=\textwidth, alt={}]{8aeacd54-a5a1-4f2d-b936-2faf635ffce7-20_946_1709_1761_153}
      \includegraphics[max width=\textwidth, alt={}]{8aeacd54-a5a1-4f2d-b936-2faf635ffce7-21_2484_1707_221_153}
      \includegraphics[max width=\textwidth, alt={}]{8aeacd54-a5a1-4f2d-b936-2faf635ffce7-23_2484_1707_221_153}
Edexcel S1 Q4
10 marks Moderate -0.8
4. An internet service provider runs a series of television adverts at weekly intervals. To investigate the effectiveness of the adverts the company recorded the viewing figures in millions, \(v\), for the programme in which the advert was shown, and the number of new customers, \(c\), who signed up for their service the next day. The results are summarised as follows. $$\bar { v } = 4.92 , \quad \bar { c } = 104.4 , \quad S _ { v c } = 594.05 , \quad S _ { v v } = 85.44 .$$
  1. Calculate the equation of the regression line of \(c\) on \(v\) in the form \(c = a + b v\).
  2. Give an interpretation of the constants \(a\) and \(b\) in this context.
  3. Estimate the number of customers that will sign up with the company the day after an advert is shown during a programme watched by 3.7 million viewers.
  4. State two other factors besides viewing figures that will affect the success of an advert in gaining new customers for the company.
Edexcel S1 Q7
15 marks Moderate -0.8
7. Pipes-R-us manufacture a special lightweight aluminium tubing. The price \(\pounds P\), for each length, \(l\) metres, that the company sells is shown in the table.
\(l\) (metres)0.50.81.01.5246
\(P ( \pounds )\)2.503.404.005.206.0010.5015.00
  1. Represent these data on a scatter diagram. You may use $$\Sigma l = 15.8 , \quad \Sigma P = 46.6 , \quad \Sigma l ^ { 2 } = 60.14 , \quad \Sigma l P = 159.77$$
  2. Find the equation of the regression line of \(P\) on \(l\) in the form \(P = a + b l\).
  3. Give a practical interpretation of the constant b. In response to customer demand Pipes- \(R\)-us decide to start selling tubes cut to specific lengths. Initially the company decides to use the regression line found in part (b) as a pricing formula for this new service.
  4. Calculate the price that Pipes- \(R\)-us should charge for 5.2 metres of the tubing.
  5. Suggest a reason why Pipes- \(R\)-us might not offer prices based on the regression line for any length of tubing.
Edexcel S1 Q7
17 marks Standard +0.3
7. A new vaccine is tested over a six-month period in one health authority. The table shows the number of new cases of the disease, \(d\), reported in the \(m\) th month after the trials began.
\(m\)123456
\(d\)1026961585248
A doctor suggests that a relationship of the form \(d = a + b x\) where \(x = \frac { 1 } { m }\) can be used to model the situation.
  1. Tabulate the values of \(x\) corresponding to the given values of \(d\) and plot a scatter diagram of \(d\) against \(x\).
  2. Explain how your scatter diagram supports the suggested model. You may use $$\Sigma x = 2.45 , \quad \Sigma d = 390 , \quad \Sigma x ^ { 2 } = 1.491 , \quad \Sigma x d = 189.733$$
  3. Find an equation of the regression line \(d\) on \(x\) in the form \(d = a + b x\).
  4. Use your regression line to estimate how many new cases of the disease there will be in the 13th month after the trial began.
  5. Comment on the reliability of your answer to part (d).
Edexcel S1 Q6
14 marks Moderate -0.8
6. A physics student recorded the length, \(l \mathrm {~cm}\), of a spring when different masses, \(m\) grams, were suspended from it giving the following results.
\(m ( \mathrm {~g} )\)50100200300400500600700
\(l ( \mathrm {~cm} )\)7.810.716.522.128.033.935.235.6
  1. Represent these data on a scatter diagram with \(l\) on the vertical axis. The student decides to find the equation of a regression line of the form \(l = a + b m\) using only the data for \(m \leq 500 \mathrm {~g}\).
  2. Give a reason to support the fitting of such a regression line and explain why the student is excluding two of his values.
    (2 marks)
    You may use $$\Sigma m = 1550 , \quad \Sigma l = 119 , \quad \Sigma m ^ { 2 } = 552500 , \quad \Sigma l ^ { 2 } = 2869.2 , \quad \Sigma m l = 39540 .$$
  3. Find the values of \(a\) and \(b\).
  4. Explain the significance of the values of \(a\) and \(b\) in this situation.
Edexcel S1 Q4
11 marks Standard +0.3
  1. An engineer tested a new material under extreme conditions in a wind tunnel. He recorded the number of microfractures, \(n\), that formed and the wind speed, \(v\) metres per second, for 8 different values of \(v\) with all other conditions remaining constant. He then coded the data using \(x = v - 700\) and \(y = n - 20\) and calculated the following summary statistics.
$$\Sigma x = 100 , \quad \Sigma y = 23 , \quad \Sigma x ^ { 2 } = 215000 , \quad \Sigma x y = 11600 .$$
  1. Find an equation of the regression line of \(y\) on \(x\).
  2. Hence, find an equation of the regression line of \(n\) on \(v\).
  3. Use your regression line to estimate the number of microfractures that would be formed if the material was tested in a wind speed of 900 metres per second with all other conditions remaining constant.
    (2 marks)
OCR MEI Further Statistics A AS 2019 June Q5
13 marks Standard +0.3
5 A researcher is investigating births of females and males in a particular species of animal which very often produces litters of 7 offspring.
The table shows some data about the number of females per litter in 200 litters of 7 offspring. The researcher thinks that a binomial distribution \(\mathrm { B } ( 7 , p )\) may be an appropriate model for these data. (c) Complete the test at the \(5 \%\) significance level. Fig. 5 shows the probability distribution \(\mathrm { B } ( 7,0.35 )\) together with the relative frequencies of the observed data (the numbers of litters each divided by 200). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-5_659_1285_342_319} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure} (d) Comment on the result of the test completed in part (c) by considering Fig. 5.
OCR MEI Further Statistics A AS 2020 November Q5
8 marks Moderate -0.3
5 A doctor is investigating the relationship between the levels in the blood of a particular hormone and of calcium in healthy adults. The levels of the hormone and of calcium, each measured in suitable units, are denoted by \(x\) and \(y\) respectively. The doctor selects a random sample of 14 adults and measures the hormone and calcium levels in each of them. The spreadsheet in Fig. 5 shows the values obtained, together with a scatter diagram which illustrates the data. The equation of the regression line of \(y\) on \(x\) is shown on the scatter diagram, together with the value of the square of the product moment correlation coefficient. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ba3fcd3c-6834-4116-be0e-d5b27aed0a7e-5_801_1644_646_255} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure}
  1. Use the equation of the regression line to estimate the mean calcium level of people with the following hormone levels.
OCR MEI Further Statistics A AS 2021 November Q6
11 marks Moderate -0.3
6 A health researcher is investigating the relationship between age and maximum heart rate. A commonly quoted formula states that 'maximum heart rate \(= 220\) - age in years'. The researcher wants to check if this formula is a satisfactory model for people who work in the large hospital where she is employed. The researcher selects a random sample of 20 people who work in her hospital, and measures their maximum heart rates.
  1. Explain why the researcher selects a sample, rather than using all of the people who work in the hospital. The ages, \(x\) years, and maximum heart rates, \(y\) beats per minute, of the people in the researcher's sample are summarised as follows. \(n = 20 \quad \sum x = 922 \quad \sum y = 3638 \quad \sum x ^ { 2 } = 47250 \quad \sum y ^ { 2 } = 664610 \quad \sum x y = 164998\) These data are illustrated below. \includegraphics[max width=\textwidth, alt={}, center]{5be067ff-4668-48d6-8ed2-b8dfa3e678f7-5_758_1246_1027_244}
    1. Draw the line which represents the formula 'maximum heart rate \(= 220 -\) age in years' on the copy of the scatter diagram in the Printed Answer Booklet.
    2. Comment on how well this model fits the data.
  2. Determine the equation of the regression line of maximum heart rate on age.
  3. Use the equation of the regression line to predict the values of the maximum heart rate for each of the following ages.
OCR MEI Further Statistics Major 2020 November Q5
13 marks Moderate -0.3
5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, \(x\), and the web-based test, \(y\), are both measured in the same suitable units.
  1. Half of the participants do the laboratory-based test first and the other half do the web-based test first. Explain why the expert adopts this approach. The scatter diagram in Fig. 5 shows the data that the expert collected. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242} \captionsetup{labelformat=empty} \caption{Fig. 5}
    \end{figure} Summary statistics for these data are as follows. $$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
  2. Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
  3. Estimate the web-based scores of people whose laboratory-based scores were as follows.
    Stating the approximate coordinates of the outlier, suggest what the expert should do.
OCR MEI Further Statistics Major 2021 November Q8
16 marks Standard +0.3
8
  1. \(\mathrm { VO } _ { 2 \max }\) is a measure of athletic fitness. Since \(\mathrm { VO } _ { 2 \max }\) is fairly time-consuming and expensive to measure, an exercise scientist wants to predict \(\mathrm { VO } _ { 2 _ { \text {max } } }\) from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(y\), in suitable units. She also obtains accurate measurements of the \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(x\), in the same units. The scatter diagram in Fig. 8.1 shows the values of \(x\) and \(y\) obtained, together with the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317} \captionsetup{labelformat=empty} \caption{Fig. 8.1}
    \end{figure}
    1. Use the regression line to estimate the predicted \(\mathrm { VO } _ { 2 \text { max } }\) of an athlete whose accurately measured \(\mathrm { VO } _ { 2 \text { max } }\) is 50 .
    2. Comment on the reliability of your estimate.
    3. The equation of the regression line of \(x\) on \(y\) is \(x = 0.7565 y + 10.493\). Find the coordinates of the point at which the two regression lines meet.
    4. State what the point you found in part (iii) represents.
  2. It is known that there is negative correlation between \(\mathrm { VO } _ { 2 \text { max } }\) and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours and accurately measures their \(\mathrm { VO } _ { 2 \text { max } }\). Fig. 8.2 is a scatter diagram of accurately measured \(\mathrm { VO } _ { \text {2max } }\), \(v\) units, against best marathon time, \(t\) hours, for these runners. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319} \captionsetup{labelformat=empty} \caption{Fig. 8.2}
    \end{figure}
    1. Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid. Summary statistics for the 20 runners are as follows. $$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
    2. Find the value of Pearson's product moment correlation coefficient.
    3. Carry out a test at the \(5 \%\) significance level to investigate whether there is negative correlation between accurately measured \(\mathrm { VO } _ { 2 _ { \text {max } } }\) and best marathon time for runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours.