Convert regression equation between coded and original

Questions that require finding the regression equation in coded variables and then converting it to original variables, or vice versa, using the coding transformations.

13 questions

Edexcel S1 2016 January Q3
3. A publisher collects information about the amount spent on advertising, \(\pounds x\), and the sales, \(y\) books, for some of her publications. She collects information for a random sample of 8 textbooks and codes the data using \(v = \frac { x + 50 } { 200 }\) and \(s = \frac { y } { 1000 }\) to give
\(v\)0.608.104.300.401.606.402.505.10
\(s\)1.846.735.951.302.457.464.826.25
[You may use: \(\sum v = 29 \sum s = 36.8 \sum s ^ { 2 } = 209.72 \sum v s = 177.311 \quad \mathrm {~S} _ { v v } = 55.275\) ]
  1. Find \(\mathrm { S } _ { v s }\) and \(\mathrm { S } _ { s s }\)
  2. Calculate the product moment correlation coefficient for these data. The publisher believes that a linear regression model may be appropriate to describe these data.
  3. State, giving a reason, whether or not your answer to part (b) supports the publisher's belief.
  4. Find the equation of the regression line of \(s\) on \(v\), giving your answer in the form \(s = a + b v\)
  5. Hence find the equation of the regression line of \(y\) on \(x\) for the sample of textbooks, giving your answer in the form \(y = c + d x\) The publisher calculated the regression line for a sample of novels and obtained the equation $$y = 3100 + 1.2 x$$ She wants to increase the sales of books by spending more money on advertising.
  6. State, giving your reasons, whether the publisher should spend more money on advertising textbooks or novels.
Edexcel S1 2018 January Q5
5. Franca is the manager of an accountancy firm. She is investigating the relationship between the salary, \(\pounds x\), and the length of commute, \(y\) minutes, for employees at the firm. She collected this information from 9 randomly selected employees. The salary of each employee was then coded using \(w = \frac { x - 20000 } { 1000 }\) The table shows the values of \(w\) and \(y\) for the 9 employees.
\(w\)688- 125153- 219
\(y\)455035652540507520
(You may use \(\sum w = 81 \quad \sum y = 405 \quad \sum w y = 2490 \quad S _ { w w } = 660 \quad S _ { y y } = 2500\) )
  1. Calculate the salary of the employee with \(w = - 2\)
  2. Show that, to 3 significant figures, the value of the product moment correlation coefficient between \(w\) and \(y\) is - 0.899
  3. State, giving a reason, the value of the product moment correlation coefficient between \(x\) and \(y\) The least squares regression line of \(y\) on \(w\) is \(y = 60.75 - 1.75 w\)
  4. Find the equation of the least squares regression line of \(y\) on \(x\) giving your answer in the form \(y = a + b x\)
  5. Estimate the length of commute for an employee with a salary of \(\pounds 21000\) Franca uses the regression line to estimate the length of commute for employees with salaries between \(\pounds 25000\) and \(\pounds 40000\)
  6. State, giving a reason, whether or not these estimates are reliable.
Edexcel S1 2018 June Q1
  1. A random sample of 10 cars of different makes and sizes is taken and the published miles per gallon, \(p\), and the actual miles per gallon, \(m\), are recorded. The data are coded using variables \(x = \frac { p } { 10 }\) and \(y = m - 25\)
The results for the coded data are summarised below.
\(\boldsymbol { x }\)6.893.675.925.044.873.924.715.143.655.23
\(\boldsymbol { y }\)30322151381513.5319
(You may use \(\sum y ^ { 2 } = 2628.25 \quad \sum x y = 768.58 \quad \mathrm {~S} _ { x x } = 9.25924 \quad \mathrm {~S} _ { x y } = 74.664\) )
  1. Show that \(\mathrm { S } _ { y y } = 626.025\)
  2. Find the product moment correlation coefficient between \(x\) and \(y\).
  3. Give a reason to support fitting a regression model of the form \(y = a + b x\) to these data.
  4. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
    Give the value of \(a\) and the value of \(b\) to 3 significant figures. A car's published miles per gallon is 44
  5. Estimate the actual miles per gallon for this particular car.
  6. Comment on the reliability of your estimate in part (e). Give a reason for your answer.
Edexcel S1 2011 January Q4
  1. A farmer collected data on the annual rainfall, \(x \mathrm {~cm}\), and the annual yield of peas, \(p\) tonnes per acre.
The data for annual rainfall was coded using \(v = \frac { x - 5 } { 10 }\) and the following statistics were found. $$S _ { v v } = 5.753 \quad S _ { p v } = 1.688 \quad S _ { p p } = 1.168 \quad \bar { p } = 3.22 \quad \bar { v } = 4.42$$
  1. Find the equation of the regression line of \(p\) on \(v\) in the form \(p = a + b v\).
  2. Using your regression line estimate the annual yield of peas per acre when the annual rainfall is 85 cm .
Edexcel S1 2005 June Q3
  1. A long distance lorry driver recorded the distance travelled, \(m\) miles, and the amount of fuel used, \(f\) litres, each day. Summarised below are data from the driver's records for a random sample of 8 days.
The data are coded such that \(x = m - 250\) and \(y = f - 100\). $$\Sigma x = 130 \quad \Sigma y = 48 \quad \Sigma x y = 8880 \quad \mathrm {~S} _ { x x } = 20487.5$$
  1. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\).
  2. Hence find the equation of the regression line of \(f\) on \(m\).
  3. Predict the amount of fuel used on a journey of 235 miles.
Edexcel S1 2013 June Q1
  1. Sammy is studying the number of units of gas, \(g\), and the number of units of electricity, \(e\), used in her house each week. A random sample of 10 weeks use was recorded and the data for each week were coded so that \(x = \frac { g - 60 } { 4 }\) and \(y = \frac { e } { 10 }\). The results for the coded data are summarised below
$$\sum x = 48.0 \quad \sum y = 58.0 \quad \mathrm {~S} _ { x x } = 312.1 \quad \mathrm {~S} _ { y y } = 2.10 \quad \mathrm {~S} _ { x y } = 18.35$$
  1. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). Give the values of \(a\) and \(b\) correct to 3 significant figures.
  2. Hence find the equation of the regression line of \(e\) on \(g\) in the form \(e = c + d g\). Give the values of \(c\) and \(d\) correct to 2 significant figures.
  3. Use your regression equation to estimate the number of units of electricity used in a week when 100 units of gas were used.
    (a)Find the probability distribution of \(X\) .
    (b)Write down the value of \(\mathrm { F } ( 1.8 )\) .
    (a)Find the probability distribution of \(X\) .勤
Edexcel S1 2002 November Q5
5. An agricultural researcher collected data, in appropriate units, on the annual rainfall \(x\) and the annual yield of wheat \(y\) at 8 randomly selected places. The data were coded using \(s = x - 6\) and \(t = y - 20\) and the following summations were obtained. $$\Sigma s = 48.5 , \quad \Sigma t = 65.0 , \quad \Sigma s ^ { 2 } = 402.11 , \quad \Sigma t ^ { 2 } = 701.80 , \quad \Sigma s t = 523.23$$
  1. Find the equation of the regression line of \(t\) on \(s\) in the form \(t = p + q s\).
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\), giving \(a\) and \(b\) to 3 decimal places. The value of the product moment correlation coefficient between \(s\) and \(t\) is 0.943 , to 3 decimal places.
  3. Write down the value of the product moment correlation coefficient between \(x\) and \(y\). Give a justification for your answer.
Edexcel S1 Specimen Q4
4. A drilling machine can run at various speeds, but in general the higher the speed the sooner the drill needs to be replaced. Over several months, 15 pairs of observations relating to speed, \(s\) revolutions per minute, and life of drill, \(h\) hours, are collected. For convenience the data are coded so that \(x = s - 20\) and \(y = h - 100\) and the following summations obtained.
\(\Sigma x = 143 ; \Sigma y = 391 ; \Sigma x ^ { 2 } = 2413 ; \Sigma y ^ { 2 } = 22441 ; \Sigma x y = 484\).
  1. Find the equation of the regression line of \(h\) on \(s\).
  2. Interpret the slope of your regression line. Estimate the life of a drill revolving at 30 revolutions per minute.
    (2)
Edexcel S1 Q3
3. Twenty pairs of observations are made of two variables \(x\) and \(y\), which are believed to be related. It is found that $$\sum x = 200 , \quad \sum y = 174 , \quad \sum x ^ { 2 } = 6201 , \quad \sum y ^ { 2 } = 5102 , \quad \sum x y = 5200 .$$ Find
  1. the product-moment correlation coefficient between \(x\) and \(y\),
  2. the equation of the regression line of \(y\) on \(x\). Given that \(p = x + 30\) and \(q = y + 50\),
  3. find the equation of the regression line of \(q\) on \(p\), in the form \(q = m p + c\).
  4. Estimate the value of \(q\) when \(p = 46\), stating any assumptions you make.
Edexcel S1 Q6
6. In a survey for a computer magazine, the times \(t\) seconds taken by eight laser printers to print a page of text were compared with the prices \(\pounds p\) of the printers. The data were coded using the equations \(x = t - 10\) and \(y = p - 150\), and it was found that $$\sum x = 42 \cdot 4 , \quad \sum x ^ { 2 } = 314 \cdot 5 , \quad \sum y = 560 , \quad \sum y ^ { 2 } = 60600 , \quad \sum x y = 1592 .$$
  1. Find the mean time and the mean price for the eight printers.
  2. Find the variance of the times.
  3. Find the equation of the regression line of \(p\) on \(t\).
  4. Estimate the price of a printer which takes 11.3 seconds to print the page.
Edexcel S1 Q4
4. The owner of a mobile burger-bar believes that hot weather reduces his sales. To investigate the effect on his business he collected data on his daily sales, \(\pounds P\), and the maximum temperature, \(T ^ { \circ } \mathrm { C }\), on each of 20 days. He then coded the data, using \(x = T - 20\) and \(y = P - 300\), and calculated the summary statistics given below. $$\Sigma x = 57 , \quad \Sigma y = 2222 , \quad \Sigma x ^ { 2 } = 401 , \quad \Sigma y ^ { 2 } = 305576 , \quad \Sigma x y = 3871 .$$
  1. Find an equation of the regression line of \(P\) on \(T\). The owner of the bar doesn't believe it is profitable for him to run the bar if he takes less than \(\pounds 460\) in a day.
  2. According to your regression line at what maximum daily temperature, to the nearest degree Celsius, does it become unprofitable for him to run the bar?
    (3 marks)
Edexcel S1 Q4
  1. An engineer tested a new material under extreme conditions in a wind tunnel. He recorded the number of microfractures, \(n\), that formed and the wind speed, \(v\) metres per second, for 8 different values of \(v\) with all other conditions remaining constant. He then coded the data using \(x = v - 700\) and \(y = n - 20\) and calculated the following summary statistics.
$$\Sigma x = 100 , \quad \Sigma y = 23 , \quad \Sigma x ^ { 2 } = 215000 , \quad \Sigma x y = 11600 .$$
  1. Find an equation of the regression line of \(y\) on \(x\).
  2. Hence, find an equation of the regression line of \(n\) on \(v\).
  3. Use your regression line to estimate the number of microfractures that would be formed if the material was tested in a wind speed of 900 metres per second with all other conditions remaining constant.
    (2 marks)
Edexcel FS2 2024 June Q1
  1. Two students are experimenting with some water in a plastic bottle. The bottle is filled with water and a hole is put in the bottom of the bottle. The students record the time, \(t\) seconds, it takes for the water level to fall to each of 10 given values of the height, \(h \mathrm {~cm}\), above the hole.
Student \(A\) models the data with an equation of the form \(t = a + b \sqrt { h }\)
The data is coded using \(v = t - 40\) and \(w = \sqrt { h }\) and the following information is obtained. $$\sum v = 626 \quad \sum v ^ { 2 } = 64678 \quad \sum w = 22.47 \quad \mathrm {~S} _ { w w } = 4.52 \quad \mathrm {~S} _ { v w } = - 338.83$$
  1. Find the equation of the regression line of \(t\) on \(\sqrt { h }\) in the form \(t = a + b \sqrt { h }\) The time it takes the water level to fall to a height of 9 cm above the hole is 47 seconds.
  2. Calculate the residual for this data point. Give your answer to 2 decimal places. Given that the residual sum of squares (RSS) for the model of \(t\) on \(\sqrt { h }\) is the same as the RSS for the model of \(v\) on \(w\),
  3. calculate the RSS for these 10 data points. Student \(B\) models the data with an equation of the form \(t = c + d h\)
    The regression line of \(t\) on \(h\) is calculated and the residual sum of squares (RSS) is found to be 980 to 3 significant figures.
  4. With reference to part (c) state, giving a reason, whether Student B's model or Student A's model is the more suitable for these data.