Calculate y on x from summary statistics

Questions that provide summary statistics (sums, means, variances, Sxx, Sxy, etc.) and ask to find the regression line of y on x.

11 questions

Edexcel S1 2017 January Q3
  1. A scientist measured the salinity of water, \(x \mathrm {~g} / \mathrm { kg }\), and recorded the temperature at which the water froze, \(y ^ { \circ } \mathrm { C }\), for 12 different water samples. The summary statistics are listed below.
$$\begin{gathered} \sum x = 504 \quad \sum y = - 27 \quad \sum x ^ { 2 } = 22842 \quad \sum y ^ { 2 } = 62.98
\sum x y = - 1190.7 \quad \mathrm {~S} _ { x x } = 1674 \quad \mathrm {~S} _ { y y } = 2.23 \end{gathered}$$
  1. Find the mean and variance of the recorded temperatures.
    (3) Priya believes that the higher the salinity of water, the higher the temperature at which the water freezes.
    1. Calculate the product moment correlation coefficient between \(x\) and \(y\)
    2. State, with a reason, whether or not this value supports Priya's belief.
  2. Find the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  3. Estimate the temperature at which water freezes when the salinity is \(32 \mathrm {~g} / \mathrm { kg }\) The coding \(w = 1.8 y + 32\) is used to convert the recorded temperatures from \({ } ^ { \circ } \mathrm { C }\) to \({ } ^ { \circ } \mathrm { F }\)
  4. Find an equation of the least squares regression line of \(w\) on \(x\) in the form \(w = c + d x\)
  5. Find
    1. the variance of the recorded temperatures when converted to \({ } ^ { \circ } \mathrm { F }\)
    2. the product moment correlation coefficient between \(w\) and \(x\)
      \href{http://PhysicsAndMathsTutor.com}{PhysicsAndMathsTutor.com}
CAIE FP2 2019 June Q10
4 marks
10 The means and variances for a random sample of 8 pairs of values of \(x\) and \(y\) taken from a bivariate distribution are given in the following table.
MeanVariance
\(x\)3.31253.3086
\(y\)6.73757.9473
The product moment correlation coefficient for the sample is 0.5815 , correct to 4 decimal places.
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between \(x\) and \(y\). [4]
  3. Calculate an estimate of \(y\) when \(x = 6.0\) and comment on the reliability of your estimate.
CAIE FP2 2010 November Q10
10 For each month of a certain year, a weather station recorded the average rainfall per day, \(x \mathrm {~mm}\), and the average amount of sunshine per day, \(y\) hours. The results are summarised below. $$n = 12 , \quad \Sigma x = 24.29 , \quad \Sigma x ^ { 2 } = 50.146 , \quad \Sigma y = 45.8 , \quad \Sigma y ^ { 2 } = 211.16 , \quad \Sigma x y = 88.415 .$$
  1. Find the mean values, \(\bar { x }\) and \(\bar { y }\).
  2. Calculate the gradient of the line of regression of \(y\) on \(x\).
  3. Use the answers to parts (i) and (ii) to obtain the equation of the line of regression of \(y\) on \(x\).
  4. Find the product moment correlation coefficient and comment, in context, on its value.
  5. Stating your hypotheses, test at the \(1 \%\) level of significance whether there is negative correlation between average rainfall per day and average amount of sunshine per day.
CAIE FP2 2017 November Q11 OR
A large number of people attended a course to improve the speed of their logical thinking. The times taken to complete a particular type of logic puzzle at the beginning of the course and at the end of the course are recorded for each person. The time taken, in minutes, at the beginning of the course is denoted by \(x\) and the time taken, in minutes, at the end of the course is denoted by \(y\). For a random sample of 9 people, the results are summarised as follows. $$\Sigma x = 45.3 \quad \Sigma x ^ { 2 } = 245.59 \quad \Sigma y = 40.5 \quad \Sigma y ^ { 2 } = 195.11 \quad \Sigma x y = 218.72$$ Ken attended the course, but his time to complete the puzzle at the beginning of the course was not recorded. His time to complete the puzzle at the end of the course was 4.2 minutes.
  1. By finding, showing all necessary working, the equation of a suitable regression line, find an estimate for the time that Ken would have taken to complete the puzzle at the beginning of the course.
    The values of \(x - y\) for the sample of 9 people are as follows. $$\begin{array} { l l l l l l l l l } 0.2 & 0.8 & 0.5 & 1.0 & 0.2 & 0.6 & 0.2 & 0.5 & 0.8 \end{array}$$ The organiser of the course believes that, on average, the time taken to complete the puzzle decreases between the beginning and the end of the course by more than 0.3 minutes.
  2. Stating suitable hypotheses and assuming a normal distribution, test the organiser's belief at the \(2 \frac { 1 } { 2 } \%\) significance level.
Edexcel S1 2022 June Q2
  1. Stuart is investigating the relationship between Gross Domestic Product (GDP) and the size of the population for a particular country.
    He takes a random sample of 9 years and records the size of the population, \(t\) millions, and the GDP, \(g\) billion dollars for each of these years.
The data are summarised as $$n = 9 \quad \sum t = 7.87 \quad \sum g = 144.84 \quad \sum g ^ { 2 } = 3624.41 \quad S _ { t t } = 1.29 \quad S _ { t g } = 40.25$$
  1. Calculate the product moment correlation coefficient between \(t\) and \(g\)
  2. Give an interpretation of your product moment correlation coefficient.
  3. Find the equation of the least squares regression line of \(g\) on \(t\) in the form \(g = a + b t\)
  4. Give an interpretation of the value of \(b\) in your regression line.
    1. Use the regression line from part (c) to estimate the GDP, in billions of dollars, for a population of 7000000
    2. Comment on the reliability of your answer in part (i). Give a reason, in context, for your answer. Using the regression line from part (c), Stuart estimates that for a population increase of \(x\) million there will be an increase of 0.1 billion dollars in GDP.
  5. Find the value of \(x\)
Edexcel S1 2016 October Q4
  1. A doctor is studying the scans of 30 -week old foetuses. She takes a random sample of 8 scans and measures the length, \(f \mathrm {~mm}\), of the leg bone called the femur. She obtains the following results.
$$\begin{array} { l l l l l l l l } 52 & 53 & 56 & 57 & 57 & 59 & 60 & 62 \end{array}$$
  1. Show that \(\mathrm { S } _ { f f } = 80\) The doctor also measures the head circumference, \(h \mathrm {~mm}\), of each foetus and her results are summarised as $$\sum h = 2209 \quad \sum h ^ { 2 } = 610463 \quad \mathrm {~S} _ { f h } = 182$$
  2. Find \(\mathrm { S } _ { h h }\)
  3. Calculate the product moment correlation coefficient between the length of the femur and the head circumference for these data. The doctor believes that there is a linear relationship between the length of the femur and the head circumference of 30-week old foetuses.
  4. State, giving a reason, whether or not your calculation in part (c) supports the doctor's belief.
  5. Find an equation of the regression line of \(h\) on \(f\). The doctor plans in future to measure the femur length, \(f\), and then use the regression line to estimate the corresponding head circumference, \(h\). A statistician points out that there will always be the chance of an error between the true head circumference and the estimated value of the head circumference. Given that the error, \(E \mathrm {~mm}\), has the normal distribution \(\mathrm { N } \left( 0,4 ^ { 2 } \right)\)
  6. find the probability that the estimate is within 3 mm of the true value.
Edexcel S1 Q4
4. An internet service provider runs a series of television adverts at weekly intervals. To investigate the effectiveness of the adverts the company recorded the viewing figures in millions, \(v\), for the programme in which the advert was shown, and the number of new customers, \(c\), who signed up for their service the next day. The results are summarised as follows. $$\bar { v } = 4.92 , \quad \bar { c } = 104.4 , \quad S _ { v c } = 594.05 , \quad S _ { v v } = 85.44 .$$
  1. Calculate the equation of the regression line of \(c\) on \(v\) in the form \(c = a + b v\).
  2. Give an interpretation of the constants \(a\) and \(b\) in this context.
  3. Estimate the number of customers that will sign up with the company the day after an advert is shown during a programme watched by 3.7 million viewers.
  4. State two other factors besides viewing figures that will affect the success of an advert in gaining new customers for the company.
Edexcel FS2 AS Specimen Q3
  1. A scientist wants to develop a model to describe the relationship between the average daily temperature, \(\mathrm { x } ^ { \circ } \mathrm { C }\), and a household's daily energy consumption, ykWh , in winter.
A random sample of the average temperature and energy consumption are taken from 10 winter days and are summarised below. $$\begin{gathered} \sum x = 12 \quad \sum x ^ { 2 } = 24.76 \quad \sum y = 251 \quad \sum y ^ { 2 } = 6341 \quad \sum x y = 284.8
S _ { x x } = 10.36 \quad S _ { y y } = 40.9 \end{gathered}$$
  1. Find the product moment correlation coefficient between y and x .
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\)
  3. Use your equation to estimate the daily energy consumption when the average daily temperature is \(2 ^ { \circ } \mathrm { C }\)
  4. Calculate the residual sum of squares (RSS). The table shows the residual for each value of x .
    \(\mathbf { x }\)- 0.4- 0.20.30.81.11.41.82.12.52.6
    R esidual- 0.63- 0.32- 0.52- 0.730.742.221.840.32\(f\)- 1.88
  5. Find the value of f.
  6. By considering the signs of the residuals, explain whether or not the linear regression model is a suitable model for these data.
SPS SPS FM Statistics 2022 February Q1
  1. At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows.
$$\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297$$
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(c = \frac { 5 } { 9 } ( f - 32 )\). Find the equation of the new regression line.
    [0pt] [BLANK PAGE]
SPS SPS FM Statistics 2024 January Q2
2. At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows. $$\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297$$
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(c = \frac { 5 } { 9 } ( f - 32 )\). Find the equation of the new regression line.
Edexcel S1 2021 October Q2
2. A large company is analysing how much money it spends on paper in its offices each year. The number of employees in the office, \(x\), and the amount spent on paper in a year, \(p\) (\$ hundreds), in each of 12 randomly selected offices were recorded. The results are summarised in the following statistics. $$\sum x = 93 \quad \mathrm {~S} _ { x x } = 148.25 \quad \sum p = 273 \quad \sum p ^ { 2 } = 6602.72 \quad \sum x p = 2347$$
  1. Show that \(\mathrm { S } _ { x p } = 231.25\)
  2. Find the product moment correlation coefficient for these data.
  3. Find the equation of the regression line of \(p\) on \(x\) in the form \(p = a + b x\)
  4. Give an interpretation of the gradient of your regression line. The director of the company wants to reduce the amount spent on paper each year. He wants each office to aim for a model of the form \(p = \frac { 4 } { 5 } a + \frac { 1 } { 2 } b x\), where \(a\) and \(b\) are the values found in part (c). Using the data for the 93 employees from the 12 offices,
  5. estimate the percentage saving in the amount spent on paper each year by the company using the director's model.