Calculate y on x from summary statistics

Questions that provide summary statistics (sums, means, variances, Sxx, Sxy, etc.) and ask to find the regression line of y on x.

12 questions · Moderate -0.3

5.09c Calculate regression line
Sort by: Default | Easiest first | Hardest first
Edexcel S1 2017 January Q3
17 marks Moderate -0.3
  1. A scientist measured the salinity of water, \(x \mathrm {~g} / \mathrm { kg }\), and recorded the temperature at which the water froze, \(y ^ { \circ } \mathrm { C }\), for 12 different water samples. The summary statistics are listed below.
$$\begin{gathered} \sum x = 504 \quad \sum y = - 27 \quad \sum x ^ { 2 } = 22842 \quad \sum y ^ { 2 } = 62.98 \\ \sum x y = - 1190.7 \quad \mathrm {~S} _ { x x } = 1674 \quad \mathrm {~S} _ { y y } = 2.23 \end{gathered}$$
  1. Find the mean and variance of the recorded temperatures.
    (3) Priya believes that the higher the salinity of water, the higher the temperature at which the water freezes.
    1. Calculate the product moment correlation coefficient between \(x\) and \(y\)
    2. State, with a reason, whether or not this value supports Priya's belief.
  2. Find the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  3. Estimate the temperature at which water freezes when the salinity is \(32 \mathrm {~g} / \mathrm { kg }\) The coding \(w = 1.8 y + 32\) is used to convert the recorded temperatures from \({ } ^ { \circ } \mathrm { C }\) to \({ } ^ { \circ } \mathrm { F }\)
  4. Find an equation of the least squares regression line of \(w\) on \(x\) in the form \(w = c + d x\)
  5. Find
    1. the variance of the recorded temperatures when converted to \({ } ^ { \circ } \mathrm { F }\)
    2. the product moment correlation coefficient between \(w\) and \(x\) \href{http://PhysicsAndMathsTutor.com}{PhysicsAndMathsTutor.com}
CAIE FP2 2019 June Q10
12 marks Moderate -0.3
10 The means and variances for a random sample of 8 pairs of values of \(x\) and \(y\) taken from a bivariate distribution are given in the following table.
MeanVariance
\(x\)3.31253.3086
\(y\)6.73757.9473
The product moment correlation coefficient for the sample is 0.5815 , correct to 4 decimal places.
  1. Find the equation of the regression line of \(y\) on \(x\).
  2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between \(x\) and \(y\). [4]
  3. Calculate an estimate of \(y\) when \(x = 6.0\) and comment on the reliability of your estimate.
CAIE FP2 2017 November Q11 OR
Moderate -0.3
A large number of people attended a course to improve the speed of their logical thinking. The times taken to complete a particular type of logic puzzle at the beginning of the course and at the end of the course are recorded for each person. The time taken, in minutes, at the beginning of the course is denoted by \(x\) and the time taken, in minutes, at the end of the course is denoted by \(y\). For a random sample of 9 people, the results are summarised as follows. $$\Sigma x = 45.3 \quad \Sigma x ^ { 2 } = 245.59 \quad \Sigma y = 40.5 \quad \Sigma y ^ { 2 } = 195.11 \quad \Sigma x y = 218.72$$ Ken attended the course, but his time to complete the puzzle at the beginning of the course was not recorded. His time to complete the puzzle at the end of the course was 4.2 minutes.
  1. By finding, showing all necessary working, the equation of a suitable regression line, find an estimate for the time that Ken would have taken to complete the puzzle at the beginning of the course.
    The values of \(x - y\) for the sample of 9 people are as follows. $$\begin{array} { l l l l l l l l l } 0.2 & 0.8 & 0.5 & 1.0 & 0.2 & 0.6 & 0.2 & 0.5 & 0.8 \end{array}$$ The organiser of the course believes that, on average, the time taken to complete the puzzle decreases between the beginning and the end of the course by more than 0.3 minutes.
  2. Stating suitable hypotheses and assuming a normal distribution, test the organiser's belief at the \(2 \frac { 1 } { 2 } \%\) significance level.
OCR Further Statistics 2023 June Q2
8 marks Standard +0.3
2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, \(\pounds P\), of the most expensive tickets and the number of people in the audience, \(H\) hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.
\(P\) (£)7565554535
\multirow[t]{5}{*}{\(H\) (hundred)}2727272615
2727202112
2218169
191813
12169
\(\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535\)
  1. Calculate the equation of the regression line of \(h\) on \(p\).
  2. State what change, if any, there would be to your answer to part (a) if \(H\) had been measured in thousands (to 1 decimal place) rather than in hundreds. For a special charity concert, the most expensive tickets cost \(\pounds 50\).
  3. Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to \(\mathbf { 1 }\) decimal place.
  4. Comment on the reliability of your answer to part (c). You should refer to
Edexcel S1 2022 June Q2
14 marks Moderate -0.8
  1. Stuart is investigating the relationship between Gross Domestic Product (GDP) and the size of the population for a particular country.
    He takes a random sample of 9 years and records the size of the population, \(t\) millions, and the GDP, \(g\) billion dollars for each of these years.
The data are summarised as $$n = 9 \quad \sum t = 7.87 \quad \sum g = 144.84 \quad \sum g ^ { 2 } = 3624.41 \quad S _ { t t } = 1.29 \quad S _ { t g } = 40.25$$
  1. Calculate the product moment correlation coefficient between \(t\) and \(g\)
  2. Give an interpretation of your product moment correlation coefficient.
  3. Find the equation of the least squares regression line of \(g\) on \(t\) in the form \(g = a + b t\)
  4. Give an interpretation of the value of \(b\) in your regression line.
    1. Use the regression line from part (c) to estimate the GDP, in billions of dollars, for a population of 7000000
    2. Comment on the reliability of your answer in part (i). Give a reason, in context, for your answer. Using the regression line from part (c), Stuart estimates that for a population increase of \(x\) million there will be an increase of 0.1 billion dollars in GDP.
  5. Find the value of \(x\)
Edexcel S1 2016 October Q4
15 marks Moderate -0.3
  1. A doctor is studying the scans of 30 -week old foetuses. She takes a random sample of 8 scans and measures the length, \(f \mathrm {~mm}\), of the leg bone called the femur. She obtains the following results.
$$\begin{array} { l l l l l l l l } 52 & 53 & 56 & 57 & 57 & 59 & 60 & 62 \end{array}$$
  1. Show that \(\mathrm { S } _ { f f } = 80\) The doctor also measures the head circumference, \(h \mathrm {~mm}\), of each foetus and her results are summarised as $$\sum h = 2209 \quad \sum h ^ { 2 } = 610463 \quad \mathrm {~S} _ { f h } = 182$$
  2. Find \(\mathrm { S } _ { h h }\)
  3. Calculate the product moment correlation coefficient between the length of the femur and the head circumference for these data. The doctor believes that there is a linear relationship between the length of the femur and the head circumference of 30-week old foetuses.
  4. State, giving a reason, whether or not your calculation in part (c) supports the doctor's belief.
  5. Find an equation of the regression line of \(h\) on \(f\). The doctor plans in future to measure the femur length, \(f\), and then use the regression line to estimate the corresponding head circumference, \(h\). A statistician points out that there will always be the chance of an error between the true head circumference and the estimated value of the head circumference. Given that the error, \(E \mathrm {~mm}\), has the normal distribution \(\mathrm { N } \left( 0,4 ^ { 2 } \right)\)
  6. find the probability that the estimate is within 3 mm of the true value.
Edexcel S1 Q4
10 marks Moderate -0.8
4. An internet service provider runs a series of television adverts at weekly intervals. To investigate the effectiveness of the adverts the company recorded the viewing figures in millions, \(v\), for the programme in which the advert was shown, and the number of new customers, \(c\), who signed up for their service the next day. The results are summarised as follows. $$\bar { v } = 4.92 , \quad \bar { c } = 104.4 , \quad S _ { v c } = 594.05 , \quad S _ { v v } = 85.44 .$$
  1. Calculate the equation of the regression line of \(c\) on \(v\) in the form \(c = a + b v\).
  2. Give an interpretation of the constants \(a\) and \(b\) in this context.
  3. Estimate the number of customers that will sign up with the company the day after an advert is shown during a programme watched by 3.7 million viewers.
  4. State two other factors besides viewing figures that will affect the success of an advert in gaining new customers for the company.
Edexcel FS2 AS Specimen Q3
11 marks Standard +0.3
  1. A scientist wants to develop a model to describe the relationship between the average daily temperature, \(\mathrm { x } ^ { \circ } \mathrm { C }\), and a household's daily energy consumption, ykWh , in winter.
A random sample of the average temperature and energy consumption are taken from 10 winter days and are summarised below. $$\begin{gathered} \sum x = 12 \quad \sum x ^ { 2 } = 24.76 \quad \sum y = 251 \quad \sum y ^ { 2 } = 6341 \quad \sum x y = 284.8 \\ S _ { x x } = 10.36 \quad S _ { y y } = 40.9 \end{gathered}$$
  1. Find the product moment correlation coefficient between y and x .
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\)
  3. Use your equation to estimate the daily energy consumption when the average daily temperature is \(2 ^ { \circ } \mathrm { C }\)
  4. Calculate the residual sum of squares (RSS). The table shows the residual for each value of x .
    \(\mathbf { x }\)- 0.4- 0.20.30.81.11.41.82.12.52.6
    R esidual- 0.63- 0.32- 0.52- 0.730.742.221.840.32\(f\)- 1.88
  5. Find the value of f.
  6. By considering the signs of the residuals, explain whether or not the linear regression model is a suitable model for these data.
Edexcel S1 2021 October Q2
12 marks Moderate -0.5
2. A large company is analysing how much money it spends on paper in its offices each year. The number of employees in the office, \(x\), and the amount spent on paper in a year, \(p\) (\$ hundreds), in each of 12 randomly selected offices were recorded. The results are summarised in the following statistics. $$\sum x = 93 \quad \mathrm {~S} _ { x x } = 148.25 \quad \sum p = 273 \quad \sum p ^ { 2 } = 6602.72 \quad \sum x p = 2347$$
  1. Show that \(\mathrm { S } _ { x p } = 231.25\)
  2. Find the product moment correlation coefficient for these data.
  3. Find the equation of the regression line of \(p\) on \(x\) in the form \(p = a + b x\)
  4. Give an interpretation of the gradient of your regression line. The director of the company wants to reduce the amount spent on paper each year. He wants each office to aim for a model of the form \(p = \frac { 4 } { 5 } a + \frac { 1 } { 2 } b x\), where \(a\) and \(b\) are the values found in part (c). Using the data for the 93 employees from the 12 offices,
  5. estimate the percentage saving in the amount spent on paper each year by the company using the director's model.
Pre-U Pre-U 9794/3 2018 June Q2
9 marks Moderate -0.3
2 A teacher is monitoring the progress of students. The length of time, \(x\) hours, spent revising in a given week is compared to the score, \(y\), achieved in an assessment at the end of the week. The scatter diagram for a random sample of 8 students is shown below. \includegraphics[max width=\textwidth, alt={}, center]{35d24778-1203-4d5d-be4b-bb375344fe09-2_866_967_715_589} The data are summarised as \(\Sigma x = 24.6 , \Sigma y = 404 , \Sigma x ^ { 2 } = 105.56 , \Sigma y ^ { 2 } = 20820\) and \(\Sigma x y = 1350.2\).
  1. Find the equation of the least squares regression line of \(y\) on \(x\).
  2. Calculate the product moment correlation coefficient for the data.
  3. A ninth student, Jane, revises for 1.5 hours.
    1. Estimate her score in the assessment.
    2. Comment on the reliability of this estimate.
Pre-U Pre-U 9794/1 Specimen Q13
9 marks Moderate -0.3
13 A seed company investigated how well African Marigold seeds germinated when the seeds were past their sell-by date. The table shows the average number of seeds which germinated per packet, \(y\), and the number of months past their sell-by date, \(t\).
\(t\)1020304050
\(y\)24.524.021.718.612.4
The summary data for the investigation were as follows. $$\Sigma t = 150 \quad \Sigma t ^ { 2 } = 5500 \quad \Sigma y = 101.2 \quad \Sigma y ^ { 2 } = 2146.86 \quad \Sigma t y = 2740$$
  1. Calculate the equation of the regression line of \(y\) on \(t\).
  2. Use your regression line to calculate \(y\) when \(t = 10\). Compare your answer with the value of \(y\) when \(t = 10\) in the table and comment on the result.
  3. Use your regression line to calculate \(y\) when \(t = 100\). Comment on the validity of this result.
  4. Suggest with reasons whether the regression line provides a good model for predicting the germination of seeds past their sell-by date.
WJEC Further Unit 2 2018 June Q7
7 marks Moderate -0.3
A university professor conducted some research into factors that affect job satisfaction. The four factors considered were Interesting work, Good wages, Job security and Appreciation of work done. The professor interviewed workers at 14 different companies and asked them to rate their companies on each of the factors. The workers' ratings were averaged to give each company a score out of 5 on each factor. Each company was also given a score out of 100 for Job satisfaction. The following graph shows the part of the research concerning Job Satisfaction versus Interesting work. \includegraphics{figure_2}
  1. Calculate the equation of the least squares regression line of Job satisfaction (\(y\)) on Interesting work (\(x\)), given the following summary statistics. [5] \(\sum x = 46 \cdot 2\), \quad \(\sum y = 898\), \quad \(S_{xx} = 3 \cdot 48\) \(S_{xy} = 49 \cdot 45\), \quad \(S_{yy} = 1437 \cdot 714\), \quad \(n = 14\)
  2. Give two reasons why it would be inappropriate for the professor to use this equation to calculate the score for Interesting work from a Job satisfaction score of 90. [2]