Calculate from summary statistics

A question is this sub-type if and only if it provides summary statistics (such as Σx, Σy, Σx², Σy², Σxy, n) and asks to calculate Sxx, Syy, or Sxy using the standard formulas.

7 questions · Moderate -0.3

5.08a Pearson correlation: calculate pmcc5.09c Calculate regression line5.09e Use regression: for estimation in context
Sort by: Default | Easiest first | Hardest first
Edexcel S1 2015 June Q2
13 marks Moderate -0.3
2. Paul believes there is a relationship between the value and the floor size of a house. He takes a random sample of 20 houses and records the value, \(\pounds v\), and the floor size, \(s \mathrm {~m} ^ { 2 }\) The data were coded using \(x = \frac { s - 50 } { 10 }\) and \(y = \frac { v } { 100000 }\) and the following statistics obtained. $$\sum x = 441.5 , \quad \sum y = 59.8 , \quad \sum x ^ { 2 } = 11261.25 , \quad \sum y ^ { 2 } = 196.66 , \quad \sum x y = 1474.1$$
  1. Find the value of \(S _ { x y }\) and the value of \(S _ { x x }\)
  2. Find the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) The least squares regression line of \(v\) on \(s\) is \(v = c + d s\)
  3. Show that \(d = 1020\) to 3 significant figures and find the value of \(c\)
  4. Estimate the value of a house of floor size \(130 \mathrm {~m} ^ { 2 }\)
  5. Interpret the value \(d\) Paul wants to increase the value of his house. He decides to add an extension to increase the floor size by \(31 \mathrm {~m} ^ { 2 }\)
  6. Estimate the increase in the value of Paul's house after adding the extension.
Edexcel S1 2020 June Q5
15 marks Moderate -0.3
  1. A large company rents shops in different parts of the country. A random sample of 10 shops was taken and the floor area, \(x\) in \(10 \mathrm {~m} ^ { 2 }\), and the annual rent, \(y\) in thousands of dollars, were recorded.
    The data are summarised by the following statistics
$$\sum x = 900 \quad \sum x ^ { 2 } = 84818 \quad \sum y = 183 \quad \sum y ^ { 2 } = 3434$$ and the regression line of \(y\) on \(x\) has equation \(y = 6.066 + 0.136 x\)
  1. Use the regression line to estimate the annual rent in dollars for a shop with a floor area of \(800 \mathrm {~m} ^ { 2 }\)
  2. Find \(\mathrm { S } _ { y y }\) and \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(y\) and \(x\). An 11th shop is added to the sample. The floor area is \(900 \mathrm {~m} ^ { 2 }\) and the annual rent is 15000 dollars.
  4. Use the formula \(\mathrm { S } _ { x y } = \sum ( x - \bar { x } ) ( y - \bar { y } )\) to show that the value of \(\mathrm { S } _ { x y }\) for the 11 shops will be the same as it was for the original 10 shops.
  5. Find the new equation of the regression line of \(y\) on \(x\) for the 11 shops. The company is considering renting a larger shop with area of \(3000 \mathrm {~m} ^ { 2 }\)
  6. Comment on the suitability of using the new regression line to estimate the annual rent. Give a reason for your answer.
Edexcel S1 2021 June Q6
16 marks Standard +0.3
  1. Two economics students, Andi and Behrouz, are studying some data relating to unemployment, \(x \%\), and increase in wages, \(y \%\), for a European country. The least squares regression line of \(y\) on \(x\) has equation
$$y = 3.684 - 0.3242 x$$ and $$\sum y = 23.7 \quad \sum y ^ { 2 } = 42.63 \quad \sum x ^ { 2 } = 756.81 \quad n = 16$$
  1. Show that \(\mathrm { S } _ { y y } = 7.524375\)
  2. Find \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(x\) and \(y\). Behrouz claims that, assuming the model is valid, the data show that when unemployment is 2\% wages increase at over 3\%
  4. Explain how Behrouz could have come to this conclusion. Andi uses the formula $$\text { range } = \text { mean } \pm 3 \times \text { standard deviation }$$ to estimate the range of values for \(x\).
  5. Find estimates of the minimum value and the maximum value of \(x\) in these data using Andi's formula.
  6. Comment, giving a reason, on the reliability of Behrouz's claim. Andi suggests using the regression line with equation \(y = 3.684 - 0.3242 x\) to estimate unemployment when wages are increasing at \(2 \%\)
  7. Comment, giving a reason, on Andi's suggestion.
    \includegraphics[max width=\textwidth, alt={}]{a439724e-b570-434d-bf75-de2b50915042-20_2647_1835_118_116}
Edexcel S1 2022 October Q2
13 marks Moderate -0.5
  1. The production cost, \(\pounds c\) million, of a film and the total ticket sales, \(\pounds t\) million, earned by the film are recorded for a sample of 40 films.
Some summary statistics are given below. $$\sum c = 1634 \quad \sum t = 1361 \quad \sum t ^ { 2 } = 82873 \quad \sum c t = 83634 \quad \mathrm {~S} _ { c c } = 28732.1$$
  1. Find the exact value of \(\mathrm { S } _ { t t }\) and the exact value of \(\mathrm { S } _ { c t }\)
  2. Calculate the value of the product moment correlation coefficient for these data.
  3. Give an interpretation of your answer to part (b)
  4. Show that the equation of the linear regression line of \(t\) on \(c\) can be written as $$t = - 5.84 + 0.976 c$$ where the values of the intercept and gradient are given to 3 significant figures.
  5. Find the expected total ticket sales for a film with a production cost of \(\pounds 90\) million. Using the regression line in part (d)
  6. find the range of values of the production cost of a film for which the total ticket sales are less than \(80 \%\) of its production cost.
Edexcel S1 2017 June Q1
14 marks Moderate -0.5
  1. A clothes shop manager records the weekly sales figures, \(\pounds s\), and the average weekly temperature, \(t ^ { \circ } \mathrm { C }\), for 6 weeks during the summer. The sales figures were coded so that \(w = \frac { s } { 1000 }\)
The data are summarised as follows $$\mathrm { S } _ { w w } = 50 \quad \sum w t = 784 \quad \sum t ^ { 2 } = 2435 \quad \sum t = 119 \quad \sum w = 42$$
  1. Find \(\mathrm { S } _ { w t }\) and \(\mathrm { S } _ { t t }\)
  2. Write down the value of \(\mathrm { S } _ { s s }\) and the value of \(\mathrm { S } _ { s t }\)
  3. Find the product moment correlation coefficient between \(s\) and \(t\). The manager of the clothes shop believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the manager's belief.
  5. Find the equation of the regression line of \(w\) on \(t\), giving your answer in the form \(w = a + b t\)
  6. Hence find the equation of the regression line of \(s\) on \(t\), giving your answer in the form \(s = c + d t\), where \(c\) and \(d\) are correct to 3 significant figures.
  7. Using your equation in part (f), interpret the effect of a \(1 ^ { \circ } \mathrm { C }\) increase in average weekly temperature on weekly sales during the summer.
Edexcel S1 2017 June Q5
15 marks Moderate -0.3
  1. Tomas is studying the relationship between temperature and hours of sunshine in Seapron. He records the midday temperature, \(t ^ { \circ } \mathrm { C }\), and the hours of sunshine, \(s\) hours, for a random sample of 9 days in October. He calculated the following statistics
$$\sum s = 15 \quad \sum s ^ { 2 } = 44.22 \quad \sum t = 127 \quad \mathrm {~S} _ { t t } = 10.89$$
  1. Calculate \(\mathrm { S } _ { s s }\) Tomas calculated the product moment correlation coefficient between \(s\) and \(t\) to be 0.832 correct to 3 decimal places.
  2. State, giving a reason, whether or not this correlation coefficient supports the use of a linear regression model to describe the relationship between midday temperature and hours of sunshine.
  3. State, giving a reason, why the hours of sunshine would be the explanatory variable in a linear regression model between midday temperature and hours of sunshine.
  4. Find \(\mathrm { S } _ { s t }\)
  5. Calculate a suitable linear regression equation to model the relationship between midday temperature and hours of sunshine.
  6. Calculate the standard deviation of \(s\) Tomas uses this model to estimate the midday temperature in Seapron for a day in October with 5 hours of sunshine.
  7. State the value of Tomas' estimate. Given that the values of \(s\) are all within 2 standard deviations of the mean,
  8. comment, giving your reason, on the reliability of this estimate.
Edexcel S1 2011 June Q7
12 marks Moderate -0.8
A teacher took a random sample of 8 children from a class. For each child the teacher recorded the length of their left foot, \(f\) cm, and their height, \(h\) cm. The results are given in the table below.
\(f\)2326232227242021
\(h\)135144134136140134130132
(You may use \(\sum f = 186 \quad \sum h = 1085 \quad S_{ff} = 39.5 \quad S_{hh} = 139.875 \quad \sum fh = 25291\))
  1. Calculate \(S_{fh}\) [2]
  2. Find the equation of the regression line of \(h\) on \(f\) in the form \(h = a + bf\). Give the value of \(a\) and the value of \(b\) correct to 3 significant figures. [5]
  3. Use your equation to estimate the height of a child with a left foot length of 25 cm. [2]
  4. Comment on the reliability of your estimate in (c), giving a reason for your answer. [2]
The left foot length of the teacher is 25 cm.
  1. Give a reason why the equation in (b) should not be used to estimate the teacher's height. [1]