Calculate from summary statistics

A question is this sub-type if and only if it provides summary statistics (such as Σx, Σy, Σx², Σy², Σxy, n) and asks to calculate Sxx, Syy, or Sxy using the standard formulas.

6 questions

Edexcel S1 2015 June Q2
2. Paul believes there is a relationship between the value and the floor size of a house. He takes a random sample of 20 houses and records the value, \(\pounds v\), and the floor size, \(s \mathrm {~m} ^ { 2 }\) The data were coded using \(x = \frac { s - 50 } { 10 }\) and \(y = \frac { v } { 100000 }\) and the following statistics obtained. $$\sum x = 441.5 , \quad \sum y = 59.8 , \quad \sum x ^ { 2 } = 11261.25 , \quad \sum y ^ { 2 } = 196.66 , \quad \sum x y = 1474.1$$
  1. Find the value of \(S _ { x y }\) and the value of \(S _ { x x }\)
  2. Find the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) The least squares regression line of \(v\) on \(s\) is \(v = c + d s\)
  3. Show that \(d = 1020\) to 3 significant figures and find the value of \(c\)
  4. Estimate the value of a house of floor size \(130 \mathrm {~m} ^ { 2 }\)
  5. Interpret the value \(d\) Paul wants to increase the value of his house. He decides to add an extension to increase the floor size by \(31 \mathrm {~m} ^ { 2 }\)
  6. Estimate the increase in the value of Paul's house after adding the extension.
Edexcel S1 2020 June Q5
  1. A large company rents shops in different parts of the country. A random sample of 10 shops was taken and the floor area, \(x\) in \(10 \mathrm {~m} ^ { 2 }\), and the annual rent, \(y\) in thousands of dollars, were recorded.
    The data are summarised by the following statistics
$$\sum x = 900 \quad \sum x ^ { 2 } = 84818 \quad \sum y = 183 \quad \sum y ^ { 2 } = 3434$$ and the regression line of \(y\) on \(x\) has equation \(y = 6.066 + 0.136 x\)
  1. Use the regression line to estimate the annual rent in dollars for a shop with a floor area of \(800 \mathrm {~m} ^ { 2 }\)
  2. Find \(\mathrm { S } _ { y y }\) and \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(y\) and \(x\). An 11th shop is added to the sample. The floor area is \(900 \mathrm {~m} ^ { 2 }\) and the annual rent is 15000 dollars.
  4. Use the formula \(\mathrm { S } _ { x y } = \sum ( x - \bar { x } ) ( y - \bar { y } )\) to show that the value of \(\mathrm { S } _ { x y }\) for the 11 shops will be the same as it was for the original 10 shops.
  5. Find the new equation of the regression line of \(y\) on \(x\) for the 11 shops. The company is considering renting a larger shop with area of \(3000 \mathrm {~m} ^ { 2 }\)
  6. Comment on the suitability of using the new regression line to estimate the annual rent. Give a reason for your answer.
Edexcel S1 2021 June Q6
  1. Two economics students, Andi and Behrouz, are studying some data relating to unemployment, \(x \%\), and increase in wages, \(y \%\), for a European country. The least squares regression line of \(y\) on \(x\) has equation
$$y = 3.684 - 0.3242 x$$ and $$\sum y = 23.7 \quad \sum y ^ { 2 } = 42.63 \quad \sum x ^ { 2 } = 756.81 \quad n = 16$$
  1. Show that \(\mathrm { S } _ { y y } = 7.524375\)
  2. Find \(\mathrm { S } _ { x x }\)
  3. Find the product moment correlation coefficient between \(x\) and \(y\). Behrouz claims that, assuming the model is valid, the data show that when unemployment is 2\% wages increase at over 3\%
  4. Explain how Behrouz could have come to this conclusion. Andi uses the formula $$\text { range } = \text { mean } \pm 3 \times \text { standard deviation }$$ to estimate the range of values for \(x\).
  5. Find estimates of the minimum value and the maximum value of \(x\) in these data using Andi's formula.
  6. Comment, giving a reason, on the reliability of Behrouz's claim. Andi suggests using the regression line with equation \(y = 3.684 - 0.3242 x\) to estimate unemployment when wages are increasing at \(2 \%\)
  7. Comment, giving a reason, on Andi's suggestion.
    \includegraphics[max width=\textwidth, alt={}]{a439724e-b570-434d-bf75-de2b50915042-20_2647_1835_118_116}
Edexcel S1 2022 October Q2
  1. The production cost, \(\pounds c\) million, of a film and the total ticket sales, \(\pounds t\) million, earned by the film are recorded for a sample of 40 films.
Some summary statistics are given below. $$\sum c = 1634 \quad \sum t = 1361 \quad \sum t ^ { 2 } = 82873 \quad \sum c t = 83634 \quad \mathrm {~S} _ { c c } = 28732.1$$
  1. Find the exact value of \(\mathrm { S } _ { t t }\) and the exact value of \(\mathrm { S } _ { c t }\)
  2. Calculate the value of the product moment correlation coefficient for these data.
  3. Give an interpretation of your answer to part (b)
  4. Show that the equation of the linear regression line of \(t\) on \(c\) can be written as $$t = - 5.84 + 0.976 c$$ where the values of the intercept and gradient are given to 3 significant figures.
  5. Find the expected total ticket sales for a film with a production cost of \(\pounds 90\) million. Using the regression line in part (d)
  6. find the range of values of the production cost of a film for which the total ticket sales are less than \(80 \%\) of its production cost.
Edexcel S1 2017 June Q1
  1. A clothes shop manager records the weekly sales figures, \(\pounds s\), and the average weekly temperature, \(t ^ { \circ } \mathrm { C }\), for 6 weeks during the summer. The sales figures were coded so that \(w = \frac { s } { 1000 }\)
The data are summarised as follows $$\mathrm { S } _ { w w } = 50 \quad \sum w t = 784 \quad \sum t ^ { 2 } = 2435 \quad \sum t = 119 \quad \sum w = 42$$
  1. Find \(\mathrm { S } _ { w t }\) and \(\mathrm { S } _ { t t }\)
  2. Write down the value of \(\mathrm { S } _ { s s }\) and the value of \(\mathrm { S } _ { s t }\)
  3. Find the product moment correlation coefficient between \(s\) and \(t\). The manager of the clothes shop believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the manager's belief.
  5. Find the equation of the regression line of \(w\) on \(t\), giving your answer in the form \(w = a + b t\)
  6. Hence find the equation of the regression line of \(s\) on \(t\), giving your answer in the form \(s = c + d t\), where \(c\) and \(d\) are correct to 3 significant figures.
  7. Using your equation in part (f), interpret the effect of a \(1 ^ { \circ } \mathrm { C }\) increase in average weekly temperature on weekly sales during the summer.
Edexcel S1 2017 June Q5
  1. Tomas is studying the relationship between temperature and hours of sunshine in Seapron. He records the midday temperature, \(t ^ { \circ } \mathrm { C }\), and the hours of sunshine, \(s\) hours, for a random sample of 9 days in October. He calculated the following statistics
$$\sum s = 15 \quad \sum s ^ { 2 } = 44.22 \quad \sum t = 127 \quad \mathrm {~S} _ { t t } = 10.89$$
  1. Calculate \(\mathrm { S } _ { s s }\) Tomas calculated the product moment correlation coefficient between \(s\) and \(t\) to be 0.832 correct to 3 decimal places.
  2. State, giving a reason, whether or not this correlation coefficient supports the use of a linear regression model to describe the relationship between midday temperature and hours of sunshine.
  3. State, giving a reason, why the hours of sunshine would be the explanatory variable in a linear regression model between midday temperature and hours of sunshine.
  4. Find \(\mathrm { S } _ { s t }\)
  5. Calculate a suitable linear regression equation to model the relationship between midday temperature and hours of sunshine.
  6. Calculate the standard deviation of \(s\) Tomas uses this model to estimate the midday temperature in Seapron for a day in October with 5 hours of sunshine.
  7. State the value of Tomas' estimate. Given that the values of \(s\) are all within 2 standard deviations of the mean,
  8. comment, giving your reason, on the reliability of this estimate.