Calculate PMCC from summary statistics

Questions that provide summary statistics (such as Sxx, Syy, Sxy, sums of x, y, x², y², xy) and require calculating the product moment correlation coefficient using these given values.

9 questions

Edexcel S1 2018 January Q3
3. Martin is investigating the relationship between a person's daily caffeine consumption, \(c\) milligrams, and the amount of sleep they get, \(h\) hours, per night. He collected this information from 20 people and the results are summarised below. $$\begin{array} { c c } \sum c = 3660 \quad \sum h = 126 \quad \sum c ^ { 2 } = 973228
\sum c h = 20023.4 \quad S _ { c c } = 303448 \quad S _ { c h } = - 3034.6 \end{array}$$ Martin calculates the product moment correlation coefficient for these data and obtains - 0.833
  1. Give a reason why this value supports a linear relationship between \(c\) and \(h\) The amount of sleep per night is the response variable.
  2. Explain what you understand by the term 'response variable'. Martin says that for each additional 100 mg of caffeine consumed, the expected number of hours of sleep decreases by 1
  3. Determine, by calculation, whether or not the data support this statement.
  4. Use the data to calculate an estimate for the expected number of hours of sleep per night when no caffeine is consumed.
OCR S1 2011 January Q3
3 A firm wishes to assess whether there is a linear relationship between the annual amount spent on advertising, \(\pounds x\) thousand, and the annual profit, \(\pounds y\) thousand. A summary of the figures for 12 years is as follows. $$n = 12 \quad \Sigma x = 86.6 \quad \Sigma y = 943.8 \quad \Sigma x ^ { 2 } = 658.76 \quad \Sigma y ^ { 2 } = 83663.00 \quad \Sigma x y = 7351.12$$
  1. Calculate the product moment correlation coefficient, showing that it is greater than 0.9 .
  2. Comment briefly on this value in this context.
  3. A manager claims that this result shows that spending more money on advertising in the future will result in greater profits. Make two criticisms of this claim.
  4. Calculate the equation of the regression line of \(y\) on \(x\).
  5. Estimate the annual profit during a year when \(\pounds 7400\) was spent on advertising.
Edexcel S1 2018 June Q6
6. A group of climbers collected information about the height above sea level, \(h\) metres, and the air temperature, \(t ^ { \circ } \mathrm { C }\), at the same time at 8 different points on the same mountain. The data are summarised by $$\sum h = 6370 \quad \sum t = 61 \quad \sum t h = 31070 \quad \sum t ^ { 2 } = 693$$
  1. Show that \(\mathrm { S } _ { \text {th } } = - 17501.25\) and \(\mathrm { S } _ { \text {tt } } = 227.875\) The product moment correlation coefficient for these data is - 0.985
  2. State, giving a reason, whether or not this value supports the use of a regression equation to predict the air temperature at different heights on this mountain.
  3. Find the equation of the regression line of \(t\) on \(h\), giving your answer in the form \(t = a + b h\). Give the value of your coefficients to 3 significant figures.
  4. Give an interpretation of your value of \(a\). One of the climbers has just stopped for a short break before climbing the next 150 metres.
  5. Estimate the drop in temperature over this 150 metre climb.
Edexcel FS2 AS 2019 June Q3
  1. Two students, Jim and Dora, collected data on the mean annual rainfall, \(w \mathrm {~cm}\), and the annual yield of leeks, \(l\) tonnes per hectare, for 10 years.
Jim summarised the data as follows $$\mathrm { S } _ { w l } = 42.786 \quad \mathrm {~S} _ { w w } = 9936.9 \quad \sum l ^ { 2 } = 26.2326 \quad \sum l = 16.06$$
  1. Find the product moment correlation coefficient between \(l\) and \(w\) Dora decided to code the data first using \(s = w - 6\) and \(t = l - 20\)
  2. Write down the value of the product moment correlation coefficient between \(s\) and \(t\). Give a justification for your answer. Dora calculates the equation of the regression line of \(t\) on \(s\) to be \(t = 0.00431 s - 18.87\)
  3. Find the equation of the regression line of \(l\) on \(w\) in the form \(l = a + b w\), giving the values of \(a\) and \(b\) to 3 significant figures.
  4. Use your equation to estimate the yield of leeks when \(w\) is 100 cm .
  5. Calculate the residual sum of squares. The graph shows the residual for each value of \(l\)
    \includegraphics[max width=\textwidth, alt={}, center]{7e46e14a-0f5a-4d02-8f00-a92bc4def6d7-08_716_1594_1594_239}
    1. State whether this graph suggests that the use of a linear regression model is suitable for these data. Give a reason for your answer.
    2. Other than collecting more data, suggest how to improve the fit of the model in part (c) to the data.
Edexcel FS2 AS 2023 June Q3
  1. Pat is investigating the relationship between the height of professional tennis players and the speed of their serve. Data from 9 randomly selected professional male tennis players were collected. The variables recorded were the height of each player, \(h\) metres, and the maximum speed of their serve, \(v \mathrm {~km} / \mathrm { h }\).
Pat summarised these data as follows $$\sum h = 17.63 \quad \sum v = 2174.9 \quad \sum v ^ { 2 } = 526407.8 \quad S _ { h h } = 0.0487 \quad S _ { h v } = 5.1376$$
  1. Calculate the product moment correlation coefficient between \(h\) and \(v\)
  2. Explain whether the answer to part (a) is consistent with a linear model for these data.
  3. Find the equation of the regression line of \(v\) on \(h\) in the form \(v = a + b h\) where \(a\) and \(b\) are to be given to one decimal place. Pat calculated the sum of the residuals for the 9 tennis players as 1.04
  4. Without doing a calculation, explain how you know Pat has made a mistake. Pat made one mistake in the calculation. For the tennis player of height 1.96 m Pat misread the residual as 2.27
  5. Find the maximum speed of serve, in km/h, for the tennis player of height 1.96 m
Edexcel FS2 AS 2024 June Q5
  1. A random sample of 24 adults is taken. The height, \(h\) metres, and the arm span, \(s\) metres, for each adult are recorded.
These data are summarised below. $$\mathrm { S } _ { h h } = 0.377 \quad \mathrm {~S} _ { s h } = 0.352 \quad \bar { s } = 1.70 \quad \bar { h } = 1.68$$ The least squares regression line of \(h\) on \(s\) is $$h = a + 0.919 s$$ where \(a\) is a constant.
  1. Calculate the product moment correlation coefficient. A doctor uses the least squares regression line of \(h\) on \(s\) as a model to predict a person's height based on their arm span.
  2. Use the model to predict the height of an adult with arm span 1.79 metres. Ewan has an arm span of 1.70 metres and a height of 1.75 metres. His information is added to the sample as the 25th adult.
  3. Explain how the gradient of the regression line for the sample of 25 adults compares with the gradient of the regression line for the original sample of 24 adults.
    Give a reason for your answer.
Edexcel FS2 2019 June Q2
2 A large field of wheat is split into 8 plots of equal area. Each plot is treated with a different amount of fertiliser, \(f\) grams \(/ \mathrm { m } ^ { 2 }\). The yield of wheat, \(w\) tonnes, from each plot is recorded. The results are summarised below. $$\sum f = 28 \quad \sum w = 303 \quad \sum w ^ { 2 } = 13447 \quad \mathrm {~S} _ { f f } = 42 \quad \mathrm {~S} _ { f w } = 269.5$$
  1. Calculate the product moment correlation coefficient between \(f\) and \(w\)
  2. Interpret the value of your product moment correlation coefficient.
  3. Find the equation of the regression line of \(w\) on \(f\) in the form \(w = a + b f\)
  4. Using your equation, estimate the decrease in yield when the amount of fertiliser decreases by 0.5 grams \(/ \mathrm { m } ^ { 2 }\) The residuals of the data recorded are calculated and plotted on the graph below.
    \includegraphics[max width=\textwidth, alt={}, center]{67df73d4-6ce4-45f7-8a69-aa94292ea814-04_1232_1294_1169_301}
  5. With reference to this graph, comment on the suitability of the model you found in part (c).
  6. Suggest how you might be able to refine your model.
Edexcel FS2 2021 June Q4
  1. A researcher is investigating the relationship between elevation, \(x\) metres, and annual mean temperature, \(t ^ { \circ } \mathrm { C }\).
From a random sample of 20 weather stations in Switzerland, the following results were obtained $$\mathrm { S } _ { x x } = 8820655 \quad \mathrm {~S} _ { t t } = 444.7 \quad \sum x = 28130 \quad \sum t = 94.62$$ The product moment correlation coefficient for these data is found to be - 0.959
  1. Interpret the value of this correlation coefficient.
  2. Show that the equation of the regression line of \(t\) on \(x\) can be written as $$t = 14.3 - 0.00681 x$$ The random variable \(W\) represents the elevations of the weather stations in kilometres.
  3. Write down the equation of the regression line of \(t\) on \(w\) for these 20 weather stations in the form \(t = a + b w\)
  4. Show that the residual sum of squares (RSS) for the model for \(t\) and \(x\) is 35.7 correct to one decimal place. One of the weather stations in the sample had a recorded elevation of 1100 metres and an annual mean temperature of \(1.4 ^ { \circ } \mathrm { C }\)
    1. Calculate this weather station's contribution to the residual sum of squares. Give your answer as a percentage
    2. Comment on the data for this weather station in light of your answer to part (e)(i).
Edexcel FS2 2023 June Q1
  1. Baako is investigating the times taken by children to run a 100 m race, \(x\) seconds, and a 500 m race, \(y\) seconds. For a sample of 20 children, Baako obtains the time taken by each child to run each race.
Here are Baako's summary statistics. $$\begin{gathered} \mathrm { S } _ { x x } = 314.55 \quad \mathrm {~S} _ { y y } = 9026 \quad \mathrm {~S} _ { x y } = 1610
\bar { x } = 19.65 \quad \bar { y } = 108 \end{gathered}$$
  1. Calculate the product moment correlation coefficient between the times taken to run the 100 m race and the times taken to run the 500 m race.
  2. Show that the equation of the regression line of \(y\) on \(x\) can be written as $$y = 5.12 x + 7.42$$ where the gradient and \(y\) intercept are given to 3 significant figures. The child who completed the 100 m race in 20 seconds took 104 seconds to complete the 500 m race.
  3. Find the residual for this child. The table below shows the signs of the residuals for the 20 children in order of finishing time for the 100 m race.
    Sign of residual++++--+--------+++++
  4. Explain what the signs of the residuals show about the model's predictions of the 500 m race times for the children who are fastest and slowest over the 100 m race.