Calculate r from summary statistics

Questions that provide pre-calculated summary statistics (such as Σx, Σy, Σx², Σy², Σxy, or Sxx, Syy, Sxy) and ask to calculate r using these given values.

27 questions

OCR S1 2005 June Q4
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54 . State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate.
Edexcel S1 2019 January Q6
  1. Following some school examinations, Chetna is studying the results of the 16 students in her class. The mark for paper \(1 , x\), and the mark for paper \(2 , y\), for each student are summarised in the following statistics.
$$\bar { x } = 35.75 \quad \bar { y } = 25.75 \quad \sigma _ { x } = 7.79 \quad \sigma _ { y } = 11.91 \quad \sum x y = 15837$$
  1. Comment on the differences between the marks of the students on paper 1 and paper 2 Chetna decides to examine these data in more detail and plots the marks for each of the 16 students on the scatter diagram opposite.
    1. Explain why the circled point \(( 38,0 )\) is possibly an outlier.
    2. Suggest a possible reason for this result. Chetna decides to omit the data point \(( 38,0 )\) and examine the other 15 students' marks.
  2. Find the value of \(\bar { x }\) and the value of \(\bar { y }\) for these 15 students. For these 15 students
    1. explain why \(\sum x y\) is still 15837
    2. show that \(\mathrm { S } _ { x y } = 1169.8\) For these 15 students, Chetna calculates \(\mathrm { S } _ { x x } = 965.6\) and \(\mathrm { S } _ { y y } = 1561.7\) correct to 1 decimal place.
  3. Calculate the product moment correlation coefficient for these 15 students.
  4. Calculate the equation of the line of regression of \(y\) on \(x\) for these 15 students, giving your answer in the form \(y = a + b x\) The product moment correlation coefficient between \(x\) and \(y\) for all 16 students is 0.746
  5. Explain how your calculation in part (e) supports Chetna's decision to omit the point \(( 38,0 )\) before calculating the equation of the linear regression line.
    (1)
  6. Estimate the mark in the second paper for a student who scored 38 marks in the first paper.
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-17_1127_1146_301_406}
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-20_2630_1828_121_121}
OCR S1 2010 January Q3
3 The heights, \(h \mathrm {~m}\), and weights, \(m \mathrm {~kg}\), of five men were measured. The results are plotted on the diagram.
\includegraphics[max width=\textwidth, alt={}, center]{5c25d6cf-2c23-4b49-88fb-e4abe6c281e4-3_738_956_386_593} The results are summarised as follows. $$n = 5 \quad \Sigma h = 9.02 \quad \Sigma m = 377.7 \quad \Sigma h ^ { 2 } = 16.382 \quad \Sigma m ^ { 2 } = 28558.67 \quad \Sigma h m = 681.612$$
  1. Use the summarised data to calculate the value of the product moment correlation coefficient, \(r\).
  2. Comment on your value of \(r\) in relation to the diagram.
  3. It was decided to re-calculate the value of \(r\) after converting the heights to feet and the masses to pounds. State what effect, if any, this will have on the value of \(r\).
  4. One of the men had height 1.63 m and mass 78.4 kg . The data for this man were removed and the value of \(r\) was re-calculated using the original data for the remaining four men. State in general terms what effect, if any, this will have on the value of \(r\).
OCR S1 2011 June Q1
1 Five salesmen from a certain firm were selected at random for a survey. For each salesman, the annual income, \(x\) thousand pounds, and the distance driven last year, \(y\) thousand miles, were recorded. The results were summarised as follows. $$n = 5 \quad \Sigma x = 251 \quad \Sigma x ^ { 2 } = 14323 \quad \Sigma y = 65 \quad \Sigma y ^ { 2 } = 855 \quad \Sigma x y = 3247$$
  1. (a) Show that the product moment correlation coefficient, \(r\), between \(x\) and \(y\) is - 0.122 , correct to 3 significant figures.
    (b) State what this value of \(r\) shows about the relationship between annual income and distance driven last year for these five salesmen.
    (c) It was decided to recalculate \(r\) with the distances measured in kilometres instead of miles. State what effect, if any, this would have on the value of \(r\).
  2. Another salesman from the firm is selected at random. His annual income is known to be \(\pounds 52000\), but the distance that he drove last year is unknown. In order to estimate this distance, a regression line based on the above data is used. Comment on the reliability of such an estimate.
OCR S1 2015 June Q1
1 For the top 6 clubs in the 2010/11 season of the English Premier League, the table shows the annual salary, \(\pounds x\) million, of the highest paid player and the number of points scored, \(y\).
ClubManchester UnitedManchester CityChelseaArsenalTottenhamLiverpool
\(x\)5.67.46.54.13.66.5
\(y\)807171686258
$$n = 6 \quad \sum x = 33.7 \quad \sum x ^ { 2 } = 200.39 \quad \sum y = 410 \quad \sum y ^ { 2 } = 28314 \quad \sum x y = 2313.9$$
  1. Use a suitable formula to calculate the product moment correlation coefficient, \(r\), between \(x\) and \(y\), showing that \(0 < r < 0.2\).
  2. State what this value of \(r\) shows in this context.
  3. A fan suggests that the data should be used to draw a regression line in order to estimate the number of points that would be scored by another Premier League club, whose highest paid player's salary is \(\pounds 1.7\) million. Give two reasons why such an estimate would be unlikely to be reliable.
OCR Further Statistics AS 2024 June Q3
3 The ages, \(x\) years, and the reaction time, \(t\) seconds, in an experiment carried out on a sample of 15 volunteers are summarised as follows.
\(n = 15 \quad \sum x = 762 \quad \sum t = 8.7 \quad \sum x ^ { 2 } = 44204 \quad \sum t ^ { 2 } = 5.65 \quad \sum x t = 490.1\)
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(t\).
  2. Calculate the equation of the line of regression of \(t\) on \(x\). Give your answer in the form \(\mathrm { t } = \mathrm { a } + \mathrm { bx }\) where \(a\) and \(b\) are constants to be determined.
  3. Explain the relevance of the quantity \(\sum ( t - a - b x ) ^ { 2 }\) to your answer to part (b).
  4. Estimate the reaction time, in seconds, for a volunteer aged 42. It is subsequently decided to measure the reaction time in tenths of a second rather than in seconds (so, for example, a time of 0.6 seconds would now be recorded as 6 ).
    1. State what effect, if any, this change would have on your answer to part (a).
    2. State what effect, if any, this change would have on your answer to part (b). It is known that the sample of 15 volunteers consisted almost entirely of students and retired people.
  5. Using this information, and the value of the product moment correlation coefficient, comment on the reliability of your estimate in part (d).
Edexcel S1 Specimen Q1
  1. Gary compared the total attendance, \(x\), at home matches and the total number of goals, \(y\), scored at home during a season for each of 12 football teams playing in a league. He correctly calculated:
$$S _ { x x } = 1022500 \quad S _ { y y } = 130.9 \quad S _ { x y } = 8825$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Interpret the value of the correlation coefficient. Helen was given the same data to analyse. In view of the large numbers involved she decided to divide the attendance figures by 100 . She then calculated the product moment correlation coefficient between \(\frac { x } { 100 }\) and \(y\).
  3. Write down the value Helen should have obtained.
Edexcel S1 2007 June Q1
  1. A young family were looking for a new 3 bedroom semi-detached house. A local survey recorded the price \(x\), in \(\pounds 1000\), and the distance \(y\), in miles, from the station of such houses. The following summary statistics were provided
$$S _ { x x } = 113573 , \quad S _ { y y } = 8.657 , \quad S _ { x y } = - 808.917$$
  1. Use these values to calculate the product moment correlation coefficient.
  2. Give an interpretation of your answer to part (a). Another family asked for the distances to be measured in km rather than miles.
  3. State the value of the product moment correlation coefficient in this case.
Edexcel S1 2012 June Q2
2. A bank reviews its customer records at the end of each month to find out how many customers have become unemployed, \(u\), and how many have had their house repossessed, \(h\), during that month. The bank codes the data using variables \(x = \frac { u - 100 } { 3 }\) and \(y = \frac { h - 20 } { 7 }\) The results for the 12 months of 2009 are summarised below. $$\sum x = 477 \quad S _ { x x } = 5606.25 \quad \sum y = 480 \quad S _ { y y } = 4244 \quad \sum x y = 23070$$
  1. Calculate the value of the product moment correlation coefficient for \(x\) and \(y\).
  2. Write down the product moment correlation coefficient for \(u\) and \(h\). The bank claims that an increase in unemployment among its customers is associated with an increase in house repossessions.
  3. State, with a reason, whether or not the bank's claim is supported by these data.
Edexcel S1 2016 June Q3
3. Before going on holiday to Seapron, Tania records the weekly rainfall ( \(x \mathrm {~mm}\) ) at Seapron for 8 weeks during the summer. Her results are summarised as $$\sum x = 86.8 \quad \sum x ^ { 2 } = 985.88$$
  1. Find the standard deviation, \(\sigma _ { x }\), for these data.
    (3) Tania also records the number of hours of sunshine ( \(y\) hours) per week at Seapron for these 8 weeks and obtains the following $$\bar { y } = 58 \quad \sigma _ { y } = 9.461 \text { (correct to } 4 \text { significant figures) } \quad \sum x y = 4900.5$$
  2. Show that \(\mathrm { S } _ { y y } = 716\) (correct to 3 significant figures)
  3. Find \(\mathrm { S } _ { x y }\)
  4. Calculate the product moment correlation coefficient, \(r\), for these data. During Tania's week-long holiday at Seapron there are 14 mm of rain and 70 hours of sunshine.
  5. State, giving a reason, what the effect of adding this information to the above data would be on the value of the product moment correlation coefficient.
AQA S1 2008 January Q2
2 The head and body length, \(x\) millimetres, and tail length, \(y\) millimetres, of each of a sample of 20 adult dormice were measured. The following statistics are derived from the results. $$S _ { x x } = 1280.55 \quad S _ { y y } = 281.8 \quad S _ { x y } = 416.3$$
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
  3. Write down the value of the product moment correlation coefficient if the measurements had been recorded in centimetres.
  4. Give a reason why it is not generally advisable to calculate the value of the product moment correlation coefficient without first viewing a scatter diagram of the data. Illustrate your answer with a sketch.
AQA S1 2012 June Q1
1 A production line in a rolling mill produces lengths of steel.
A random sample of 20 lengths of steel from the production line was selected. The minimum width, \(x\) centimetres, and the minimum thickness, \(y\) millimetres, of each selected length was recorded. The following summarised information was then calculated from these records. $$S _ { x x } = 2.030 \quad S _ { y y } = 1.498 \quad S _ { x y } = - 0.410$$
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of the question.
AQA S1 2014 June Q4
4 Every year, usually during early June, the Isle of Man hosts motorbike races. Each race consists of three consecutive laps of the island's course. To compete in a race, a rider must first complete at least one qualifying lap. The data refer to the lightweight motorbike class in 2012 and show, for each of a random sample of 10 riders, values of $$u = x - 100 \quad \text { and } \quad v = y - 100$$ where
\(x\) denotes the average speed, in mph, for the rider's fastest qualifying lap and
\(y\) denotes the average speed, in mph, for the rider's three laps of the race.
\cline { 2 - 11 } \multicolumn{1}{c|}{}Rider
\cline { 2 - 11 } \multicolumn{1}{c|}{}\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)
\(\boldsymbol { u }\)7.8813.024.292.886.267.033.6011.7813.1511.69
\(\boldsymbol { v }\)6.6310.163.630.475.708.013.307.3113.0811.82
    1. Calculate the value of \(r _ { u v }\), the product moment correlation coefficient between \(u\) and \(v\).
    2. Hence state the value of \(r _ { x y }\), giving a reason for your answer.
  1. Interpret your value of \(r _ { x y }\) in the context of this question.
Edexcel S1 Q5
5. The following marks out of 50 were given by two judges to the contestants in a talent contest:
Contestant\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Judge 1 \(( x )\)4332402147112938
Judge 2 \(( y )\)3925402236132732
Given that \(\sum x = 261 , \sum x ^ { 2 } = 9529\) and \(\sum x y = 8373\),
  1. calculate the product-moment correlation coefficient between the two judges' marks. \section*{STATISTICS 1 (A)TEST PAPER 6 Page 2} 5 continued...
  2. Find an equation of the regression line of \(x\) on \(y\). Contestant \(I\) was awarded 45 marks by Judge 2 .
  3. Estimate the mark that this contestant would have received from Judge 1.
  4. Comment, with explanation, on the probable accuracy of your answer.
Edexcel S1 Q2
2. A tennis coach believes that taller players are generally capable of hitting faster serves. To investigate this hypothesis he collects data on the 20 adult male players he coaches. The height, \(h\), in metres and the speed of each player's fastest serve, \(v\), in miles per hour were recorded and summarised as follows: $$\Sigma h = 36.22 , \quad \Sigma v = 2275 , \quad \Sigma h ^ { 2 } = 65.7396 , \quad \Sigma v ^ { 2 } = 259853 , \quad \Sigma h v = 4128.03 .$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Comment on the coach's hypothesis.
Edexcel S1 Q2
2. A supermarket manager believes that those of her staff on lower rates of pay tend to work more hours of overtime.
  1. Suggest why this might be the case. To investigate her theory the manager recorded the number of hours of overtime, \(h\), worked by each of the store's 18 full-time staff during one week. She also recorded each employee's hourly rate of pay, \(\pounds p\), and summarised her results as follows: $$\Sigma p = 86 , \quad \Sigma h = 104.5 , \quad \Sigma p ^ { 2 } = 420.58 , \quad \Sigma h ^ { 2 } = 830.25 , \quad \Sigma p h = 487.3$$
  2. Calculate the product moment correlation coefficient for these data.
  3. Comment on the manager's hypothesis.
Edexcel S1 Q2
2. A statistics student gave a questionnaire to a random sample of 50 pupils at his school. The sample included pupils aged from 11 to 18 years old. The student summarised the data on age in completed years, \(A\), and the number of hours spent doing homework in the previous week, \(H\), giving the following: $$\Sigma A = 703 , \quad \Sigma H = 217 , \quad \Sigma A ^ { 2 } = 10131 , \quad \Sigma H ^ { 2 } = 1338.5 , \quad \Sigma A H = 3253.5$$
  1. Calculate the product moment correlation coefficient for these data and explain what is shown by your result.
    (6 marks)
    The student also asked each pupil how many hours of paid work they had done in the previous week. He then calculated the product moment correlation coefficient for the data on hours doing homework and hours doing paid work, giving a value of \(r = 0.5213\) The student concluded that paid work did not interfere with homework as pupils doing more paid work also tended to do more homework.
  2. Explain why this conclusion may not be valid.
  3. Explain briefly how the student could more effectively investigate the effect of paid work on homework.
    (2 marks)
Edexcel S1 Q1
  1. A shop recorded the number of pairs of gloves, \(n\), that it sold and the average daytime temperature, \(T ^ { \circ } \mathrm { C }\), for each month over a 12-month period.
The data was then summarised as follows: $$\Sigma T = 124 , \quad \Sigma n = 384 , \quad \Sigma T ^ { 2 } = 1802 , \quad \Sigma n ^ { 2 } = 18518 , \quad \Sigma T n = 2583 .$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Comment on what your value shows and suggest a reason for this.
OCR MEI Further Statistics Minor 2019 June Q6
6 The discrete random variable \(X\) has a uniform distribution over \(\{ n , n + 1 , \ldots , 2 n \}\).
  1. Given that \(n\) is odd, find \(\mathrm { P } \left( X < \frac { 3 } { 2 } n \right)\).
  2. Given instead that \(n\) is even, find \(\mathrm { P } \left( X < \frac { 3 } { 2 } n \right)\), giving your answer as a single algebraic fraction.
  3. The sum of 6 independent values of \(X\) is denoted by \(Y\). Find \(\operatorname { Var } ( Y )\).
Edexcel FS2 AS 2018 June Q1
  1. The scores achieved on a maths test, \(m\), and the scores achieved on a physics test, \(p\), by 16 students are summarised below.
$$\sum m = 392 \quad \sum p = 254 \quad \sum p ^ { 2 } = 4748 \quad \mathrm {~S} _ { m m } = 1846 \quad \mathrm {~S} _ { m p } = 1115$$
  1. Find the product moment correlation coefficient between \(m\) and \(p\)
  2. Find the equation of the linear regression line of \(p\) on \(m\) Figure 1 shows a plot of the residuals. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{0fcb4d83-9763-4edd-8006-93f75a44c596-02_808_1222_997_429} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure}
  3. Calculate the residual sum of squares (RSS). For the person who scored 30 marks on the maths test,
  4. find the score on the physics test. The data for the person who scored 20 on the maths test is removed from the data set.
  5. Suggest a reason why. The product moment correlation coefficient between \(m\) and \(p\) is now recalculated for the remaining 15 students.
  6. Without carrying out any further calculations, suggest how you would expect this recalculated value to compare with your answer to part (a).
    Give a reason for your answer.
    V349 SIHI NI IMIMM ION OCVJYV SIHIL NI LIIIM ION OOVJYV SIHIL NI JIIYM ION OC
Edexcel S1 Q2
  1. Plot a scatter diagram showing these data. The student wanted to investigate further whether or not her data provided evidence of an increase in temperature in June each year. Using \(Y\) for the number of years since 1993 and \(T\) for the mean temperature, she calculated the following summary statistics. $$\Sigma Y = 28 , \quad \Sigma T = 182.5 , \quad \Sigma Y ^ { 2 } = 140 , \quad \Sigma T ^ { 2 } = 4173.93 , \quad \Sigma Y T = 644.7 .$$
  2. Calculate the product moment correlation coefficient for these data.
  3. Comment on your result in relation to the student's enquiry.
SPS SPS FM Statistics 2021 January Q3
3. A large field of wheat is split into 8 plots of equal area. Each plot is treated with a different amount of fertiliser, \(f\) grams \(/ \mathrm { m } ^ { 2 }\). The yield of wheat, \(w\) tonnes, from each plot is recorded. The results are summarised below. $$\sum f = 28 \quad \sum w = 303 \quad \sum w ^ { 2 } = 13447 \quad \mathrm {~S} _ { f f } = 42 \quad \mathrm {~S} _ { f w } = 269.5$$
  1. Calculate the product moment correlation coefficient between \(f\) and \(w\)
  2. Interpret the value of your product moment correlation coefficient.
  3. Find the equation of the regression line of \(w\) on \(f\) in the form \(w = a + b f\)
  4. Using your equation, estimate the decrease in yield when the amount of fertiliser decreases by 0.5 grams \(/ \mathrm { m } ^ { 2 }\)
SPS SPS FM Statistics 2023 January Q3
3. A large field of wheat is split into 8 plots of equal area. Each plot is treated with a different amount of fertiliser, \(f\) grams \(/ \mathrm { m } ^ { 2 }\). The yield of wheat, \(w\) tonnes, from each plot is recorded. The results are summarised below. $$\sum f = 28 \quad \sum w = 303 \quad \sum w ^ { 2 } = 13447 \quad \mathrm {~S} _ { f f } = 42 \quad \mathrm {~S} _ { f w } = 269.5$$
  1. Calculate the product moment correlation coefficient between \(f\) and \(w\)
  2. Interpret the value of your product moment correlation coefficient.
  3. Find the equation of the regression line of \(w\) on \(f\) in the form \(w = a + b f\)
SPS SPS FM Statistics 2026 January Q3
4 marks
3. A student is investigating the relationship between different electricity generation methods and cost of electricity in a particular country. The student first checks whether there is any correlation between the cost per unit of electricity, \(x\) euros, and the amount of electricity being generated by wind, \(y \mathrm { GW }\). The data from 30 observations are summarised as follows.
\(n = 30 \quad \sum x = 2.219 \quad \sum y = 357.7 \quad \sum x ^ { 2 } = 0.2368 \quad \sum y ^ { 2 } = 4648 \quad \sum x y = 25.01\)
  1. In this question you must show detailed reasoning. Determine the product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) level to investigate whether there is any correlation between the cost per unit of electricity and the amount of electricity generated by wind.
    [0pt] [4]
    [0pt] [BLANK PAGE]
OCR Further Statistics 2018 September Q7
7 The table shows the values of 5 observations of bivariate data \(( x , y )\).
\(x\)4.65.96.57.88.3
\(y\)15.610.810.410.19.7
$$n = 5 , \Sigma x = 33.1 , \Sigma y = 56.6 , \Sigma x ^ { 2 } = 227.95 , \Sigma y ^ { 2 } = 664.26 , \Sigma x y = 362.37$$
  1. Calculate Pearson's product-moment correlation coefficient \(r\) for the data.
  2. State what this value of \(r\) tells you about a scatter diagram illustrating the data.
  3. Test at the \(5 \%\) significance level whether there is association between \(x\) and \(y\).
  4. State the value of Spearman's rank correlation coefficient \(r _ { s }\) for the data.
  5. State whether \(r , r _ { s }\), or both or neither is changed when the values of \(x\) are replaced by
    (a) \(3 x - 2\),
    (b) \(\sqrt { x }\).