5.08a Pearson correlation: calculate pmcc

246 questions

Sort by: Default | Easiest first | Hardest first
Pre-U Pre-U 9794/3 2013 June Q3
12 marks Moderate -0.8
3 At a local athletics club, data on the ages of the members and their times to run a 10 km course are recorded. For a random sample of 25 club members aged between 20 and 60, their ages ( \(x\) years) and times ( \(y\) minutes) are summarised as follows. $$n = 25 \quad \Sigma x = 1002 \quad \Sigma x ^ { 2 } = 43508 \quad \Sigma y = 1865 \quad \Sigma y ^ { 2 } = 142749 \quad \Sigma x y = 77532$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Show that the equation of the least squares regression line of \(y\) on \(x\) is \(y = 0.83 x + 41.28\), where the coefficients are given correct to 2 decimal places.
  3. Use the equation given in part (ii) to estimate the time taken by someone who is
    1. 50 years old,
    2. 65 years old. Comment on the validity of each of these estimates.
Pre-U Pre-U 9794/3 2018 June Q2
9 marks Moderate -0.3
2 A teacher is monitoring the progress of students. The length of time, \(x\) hours, spent revising in a given week is compared to the score, \(y\), achieved in an assessment at the end of the week. The scatter diagram for a random sample of 8 students is shown below. \includegraphics[max width=\textwidth, alt={}, center]{35d24778-1203-4d5d-be4b-bb375344fe09-2_866_967_715_589} The data are summarised as \(\Sigma x = 24.6 , \Sigma y = 404 , \Sigma x ^ { 2 } = 105.56 , \Sigma y ^ { 2 } = 20820\) and \(\Sigma x y = 1350.2\).
  1. Find the equation of the least squares regression line of \(y\) on \(x\).
  2. Calculate the product moment correlation coefficient for the data.
  3. A ninth student, Jane, revises for 1.5 hours.
    1. Estimate her score in the assessment.
    2. Comment on the reliability of this estimate.
CAIE FP2 2010 June Q9
9 marks Moderate -0.3
A set of \(20\) pairs of bivariate data \((x, y)\) is summarised by $$\Sigma x = 200, \quad \Sigma x^2 = 2125, \quad \Sigma y = 240, \quad \Sigma y^2 = 8245.$$ The product moment correlation coefficient is \(-0.992\).
  1. What does the value of the product moment correlation coefficient indicate about a scatter diagram of the data points? [1]
  2. Find the equation of the regression line of \(y\) on \(x\). [6]
  3. The equation of the regression line of \(x\) on \(y\) is \(x = a' + b'y\). Find the value of \(b'\). [2]
CAIE FP2 2017 June Q10
11 marks Standard +0.3
A random sample of 5 pairs of values \((x, y)\) is given in the following table.
\(x\)12458
\(y\)75864
  1. Find, showing all necessary working, the equation of the regression line of \(y\) on \(x\). [4]
  2. Find, showing all necessary working, the value of the product moment correlation coefficient for this sample. [3]
  3. Test, at the 10% significance level, whether there is evidence of non-zero correlation between the variables. [4]
CAIE FP2 2017 June Q10
11 marks Standard +0.3
A random sample of 5 pairs of values \((x, y)\) is given in the following table.
\(x\)12458
\(y\)75864
  1. Find, showing all necessary working, the equation of the regression line of \(y\) on \(x\). [4]
  2. Find, showing all necessary working, the value of the product moment correlation coefficient for this sample. [3]
  3. Test, at the 10% significance level, whether there is evidence of non-zero correlation between the variables. [4]
CAIE FP2 2017 June Q7
6 marks Standard +0.3
A random sample of twelve pairs of values of \(x\) and \(y\) is taken from a bivariate distribution. The equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are respectively $$y = 0.46x + 1.62 \quad \text{and} \quad x = 0.93y + 8.24.$$
  1. Find the value of the product moment correlation coefficient for this sample. [2]
  2. Using a \(5\%\) significance level, test whether there is non-zero correlation between the variables. [4]
CAIE FP2 2019 June Q10
11 marks Standard +0.3
The values from a random sample of five pairs \((x, y)\) taken from a bivariate distribution are shown below.
\(x\)34468
\(y\)57\(q\)67
The equation of the regression line of \(x\) on \(y\) is given by \(x = \frac{5}{4}y + c\).
  1. Given that \(q\) is an integer, find its value. [5]
  2. Find the value of \(c\). [3]
  3. Find the value of the product moment correlation coefficient. [3]
CAIE FP2 2010 November Q10
13 marks Standard +0.3
For each month of a certain year, a weather station recorded the average rainfall per day, \(x\) mm, and the average amount of sunshine per day, \(y\) hours. The results are summarised below. \(n = 12\), \(\Sigma x = 24.29\), \(\Sigma x^2 = 50.146\), \(\Sigma y = 45.8\), \(\Sigma y^2 = 211.16\), \(\Sigma xy = 88.415\).
  1. Find the mean values, \(\bar{x}\) and \(\bar{y}\). [1]
  2. Calculate the gradient of the line of regression of \(y\) on \(x\). [2]
  3. Use the answers to parts (i) and (ii) to obtain the equation of the line of regression of \(y\) on \(x\). [2]
  4. Find the product moment correlation coefficient and comment, in context, on its value. [4]
  5. Stating your hypotheses, test at the 1% level of significance whether there is negative correlation between average rainfall per day and average amount of sunshine per day. [4]
CAIE FP2 2014 November Q9
11 marks Standard +0.3
A random sample of 10 pairs of values of \(x\) and \(y\) is given in the following table.
\(x\)466827121495
\(y\)24686109865
  1. Find the equation of the regression line of \(y\) on \(x\). [4]
  2. Find the product moment correlation coefficient for the sample. [2]
  3. Find the estimated value of \(y\) when \(x = 10\), and comment on the reliability of this estimate. [2]
  4. Another sample of \(N\) pairs of data from the same population has the same product moment correlation coefficient as the first sample given. A test, at the 1% significance level, on this second sample indicates that there is sufficient evidence to conclude that there is positive correlation. Find the set of possible values of \(N\). [3]
CAIE FP2 2018 November Q10
12 marks Standard +0.8
For a random sample of 10 observations of pairs of values \((x, y)\), the equation of the regression line of \(y\) on \(x\) is \(y = 1.1664 + 0.4604x\). It is given that $$\Sigma x^2 = 1419.98 \quad \text{and} \quad \Sigma y^2 = 439.68.$$ The mean value of \(y\) is 6.24.
  1. Find the equation of the regression line of \(x\) on \(y\). [6]
  2. Find the product moment correlation coefficient. [2]
  3. Test at the 5\% significance level whether there is evidence of positive correlation between the two variables. [4]
CAIE FP2 2019 November Q9
10 marks Standard +0.8
A random sample of five pairs of values of \(x\) and \(y\) is taken from a bivariate distribution. The values are shown in the following table, where \(p\) and \(q\) are constants.
\(x\)12345
\(y\)4\(p\)\(q\)21
The equation of the regression line of \(y\) on \(x\) is \(y = -0.5x + 3.5\).
  1. Find the values of \(p\) and \(q\). [7]
  2. Find the value of the product moment correlation coefficient. [3]
Edexcel S1 2023 June Q2
13 marks Moderate -0.3
Two students, Olive and Shan, collect data on the weight, \(w\) grams, and the tail length, \(t\) cm, of 15 mice. Olive summarised the data as follows \(S_tt = 5.3173\) \quad \(\sum w^2 = 6089.12\) \quad \(\sum tw = 2304.53\) \quad \(\sum w = 297.8\) \quad \(\sum t = 114.8\)
  1. Calculate the value of \(S_{ww}\) and the value of \(S_{tw}\) [3]
  2. Calculate the value of the product moment correlation coefficient between \(w\) and \(t\) [2]
  3. Show that the equation of the regression line of \(w\) on \(t\) can be written as $$w = -16.7 + 4.77t$$ [3]
  4. Give an interpretation of the gradient of the regression line. [1]
  5. Explain why it would not be appropriate to use the regression line in part (c) to estimate the weight of a mouse with a tail length of 2cm. [2]
Shan decided to code the data using \(x = t - 6\) and \(y = \frac{w}{2} - 5\)
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\) [1]
  2. Write down an equation of the regression line of \(y\) on \(x\) You do not need to simplify your equation. [1]
Edexcel S1 2002 January Q7
19 marks Moderate -0.3
A number of people were asked to guess the calorific content of 10 foods. The mean \(s\) of the guesses for each food and the true calorific content \(t\) are given in the table below.
Food\(t\)\(s\)
Packet of biscuits170420
1 potato90160
1 apple80110
Crisp breads1070
Chocolate bar260360
1 slice white bread75135
1 slice brown bread60115
Portion of beef curry270350
Portion of rice pudding165390
Half a pint of milk160200
[You may assume that \(\Sigma t = 1340\), \(\Sigma s = 2310\), \(\Sigma ts = 396775\), \(\Sigma t^2 = 246050\), \(\Sigma s^2 = 694650\).]
  1. Draw a scatter diagram, indicating clearly which is the explanatory (independent) and which is the response (dependent) variable. [3]
  2. Calculate, to 3 significant figures, the product moment correlation coefficient for the above data. [7]
  3. State, with a reason, whether or not the value of the product moment correlation coefficient changes if all the guesses are 50 calories higher than the values in the table. [2]
The mean of the guesses for the portion of rice pudding and for the packet of biscuits are outside the linear relation of the other eight foods.
  1. Find the equation of the regression line of \(s\) on \(t\) excluding the values for rice pudding and biscuits. [3]
[You may now assume that \(S_{tt} = 72587\), \(S_{st} = 63671.875\), \(\bar{t} = 125.625\), \(\bar{s} = 187.5\).]
  1. Draw the regression line on your scatter diagram. [2]
  2. State, with a reason, what the effect would be on the regression line of including the values for a portion of rice pudding and a packet of biscuits. [2]
Edexcel S1 2010 January Q6
18 marks Moderate -0.8
The blood pressures, \(p\) mmHg, and the ages, \(t\) years, of 7 hospital patients are shown in the table below.
PatientABCDEFG
\(t\)42744835562660
\(p\)981301208818280135
[\(\sum t = 341\), \(\sum p = 833\), \(\sum t^2 = 18181\), \(\sum p^2 = 106397\), \(\sum tp = 42948\)]
  1. Find \(S_{tt}\), \(S_{pp}\) and \(S_t\) for these data. [4]
  2. Calculate the product moment correlation coefficient for these data. [3]
  3. Interpret the correlation coefficient. [1]
  4. On the graph paper on page 17, draw the scatter diagram of blood pressure against age for these 7 patients. [2]
  5. Find the equation of the regression line of \(p\) on \(t\). [4]
  6. Plot your regression line on your scatter diagram. [2]
  7. Use your regression line to estimate the blood pressure of a 40 year old patient. [2]
Edexcel S1 2011 June Q1
7 marks Moderate -0.8
On a particular day the height above sea level, \(x\) metres, and the mid-day temperature, \(y\)°C, were recorded in 8 north European towns. These data are summarised below \(S_{xx} = 3\,535\,237.5 \quad \sum y = 181 \quad \sum y^2 = 4305 \quad S_{yy} = -23\,726.25\)
  1. Find \(S_{yy}\). [2]
  2. Calculate, to 3 significant figures, the product moment correlation coefficient for these data. [2]
  3. Give an interpretation of your coefficient. [1]
A student thought that the calculations would be simpler if the height above sea level, \(h\), was measured in kilometres and used the variable \(h = \frac{x}{1000}\) instead of \(x\).
  1. Write down the value of \(S_{hh}\) [1]
  2. Write down the value of the correlation coefficient between \(h\) and \(y\). [1]
Edexcel S1 2002 November Q5
12 marks Standard +0.3
An agricultural researcher collected data, in appropriate units, on the annual rainfall \(x\) and the annual yield of wheat \(y\) at 8 randomly selected places. The data were coded using \(s = x - 6\) and \(t = y - 20\) and the following summations were obtained. \(\Sigma s = 48.5\), \(\Sigma t = 65.0\), \(\Sigma s^2 = 402.11\), \(\Sigma t^2 = 701.80\), \(\Sigma st = 523.23\)
  1. Find the equation of the regression line of \(t\) on \(s\) in the form \(t = p + qs\). [7]
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + bx\), giving \(a\) and \(b\) to 3 decimal places. [3]
The value of the product moment correlation coefficient between \(s\) and \(t\) is 0.943, to 3 decimal places.
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\). Give a justification for your answer. [2]
Edexcel S3 2005 June Q4
13 marks Standard +0.3
Over a period of time, researchers took 10 blood samples from one patient with a blood disease. For each sample, they measured the levels of serum magnesium, \(s\) mg/dl, in the blood and the corresponding level of the disease protein, \(d\) mg/dl. The results are shown in the table.
\(s\)1.21.93.23.92.54.55.74.01.15.9
\(d\)3.87.011.012.09.012.013.512.22.013.9
[Use \(\sum s^2 = 141.51\), \(\sum d^2 = 1081.74\) and \(\sum sd = 386.32\)]
  1. Draw a scatter diagram to represent these data. [3]
  2. State what is measured by the product moment correlation coefficient. [1]
  3. Calculate \(S_{ss}\), \(S_{dd}\) and \(S_{sd}\). [3]
  4. Calculate the value of the product moment correlation coefficient \(r\) between \(s\) and \(d\). [2]
  5. Stating your hypotheses clearly, test, at the 1\% significance level, whether or not the correlation coefficient is greater than zero. [3]
  6. With reference to your scatter diagram, comment on your result in part (e). [1]
(Total 13 marks)
Edexcel S3 Q7
16 marks Standard +0.3
For one of the activities at a gymnastics competition, 8 gymnasts were awarded marks out of 10 for each of artistic performance and technical ability. The results were as follows.
Gymnast\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Technical ability8.58.69.57.56.89.19.49.2
Artistic performance6.27.58.26.76.07.28.09.1
The value of the product moment correlation coefficient for these data is 0.774.
  1. Stating your hypotheses clearly and using a 1% level of significance, interpret this value. [5]
  2. Calculate the value of the rank correlation coefficient for these data. [6]
  3. Stating your hypotheses clearly and using a 1% level of significance, interpret this coefficient. [3]
  4. Explain why the rank correlation coefficient might be the better one to use with these data. [2]
Edexcel S1 Q3
13 marks Moderate -0.3
The marks obtained by ten students in a Geography test and a History test were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Geography (\(x\))34574921845310776185
History (\(y\))404955407139476573
  1. Given that \(\sum y = 547\), calculate the mark obtained by student \(E\) in History. [1 mark] Given further that \(\sum x^2 = 34087\), \(\sum y^2 = 31575\) and \(\sum xy = 31342\), calculate
  2. the product moment correlation coefficient between \(x\) and \(y\), [4 marks]
  3. an equation of the regression line of \(y\) on \(x\), [4 marks]
  4. an estimate of the History mark of student \(K\), who scored 70 in Geography. [2 marks]
  5. State, with a reason, whether you would expect your answer to part (d) to be reliable. [2 marks]
Edexcel S1 Q3
13 marks Moderate -0.3
Twenty pairs of observations are made of two variables \(x\) and \(y\), which are believed to be related. It is found that $$\sum x = 200, \quad \sum y = 174, \quad \sum x^2 = 6201, \quad \sum y^2 = 5102, \quad \sum xy = 5200.$$ Find
  1. the product-moment correlation coefficient between \(x\) and \(y\), [3 marks]
  2. the equation of the regression line of \(y\) on \(x\). [4 marks]
Given that \(p = x + 30\) and \(q = y + 50\),
  1. find the equation of the regression line of \(q\) on \(p\), in the form \(q = mp + c\). [3 marks]
  2. Estimate the value of \(q\) when \(p = 46\), stating any assumptions you make. [3 marks]
Edexcel S1 Q7
15 marks Moderate -0.3
The following data was collected for seven cars, showing their engine size, \(x\) litres, and their fuel consumption, \(y\) km per litre, on a long journey.
Car\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)
\(x\)0.951.201.371.762.252.502.875
\(y\)21.317.215.519.114.711.49.0
\(\sum x = 12.905\), \(\sum x^2 = 26.8951\), \(\sum y = 108.2\), \(\sum y^2 = 1781.64\), \(\sum xy = 183.176\).
  1. Calculate the equation of the regression line of \(x\) on \(y\), expressing your answer in the form \(x = ay + b\). [6 marks]
  2. Calculate the product moment correlation coefficient between \(y\) and \(x\) and give a brief interpretation of its value. [4 marks]
  3. Use the equation of the regression line to estimate the value of \(x\) when \(y = 12\). State, with a reason, how accurate you would expect this estimate to be. [3 marks]
  4. Comment on the use of the line to find values of \(x\) as \(y\) gets very small. [2 marks]
Edexcel S1 Q4
10 marks Moderate -0.8
The heights, \(h\) m, of eight children were measured, giving the following values of \(h\): 1.20, 1.12, 1.43, 0.98, 1.31, 1.26, 1.02, 1.41.
  1. Find the mean height of the children. [2 marks]
  2. Calculate the variance of the heights. [3 marks]
The children were also weighed. It was found that their masses, \(w\) kg, were such that $$\sum w = 324, \quad \sum w^2 = 13532, \quad \sum wh = 403.$$
  1. Calculate the product-moment correlation coefficient between \(w\) and \(h\). [4 marks]
  2. Comment briefly on the value you have obtained. [1 mark]
Edexcel S1 Q5
13 marks Standard +0.3
The following marks out of 50 were given by two judges to the contestants in a talent contest:
Contestant\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Judge 1 (\(x\))4332402147112938
Judge 2 (\(y\))3925402236132732
Given that \(\sum x = 261\), \(\sum x^2 = 9529\) and \(\sum xy = 8373\),
  1. calculate the product-moment correlation coefficient between the two judges' marks [5 marks]
  2. Find an equation of the regression line of \(x\) on \(y\). [4 marks]
Contestant \(I\) was awarded 45 marks by Judge 2.
  1. Estimate the mark that this contestant would have received from Judge 1. [2 marks]
  2. Comment, with explanation, on the probable accuracy of your answer. [2 marks]
Edexcel S1 Q6
21 marks Standard +0.3
A missile was fired vertically upwards and its height above ground level, \(h\) metres, was found at various times \(t\) seconds after it was released. The results are given in the following table:
\(t\)1234567
\(h\)68126174216240252266
It is thought that this data can be fitted to the formula \(h = pt - qt^2\).
  1. Show that this equation can be written as \(\frac{h}{t} = p - qt\). [1 mark]
  2. Plot a scatter diagram of \(\frac{h}{t}\) against \(t\). [5 marks]
Given that \(\sum h = 1342\), \(\sum \frac{h}{t} = 371\) and \(\sum \frac{h^2}{t^2} = 20385\),
  1. find the equation of the regression line of \(\frac{h}{t}\) on \(t\) and hence write down the values of \(p\) and \(q\). [8 marks]
  2. Use your equation to find the value of \(h\) when \(t = 10\). Comment on the implication of your answer. [3 marks]
  3. Find the product-moment correlation coefficient between \(\frac{h}{t}\) and \(t\) and state the significance of its value. [4 marks]
Edexcel S1 Q6
15 marks Standard +0.3
The marks out of 75 obtained by a group of ten students in their first and second Statistics modules were as follows:
Student\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Module 1 \((x)\)\(54\)\(33\)\(42\)\(71\)\(60\)\(27\)\(39\)\(46\)\(59\)\(64\)
Module 2 \((y)\)\(50\)\(22\)\(44\)\(58\)\(42\)\(19\)\(35\)\(46\)\(55\)\(60\)
  1. Find \(\sum x\) and \(\sum y\). [2 marks]
Given that \(\sum x^2 = 26353\) and \(\sum xy = 22991\),
  1. obtain the equation of the regression line of \(y\) on \(x\). [5 marks]
  2. Estimate the Module 2 result of a student whose mark in Module 1 was (i) 65, (ii) 5. Explain why one of these estimates is less reliable than the other. [4 marks]
The equation of the regression line of \(x\) on \(y\) is \(x = 0.921y + 9.81\).
  1. Deduce the product moment correlation coefficient between \(x\) and \(y\), and briefly interpret its value. [4 marks]