5.08a Pearson correlation: calculate pmcc

246 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2011 June Q1
7 marks Moderate -0.8
1 Five salesmen from a certain firm were selected at random for a survey. For each salesman, the annual income, \(x\) thousand pounds, and the distance driven last year, \(y\) thousand miles, were recorded. The results were summarised as follows. $$n = 5 \quad \Sigma x = 251 \quad \Sigma x ^ { 2 } = 14323 \quad \Sigma y = 65 \quad \Sigma y ^ { 2 } = 855 \quad \Sigma x y = 3247$$
  1. (a) Show that the product moment correlation coefficient, \(r\), between \(x\) and \(y\) is - 0.122 , correct to 3 significant figures.
    (b) State what this value of \(r\) shows about the relationship between annual income and distance driven last year for these five salesmen.
    (c) It was decided to recalculate \(r\) with the distances measured in kilometres instead of miles. State what effect, if any, this would have on the value of \(r\).
  2. Another salesman from the firm is selected at random. His annual income is known to be \(\pounds 52000\), but the distance that he drove last year is unknown. In order to estimate this distance, a regression line based on the above data is used. Comment on the reliability of such an estimate.
OCR S1 2011 June Q7
6 marks Moderate -0.8
7 The diagram shows the results of an experiment involving some bivariate data. The least squares regression line of \(y\) on \(x\) for these results is also shown. \includegraphics[max width=\textwidth, alt={}, center]{48ffcd44-d933-40e0-818a-20d6db607298-5_748_919_390_612}
  1. Given that the least squares regression line of \(y\) on \(x\) is used for an estimation, state which of \(x\) or \(y\) is treated as the independent variable.
  2. Use the diagram to explain what is meant by 'least squares'.
  3. State, with a reason, the value of Spearman's rank correlation coefficient for these data.
  4. What can be said about the value of the product moment correlation coefficient for these data?
OCR S1 2012 June Q1
9 marks Moderate -0.8
1 For each of the last five years the number of tourists, \(x\) thousands, visiting Sackton, and the average weekly sales, \(\pounds y\) thousands, in Sackton Stores were noted. The table shows the results.
Year20072008200920102011
\(x\)250270264290292
\(y\)4.23.73.23.53.0
  1. Calculate the product moment correlation coefficient \(r\) between \(x\) and \(y\).
  2. It is required to estimate the average weekly sales at Sackton Stores in a year when the number of tourists is 280000 . Calculate the equation of an appropriate regression line, and use it to find this estimate.
  3. Over a longer period the value of \(r\) is - 0.8 . The mayor says, "This shows that having more tourists causes sales at Sackton Stores to decrease." Give a reason why this statement is not correct.
OCR S1 2014 June Q5
9 marks Moderate -0.8
5 Tariq collected information about typical prices, \(\pounds y\) million, of four-bedroomed houses at varying distances, \(x\) miles, from a large city. He chose houses at 10 -mile intervals from the city. His results are shown below.
\(x\)1020304050607080
\(y\)1.21.41.20.90.80.50.50.3
$$n = 8 \quad \Sigma x = 360 \quad \Sigma x ^ { 2 } = 20400 \quad \Sigma y = 6.8 \quad \Sigma y ^ { 2 } = 6.88 \quad \Sigma x y = 241$$
  1. Use an appropriate formula to calculate the product moment correlation coefficient, \(r\), showing that \(- 1.0 < r < - 0.9\).
  2. State what this value of \(r\) shows in this context.
  3. Tariq decides to recalculate the value of \(r\) with the house prices measured in hundreds of thousands of pounds, instead of millions of pounds. State what effect, if any, this will have on the value of \(r\).
  4. Calculate the equation of the regression line of \(y\) on \(x\).
  5. Explain why the regression line of \(y\) on \(x\), rather than \(x\) on \(y\), should be used for estimating a value of \(x\) from a given value of \(y\).
OCR S1 2015 June Q1
6 marks Moderate -0.8
1 For the top 6 clubs in the 2010/11 season of the English Premier League, the table shows the annual salary, \(\pounds x\) million, of the highest paid player and the number of points scored, \(y\).
ClubManchester UnitedManchester CityChelseaArsenalTottenhamLiverpool
\(x\)5.67.46.54.13.66.5
\(y\)807171686258
$$n = 6 \quad \sum x = 33.7 \quad \sum x ^ { 2 } = 200.39 \quad \sum y = 410 \quad \sum y ^ { 2 } = 28314 \quad \sum x y = 2313.9$$
  1. Use a suitable formula to calculate the product moment correlation coefficient, \(r\), between \(x\) and \(y\), showing that \(0 < r < 0.2\).
  2. State what this value of \(r\) shows in this context.
  3. A fan suggests that the data should be used to draw a regression line in order to estimate the number of points that would be scored by another Premier League club, whose highest paid player's salary is \(\pounds 1.7\) million. Give two reasons why such an estimate would be unlikely to be reliable.
OCR MEI S2 2011 January Q1
17 marks Standard +0.3
1 The scatter diagram below shows the birth rates \(x\), and death rates \(y\), measured in standard units, in a random sample of 14 countries in a particular year. Summary statistics for the data are as follows. $$\Sigma x = 139.8 \quad \Sigma y = 140.4 \quad \Sigma x ^ { 2 } = 1411.66 \quad \Sigma y ^ { 2 } = 1417.88 \quad \Sigma x y = 1398.56 \quad n = 14$$ \includegraphics[max width=\textwidth, alt={}, center]{cd1a8f39-dd3c-44c9-90b0-6a919361d593-2_643_1047_488_550}
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between birth rates and death rates.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly in the light of the scatter diagram why it appears that the assumption may be valid.
  4. The values of \(x\) and \(y\) for another country in that year are 14.4 and 7.8 respectively. If these values are included, the value of the sample product moment correlation coefficient is - 0.5694 . Explain why this one observation causes such a large change to the value of the sample product moment correlation coefficient. Discuss whether this brings the validity of the test into question.
OCR MEI S2 2009 June Q1
16 marks Standard +0.3
1 An investment analyst thinks that there may be correlation between the cost of oil, \(x\) dollars per barrel, and the price of a particular share, \(y\) pence. The analyst selects 50 days at random and records the values of \(x\) and \(y\). Summary statistics for these data are shown below, together with a scatter diagram. $$\Sigma x = 2331.3 \quad \Sigma y = 6724.3 \quad \Sigma x ^ { 2 } = 111984 \quad \Sigma y ^ { 2 } = 921361 \quad \Sigma x y = 316345 \quad n = 50$$ \includegraphics[max width=\textwidth, alt={}, center]{ae79cdd9-a57c-490e-a9f3-f47c7c8a1aa6-2_857_905_516_621}
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the analyst's belief. State your hypotheses clearly, defining any symbols which you use.
  3. An assumption that there is a bivariate Normal distribution is required for this test to be valid. State whether it is the sample or the population which is required to have such a distribution. State, with a reason, whether in this case the assumption appears to be justified.
  4. Explain why a 2-tail test is appropriate even though it is clear from the scatter diagram that the sample has a positive correlation coefficient.
OCR MEI S2 2012 June Q1
19 marks Standard +0.3
1 The times, in seconds, taken by ten randomly selected competitors for the first and last sections of an Olympic bobsleigh run are denoted by \(x\) and \(y\) respectively. Summary statistics for these data are as follows. $$\Sigma x = 113.69 \quad \Sigma y = 52.81 \quad \Sigma x ^ { 2 } = 1292.56 \quad \Sigma y ^ { 2 } = 278.91 \quad \Sigma x y = 600.41 \quad n = 10$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any correlation between times taken for the first and last sections of the bobsleigh run.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A commentator says that in order to have a fast time on the last section, you must have a fast time on the first section. Comment briefly on this suggestion.
  5. (A) Would your conclusion in part (ii) have been different if you had carried out the hypothesis test at the \(1 \%\) level rather than the \(10 \%\) level? Explain your answer.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(10 \%\) significance level in a hypothesis test.
OCR MEI S2 2013 June Q1
18 marks Standard +0.3
1 Salbutamol is a drug used to improve lung function. In a medical trial, a random sample of 60 people with impaired lung function was selected. The forced expiratory volume in one second (FEV1) was measured for each person, both before being given salbutamol and again after a two-week course of the drug. The variables \(x\) and \(y\), measured in suitable units, represent FEV1 before and after the two-week course respectively. The data are illustrated in the scatter diagram below, together with the summary statistics for these data. \includegraphics[max width=\textwidth, alt={}, center]{f3690bc0-3392-4f29-86f7-797d33fab4f1-2_682_1024_502_516} Summary statistics: $$n = 60 , \quad \sum x = 43.62 , \quad \sum y = 55.15 , \quad \sum x ^ { 2 } = 32.68 , \quad \sum y ^ { 2 } = 51.44 , \quad \sum x y = 40.66$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive correlation between FEV1 before and after the course.
  3. State the distributional assumption which is necessary for this test to be valid. State, with a reason, whether the assumption appears to be valid.
  4. Explain the meaning of the term 'significance level'.
  5. Calculate the values of the summary statistics if the data point \(x = 0.55 , y = 1.00\) had been incorrectly recorded as \(x = 1.00 , y = 0.55\).
OCR MEI S2 2014 June Q1
18 marks Standard +0.3
1 A medical student is investigating the claim that young adults with high diastolic blood pressure tend to have high systolic blood pressure. The student measures the diastolic and systolic blood pressures of a random sample of ten young adults. The data are shown in the table and illustrated in the scatter diagram.
Diastolic blood pressure60616263737684879095
Systolic blood pressure98121118114108112132130134139
\includegraphics[max width=\textwidth, alt={}, center]{17e474c4-f5be-4ca1-b7c3-e444b46c3bec-2_865_809_593_628}
  1. Calculate the value of Spearman's rank correlation coefficient for these data.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to examine whether there is positive association between diastolic blood pressure and systolic blood pressure in the population of young adults.
  3. Explain why, in the light of the scatter diagram, it might not be valid to carry out a test based on the product moment correlation coefficient. The product moment correlation coefficient between the diastolic and systolic blood pressures of a random sample of 10 athletes is 0.707 .
  4. Carry out a hypothesis test at the \(1 \%\) significance level to investigate whether there appears to be positive correlation between these two variables in the population of athletes. You may assume that in this case such a test is valid.
CAIE FP2 2010 June Q9
9 marks Standard +0.3
9 A set of 20 pairs of bivariate data \(( x , y )\) is summarised by $$\Sigma x = 200 , \quad \Sigma x ^ { 2 } = 2125 , \quad \Sigma y = 240 , \quad \Sigma y ^ { 2 } = 8245 .$$ The product moment correlation coefficient is - 0.992 .
  1. What does the value of the product moment correlation coefficient indicate about a scatter diagram of the data points?
  2. Find the equation of the regression line of \(y\) on \(x\).
  3. The equation of the regression line of \(x\) on \(y\) is \(x = a ^ { \prime } + b ^ { \prime } y\). Find the value of \(b ^ { \prime }\).
CAIE FP2 2010 June Q9
10 marks Moderate -0.3
9
  1. The following are values of the product moment correlation coefficient between the \(x\) and \(y\) values of three different large samples of bivariate data. State what each indicates about the appearance of a scatter diagram illustrating the data.
    1. - 1 ,
    2. 0.02 ,
    3. 0.92 .
  2. In 1852 Dr William Farr published data on deaths due to cholera during an outbreak of the disease in London. The table shows the altitude (in feet, above the level of the river Thames) at which people lived and the corresponding number of deaths from cholera per 10000 people.
    Altitude, \(x\)1030507090100350
    Number of deaths, \(y\)10265342722178
    $$\left[ \Sigma x = 700 , \Sigma x ^ { 2 } = 149000 , \Sigma y = 275 , \Sigma y ^ { 2 } = 17351 , \Sigma x y = 13040 . \right]$$
    1. Calculate the product moment correlation coefficient.
    2. Test, at the \(5 \%\) significance level, whether there is evidence of negative correlation.
CAIE FP2 2011 June Q10
10 marks Standard +0.3
10 The mid-day temperature, \(x ^ { \circ } \mathrm { C }\), and the amount of sunshine, \(y\) hours, were recorded at a winter holiday resort on each of 12 days, chosen at random during the winter season. The results are summarised as follows. $$\Sigma x = 18.7 \quad \Sigma x ^ { 2 } = 106.43 \quad \Sigma y = 34.7 \quad \Sigma y ^ { 2 } = 133.43 \quad \Sigma x y = 92.01$$
  1. Find the product moment correlation coefficient for the data.
  2. Stating your hypotheses, test at the \(1 \%\) significance level whether there is a non-zero correlation between mid-day temperature and amount of sunshine.
  3. Use the equation of a suitable regression line to estimate the number of hours of sunshine on a day when the mid-day temperature is \(2 ^ { \circ } \mathrm { C }\).
CAIE FP2 2011 June Q9
11 marks Standard +0.3
9 The marks achieved by a random sample of 15 college students in a Physics examination ( \(x\) ) and in a General Studies examination (y) are summarised as follows. $$\Sigma x = 752 \quad \Sigma x ^ { 2 } = 38814 \quad \Sigma y = 773 \quad \Sigma y ^ { 2 } = 45351 \quad \Sigma x y = 40236$$
  1. Find the mean values, \(\bar { x }\) and \(\bar { y }\).
  2. Another college student achieved a mark of 56 in the General Studies examination, but was unable to take the Physics examination. Use the equation of a suitable regression line to estimate the mark that the student would have obtained in the Physics examination.
  3. Find the product moment correlation coefficient for the given data.
  4. Stating your hypotheses, test at the \(5 \%\) level of significance whether there is a non-zero product moment correlation coefficient between examination marks in Physics and in General Studies achieved by college students.
CAIE FP2 2013 June Q10 OR
Standard +0.8
The regression line of \(y\) on \(x\), obtained from a random sample of five pairs of values of \(x\) and \(y\), has equation $$y = x + k$$ where \(k\) is a constant. The following table shows the data.
\(x\)2334\(p\)
\(y\)45842
Find the two possible values of \(p\). For the smaller of these two values of \(p\), find
  1. the product moment correlation coefficient,
  2. the equation of the regression line of \(x\) on \(y\).
CAIE FP2 2013 June Q6
6 marks Moderate -0.8
6 Six pairs of values of variables \(x\) and \(y\) are measured. Draw a sketch of a possible scatter diagram of the data for each of the following cases:
  1. the product moment correlation coefficient is approximately zero;
  2. the product moment correlation coefficient is exactly - 1 . On your diagram for part (i), sketch the regression line of \(y\) on \(x\) and the regression line of \(x\) on \(y\), labelling each line. On your diagram for part (ii), sketch the regression line of \(y\) on \(x\) and state its relationship to the regression line of \(x\) on \(y\).
CAIE FP2 2013 June Q9
9 marks Standard +0.8
9 A researcher records a random sample of \(n\) pairs of values of \(( x , y )\), giving the following summarised data. $$\Sigma x = 24 \quad \Sigma x ^ { 2 } = 160 \quad \Sigma y = 34 \quad \Sigma y ^ { 2 } = 324 \quad \Sigma x y = 192$$ The gradient of the regression line of \(y\) on \(x\) is \(- \frac { 3 } { 4 }\). Find
  1. the value of \(n\),
  2. the equation of the regression line of \(x\) on \(y\) in the form \(x = A y + B\), where \(A\) and \(B\) are constants to be determined,
  3. the product moment correlation coefficient. Another researcher records the same data in the form \(\left( x ^ { \prime } , y ^ { \prime } \right)\), where \(x ^ { \prime } = \frac { x } { k } , y ^ { \prime } = \frac { y } { k }\) and \(k\) is a constant.
    Without further calculation, state the equation of the regression line of \(x ^ { \prime }\) on \(y ^ { \prime }\).
CAIE FP2 2014 June Q10
11 marks Standard +0.3
10 Samples of rock from a number of geological sites were analysed for the quantities of two types, \(X\) and \(Y\), of rare minerals. The results, in milligrams, for 10 randomly chosen samples, each of 10 kg , are summarised as follows. $$\Sigma x = 866 \quad \Sigma x ^ { 2 } = 121276 \quad \Sigma y = 639 \quad \Sigma y ^ { 2 } = 55991 \quad \Sigma x y = 73527$$ Find the product moment correlation coefficient. Stating your hypotheses, test at the \(5 \%\) significance level whether there is non-zero correlation between quantities of the two rare minerals. Find the equation of the regression line of \(x\) on \(y\) in the form \(x = p y + q\), where \(p\) and \(q\) are constants to be determined.
CAIE FP2 2015 June Q8
8 marks Standard +0.3
8
  1. For a random sample of ten pairs of values of \(x\) and \(y\) taken from a bivariate distribution, the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are, respectively, $$y = 0.38 x + 1.41 \quad \text { and } \quad x = 0.96 y + 7.47$$
    1. Find the value of the product moment correlation coefficient for this sample.
    2. Using a \(5 \%\) significance level, test whether there is positive correlation between the variables.
  2. For a random sample of \(n\) pairs of values of \(u\) and \(v\) taken from another bivariate distribution, the value of the product moment correlation coefficient is 0.507 . Using a test at the \(5 \%\) significance level, there is evidence of non-zero correlation between the variables. Find the least possible value of \(n\).
CAIE FP2 2015 June Q7
11 marks Standard +0.8
7 For a random sample of 10 observations of pairs of values \(( x , y )\), the equation of the regression line of \(y\) on \(x\) is \(y = 3.25 x - 4.27\). The sum of the ten \(x\) values is 15.6 and the product moment correlation coefficient for the sample is 0.56 . Find the equation of the regression line of \(x\) on \(y\). Test, at the \(5 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
CAIE FP2 2016 June Q10
11 marks Standard +0.3
10 For a random sample of 6 observations of pairs of values \(( x , y )\), where \(0 < x < 21\) and \(0 < y < 14\), the following results are obtained. $$\Sigma x ^ { 2 } = 844.20 \quad \Sigma y ^ { 2 } = 481.50 \quad \Sigma x y = 625.59$$ It is also found that the variance of the \(x\)-values is 36.66 and the variance of the \(y\)-values is 9.69 .
  1. Find the product moment correlation coefficient for the sample.
  2. Find the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\).
  3. Use the appropriate regression line to estimate the value of \(x\) when \(y = 6.4\) and comment on the reliability of your estimate.
CAIE FP2 2018 June Q11 OR
Standard +0.8
The regression line of \(y\) on \(x\), obtained from a random sample of 6 pairs of values of \(x\) and \(y\), has equation $$y = 0.25 x + k$$ where \(k\) is a constant. The values from the sample are shown in the following table.
\(x\)45781014
\(y\)58\(p\)7\(p\)9
  1. Find the value of \(p\) and the value of \(k\).
  2. Find the product moment correlation coefficient for the data.
  3. Test, at the \(5 \%\) significance level, whether there is evidence of positive correlation between the variables.
    If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
CAIE FP2 2018 June Q8
9 marks Challenging +1.2
8 For a random sample of 6 observations of pairs of values \(( x , y )\), the equation of the regression line of \(y\) on \(x\) is \(y = b x + 1.306\), where \(b\) is a constant. The corresponding equation of the regression line of \(x\) on \(y\) is \(x = 0.6331 y + d\), where \(d\) is a constant. The values of \(x\) from the sample are $$\begin{array} { l l l l l l } 2.3 & 2.8 & 3.7 & p & 6.1 & 6.4 \end{array}$$ and the sum of the values of \(y\) is 46.5 . The product moment correlation coefficient is 0.9797 .
  1. Find the value of \(b\) correct to 3 decimal places.
  2. Find the value of \(p\).
  3. Use the equation of the regression line of \(x\) on \(y\) to estimate the value of \(x\) when \(y = 8.5\).
CAIE FP2 2019 June Q10
11 marks Standard +0.3
10 The values from a random sample of five pairs \(( x , y )\) taken from a bivariate distribution are shown below.
\(x\)34468
\(y\)57\(q\)67
The equation of the regression line of \(x\) on \(y\) is given by \(x = \frac { 5 } { 4 } y + c\).
  1. Given that \(q\) is an integer, find its value.
  2. Find the value of \(c\).
  3. Find the value of the product moment correlation coefficient.
CAIE FP2 2008 November Q8
9 marks Moderate -0.3
8 The equations of the regression lines for a random sample of 25 pairs of data \(( x , y )\) from a bivariate population are $$\begin{array} { c c } y \text { on } x : & y = 1.28 - 0.425 x , \\ x \text { on } y : & x = 1.05 - 0.516 y . \end{array}$$
  1. Find the sample means, \(\bar { x }\) and \(\bar { y }\).
  2. Find the product moment correlation coefficient for the sample.
  3. Test at the \(5 \%\) significance level whether the population correlation coefficient differs from zero.