Calculate PMCC from summary statistics

A question is this type if and only if it asks to calculate Pearson's product moment correlation coefficient given summary statistics (Σx, Σy, Σx², Σy², Σxy, n) or Sxx, Syy, Sxy.

19 questions · Standard +0.1

Sort by: Default | Easiest first | Hardest first
OCR MEI S2 2006 June Q3
18 marks Standard +0.3
3 A student is investigating the relationship between the length \(x \mathrm {~mm}\) and circumference \(y \mathrm {~mm}\) of plums from a large crop. The student measures the dimensions of a random sample of 10 plums from this crop. Summary statistics for these dimensions are as follows. $$\begin{aligned} & \sum x = 4715 \quad \sum y = 13175 \quad \sum x ^ { 2 } = 2237725 \\ & \sum y ^ { 2 } = 17455825 \quad \sum x y = 6235575 \quad n = 10 \end{aligned}$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between length and circumference of plums from this crop. State your hypotheses clearly, defining any symbols which you use.
  3. (A) Explain the meaning of a 5\% significance level.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(5 \%\) significance level in a hypothesis test. The student decides to take another random sample of 10 plums. Using the same hypotheses as in part (ii), the correlation coefficient for this second sample is significant at the \(5 \%\) level. The student decides to ignore the first result and concludes that there is correlation between the length and circumference of plums in the crop.
  4. Comment on the student's decision to ignore the first result. Suggest a better way in which the student could proceed.
OCR MEI S2 2008 June Q1
18 marks Standard +0.3
1 A researcher believes that there is a negative correlation between money spent by the government on education and population growth in various countries. A random sample of 48 countries is selected to investigate this belief. The level of government spending on education \(x\), measured in suitable units, and the annual percentage population growth rate \(y\), are recorded for these countries. Summary statistics for these data are as follows. $$\Sigma x = 781.3 \quad \Sigma y = 57.8 \quad \Sigma x ^ { 2 } = 14055 \quad \Sigma y ^ { 2 } = 106.3 \quad \Sigma x y = 880.1 \quad n = 48$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the researcher's belief. State your hypotheses clearly, defining any symbols which you use.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A student suggests that if the variables are negatively correlated then population growth rates can be reduced by increasing spending on education. Explain why the student may be wrong. Discuss an alternative explanation for the correlation.
  5. State briefly one advantage and one disadvantage of using a smaller sample size in this investigation.
OCR MEI S2 2011 January Q1
17 marks Standard +0.3
1 The scatter diagram below shows the birth rates \(x\), and death rates \(y\), measured in standard units, in a random sample of 14 countries in a particular year. Summary statistics for the data are as follows. $$\Sigma x = 139.8 \quad \Sigma y = 140.4 \quad \Sigma x ^ { 2 } = 1411.66 \quad \Sigma y ^ { 2 } = 1417.88 \quad \Sigma x y = 1398.56 \quad n = 14$$ \includegraphics[max width=\textwidth, alt={}, center]{cd1a8f39-dd3c-44c9-90b0-6a919361d593-2_643_1047_488_550}
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between birth rates and death rates.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly in the light of the scatter diagram why it appears that the assumption may be valid.
  4. The values of \(x\) and \(y\) for another country in that year are 14.4 and 7.8 respectively. If these values are included, the value of the sample product moment correlation coefficient is - 0.5694 . Explain why this one observation causes such a large change to the value of the sample product moment correlation coefficient. Discuss whether this brings the validity of the test into question.
OCR MEI S2 2009 June Q1
16 marks Standard +0.3
1 An investment analyst thinks that there may be correlation between the cost of oil, \(x\) dollars per barrel, and the price of a particular share, \(y\) pence. The analyst selects 50 days at random and records the values of \(x\) and \(y\). Summary statistics for these data are shown below, together with a scatter diagram. $$\Sigma x = 2331.3 \quad \Sigma y = 6724.3 \quad \Sigma x ^ { 2 } = 111984 \quad \Sigma y ^ { 2 } = 921361 \quad \Sigma x y = 316345 \quad n = 50$$ \includegraphics[max width=\textwidth, alt={}, center]{ae79cdd9-a57c-490e-a9f3-f47c7c8a1aa6-2_857_905_516_621}
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the analyst's belief. State your hypotheses clearly, defining any symbols which you use.
  3. An assumption that there is a bivariate Normal distribution is required for this test to be valid. State whether it is the sample or the population which is required to have such a distribution. State, with a reason, whether in this case the assumption appears to be justified.
  4. Explain why a 2-tail test is appropriate even though it is clear from the scatter diagram that the sample has a positive correlation coefficient.
OCR MEI S2 2012 June Q1
19 marks Standard +0.3
1 The times, in seconds, taken by ten randomly selected competitors for the first and last sections of an Olympic bobsleigh run are denoted by \(x\) and \(y\) respectively. Summary statistics for these data are as follows. $$\Sigma x = 113.69 \quad \Sigma y = 52.81 \quad \Sigma x ^ { 2 } = 1292.56 \quad \Sigma y ^ { 2 } = 278.91 \quad \Sigma x y = 600.41 \quad n = 10$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any correlation between times taken for the first and last sections of the bobsleigh run.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A commentator says that in order to have a fast time on the last section, you must have a fast time on the first section. Comment briefly on this suggestion.
  5. (A) Would your conclusion in part (ii) have been different if you had carried out the hypothesis test at the \(1 \%\) level rather than the \(10 \%\) level? Explain your answer.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(10 \%\) significance level in a hypothesis test.
OCR MEI S2 2013 June Q1
18 marks Standard +0.3
1 Salbutamol is a drug used to improve lung function. In a medical trial, a random sample of 60 people with impaired lung function was selected. The forced expiratory volume in one second (FEV1) was measured for each person, both before being given salbutamol and again after a two-week course of the drug. The variables \(x\) and \(y\), measured in suitable units, represent FEV1 before and after the two-week course respectively. The data are illustrated in the scatter diagram below, together with the summary statistics for these data. \includegraphics[max width=\textwidth, alt={}, center]{f3690bc0-3392-4f29-86f7-797d33fab4f1-2_682_1024_502_516} Summary statistics: $$n = 60 , \quad \sum x = 43.62 , \quad \sum y = 55.15 , \quad \sum x ^ { 2 } = 32.68 , \quad \sum y ^ { 2 } = 51.44 , \quad \sum x y = 40.66$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive correlation between FEV1 before and after the course.
  3. State the distributional assumption which is necessary for this test to be valid. State, with a reason, whether the assumption appears to be valid.
  4. Explain the meaning of the term 'significance level'.
  5. Calculate the values of the summary statistics if the data point \(x = 0.55 , y = 1.00\) had been incorrectly recorded as \(x = 1.00 , y = 0.55\).
CAIE FP2 2010 June Q9
10 marks Moderate -0.3
9
  1. The following are values of the product moment correlation coefficient between the \(x\) and \(y\) values of three different large samples of bivariate data. State what each indicates about the appearance of a scatter diagram illustrating the data.
    1. - 1 ,
    2. 0.02 ,
    3. 0.92 .
  2. In 1852 Dr William Farr published data on deaths due to cholera during an outbreak of the disease in London. The table shows the altitude (in feet, above the level of the river Thames) at which people lived and the corresponding number of deaths from cholera per 10000 people.
    Altitude, \(x\)1030507090100350
    Number of deaths, \(y\)10265342722178
    $$\left[ \Sigma x = 700 , \Sigma x ^ { 2 } = 149000 , \Sigma y = 275 , \Sigma y ^ { 2 } = 17351 , \Sigma x y = 13040 . \right]$$
    1. Calculate the product moment correlation coefficient.
    2. Test, at the \(5 \%\) significance level, whether there is evidence of negative correlation.
CAIE FP2 2012 June Q11 OR
Standard +0.3
A new restaurant \(S\) has recently opened in a particular town. In order to investigate any effect of \(S\) on an existing restaurant \(R\), the daily takings, \(x\) and \(y\) in thousands of dollars, at \(R\) and \(S\) respectively are recorded for a random sample of 8 days during a six-month period. The results are shown in the following table.
Day12345678
\(x\)1.21.40.91.10.81.00.61.5
\(y\)0.30.40.60.60.250.750.60.35
  1. Calculate the product moment correlation coefficient for this sample.
  2. Stating your hypotheses, test, at the \(2.5 \%\) significance level, whether there is negative correlation between daily takings at the two restaurants and comment on your result in the context of the question. Another sample is taken over \(N\) randomly chosen days and the product moment correlation coefficient is found to be - 0.431 . A test, at the \(5 \%\) significance level, shows that there is evidence of negative correlation between daily takings in the two restaurants.
  3. Find the range of possible values of \(N\). \footnotetext{Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the publisher will be pleased to make amends at the earliest possible opportunity. University of Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge. }
CAIE FP2 2014 June Q10
11 marks Standard +0.3
10 Samples of rock from a number of geological sites were analysed for the quantities of two types, \(X\) and \(Y\), of rare minerals. The results, in milligrams, for 10 randomly chosen samples, each of 10 kg , are summarised as follows. $$\Sigma x = 866 \quad \Sigma x ^ { 2 } = 121276 \quad \Sigma y = 639 \quad \Sigma y ^ { 2 } = 55991 \quad \Sigma x y = 73527$$ Find the product moment correlation coefficient. Stating your hypotheses, test at the \(5 \%\) significance level whether there is non-zero correlation between quantities of the two rare minerals. Find the equation of the regression line of \(x\) on \(y\) in the form \(x = p y + q\), where \(p\) and \(q\) are constants to be determined.
CAIE FP2 2015 June Q8
8 marks Standard +0.3
8
  1. For a random sample of ten pairs of values of \(x\) and \(y\) taken from a bivariate distribution, the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are, respectively, $$y = 0.38 x + 1.41 \quad \text { and } \quad x = 0.96 y + 7.47$$
    1. Find the value of the product moment correlation coefficient for this sample.
    2. Using a \(5 \%\) significance level, test whether there is positive correlation between the variables.
  2. For a random sample of \(n\) pairs of values of \(u\) and \(v\) taken from another bivariate distribution, the value of the product moment correlation coefficient is 0.507 . Using a test at the \(5 \%\) significance level, there is evidence of non-zero correlation between the variables. Find the least possible value of \(n\).
CAIE FP2 2017 June Q7
6 marks Standard +0.3
7 A random sample of twelve pairs of values of \(x\) and \(y\) is taken from a bivariate distribution. The equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are respectively $$y = 0.46 x + 1.62 \quad \text { and } \quad x = 0.93 y + 8.24$$
  1. Find the value of the product moment correlation coefficient for this sample.
  2. Using a \(5 \%\) significance level, test whether there is non-zero correlation between the variables.
OCR Further Statistics AS 2018 June Q4
8 marks Standard +0.3
4 Judith believes that mathematical ability and chess-playing ability are related. She asks 20 randomly chosen chess players, with known British Chess Federation (BCF) ratings \(X\), to take a mathematics aptitude test, with scores \(Y\). The results are summarised as follows. $$n = 20 , \sum x = 3600 , \sum x ^ { 2 } = 660500 , \sum y = 1440 , \sum y ^ { 2 } = 105280 , \sum x y = 260990$$
  1. Calculate the value of Pearson's product-moment correlation coefficient \(r\).
  2. State an assumption needed to be able to carry out a significance test on the value of \(r\).
  3. Assume now that the assumption in part (ii) is valid. Test at the \(5 \%\) significance level whether there is evidence that chess players with higher BCF ratings are better at mathematics.
  4. There are two different grading systems for chess players, the BCF system and the international ELO system. The two sets of ratings are related by $$\text { ELO rating } = 8 \times \text { BCF rating } + 650$$ Magnus says that the experiment should have used ELO ratings instead of BCF ratings. Comment on Magnus's suggestion.
OCR Further Statistics AS Specimen Q8
10 marks Standard +0.3
8 The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
  1. Find the equation of the regression line of \(y\) on \(x\). You are given that the product moment correlation coefficient is 0.959 .
  2. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  3. Assume that the data is a random sample of all years.
    (a) Carry out the test at the \(10 \%\) significance level.
    (b) Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. {www.ocr.org.uk}) after the live examination series. If OCR has unwittingly failed to correctly acknowledge or clear any third-party content in this assessment material, OCR will be happy to correct its mistake at the earliest possible opportunity. For queries or further information please contact the Copyright Team, First Floor, 9 Hills Road, Cambridge CB2 1GE.
    OCR is part of the Cambridge Assessment Group; Cambridge Assessment is the brand name of University of Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge. }
OCR Further Statistics 2024 June Q2
9 marks Standard +0.3
2 A newspaper article claimed that "taller dog owners have taller dogs as pets". Alex investigated this claim and obtained data from a random sample of 16 fellow students who owned exactly one dog. The results are summarised as follows, where the height of the student, in cm, is denoted by \(h\) and the height, in cm, of their dog is denoted by \(d\). \(\mathrm { n } = 16 \quad \sum \mathrm {~h} = 2880 \quad \sum \mathrm {~d} = 660 \quad \sum \mathrm {~h} ^ { 2 } = 519276 \quad \sum \mathrm {~d} ^ { 2 } = 30000 \quad \sum \mathrm { hd } = 119425\)
  1. Calculate the value of Pearson's product moment correlation coefficient for the data.
  2. State what your answer tells you about a scatter diagram illustrating the data.
  3. Use the data to test, at the \(5 \%\) significance level, the claim of the newspaper article.
  4. Explain whether the answer to part (a) would be likely to be different if the dogs' weights had been used instead of their heights.
OCR Further Statistics 2020 November Q2
8 marks Moderate -0.3
2 A book collector compared the prices of some books, \(\pounds x\), when new in 1972 and the prices of copies of the same books, \(\pounds y\), on a second-hand website in 2018.
The results are shown in Table 1 and are summarised below the table. \begin{table}[h]
BookABCDEFGHIJKL
\(x\)0.950.650.700.900.551.401.500.501.150.350.200.35
\(y\)6.067.002.005.874.005.367.192.503.008.291.372.00
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} $$n = 12 , \sum x = 9.20 , \sum y = 54.64 , \sum x ^ { 2 } = 8.9950 , \sum y ^ { 2 } = 310.4572 , \sum x y = 46.0545$$
  1. It is given that the value of Pearson’s product-moment correlation coefficient for the data is 0.381, correct to 3 significant figures.
    1. State what this information tells you about a scatter diagram illustrating the data.
    2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018.
  2. The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books.
Edexcel S3 2022 January Q3
12 marks Moderate -0.3
  1. A medical research team carried out an investigation into the metabolic rate, MR, of men aged between 30 years and 60 years.
A random sample of 10 men was taken from this age group.
The table below shows for each man his MR and his body mass index, BMI. The table also shows the rank for the level of daily physical activity, DPA, which was assessed by the medical research team. Rank 1 was assigned to the man with the highest level of daily physical activity.
Man\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
MR ( \(\boldsymbol { x }\) )6.245.946.836.536.317.447.328.707.887.78
BMI ( \(\boldsymbol { y }\) )19.619.223.621.420.220.822.925.523.325.1
DPA rank10798631452
$$\text { [You may use } \quad \mathrm { S } _ { x y } = 15.1608 \quad \mathrm {~S} _ { x x } = 6.90181 \quad \mathrm {~S} _ { y y } = 45.304 \text { ] }$$
  1. Calculate the value of the product moment correlation coefficient between MR and BMI for these 10 men.
  2. Use your value of the product moment correlation coefficient to test, at the 5\% significance level, whether or not there is evidence of a positive correlation between MR and BMI.
    State your hypotheses clearly.
  3. State an assumption that must be made to carry out the test in part (b).
  4. Calculate the value of Spearman's rank correlation coefficient between MR and DPA for these 10 men.
  5. Use a two-tailed test and a \(5 \%\) level of significance to assess whether or not there is evidence of a correlation between MR and DPA.
Edexcel S3 2005 June Q4
13 marks Moderate -0.3
4. Over a period of time, researchers took 10 blood samples from one patient with a blood disease. For each sample, they measured the levels of serum magnesium, \(s \mathrm { mg } / \mathrm { dl }\), in the blood and the corresponding level of the disease protein, \(d \mathrm { mg } / \mathrm { dl }\). The results are shown in the table.
\(s\)1.21.93.23.92.54.55.74.01.15.9
\(d\)3.87.011.012.09.012.013.512.22.013.9
$$\text { [Use } \sum s ^ { 2 } = 141.51 , \sum d ^ { 2 } = 1081.74 \text { and } \sum s d = 386.32 \text { ] }$$
  1. Draw a scatter diagram to represent these data.
  2. State what is measured by the product moment correlation coefficient.
  3. Calculate \(S _ { x x } , S _ { d d }\) and \(S _ { s d }\).
  4. Calculate the value of the product moment correlation coefficient \(r\) between \(s\) and \(d\).
  5. Stating your hypotheses clearly, test, at the \(1 \%\) significance level, whether or not the correlation coefficient is greater than zero.
  6. With reference to your scatter diagram, comment on your result in part (e).
AQA S3 2012 June Q1
6 marks Moderate -0.8
1 A wildlife expert measured the neck lengths, \(x\) metres, and the tail lengths, \(y\) metres, of a sample of 12 mature male giraffes as part of a study into their physical characteristics. The results are shown in the table.
AQA S3 2015 June Q1
6 marks Moderate -0.8
1 A demographer measured the length of the right foot, \(x\) millimetres, and the length of the right hand, \(y\) millimetres, of each of a sample of 12 males aged between 19 years and 25 years. The results are given in the table.