5.08a Pearson correlation: calculate pmcc

246 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 Q6
13 marks Standard +0.3
Two variables \(x\) and \(y\) are such that, for a sample of ten pairs of values, $$\sum x = 104.5, \quad \sum y = 113.6, \quad \sum x^2 = 1954.1, \quad \sum y^2 = 2100.6.$$ The regression line of \(x\) on \(y\) has gradient 0.8. Find
  1. \(\sum xy\), [4 marks]
  2. the equation of the regression line of \(y\) on \(x\), [5 marks]
  3. the product moment correlation coefficient between \(y\) and \(x\). [3 marks]
  4. Describe the kind of correlation indicated by your answer to (c). [1 mark]
OCR S1 2010 January Q3
7 marks Moderate -0.8
The heights, \(h\) m, and weights, \(m\) kg, of five men were measured. The results are plotted on the diagram. \includegraphics{figure_3} The results are summarised as follows. \(n = 5\) \(\Sigma h = 9.02\) \(\Sigma m = 377.7\) \(\Sigma h^2 = 16.382\) \(\Sigma m^2 = 28558.67\) \(\Sigma hm = 681.612\)
  1. Use the summarised data to calculate the value of the product moment correlation coefficient, \(r\). [3]
  2. Comment on your value of \(r\) in relation to the diagram. [2]
  3. It was decided to re-calculate the value of \(r\) after converting the heights to feet and the masses to pounds. State what effect, if any, this will have on the value of \(r\). [1]
  4. One of the men had height 1.63 m and mass 78.4 kg. The data for this man were removed and the value of \(r\) was re-calculated using the original data for the remaining four men. State in general terms what effect, if any, this will have on the value of \(r\). [1]
OCR S1 2010 January Q6
7 marks Standard +0.3
  1. A student calculated the values of the product moment correlation coefficient, \(r\), and Spearman's rank correlation coefficient, \(r_s\), for two sets of bivariate data, \(A\) and \(B\). His results are given below. $$A: \quad r = 0.9 \text{ and } r_s = 1$$ $$B: \quad r = 1 \quad \text{and } r_s = 0.9$$ With the aid of a diagram where appropriate, explain why the student's results for \(A\) could both be correct but his results for \(B\) cannot both be correct. [3]
  2. An old research paper has been partially destroyed. The surviving part of the paper contains the following incomplete information about some bivariate data from an experiment. \includegraphics{figure_6} The mean of \(x\) is 4.5. The equation of the regression line of \(y\) on \(x\) is \(y = 2.4x + 3.7\). The equation of the regression line of \(x\) on \(y\) is \(x = 0.40y\) + [missing constant] Calculate the missing constant at the end of the equation of the second regression line. [4]
OCR S1 2013 January Q3
12 marks Moderate -0.3
The Gross Domestic Product per Capita (GDP), \(x\) dollars, and the Infant Mortality Rate per thousand (IMR), \(y\), of 6 African countries were recorded and summarised as follows. \(n = 6\) \quad \(\sum x = 7000\) \quad \(\sum x^2 = 8700000\) \quad \(\sum y = 456\) \quad \(\sum y^2 = 36262\) \quad \(\sum xy = 509900\)
  1. Calculate the equation of the regression line of \(y\) on \(x\) for these 6 countries. [4]
The original data were plotted on a scatter diagram and the regression line of \(y\) on \(x\) was drawn, as shown below. \includegraphics{figure_3}
  1. The GDP for another country, Tanzania, is 1300 dollars. Use the regression line in the diagram to estimate the IMR of Tanzania. [1]
  2. The GDP for Nigeria is 2400 dollars. Give two reasons why the regression line is unlikely to give a reliable estimate for the IMR for Nigeria. [2]
  3. The actual value of the IMR for Tanzania is 96. The data for Tanzania (\(x = 1300, y = 96\)) is now included with the original 6 countries. Calculate the value of the product moment correlation coefficient, \(r\), for all 7 countries. [4]
  4. The IMR is now redefined as the infant mortality rate per hundred instead of per thousand, and the value of \(r\) is recalculated for all 7 countries. Without calculation state what effect, if any, this would have on the value of \(r\) found in part (iv). [1]
OCR S1 2009 June Q3
8 marks Moderate -0.3
In an agricultural experiment, the relationship between the amount of water supplied, \(x\) units, and the yield, \(y\) units, was investigated. Six values of \(x\) were chosen and for each value of \(x\) the corresponding value of \(y\) was measured. The results are shown in the table.
\(x\)123456
\(y\)36881110
These results, together with the regression line of \(y\) on \(x\), are plotted on the graph. \includegraphics{figure_1}
  1. Give a reason why the regression line of \(x\) on \(y\) is not suitable in this context. [1]
  2. Explain the significance, for the regression line of \(y\) on \(x\), of the distances shown by the vertical dotted lines in the diagram. [2]
  3. Calculate the value of the product moment correlation coefficient, \(r\). [3]
  4. Comment on your value of \(r\) in relation to the diagram. [2]
OCR S1 2013 June Q5
9 marks Moderate -0.3
The table shows some of the values of the seasonally adjusted Unemployment Rate (UR), \(x\)\%, and the Consumer Price Index (CPI), \(y\)\%, in the United Kingdom from April 2008 to July 2010.
DateApril 2008July 2008October 2008January 2009April 2009July 2009October 2009January 2010April 2010July 2010
UR, \(x\)\%5.25.76.16.87.57.87.87.97.87.7
CPI, \(y\)\%3.04.44.53.02.31.81.53.53.73.1
These data are summarised below. $$n = 10 \quad \sum x = 70.3 \quad \sum x^2 = 503.45 \quad \sum y = 30.8 \quad \sum y^2 = 103.94 \quad \sum xy = 211.9$$
  1. Calculate the product moment correlation coefficient, \(r\), for the data, showing that \(-0.6 < r < -0.5\). [3]
  2. Karen says "The negative value of \(r\) shows that when the Unemployment Rate increases, it causes the Consumer Price Index to decrease." Give a criticism of this statement. [1]
    1. Calculate the equation of the regression line of \(x\) on \(y\). [3]
    2. Use your equation to estimate the value of the Unemployment Rate in a month when the Consumer Price Index is 4.0\%. [2]
Edexcel S1 Q1
6 marks Moderate -0.8
  1. Draw two separate scatter diagrams, each with eight points, to illustrate the relationship between \(x\) and \(y\) in the cases where they have a product moment correlation coefficient equal to
    1. exactly \(+1\),
    2. about \(-0.4\). [4 marks]
  2. Explain briefly how the conclusion you would draw from a product moment correlation coefficient of \(+0.3\) would vary according to the number of pairs of data used in its calculation. [2 marks]
Edexcel S1 Q6
16 marks Moderate -0.3
The Principal of a school believes that more students are absent on days when the temperature is lower. Over a two-week period in December she records the percentage of students who are absent, \(A\%\), and the temperature, \(T°\)C, at 9 am each morning giving these results.
\(T\) (°C)4\(-3\)\(-2\)\(-6\)037\(-1\)32
\(A\) (\%)8.514.117.020.317.915.512.412.813.711.6
  1. Represent these data on a scatter diagram. [4 marks]
You may use $$\Sigma T = 7, \quad \Sigma A = 143.8, \quad \Sigma T^2 = 137, \quad \Sigma A^2 = 2172.66, \quad \Sigma TA = 20.7$$
  1. Calculate the product moment correlation coefficient for these data and comment on the Principal's hypothesis. [6 marks]
  2. Find an equation of the regression line of \(A\) on \(T\) in the form \(A = p + qT\). [4 marks]
  3. Draw the regression line on your scatter diagram. [2 marks]
Edexcel S3 Q7
11 marks Standard +0.3
A sports scientist wishes to examine the link between resting pulse and fitness. He records the resting pulse, \(p\), of 20 volunteers and the length of time, \(t\) minutes, that each one can run comfortably at 4 metres per second on a treadmill. The results are summarised by $$\Sigma p = 1176, \quad \Sigma t = 511, \quad \Sigma p^2 = 70932, \quad \Sigma t^2 = 19213, \quad \Sigma pt = 27188.$$
  1. Calculate the product moment correlation coefficient for these data. [5 marks]
  2. Stating your hypotheses clearly, test at the 1\% level of significance whether there is evidence of people with a lower resting pulse having a higher level of fitness as measured by the test. [4 marks]
  3. State an assumption necessary to carry out the test in part (b) and comment on its validity in this case. [2 marks]
AQA Paper 3 Specimen Q10
7 marks Moderate -0.8
Shona calculated four correlation coefficients using data from the Large Data Set. In each case she calculated the correlation coefficient between the masses of the cars and the CO₂ emissions for varying sample sizes. A summary of these calculations, labelled A to D, are listed in the table below.
Sample sizeCorrelation coefficient
A38270.088
B37350.246
C240.400
D1250-1.183
Shona would like to use calculation A to test whether there is evidence of positive correlation between mass and CO₂ emissions. She finds the critical value for a one-tailed test at the 5% level for a sample of size 3827 is 0.027
    1. State appropriate hypotheses for Shona to use in her test. [1 mark]
    2. Determine if there is sufficient evidence to reject the null hypothesis. Fully justify your answer. [1 mark]
  1. Shona's teacher tells her to remove calculation D from the table as it is incorrect. Explain how the teacher knew it was incorrect. [1 mark]
  2. Before performing calculation B, Shona cleaned the data. She removed all cars from the Large Data Set that had incorrect masses. Using your knowledge of the large data set, explain what was incorrect about the masses which were removed from the calculation. [1 mark]
  3. Apart from CO2 and CO emissions, state one other type of emission that Shona could investigate using the Large Data Set. [1 mark]
  4. Wesley claims that calculation C shows that a heavier car causes higher CO2 emissions. Give two reasons why Wesley's claim may be incorrect. [2 marks]
OCR MEI Paper 2 Specimen Q9
4 marks Moderate -0.8
A geyser is a hot spring which erupts from time to time. For two geysers, the duration of each eruption, \(x\) minutes, and the waiting time until the next eruption, \(y\) minutes, are recorded.
  1. For a random sample of 50 eruptions of the first geyser, the correlation coefficient between \(x\) and \(y\) is 0.758. The critical value for a 2-tailed hypothesis test for correlation at the 5% level is 0.279. Explain whether or not there is evidence of correlation in the population of eruptions. [2]
The scatter diagram in Fig. 9 shows the data from a random sample of 50 eruptions of the second geyser. \includegraphics{figure_9}
  1. Stella claims the scatter diagram shows evidence of correlation between duration of eruption and waiting time. Make two comments about Stella's claim. [2]
OCR Further Statistics AS Specimen Q8
10 marks Standard +0.3
The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
source: www.tylervigen.com
  1. Find the equation of the regression line of \(y\) on \(x\). [2]
You are given that the product moment correlation coefficient is 0.959.
  1. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. [1]
It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  1. Assume that the data is a random sample of all years.
    1. Carry out the test at the 10\% significance level. [6]
    2. Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. [1]
OCR Further Statistics 2020 November Q2
8 marks Standard +0.3
A book collector compared the prices of some books, \(£x\), when new in 1972 and the prices of copies of the same books, \(£y\), on a second-hand website in 2018. The results are shown in Table 1 and are summarised below the table.
BookABCDEFGHIJKL
\(x\)0.950.650.700.900.551.401.500.501.150.350.200.35
\(y\)6.067.002.005.874.005.367.192.503.008.291.372.00
Table 1 \(n = 12, \Sigma x = 9.20, \Sigma y = 54.64, \Sigma x^2 = 8.9950, \Sigma y^2 = 310.4572, \Sigma xy = 46.0545\)
  1. It is given that the value of Pearson's product-moment correlation coefficient for the data is 0.381, correct to 3 significant figures.
    1. State what this information tells you about a scatter diagram illustrating the data. [1]
    2. Test at the 5\% significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018. [5]
  2. The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books. [2]
WJEC Unit 4 2019 June Q5
9 marks Moderate -0.8
A bowling alley manager in the UK is concerned about falling revenues. He collects data from the United States, hoping to use what he finds to revive his business in the UK. He finds data which seem to show correlation between margarine consumption and bowling alley revenue. He attempts to carry out some statistical analysis in order to present his findings to the board of directors. He produces the scatter diagram shown below. \includegraphics{figure_5} The product moment correlation coefficient for these data is \(-0.7617\). He carries out a one-tailed test at the 1\% level of significance and concludes that higher margarine consumption is associated with lower revenue generated by bowling alleys.
  1. Show all the working for this test. [5]
The manager also conducts a significance test for bowling alley revenue and fish consumption per person. He produces the computer output, shown below, for the analysis of bowling alley revenue versus fish consumption per person. \# Pearson's product-moment correlation
\# data: revenue and fish
\# t = 3.8303, df = 8, p-value = 0.005215
\# alternative hypothesis: true correlation is not equal to 0
\# sample estimates:
\# correlation
\# 0.802423
  1. Comment on the correlation between bowling alley revenue and fish consumption per person and what the board of directors should do in light of the manager's findings in part (a) and part (b). [3]
  2. Give one possible reason why the board of directors might not be happy with the manager's analysis. [1]
WJEC Further Unit 2 Specimen Q4
9 marks Moderate -0.8
A year 12 student wishes to study at a Welsh university. For a randomly chosen year between 2000 and 2017 she collected data for seven universities in Wales from the Complete University Guide website. The data are for the variables: • 'Entry standards' – the average UCAS tariff score of new undergraduate students; • 'Student satisfaction' – a measure of student views of the teaching quality at the university taken from the National Student Survey (maximum 5); • 'Graduate prospects' – a measure of the employability of a university's first degree graduates (maximum 100); • 'Research quality' – a measure of the quality of the research undertaken in the university (maximum 4).
  1. Pearson's product-moment correlation coefficients, for each pairing of the four variables, are shown in the table below. Discuss the correlation between graduate prospects and the other three variables. [2]
    VariableEntry standardsStudent satisfactionGraduate prospectsResearch quality
    Entry standards1
    Student satisfaction-0.0301
    Graduate prospects0.7720.2361
    Research quality0.8660.0660.8271
  2. Calculate the equation of the least squares regression line to predict 'Entry standards'(y) from 'Research quality'(x), given the summary statistics: $$\sum x = 22.24, \sum y = 2522, S_{xx} = 1.0542, S_{xy} = 20193.5, S_{yy} = 122.72.$$ [5]
  3. The data for one of the Welsh universities are missing. This university has a research quality of 3.00. Use your equation to predict the entry standard for this university. [2]
SPS SPS FM Statistics 2021 January Q3
7 marks Moderate -0.3
A large field of wheat is split into 8 plots of equal area. Each plot is treated with a different amount of fertiliser, \(f\) grams/m². The yield of wheat, \(w\) tonnes, from each plot is recorded. The results are summarised below. $$\sum f = 28 \quad \sum w = 303 \quad \sum w^2 = 13447 \quad S_{ff} = 42 \quad S_{fw} = 269.5$$
  1. Calculate the product moment correlation coefficient between \(f\) and \(w\) [2]
  2. Interpret the value of your product moment correlation coefficient. [1]
  3. Find the equation of the regression line of \(w\) on \(f\) in the form \(w = a + bf\) [3]
  4. Using your equation, estimate the decrease in yield when the amount of fertiliser decreases by 0.5 grams/m² [1]
OCR Further Statistics 2021 June Q1
5 marks Moderate -0.3
A set of bivariate data \((X, Y)\) is summarised as follows. \(n = 25\), \(\Sigma x = 9.975\), \(\Sigma y = 11.175\), \(\Sigma x^2 = 5.725\), \(\Sigma y^2 = 46.200\), \(\Sigma xy = 11.575\)
  1. Calculate the value of Pearson's product-moment correlation coefficient. [1]
  2. Calculate the equation of the regression line of \(y\) on \(x\). [2]
It is desired to know whether the regression line of \(y\) on \(x\) will provide a reliable estimate of \(y\) when \(x = 0.75\).
  1. State one reason for believing that the estimate will be reliable. [1]
  2. State what further information is needed in order to determine whether the estimate is reliable. [1]
OCR FS1 AS 2017 Specimen Q8
10 marks Standard +0.3
The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
source: www.tylervigen.com
  1. Find the equation of the regression line of \(y\) on \(x\). [2]
You are given that the product moment correlation coefficient is 0.959.
  1. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. [1]
It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  1. Assume that the data is a random sample of all years.
    1. Carry out the test at the 10\% significance level. [6]
    2. Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. [1]
Pre-U Pre-U 9794/1 2010 June Q13
10 marks Moderate -0.3
A survey was conducted into the annual salary offered for 19 different jobs in 2008. The results were as follows, in thousands of pounds.
15161819213636384141
4347515556606264110
It was decided to undertake a further study to see if self-esteem was correlated with level of annual salary. A random sample of 11 employees was taken and self-esteem was rated on a scale of 1 to 10 with the highest self-esteem being 10. The results were as follows.
Salary in £10 000's1234567891011
Self-esteem435177851079
Pre-U Pre-U 9794/3 2016 June Q1
4 marks Moderate -0.8
The following data refer to the annual rate of inflation and the annual percentage pay increase measured on 10 randomly chosen occasions.
Inflation rate (\%)0.91.21.61.51.73.04.13.72.84.2
Pay increase (\%)4.84.73.84.45.65.52.40.40.61.7
Show that, for these data, the product moment correlation coefficient between the rate of inflation and the annual pay increase is \(-0.679\), correct to 3 significant figures. [4]
CAIE FP2 2013 November Q9
Standard +0.3
9 For a random sample of 10 observations of pairs of values \(( x , y )\), the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are $$y = 4.21 x - 0.862 \quad \text { and } \quad x = 0.043 y + 6.36 ,$$ respectively.
  1. Find the value of the product moment correlation coefficient for the sample.
  2. Test, at the \(10 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
  3. Find the mean values of \(x\) and \(y\) for this sample.
  4. Estimate the value of \(x\) when \(y = 2.3\) and comment on the reliability of your answer.