5.08d Hypothesis test: Pearson correlation

109 questions

Sort by: Default | Easiest first | Hardest first
AQA Paper 3 2019 June Q15
3 marks Moderate -0.8
Jamal, a farmer, claims that the larger the rainfall, the greater the yield of wheat from his farm. He decides to investigate his claim, at the 5\% level of significance. He measures the rainfall in centimetres and the yield in kilograms for a random sample of ten years. He correctly calculates the product moment correlation coefficient between rainfall and yield for his sample to be 0.567 The table below shows the critical values for correlation coefficients for a sample size of 10 for different significance levels, for both 1- and 2-tailed tests.
1-tailed test significance level5\%2.5\%1\%0.5\%
2-tailed test significance level10\%5\%2\%1\%
Critical value0.5490.6320.7160.765
Determine what Jamal's conclusion to his investigation should be, justifying your answer. [3 marks]
AQA Paper 3 Specimen Q10
7 marks Moderate -0.8
Shona calculated four correlation coefficients using data from the Large Data Set. In each case she calculated the correlation coefficient between the masses of the cars and the CO₂ emissions for varying sample sizes. A summary of these calculations, labelled A to D, are listed in the table below.
Sample sizeCorrelation coefficient
A38270.088
B37350.246
C240.400
D1250-1.183
Shona would like to use calculation A to test whether there is evidence of positive correlation between mass and CO₂ emissions. She finds the critical value for a one-tailed test at the 5% level for a sample of size 3827 is 0.027
    1. State appropriate hypotheses for Shona to use in her test. [1 mark]
    2. Determine if there is sufficient evidence to reject the null hypothesis. Fully justify your answer. [1 mark]
  1. Shona's teacher tells her to remove calculation D from the table as it is incorrect. Explain how the teacher knew it was incorrect. [1 mark]
  2. Before performing calculation B, Shona cleaned the data. She removed all cars from the Large Data Set that had incorrect masses. Using your knowledge of the large data set, explain what was incorrect about the masses which were removed from the calculation. [1 mark]
  3. Apart from CO2 and CO emissions, state one other type of emission that Shona could investigate using the Large Data Set. [1 mark]
  4. Wesley claims that calculation C shows that a heavier car causes higher CO2 emissions. Give two reasons why Wesley's claim may be incorrect. [2 marks]
OCR MEI Paper 2 Specimen Q9
4 marks Moderate -0.8
A geyser is a hot spring which erupts from time to time. For two geysers, the duration of each eruption, \(x\) minutes, and the waiting time until the next eruption, \(y\) minutes, are recorded.
  1. For a random sample of 50 eruptions of the first geyser, the correlation coefficient between \(x\) and \(y\) is 0.758. The critical value for a 2-tailed hypothesis test for correlation at the 5% level is 0.279. Explain whether or not there is evidence of correlation in the population of eruptions. [2]
The scatter diagram in Fig. 9 shows the data from a random sample of 50 eruptions of the second geyser. \includegraphics{figure_9}
  1. Stella claims the scatter diagram shows evidence of correlation between duration of eruption and waiting time. Make two comments about Stella's claim. [2]
OCR Further Statistics AS Specimen Q8
10 marks Standard +0.3
The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
source: www.tylervigen.com
  1. Find the equation of the regression line of \(y\) on \(x\). [2]
You are given that the product moment correlation coefficient is 0.959.
  1. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. [1]
It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  1. Assume that the data is a random sample of all years.
    1. Carry out the test at the 10\% significance level. [6]
    2. Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. [1]
OCR Further Statistics 2020 November Q2
8 marks Standard +0.3
A book collector compared the prices of some books, \(£x\), when new in 1972 and the prices of copies of the same books, \(£y\), on a second-hand website in 2018. The results are shown in Table 1 and are summarised below the table.
BookABCDEFGHIJKL
\(x\)0.950.650.700.900.551.401.500.501.150.350.200.35
\(y\)6.067.002.005.874.005.367.192.503.008.291.372.00
Table 1 \(n = 12, \Sigma x = 9.20, \Sigma y = 54.64, \Sigma x^2 = 8.9950, \Sigma y^2 = 310.4572, \Sigma xy = 46.0545\)
  1. It is given that the value of Pearson's product-moment correlation coefficient for the data is 0.381, correct to 3 significant figures.
    1. State what this information tells you about a scatter diagram illustrating the data. [1]
    2. Test at the 5\% significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018. [5]
  2. The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books. [2]
WJEC Unit 4 2019 June Q5
9 marks Moderate -0.8
A bowling alley manager in the UK is concerned about falling revenues. He collects data from the United States, hoping to use what he finds to revive his business in the UK. He finds data which seem to show correlation between margarine consumption and bowling alley revenue. He attempts to carry out some statistical analysis in order to present his findings to the board of directors. He produces the scatter diagram shown below. \includegraphics{figure_5} The product moment correlation coefficient for these data is \(-0.7617\). He carries out a one-tailed test at the 1\% level of significance and concludes that higher margarine consumption is associated with lower revenue generated by bowling alleys.
  1. Show all the working for this test. [5]
The manager also conducts a significance test for bowling alley revenue and fish consumption per person. He produces the computer output, shown below, for the analysis of bowling alley revenue versus fish consumption per person. \# Pearson's product-moment correlation
\# data: revenue and fish
\# t = 3.8303, df = 8, p-value = 0.005215
\# alternative hypothesis: true correlation is not equal to 0
\# sample estimates:
\# correlation
\# 0.802423
  1. Comment on the correlation between bowling alley revenue and fish consumption per person and what the board of directors should do in light of the manager's findings in part (a) and part (b). [3]
  2. Give one possible reason why the board of directors might not be happy with the manager's analysis. [1]
OCR FS1 AS 2017 Specimen Q8
10 marks Standard +0.3
The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
source: www.tylervigen.com
  1. Find the equation of the regression line of \(y\) on \(x\). [2]
You are given that the product moment correlation coefficient is 0.959.
  1. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. [1]
It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  1. Assume that the data is a random sample of all years.
    1. Carry out the test at the 10\% significance level. [6]
    2. Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. [1]
CAIE FP2 2013 November Q9
Standard +0.3
9 For a random sample of 10 observations of pairs of values \(( x , y )\), the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are $$y = 4.21 x - 0.862 \quad \text { and } \quad x = 0.043 y + 6.36 ,$$ respectively.
  1. Find the value of the product moment correlation coefficient for the sample.
  2. Test, at the \(10 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
  3. Find the mean values of \(x\) and \(y\) for this sample.
  4. Estimate the value of \(x\) when \(y = 2.3\) and comment on the reliability of your answer.
CAIE FP2 2014 June Q11
Challenging +1.2
11 Answer only one of the following two alternatives.
EITHER
A particle \(P\) of mass \(m\) is suspended from a fixed point by a light elastic string of natural length \(l\), and hangs in equilibrium. The particle is pulled vertically down to a position where the length of the string is \(\frac { 13 } { 7 } l\). The particle is released from rest in this position and reaches its greatest height when the length of the string is \(\frac { 11 } { 7 } l\).
  1. Show that the modulus of elasticity of the string is \(\frac { 7 } { 5 } \mathrm { mg }\).
  2. Show that \(P\) moves in simple harmonic motion about the equilibrium position and state the period of the motion.
  3. Find the time after release when the speed of \(P\) is first equal to half of its maximum value.
    OR
    For a random sample of 12 observations of pairs of values \(( x , y )\), the equation of the regression line of \(y\) on \(x\) and the equation of the regression line of \(x\) on \(y\) are $$y = b x + 4.5 \quad \text { and } \quad x = a y + c$$ respectively, where \(a , b\) and \(c\) are constants. The product moment correlation coefficient for the sample is 0.6 .
  4. Test, at the \(5 \%\) significance level, whether there is evidence of positive correlation between the variables.
  5. Given that \(b - a = 0.5\), find the values of \(a\) and \(b\).
  6. Given that the sum of the \(x\)-values in the sample data is 66, find the value of \(c\) and sketch the two regression lines on the same diagram. For each of the 12 pairs of values of \(( x , y )\) in the sample, another variable \(z\) is considered, where \(z = 5 y\).
  7. State the coefficient of \(x\) in the equation of the regression line of \(z\) on \(x\) and find the value of the product moment correlation coefficient between \(x\) and \(z\), justifying your answer.