Hypothesis test of Pearson’s product-moment correlation coefficient

Calculate PMCC from summary statistics

A question is this type if and only if it asks to calculate Pearson's product moment correlation coefficient given summary statistics (Σx, Σy, Σx², Σy², Σxy, n) or Sxx, Syy, Sxy.

19 Standard +0.1

30.6% of questions

Show example »

1 A wildlife expert measured the neck lengths, $x$ metres, and the tail lengths, $y$ metres, of a sample of 12 mature male giraffes as part of a study into their physical characteristics. The results are shown in the table.

View full question →

Easiest question Moderate -0.8 »

1 A wildlife expert measured the neck lengths, $x$ metres, and the tail lengths, $y$ metres, of a sample of 12 mature male giraffes as part of a study into their physical characteristics. The results are shown in the table.

View full question →

Hardest question Standard +0.3 »

3 A student is investigating the relationship between the length $x \mathrm {~mm}$ and circumference $y \mathrm {~mm}$ of plums from a large crop. The student measures the dimensions of a random sample of 10 plums from this crop. Summary statistics for these dimensions are as follows. $$\begin{aligned} & \sum x = 4715 \quad \sum y = 13175 \quad \sum x ^ { 2 } = 2237725 \\ & \sum y ^ { 2 } = 17455825 \quad \sum x y = 6235575 \quad n = 10 \end{aligned}$$

Calculate the sample product moment correlation coefficient.
Carry out a hypothesis test at the $5 \%$ significance level to determine whether there is any correlation between length and circumference of plums from this crop. State your hypotheses clearly, defining any symbols which you use.
(A) Explain the meaning of a 5\% significance level.
(B) State one advantage and one disadvantage of using a $1 \%$ significance level rather than a $5 \%$ significance level in a hypothesis test. The student decides to take another random sample of 10 plums. Using the same hypotheses as in part (ii), the correlation coefficient for this second sample is significant at the $5 \%$ level. The student decides to ignore the first result and concludes that there is correlation between the length and circumference of plums in the crop.
Comment on the student's decision to ignore the first result. Suggest a better way in which the student could proceed.

View full question →

One-tailed test for positive correlation

A question is this type if and only if it asks to test whether there is positive correlation between two variables using a one-tailed hypothesis test with H₁: ρ > 0.

17 Standard +0.3

27.4% of questions

Show example »

2 A shopper estimates the cost, $\pounds X$ per item, of each of 12 items in a supermarket. The shopper's estimates are compared with the actual cost, $\pounds Y$ per item, of each item. The results are summarised as follows. $n = 12$ $\sum x = 399$ $\sum y = 623.88$ $\sum x ^ { 2 } = 28127$ $\sum y ^ { 2 } = 116509.0212$ $\sum x y = 45006.01$ Test at the 1\% significance level whether the shopper's estimates are positively correlated with the actual cost of the items.

View full question →

Easiest question Moderate -0.3 »

1 The best performances of a random sample of 20 junior athletes in the long jump, $x$ metres, and in the high jump, $y$ metres, were recorded. The following statistics were calculated from the results. $$S _ { x x } = 7.0036 \quad S _ { y y } = 0.8464 \quad S _ { x y } = 1.3781$$

Calculate the value of the product moment correlation coefficient between $x$ and $y$.
(2 marks)
Assuming that these data come from a bivariate normal distribution, investigate, at the $1 \%$ level of significance, the claim that for junior athletes there is a positive correlation between $x$ and $y$.
Interpret your conclusion in the context of this question.

View full question →

Hardest question Standard +0.8 »

9 The land areas $x$ (in suitable units) and populations $y$ (in millions) for a sample of 8 randomly chosen cities are given in the following table.

Land area $( x )$	1.0	4.5	2.4	1.6	3.8	8.6	7.5	6.5
Population $( y )$	0.8	8.4	4.2	1.6	2.2	10.2	4.2	5.2

$$\left[ \Sigma x = 35.9 , \Sigma x ^ { 2 } = 216.47 , \Sigma y = 36.8 , \Sigma y ^ { 2 } = 244.96 , \Sigma x y = 212.62 . \right]$$

Find, showing all necessary working, the value of the product moment correlation coefficient for this sample.
Using a $1 \%$ significance level, test whether there is positive correlation between land area and population of cities.
The land areas and populations for another randomly chosen sample of cities, this time of size $n$, give a product moment correlation coefficient of 0.651 . Using a test at the $1 \%$ significance level, there is evidence of non-zero correlation between the variables.
Find the least possible value of $n$, justifying your answer.

View full question →

Two-tailed test for any correlation

A question is this type if and only if it asks to test whether there is any correlation (non-zero correlation) between two variables using a two-tailed hypothesis test.

14 Standard +0.0

22.6% of questions

Show example »

6 A random sample of 15 observations of pairs of values of two variables gives a product moment correlation coefficient of 0.430 .

Test at the $10 \%$ significance level whether there is evidence of non-zero correlation between the variables.
A second random sample of $N$ observations gives a product moment correlation coefficient of 0.615 . Using a 5\% significance level, there is evidence of positive correlation between the variables.
Find the least possible value of $N$, justifying your answer.

View full question →

Easiest question Easy -1.2 »

3. Laxmi wishes to test whether there is linear correlation between the mass and the height of adult males.

State, with a reason, whether Laxmi should use a 1-tail or a 2-tail test. Laxmi chooses a random sample of 40 adult males and calculates Pearson's product-moment correlation coefficient, $r$. She finds that $r = 0.2705$.
Use the table below to carry out the test at the $5 \%$ significance level. Critical values of Pearson's product-moment correlation coefficient.
\cline{2-5}
1-tail
test
$5 \%$ $2.5 \%$ $1 \%$
2-tail
test
$10 \%$ $5 \%$ $2.5 \%$ $1 \%$
38 0.2709 0.3202 0.3760 0.4128
39 0.2673 0.3160 0.3712 0.4076
40 0.2638 0.3120 0.3665 0.4026
41 0.2605 0.3081 0.3621 0.3978

View full question →

Hardest question Standard +0.3 »

6 A random sample of 15 observations of pairs of values of two variables gives a product moment correlation coefficient of 0.430 .

Test at the $10 \%$ significance level whether there is evidence of non-zero correlation between the variables.
A second random sample of $N$ observations gives a product moment correlation coefficient of 0.615 . Using a 5\% significance level, there is evidence of positive correlation between the variables.
Find the least possible value of $N$, justifying your answer.

View full question →

One-tailed test for negative correlation

A question is this type if and only if it asks to test whether there is negative correlation between two variables using a one-tailed hypothesis test with H₁: ρ < 0.

4 Standard +0.0

6.5% of questions

Show example »

5 For a random sample of 12 observations of pairs of values $( x , y )$, the product moment correlation coefficient is - 0.456 . Test, at the $5 \%$ significance level, whether there is evidence of negative correlation between the variables.

View full question →

Describe correlation from scatter diagram

A question is this type if and only if it shows a scatter diagram and asks to describe the correlation (strength and direction) without calculation.

3 Moderate -0.4

4.8% of questions

Show example »

A random sample of 15 days is taken from the large data set for Perth in June and July 1987. The scatter diagram in Figure 1 displays the values of two of the variables for these 15 days.

\begin{figure}[h]

\includegraphics[alt={},max width=\textwidth]{2b63aa7f-bc50-4422-8dc0-e661b521c221-04_722_709_376_677} \captionsetup{labelformat=empty} \caption{Figure 1}

\end{figure}

Describe the correlation. The variable on the $x$-axis is Daily Mean Temperature measured in ${ } ^ { \circ } \mathrm { C }$.
Using your knowledge of the large data set,
1. suggest which variable is on the $y$-axis,
2. state the units that are used in the large data set for this variable. Stav believes that there is a correlation between Daily Total Sunshine and Daily Maximum Relative Humidity at Heathrow. He calculates the product moment correlation coefficient between these two variables for a random sample of 30 days and obtains $r = - 0.377$
Carry out a suitable test to investigate Stav's belief at a $5 \%$ level of significance. State clearly
- your hypotheses
- your critical value
On a random day at Heathrow the Daily Maximum Relative Humidity was 97\%
Comment on the number of hours of sunshine you would expect on that day, giving a reason for your answer.

View full question →

Interpret p-value for correlation test

A question is this type if and only if it provides a p-value and asks to interpret it or use it to reach a conclusion about correlation.

1 Moderate -0.5

1.6% of questions

Show example »

13 The pre-release material contains information concerning median house prices, recycling rates and employment rates. Fig. 13.1 shows a scatter diagram of recycling rate against employment rate for a random sample of 33 regions. \begin{figure}[h]

\includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_629_1424_397_242} \captionsetup{labelformat=empty} \caption{Fig. 13.1}

\end{figure} The product moment correlation coefficient for this sample is 0.37154 and the associated $p$-value is 0.033. Lee conducts a hypothesis test at the $5 \%$ level to test whether there is any evidence to suggest there is positive correlation between recycling rate and employment rate. He concludes that there is no evidence to suggest positive correlation because $0.033 \approx 0$ and $0.37154 > 0.05$.

Explain whether Lee's reasoning is correct. Fig. 13.2 shows a scatter diagram of recycling rate against median house price for a random sample of 33 regions. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{cea67565-8074-4703-8e1a-09b98e380baf-14_648_1474_1758_242} \captionsetup{labelformat=empty} \caption{Fig. 13.2}
\end{figure} The product moment correlation coefficient for this sample is - 0.33278 and the associated $p$-value is 0.058 . Fig. 13.3 shows summary statistics for the median house prices for the data in this sample. \begin{table}[h]
Statistics
$n$ 33
Mean 465467.9697
$\sigma$ 201236.1345
$s$ 204356.2606
$\Sigma x$ 15360443
$\Sigma x ^ { 2 }$ 8486161617387
Min 243500
Q1 342500
Median 410000
Q3 521000
Max 1200000
\captionsetup{labelformat=empty} \caption{Fig. 13.3}
\end{table}
Use the information in Fig. 13.3 and Fig. 13.2 to show that there are at least two outliers.
Describe the effect of removing the outliers on
- the product moment correlation coefficient between recycling rate and median house price,
- the $p$-value associated with this correlation coefficient,
  in each case explaining your answer.
  [0pt] [2]
  All 33 items in the sample are areas in London. A student suggests that it is very unlikely that only areas in London would be selected in a random sample.
- Use your knowledge of the pre-release material to explain whether you think the student's suggestion is reasonable.

View full question →

Use critical value table directly

A question is this type if and only if it provides a table of critical values and asks to carry out a hypothesis test by comparing the calculated r to the critical value.

1 Moderate -0.8

1.6% of questions

Show example »

10 A researcher plans to carry out a statistical investigation to test whether there is linear correlation between the time ( $T$ weeks) from conception to birth, and the birth weight ( $W$ grams) of new-born babies.

Explain why a 1-tail test is appropriate in this context. The researcher records the values of $T$ and $W$ for a random sample of 11 babies. They calculate Pearson's product-moment correlation coefficient for the sample and find that the value is 0.722 .
Use the table below to carry out the test at the $1 \%$ significance level. \section*{Critical values of Pearson's product-moment correlation coefficient.}
\multirow{2}{*}{} 1-tail test 5\% 2.5\% 1\% 0.5\%
2-tail test 10\% 5\% 2.5\% 1\%
\multirow{4}{*}{$n$} 10 0.5494 0.6319 0.7155 0.7646
11 0.5214 0.6021 0.6851 0.7348
12 0.4973 0.5760 0.6581 0.7079
13 0.4762 0.5529 0.6339 0.6835

View full question →

Compare PMCC with Spearman's rank

A question is this type if and only if it asks to test both product moment and Spearman's rank correlation coefficients and compare results or explain which is more appropriate.

1 Standard +0.3

1.6% of questions

Show example »

2. A random sample of 8 students sat examinations in Geography and Statistics. The product moment correlation coefficient between their results was 0.572 and the Spearman rank correlation coefficient was 0.655 .

Test both of these values for positive correlation. Use a $5 \%$ level of significance.
Comment on your results.

View full question →

Comment on causation vs correlation

A question is this type if and only if it asks to comment on a claim about causation or to explain why correlation does not imply causation.

1 Moderate -0.5

1.6% of questions

Show example »

10

1. State appropriate hypotheses for Shona to use in her test. 10
(ii) Determine if there is sufficient evidence to reject the null hypothesis.
Fully justify your answer.
[0pt] [1 mark] 10
Shona's teacher tells her to remove calculation $D$ from the table as it is incorrect.
Explain how the teacher knew it was incorrect.
[0pt] [1 mark] 10
Before performing calculation B, Shona cleaned the data. She removed all cars from the Large Data Set that had incorrect masses. Using your knowledge of the large data set, explain what was incorrect about the masses which were removed from the calculation.
[0pt] [1 mark] 10
Apart from CO 2 and CO emissions, state one other type of emission that Shona could investigate using the Large Data Set. 10
Wesley claims that calculation C shows that a heavier car causes higher CO 2 emissions. Give two reasons why Wesley's claim may be incorrect.

1.6% of questions

Show 1 unclassified »

2 The table below shows the heart rates, $x$ beats per minute, and the systolic blood pressures, $y$ milligrams of mercury, of a random sample of 10 patients undergoing kidney dialysis.

Patient	$\mathbf { 1 }$	$\mathbf { 2 }$	$\mathbf { 3 }$	$\mathbf { 4 }$	$\mathbf { 5 }$	$\mathbf { 6 }$	$\mathbf { 7 }$	$\mathbf { 8 }$	$\mathbf { 9 }$	$\mathbf { 1 0 }$
$\boldsymbol { x }$	83	86	88	92	94	98	101	111	115	121
$\boldsymbol { y }$	157	172	161	154	171	169	179	180	192	182

Calculate the value of the product moment correlation coefficient for these data.
Assuming that these data come from a bivariate normal distribution, investigate, at the $1 \%$ level of significance, the claim that, for patients undergoing kidney dialysis, there is a positive correlation between heart rate and systolic blood pressure.

Land area \(( x )\)	1.0	4.5	2.4	1.6	3.8	8.6	7.5	6.5
Population \(( y )\)	0.8	8.4	4.2	1.6	2.2	10.2	4.2	5.2

Statistics
\(n\)	33
Mean	465467.9697
\(\sigma\)	201236.1345
\(s\)	204356.2606
\(\Sigma x\)	15360443
\(\Sigma x ^ { 2 }\)	8486161617387
Min	243500
Q1	342500
Median	410000
Q3	521000
Max	1200000

\multirow{2}{*}{}	1-tail test	5\%	2.5\%	1\%	0.5\%
	2-tail test	10\%	5\%	2.5\%	1\%
\multirow{4}{*}{\(n\)}	10	0.5494	0.6319	0.7155	0.7646
	11	0.5214	0.6021	0.6851	0.7348
	12	0.4973	0.5760	0.6581	0.7079
	13	0.4762	0.5529	0.6339	0.6835

Patient	\(\mathbf { 1 }\)	\(\mathbf { 2 }\)	\(\mathbf { 3 }\)	\(\mathbf { 4 }\)	\(\mathbf { 5 }\)	\(\mathbf { 6 }\)	\(\mathbf { 7 }\)	\(\mathbf { 8 }\)	\(\mathbf { 9 }\)	\(\mathbf { 1 0 }\)
\(\boldsymbol { x }\)	83	86	88	92	94	98	101	111	115	121
\(\boldsymbol { y }\)	157	172	161	154	171	169	179	180	192	182