Hypothesis test for zero correlation

Questions that require testing whether the population correlation coefficient is zero (or equivalently, whether there is significant correlation) using the product moment correlation coefficient and t-distribution or critical value tables.

5 questions · Standard +0.2

5.09c Calculate regression line
Sort by: Default | Easiest first | Hardest first
CAIE FP2 2011 June Q10
10 marks Standard +0.3
10 The mid-day temperature, \(x ^ { \circ } \mathrm { C }\), and the amount of sunshine, \(y\) hours, were recorded at a winter holiday resort on each of 12 days, chosen at random during the winter season. The results are summarised as follows. $$\Sigma x = 18.7 \quad \Sigma x ^ { 2 } = 106.43 \quad \Sigma y = 34.7 \quad \Sigma y ^ { 2 } = 133.43 \quad \Sigma x y = 92.01$$
  1. Find the product moment correlation coefficient for the data.
  2. Stating your hypotheses, test at the \(1 \%\) significance level whether there is a non-zero correlation between mid-day temperature and amount of sunshine.
  3. Use the equation of a suitable regression line to estimate the number of hours of sunshine on a day when the mid-day temperature is \(2 ^ { \circ } \mathrm { C }\).
CAIE FP2 2017 Specimen Q9
11 marks Standard +0.8
9 A random sample of 8 students is chosen from those sitting examinations in both Mathematics and French. Their marks in Mathematics, \(x\), and in French, \(y\), are summarised as follows. $$\Sigma x = 472 \quad \Sigma x ^ { 2 } = 29950 \quad \Sigma y = 400 \quad \Sigma y ^ { 2 } = 21226 \quad \Sigma x y = 24879$$ Another student scored 72 marks in the Mathematics examination but was unable to sit the French examination.
  1. Estimate the mark that this student would have obtained in the French examination.
  2. Test, at the \(5 \%\) significance level, whether there is non-zero correlation between marks in Mathematics and marks in French.
OCR MEI Further Statistics Major 2021 November Q8
16 marks Standard +0.3
8
  1. \(\mathrm { VO } _ { 2 \max }\) is a measure of athletic fitness. Since \(\mathrm { VO } _ { 2 \max }\) is fairly time-consuming and expensive to measure, an exercise scientist wants to predict \(\mathrm { VO } _ { 2 _ { \text {max } } }\) from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(y\), in suitable units. She also obtains accurate measurements of the \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(x\), in the same units. The scatter diagram in Fig. 8.1 shows the values of \(x\) and \(y\) obtained, together with the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317} \captionsetup{labelformat=empty} \caption{Fig. 8.1}
    \end{figure}
    1. Use the regression line to estimate the predicted \(\mathrm { VO } _ { 2 \text { max } }\) of an athlete whose accurately measured \(\mathrm { VO } _ { 2 \text { max } }\) is 50 .
    2. Comment on the reliability of your estimate.
    3. The equation of the regression line of \(x\) on \(y\) is \(x = 0.7565 y + 10.493\). Find the coordinates of the point at which the two regression lines meet.
    4. State what the point you found in part (iii) represents.
  2. It is known that there is negative correlation between \(\mathrm { VO } _ { 2 \text { max } }\) and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours and accurately measures their \(\mathrm { VO } _ { 2 \text { max } }\). Fig. 8.2 is a scatter diagram of accurately measured \(\mathrm { VO } _ { \text {2max } }\), \(v\) units, against best marathon time, \(t\) hours, for these runners. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319} \captionsetup{labelformat=empty} \caption{Fig. 8.2}
    \end{figure}
    1. Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid. Summary statistics for the 20 runners are as follows. $$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
    2. Find the value of Pearson's product moment correlation coefficient.
    3. Carry out a test at the \(5 \%\) significance level to investigate whether there is negative correlation between accurately measured \(\mathrm { VO } _ { 2 _ { \text {max } } }\) and best marathon time for runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours.
WJEC Further Unit 2 2019 June Q7
13 marks Moderate -0.5
7. An article published in a medical journal investigated sports injuries in adolescents' ball games: football, handball and basketball. In a study of 906 randomly selected adolescent players in the three ball games, 379 players incurred injuries over the course of one year of playing the sport. Rhian wants to test whether there is an association between the site of injury and the sport played. A summary of the injuries is shown in the table below.
\multirow{2}{*}{}Site of injury
Observed valuesShoulder/ ArmHand/ FingersThigh/ LegKneeAnkleFootOtherTotal
\multirow{3}{*}{Sport}Football834536513612191
Handball14266154266115
Basketball428442211073
Total265755551154328379
  1. Calculate the values of \(A , B , C\) in the tables below.
    \multirow{2}{*}{}Site of injury
    Expected valuesShoulder/ ArmHand/ FingersThigh/ LegKneeAnkleFootOther
    \multirow{3}{*}{sodod}Football13.102928.725627.717727.717757.955121.670214.1108
    Handball7.889217.295516.688716.6887A13.04758.4960
    Basketball5.007910.978910.593710.593722.15048.28235.3931
    \multirow{2}{*}{}\multirow[b]{2}{*}{Chi-Squared Contributions}Site of injury
    Shoulder/ ArmHand/ FingersThigh/ LegKneeAnkleFootOther
    \multirow{3}{*}{sodoct}Football1.9873223.03890\(10 \cdot 77575\)2.47484\(B\)9.475860.31575
    Handball4.733334.38079C0.170871.446903.806640.73331
    Basketball0.2028626.388654.104004.104000.001026.403063.93521
  2. Given that the test statistic, \(X ^ { 2 }\), is 116.16, carry out the significance test at the \(5 \%\) level.
  3. Which site of injury most affects the conclusion of this test? Comment on your answer. Rhian also analyses the data on the type of contact that caused the injuries and the sport in which they occur, shown in the table below.
    Observed valuesBallOpponentSurfaceNoneTotal
    Football17681792194
    Handball23341938114
    Basketball2817121471
    Total6811948144379
    The chi-squared test statistic is 46.0937 . Rhian notes that this value is smaller than 116.16 , the test statistic in part (b). She concludes that there is weaker evidence for association in this case than there was in part (b).
  4. State Rhian's misconception and explain what she should consider instead. \section*{END OF PAPER}
OCR FS1 AS 2017 Specimen Q8
10 marks Standard +0.3
The following table gives the mean per capita consumption of mozzarella cheese per annum, \(x\) pounds, and the number of civil engineering doctorates awarded, \(y\), in the United States in each of 10 years.
\(x\)9.39.79.79.79.910.210.511.010.610.6
\(y\)480501540552547622655701712708
source: www.tylervigen.com
  1. Find the equation of the regression line of \(y\) on \(x\). [2]
You are given that the product moment correlation coefficient is 0.959.
  1. Explain whether this value would be different if \(x\) is measured in kilograms instead of pounds. [1]
It is desired to carry out a hypothesis test to investigate whether there is correlation between these two variables.
  1. Assume that the data is a random sample of all years.
    1. Carry out the test at the 10\% significance level. [6]
    2. Explain whether your conclusion suggests that manufacturers of mozzarella cheese could increase consumption by sponsoring doctoral candidates in civil engineering. [1]