Hypothesis test for zero correlation

Questions that require testing whether the population correlation coefficient is zero (or equivalently, whether there is significant correlation) using the product moment correlation coefficient and t-distribution or critical value tables.

5 questions

CAIE FP2 2011 June Q10
10 The mid-day temperature, \(x ^ { \circ } \mathrm { C }\), and the amount of sunshine, \(y\) hours, were recorded at a winter holiday resort on each of 12 days, chosen at random during the winter season. The results are summarised as follows. $$\Sigma x = 18.7 \quad \Sigma x ^ { 2 } = 106.43 \quad \Sigma y = 34.7 \quad \Sigma y ^ { 2 } = 133.43 \quad \Sigma x y = 92.01$$
  1. Find the product moment correlation coefficient for the data.
  2. Stating your hypotheses, test at the \(1 \%\) significance level whether there is a non-zero correlation between mid-day temperature and amount of sunshine.
  3. Use the equation of a suitable regression line to estimate the number of hours of sunshine on a day when the mid-day temperature is \(2 ^ { \circ } \mathrm { C }\).
CAIE FP2 2017 June Q10
10 A random sample of 5 pairs of values \(( x , y )\) is given in the following table.
\(x\)12458
\(y\)75864
  1. Find, showing all necessary working, the equation of the regression line of \(y\) on \(x\).
  2. Find, showing all necessary working, the value of the product moment correlation coefficient for this sample.
  3. Test, at the \(10 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
OCR MEI Further Statistics Major 2021 November Q8
8
  1. \(\mathrm { VO } _ { 2 \max }\) is a measure of athletic fitness. Since \(\mathrm { VO } _ { 2 \max }\) is fairly time-consuming and expensive to measure, an exercise scientist wants to predict \(\mathrm { VO } _ { 2 _ { \text {max } } }\) from data such as times for running different distances. The scientist uses these data for a random sample of 15 athletes to predict their \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(y\), in suitable units. She also obtains accurate measurements of the \(\mathrm { V } \mathrm { O } _ { 2 \text { max } }\) values, denoted by \(x\), in the same units. The scatter diagram in Fig. 8.1 shows the values of \(x\) and \(y\) obtained, together with the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-08_750_1324_660_317} \captionsetup{labelformat=empty} \caption{Fig. 8.1}
    \end{figure}
    1. Use the regression line to estimate the predicted \(\mathrm { VO } _ { 2 \text { max } }\) of an athlete whose accurately measured \(\mathrm { VO } _ { 2 \text { max } }\) is 50 .
    2. Comment on the reliability of your estimate.
    3. The equation of the regression line of \(x\) on \(y\) is \(x = 0.7565 y + 10.493\). Find the coordinates of the point at which the two regression lines meet.
    4. State what the point you found in part (iii) represents.
  2. It is known that there is negative correlation between \(\mathrm { VO } _ { 2 \text { max } }\) and marathon times in very good runners (those whose best marathon times are under 3 hours). The exercise scientist wishes to know whether the same applies to runners who take longer to run a marathon. She selects a random sample of 20 runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours and accurately measures their \(\mathrm { VO } _ { 2 \text { max } }\). Fig. 8.2 is a scatter diagram of accurately measured \(\mathrm { VO } _ { \text {2max } }\), \(v\) units, against best marathon time, \(t\) hours, for these runners. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ce557137-f9eb-4c09-a7e3-e4ec626109dc-09_671_1064_648_319} \captionsetup{labelformat=empty} \caption{Fig. 8.2}
    \end{figure}
    1. Explain why the exercise scientist comes to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid. Summary statistics for the 20 runners are as follows. $$\sum t = 80.37 \quad \sum v = 970.86 \quad \sum t ^ { 2 } = 324.71 \quad \sum v ^ { 2 } = 47829.24 \quad \sum t v = 3886.53$$
    2. Find the value of Pearson's product moment correlation coefficient.
    3. Carry out a test at the \(5 \%\) significance level to investigate whether there is negative correlation between accurately measured \(\mathrm { VO } _ { 2 _ { \text {max } } }\) and best marathon time for runners whose best marathon times are between \(3 \frac { 1 } { 2 }\) hours and \(4 \frac { 1 } { 2 }\) hours.
WJEC Further Unit 2 2019 June Q7
7. An article published in a medical journal investigated sports injuries in adolescents' ball games: football, handball and basketball. In a study of 906 randomly selected adolescent players in the three ball games, 379 players incurred injuries over the course of one year of playing the sport. Rhian wants to test whether there is an association between the site of injury and the sport played. A summary of the injuries is shown in the table below.
\multirow{2}{*}{}Site of injury
Observed valuesShoulder/ ArmHand/ FingersThigh/ LegKneeAnkleFootOtherTotal
\multirow{3}{*}{Sport}Football834536513612191
Handball14266154266115
Basketball428442211073
Total265755551154328379
  1. Calculate the values of \(A , B , C\) in the tables below.
    \multirow{2}{*}{}Site of injury
    Expected valuesShoulder/ ArmHand/ FingersThigh/ LegKneeAnkleFootOther
    \multirow{3}{*}{sodod}Football13.102928.725627.717727.717757.955121.670214.1108
    Handball7.889217.295516.688716.6887A13.04758.4960
    Basketball5.007910.978910.593710.593722.15048.28235.3931
    \multirow{2}{*}{}\multirow[b]{2}{*}{Chi-Squared Contributions}Site of injury
    Shoulder/ ArmHand/ FingersThigh/ LegKneeAnkleFootOther
    \multirow{3}{*}{sodoct}Football1.9873223.03890\(10 \cdot 77575\)2.47484\(B\)9.475860.31575
    Handball4.733334.38079C0.170871.446903.806640.73331
    Basketball0.2028626.388654.104004.104000.001026.403063.93521
  2. Given that the test statistic, \(X ^ { 2 }\), is 116.16, carry out the significance test at the \(5 \%\) level.
  3. Which site of injury most affects the conclusion of this test? Comment on your answer. Rhian also analyses the data on the type of contact that caused the injuries and the sport in which they occur, shown in the table below.
    Observed valuesBallOpponentSurfaceNoneTotal
    Football17681792194
    Handball23341938114
    Basketball2817121471
    Total6811948144379
    The chi-squared test statistic is 46.0937 . Rhian notes that this value is smaller than 116.16 , the test statistic in part (b). She concludes that there is weaker evidence for association in this case than there was in part (b).
  4. State Rhian's misconception and explain what she should consider instead. \section*{END OF PAPER}
WJEC Further Unit 2 Specimen Q4
4. A year 12 student wishes to study at a Welsh university. For a randomly chosen year between 2000 and 2017 she collected data for seven universities in Wales from the Complete University Guide website. The data are for the variables:
  • 'Entry standards' - the average UCAS tariff score of new undergraduate students;
  • 'Student satisfaction' - a measure of student views of the teaching quality at the university taken from the National Student Survey (maximum 5);
  • 'Graduate prospects' - a measure of the employability of a university's first degree graduates (maximum 100);
  • 'Research quality' - a measure of the quality of the research undertaken in the university (maximum 4).
    1. Pearson's product-moment correlation coefficients, for each pairing of the four variables, are shown in the table below.
      Discuss the correlation between graduate prospects and the other three variables.
VariableEntry standardsStudent satisfactionGraduate prospectsResearch quality
Entry standards1
Student satisfaction-0.0301
Graduate prospects0.7720.2361
Research quality0.8660.0660.8271
  • Calculate the equation of the least squares regression line to predict 'Entry standards'( \(y )\) from 'Research quality'( \(x\) ), given the summary statistics: $$\sum x = 22.24 , \sum y = 2522 , S _ { x x } = 1.0542 , S _ { y y } = 20193.5 , S _ { x y } = 122.72 .$$
  • The data for one of the Welsh universities are missing. This university has a research quality of 3.00 . Use your equation to predict the entry standard for this university.