Hypothesis test for association

Question requires calculating Spearman's coefficient and performing a two-tailed hypothesis test to determine if there is any association or correlation (H₁: ρₛ ≠ 0).

19 questions · Standard +0.2

5.08e Spearman rank correlation5.08f Hypothesis test: Spearman rank
Sort by: Default | Easiest first | Hardest first
OCR MEI S2 2009 January Q1
20 marks Moderate -0.3
1 A researcher is investigating whether there is a relationship between the population size of cities and the average walking speed of pedestrians in the city centres. Data for the population size, \(x\) thousands, and the average walking speed of pedestrians, \(y \mathrm {~m} \mathrm {~s} ^ { - 1 }\), of eight randomly selected cities are given in the table below.
\(x\)18435294982067841530
\(y\)1.150.971.261.351.281.421.321.64
  1. Calculate the value of Spearman's rank correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any association between population size and average walking speed. In another investigation, the researcher selects a random sample of six adult males of particular ages and measures their maximum walking speeds. The data are shown in the table below, where \(t\) years is the age of the adult and \(w \mathrm {~m} \mathrm {~s} ^ { - 1 }\) is the maximum walking speed. Also shown are summary statistics and a scatter diagram on which the regression line of \(w\) on \(t\) is drawn.
    \(t\)203040506070
    \(w\)2.492.412.382.141.972.03
    $$n = 6 \quad \Sigma t = 270 \quad \Sigma w = 13.42 \quad \Sigma t ^ { 2 } = 13900 \quad \Sigma w ^ { 2 } = 30.254 \quad \Sigma t w = 584.6$$ \includegraphics[max width=\textwidth, alt={}, center]{77b97142-afb6-41d6-8fec-e982b7a7501b-2_728_1091_1379_529}
  3. Calculate the equation of the regression line of \(w\) on \(t\).
  4. (A) Use this equation to calculate an estimate of maximum walking speed of an 80 -year-old male.
    (B) Explain why it might not be appropriate to use the equation to calculate an estimate of maximum walking speed of a 10 -year-old male.
OCR MEI S2 2012 January Q1
17 marks Standard +0.3
1 Nine long-distance runners are starting an exercise programme to improve their strength. During the first session, each of them has to do a 100 metre run and to do as many push-ups as possible in one minute. The times taken for the run, together with the number of push-ups each runner achieves, are shown in the table.
RunnerABCDEFGHI
100 metre time (seconds)13.211.610.912.314.713.111.713.612.4
Push-ups achieved324222364127373833
  1. Draw a scatter diagram to illustrate the data.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to examine whether there is any association between time taken for the run and number of push-ups achieved.
  4. Under what circumstances is it appropriate to carry out a hypothesis test based on the product moment correlation coefficient? State, with a reason, which test is more appropriate for these data.
OCR Further Statistics AS 2022 June Q2
7 marks Standard +0.3
2 Eight runners took part in two races. The positions in which the runners finished in the two races are shown in the table.
RunnerABCDEFGH
First race31562874
Second race43872561
Test at the 5\% significance level whether those runners who do better in one race tend to do better in the other.
OCR Further Statistics 2021 November Q5
10 marks Standard +0.3
5 The numbers of each of 9 items sold in two different supermarkets in a week are given in the following table.
Item123456789
Supermarket \(A\)1728414362697593115
Supermarket \(B\)24718124729584237
A researcher wants to test whether there is association between the numbers of these items sold in the two supermarkets. However, it is known that the collection of data in Supermarket \(B\) was done inaccurately and each of the numbers in the corresponding row of the table could have been in error by as much as 2 items greater or 2 items fewer.
  1. Explain why Spearman's rank correlation coefficient might be preferred to the use of Pearson's product-moment correlation coefficient in this context.
  2. Carry out the test at the \(5 \%\) significance level using Spearman's rank correlation coefficient.
Edexcel S3 2023 June Q1
9 marks Standard +0.3
  1. (a) State two conditions under which it might be more appropriate to use Spearman's rank correlation coefficient rather than the product moment correlation coefficient.
A random sample of 10 melons was taken from a market stall. The length, in centimetres, and maximum diameter, in centimetres, of each melon were recorded. The Spearman's rank correlation coefficient between the results was - 0.673
(b) Test, at the \(5 \%\) level of significance, whether or not there is evidence of a correlation. State clearly your hypotheses and the critical value used. The product moment correlation coefficient between the results was - 0.525
(c) Test, at the \(5 \%\) level of significance, whether or not there is evidence of a negative correlation.
State clearly your hypotheses and the critical value used.
Edexcel S3 2021 October Q3
14 marks Standard +0.3
3. A cafe owner wishes to know whether the price of strawberry jam is related to the taste of the jam. He finds a website that lists the price per 100 grams and a mark for the taste, out of 100, awarded by a judge, for 9 different strawberry jams \(A , B , C , D , E , F , G , H\) and \(I\). He then ranks the marks for taste and the prices. The ranks are shown in the table below.
Rank123456789
Price\(A\)\(B\)\(E\)\(C\)\(D\)\(F\)\(G\)\(H\)\(I\)
Taste\(A\)\(B\)\(F\)\(E\)\(H\)\(G\)\(I\)\(C\)\(D\)
  1. Calculate Spearman's rank correlation coefficient for these data.
  2. Test, at the \(5 \%\) level of significance, whether or not there is a relationship between the price and the taste of these strawberry jams. State your hypotheses clearly. A friend suggests that it would be better to use the price per 100 grams, \(c\), and the mark for the taste, \(m\), for each strawberry jam rather than rank them. Given that $$\mathrm { S } _ { c c } = 2.0455 \quad \mathrm {~S} _ { m m } = 243.5556 \quad \mathrm {~S} _ { c m } = 16.4943$$
  3. calculate the product moment correlation coefficient between the price and the mark for taste of these strawberry jams, giving your answer correct to 3 decimal places.
  4. Use your value of the product moment correlation coefficient to test, at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between the price and the mark for taste of these 9 strawberry jams. State your hypotheses clearly.
  5. State which of the tests in parts (b) and (d) is more appropriate for the cafe owner to use. Give a reason for your answer.
Edexcel S3 2006 January Q7
12 marks Standard +0.3
7. The numbers of deaths from pneumoconiosis and lung cancer in a developing country are given in the table.
Age group (years)20-2930-3940-4950-5960-6970 and over
Deaths from pneumoconiosis (1000s)12.55.918.519.431.231.0
Deaths from lung cancer (1000s)3.79.010.219.013.018.0
The correlation between the number of deaths in the different age groups for each disease is to be investigated.
  1. Give one reason why Spearman's rank correlation coefficient should be used.
  2. Calculate Spearman's rank correlation coefficient for these data.
  3. Use a suitable test, at the \(5 \%\) significance level, to interpret your result. State your hypotheses clearly.
    (5)
Edexcel S3 2003 June Q6
11 marks Standard +0.3
6. Two judges ranked 8 ice skaters in a competition according to the table below.
\backslashbox{Judge}{Skater}(i)(ii)(iii)(iv)(v)(vi)(vii)(viii)
A25378146
B32657418
  1. Evaluate Spearman's rank correlation coefficient between the ranks of the two judges.
  2. Use a suitable test, at the \(5 \%\) level of significance, to interpret this result.
Edexcel S3 2012 June Q1
12 marks Standard +0.3
  1. Interviews for a job are carried out by two managers. Candidates are given a score by each manager and the results for a random sample of 8 candidates are shown in the table below.
Candidate\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Manager \(X\)6256875465151210
Manager \(Y\)5447715049253044
  1. Calculate Spearman's rank correlation coefficient for these data.
  2. Test, at the \(5 \%\) level of significance, whether there is agreement between the rankings awarded by each manager. State your hypotheses clearly. Manager \(Y\) later discovered he had miscopied his score for candidate \(D\) and it should be 54 .
  3. Without carrying out any further calculations, explain how you would calculate Spearman's rank correlation in this case.
Edexcel S3 2013 June Q2
8 marks Standard +0.3
2. The table below shows the number of students per member of staff and the student satisfaction scores for 7 universities.
University\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)
Number of
students per
member of staff
14.213.113.311.710.515.910.8
Student
satisfaction
score
4.14.23.84.03.94.33.7
  1. Calculate Spearman's rank correlation coefficient for these data.
  2. Stating your hypotheses clearly test, at the \(5 \%\) level of significance, whether or not there is evidence of a correlation between the number of students per member of staff and the student satisfaction score.
Edexcel S3 Q5
12 marks Standard +0.3
5. A marathon runner believes that she is more likely to win a medal at her national championships the higher the temperature is on the day of the race. She records the temperature at the start of each of eight races against fields of a similar standard and her finishing position in each race. Her results are shown in the table below.
Temperature \(\left( { } ^ { \circ } \mathrm { C } \right)\)1691157211215
Finishing position215519104611
  1. Calculate Spearman's rank correlation coefficient for these data.
  2. Using a 5\% level of significance and stating your hypotheses clearly, interpret your result. Another runner suggests that she should use her time in each race instead of her finishing position and calculate the product moment correlation coefficient for the data.
  3. Comment on this suggestion.
Edexcel S3 Q4
12 marks Standard +0.3
4. For a project a student collects data on engine size and sales over a period of time for the models of cars made by one particular manufacturer. Her results are shown in the table below.
Engine Capacity
(litres)
1.11.31.62.12.42.62.83.0
Sales527632840619350425487401
  1. Calculate Spearman's rank correlation coefficient for these data.
  2. Stating your hypotheses clearly, test at the \(5 \%\) level of significance whether or not there is any evidence of correlation.
  3. Explain why it is more appropriate to use Spearman's rank correlation coefficient for this test than the product moment correlation coefficient.
    (2 marks)
OCR MEI Further Statistics A AS 2022 June Q3
10 marks Standard +0.3
3 A biology student is doing an experiment in which plants are inoculated with a particular microorganism in an attempt to help them grow. She is investigating whether there is any association between the percentage of roots which have been colonised by the microorganism and the dry weight of the plant shoots. After the plants have grown for a few weeks, the student takes a random sample of 10 plants and measures the percentage of roots which have been colonised by the microorganism and the dry weight of the plant shoots. The spreadsheet output shows the data, together with a scatter diagram to illustrate the data. \includegraphics[max width=\textwidth, alt={}, center]{8f1e0c68-a334-4657-823e-386ab0994c02-3_722_1648_635_244}
  1. The student decides that a test based on Pearson's product moment correlation coefficient may not be valid. Explain why she comes to this conclusion.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a test based on this coefficient, at the \(5 \%\) significance level, to investigate whether there is any association between percentage colonisation and shoot dry weight.
OCR MEI Further Statistics Minor 2022 June Q5
14 marks Standard +0.3
5 A medical researcher is investigating whether there is any relationship between the age of a person and the level of a particular protein in the person's blood. She measures the levels of the protein (measured in suitable units) in a random sample of 12 hospital patients of various ages (in years). The spreadsheet shows the values obtained, together with a scatter diagram which illustrates the data. \includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-5_736_1470_1087_246}
  1. The researcher decides that a test based on Pearson's product moment correlation coefficient may not be valid. Explain why she comes to this conclusion.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a test based on this coefficient at the \(5 \%\) significance level to investigate whether there is any association between age and protein level.
  4. Explain why the researcher chose a sample that was random.
  5. The researcher had originally intended to use a sample size of 6 rather than the 12 that she actually used. Explain what advantage there is in using the larger sample size.
OCR MEI Further Statistics Minor 2023 June Q6
10 marks Standard +0.3
6 Each competitor in a lumberjacking competition has to perform various disciplines for which they are timed. A spectator thinks that the times for two of the disciplines, chopping wood and sawing wood, are related. The table and the scatter diagram below show the times of a random sample of 8 competitors in these two disciplines.
CompetitorABCDEFGH
Sawing17.116.714.314.012.821.515.314.4
Chopping23.520.621.918.821.524.819.719.3
\includegraphics[max width=\textwidth, alt={}, center]{72215d69-c3e6-492d-bb3e-bdc28aeb4613-6_786_1130_708_239}
  1. The spectator decides to carry out a hypothesis test to investigate whether there is any relationship. Explain why the spectator decides that a test based on Pearson's product moment correlation coefficient may not be valid.
  2. Determine the value of Spearman's rank correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive association between sawing and chopping times.
OCR MEI Further Statistics Minor 2020 November Q5
17 marks Moderate -0.3
5 A student is investigating immunisation. He wonders if there is any relationship between the percentage of young children who have been given measles vaccine and the percentage who have been given BCG vaccine in various countries. He takes a random sample of 8 countries and finds the data for the two variables. The spreadsheet in Fig. 5.1 shows the values obtained, together with a scatter diagram which illustrates the data. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{882f9f3c-40d8-4abb-822a-49bd505a33ea-5_910_1653_541_246} \captionsetup{labelformat=empty} \caption{Fig. 5.1}
\end{figure}
  1. The student decides that a test based on Pearson's product moment correlation coefficient is not valid. Explain why he comes to this conclusion. The student carries out a test based on Spearman's rank correlation coefficient.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a test based on this coefficient at the \(5 \%\) significance level to investigate whether there is any association between measles and BCG vaccination levels. The student then decides to investigate the relationship between number of doctors per 1000 people in a country and unemployment rate in that country (unemployment rate is the percentage of the working age population who are not in paid work). He selects a random sample of 6 countries. The spreadsheet in Fig. 5.2 shows the values obtained, together with a scatter diagram which illustrates the data. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{882f9f3c-40d8-4abb-822a-49bd505a33ea-6_776_1649_495_248} \captionsetup{labelformat=empty} \caption{Fig. 5.2}
    \end{figure}
  4. Use your calculator to write down the equation of the regression line of unemployment rate on doctors per 1000.
  5. Use the regression line to estimate the unemployment rate for a country with 2.00 doctors per 1000.
  6. Comment briefly on the reliability of your answer to part (e). The student decides to add the data for another country with 3.99 doctors per 1000 and unemployment rate 11.42 to his diagram.
  7. Add this point to the scatter diagram in the Printed Answer Booklet.
  8. Without doing any further calculations, comment on what difference, if any, including this extra data point would make to the usefulness of a regression line of unemployment rate on doctors per 1000.
OCR MEI Further Statistics Major 2022 June Q8
14 marks Standard +0.3
8 A swimming coach is investigating whether there is correlation between the times taken by teenage swimmers to swim 50 m Butterfly and 50 m Freestyle. The coach selects a random sample of 11 teenage swimmers and records the times that each of them take for each event. The spreadsheet shows the data, together with a scatter diagram to illustrate the data. \includegraphics[max width=\textwidth, alt={}, center]{77eabbd6-a058-457f-9601-d66f3c2db005-06_712_1465_456_274}
  1. In the scatter diagram, Butterfly times have been plotted on the horizontal axis and Freestyle times on the vertical axis. A student states that the variables should have been plotted the other way around. Explain whether the student is correct. The student decides to carry out a hypothesis test to investigate whether there is any correlation between the times taken for the two events.
  2. Explain why the student decides to carry out a test based on Spearman's rank correlation coefficient.
  3. In this question you must show detailed reasoning. Carry out the test at the 5\% significance level.
  4. The student concludes that there is definitely no correlation between the times. Comment on the student's conclusion.
OCR FS1 AS 2017 December Q6
9 marks Standard +0.3
6 Arlosh, Sarah and Desi are investigating the ratings given to six different films by two critics.
  1. Arlosh calculates Spearman's rank correlation coefficient \(r _ { s }\) for the critics' ratings. He calculates that \(\Sigma d ^ { 2 } = 72\). Show that this value must be incorrect.
  2. Arlosh checks his working with Sarah, whose answer \(r _ { s } = \frac { 29 } { 35 }\) is correct. Find the correct value of \(\Sigma d ^ { 2 }\).
  3. Carry out an appropriate two-tailed significance test of the value of \(r _ { s }\) at the \(5 \%\) significance level, stating your hypotheses clearly. Each critic gives a score out of 100 to each film. Desi uses these scores to calculate Pearson's product-moment correlation coefficient. She carries out a two-tailed significance test of this value at the \(5 \%\) significance level.
  4. Explain with a reason whether you would expect the conclusion of Desi's test to be the same as the result of the test in part (iii).
SPS SPS ASFM Statistics 2021 May Q5
8 marks Moderate -0.3
Arlosh, Sarah and Desi are investigating the ratings given to six different films by two critics.
  1. Arlosh calculates Spearman's rank correlation coefficient \(r_s\) for the critics' ratings. He calculates that \(\Sigma d^2 = 72\). Show that this value must be incorrect. [2]
  2. Arlosh checks his working with Sarah, whose answer \(r_s = \frac{39}{35}\) is correct. Find the correct value of \(\Sigma d^2\). [2]
  3. Carry out an appropriate two-tailed significance test of the value of \(r_s\) at the 5% significance level, stating your hypotheses clearly. [4]