5.08d Hypothesis test: Pearson correlation

109 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S3 2023 January Q2
12 marks Standard +0.3
2 The table shows the season's best times, \(x\) seconds, for the 8 athletes who took part in the 200 m final in the 2021 Tokyo Olympics. It also shows their finishing position in the race.
Athlete\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Season's best time19.8919.8319.7419.8419.9119.9920.1320.10
Finishing position12345678
Given that the fastest season's best time is ranked number 1
  1. calculate the value of the Spearman's rank correlation coefficient for these data.
  2. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not there is evidence of a positive correlation between the rank of the season's best time and the finishing position for these athletes. Chris suggests that it would be better to use the actual finishing time, \(y\) seconds, of these athletes rather than their finishing position. Given that $$S _ { x x } = 0.1286875 \quad S _ { y y } = 0.55275 \quad S _ { x y } = 0.225175$$
  3. calculate the product moment correlation coefficient between the season's best time and the finishing time for these athletes.
    Give your answer correct to 3 decimal places.
  4. Use your value of the product moment correlation coefficient to test, at the \(1 \%\) level of significance, whether or not there is evidence of a positive correlation between the season's best time and the finishing time for these athletes.
Edexcel S3 2024 January Q3
12 marks Standard +0.3
  1. The table shows the annual tea consumption, \(t\) (kg/person), and population, \(p\) (millions), for a random sample of 7 European countries.
CountryABCDEFG
Annual tea consumption, \(\boldsymbol { t }\) (kg/person)0.270.150.420.061.940.780.44
Population, \(\boldsymbol { p }\) (millions)5.45.8910.267.917.18.7
$$\text { (You may use } \mathrm { S } _ { t t } = 2.486 \quad \mathrm {~S} _ { p p } = 3026.234 \quad \mathrm {~S} _ { p t } = 83.634 \text { ) }$$ Angela suggests using the product moment correlation coefficient to calculate the correlation between annual tea consumption and population.
  1. Use Angela's suggestion to test, at the \(5 \%\) level of significance, whether or not there is evidence of any correlation between annual tea consumption and population. State your hypotheses clearly and the critical value used. Johan suggests using Spearman's rank correlation coefficient to calculate the correlation between the rank of annual tea consumption and the rank of population.
  2. Calculate Spearman's rank correlation coefficient between the rank of annual tea consumption and the rank of population.
  3. Use Johan's suggestion to test, at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between annual tea consumption and population.
    State your hypotheses clearly and the critical value used.
Edexcel S3 2014 June Q4
12 marks Standard +0.3
4. In a survey 10 randomly selected men had their systolic blood pressure, \(x\), and weight, \(w\), measured. Their results are as follows
Man\(\boldsymbol { A }\)\(\boldsymbol { B }\)\(\boldsymbol { C }\)\(\boldsymbol { D }\)\(\boldsymbol { E }\)\(\boldsymbol { F }\)\(\boldsymbol { G }\)\(\boldsymbol { H }\)\(\boldsymbol { I }\)\(\boldsymbol { J }\)
\(x\)123128137143149153154159162168
\(w\)78938583759888879599
  1. Calculate the value of Spearman's rank correlation coefficient between \(x\) and \(w\).
  2. Stating your hypotheses clearly, test at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between systolic blood pressure and weight. The product moment correlation coefficient for these data is 0.5114
  3. Use the value of the product moment correlation coefficient to test, at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between systolic blood pressure and weight.
  4. Using your conclusions to part (b) and part (c), describe the relationship between systolic blood pressure and weight.
Edexcel S3 2018 June Q1
12 marks Standard +0.3
  1. A random sample of 9 footballers is chosen to participate in an obstacle course. The time taken, \(y\) seconds, for each footballer to complete the obstacle course is recorded, together with the footballer's Body Mass Index, \(x\). The results are shown in the table below.
FootballerBody Mass Index, \(\boldsymbol { x }\)Time taken to complete the obstacle course, \(y\) seconds
A18.7690
B19.5801
C20.2723
D20.4633
E20.8660
F21.9655
G23.2711
H24.3642
I24.8607
Russell claims, that for footballers, as Body Mass Index increases the time taken to complete the obstacle course tends to decrease.
  1. Find, to 3 decimal places, Spearman's rank correlation coefficient between \(x\) and \(y\).
  2. Use your value of Spearman's rank correlation coefficient to test Russell's claim. Use a 5\% significance level and state your hypotheses clearly. The product moment correlation coefficient for these data is - 0.5594
  3. Use the value of the product moment correlation coefficient to test for evidence of a negative correlation between Body Mass Index and the time taken to complete the obstacle course. Use a 5\% significance level.
  4. Using your conclusions to part (b) and part (c), describe the relationship between Body Mass Index and the time taken to complete the obstacle course.
Edexcel S3 2023 June Q1
9 marks Standard +0.3
  1. (a) State two conditions under which it might be more appropriate to use Spearman's rank correlation coefficient rather than the product moment correlation coefficient.
A random sample of 10 melons was taken from a market stall. The length, in centimetres, and maximum diameter, in centimetres, of each melon were recorded. The Spearman's rank correlation coefficient between the results was - 0.673
(b) Test, at the \(5 \%\) level of significance, whether or not there is evidence of a correlation. State clearly your hypotheses and the critical value used. The product moment correlation coefficient between the results was - 0.525
(c) Test, at the \(5 \%\) level of significance, whether or not there is evidence of a negative correlation.
State clearly your hypotheses and the critical value used.
Edexcel S3 2021 October Q3
14 marks Standard +0.3
3. A cafe owner wishes to know whether the price of strawberry jam is related to the taste of the jam. He finds a website that lists the price per 100 grams and a mark for the taste, out of 100, awarded by a judge, for 9 different strawberry jams \(A , B , C , D , E , F , G , H\) and \(I\). He then ranks the marks for taste and the prices. The ranks are shown in the table below.
Rank123456789
Price\(A\)\(B\)\(E\)\(C\)\(D\)\(F\)\(G\)\(H\)\(I\)
Taste\(A\)\(B\)\(F\)\(E\)\(H\)\(G\)\(I\)\(C\)\(D\)
  1. Calculate Spearman's rank correlation coefficient for these data.
  2. Test, at the \(5 \%\) level of significance, whether or not there is a relationship between the price and the taste of these strawberry jams. State your hypotheses clearly. A friend suggests that it would be better to use the price per 100 grams, \(c\), and the mark for the taste, \(m\), for each strawberry jam rather than rank them. Given that $$\mathrm { S } _ { c c } = 2.0455 \quad \mathrm {~S} _ { m m } = 243.5556 \quad \mathrm {~S} _ { c m } = 16.4943$$
  3. calculate the product moment correlation coefficient between the price and the mark for taste of these strawberry jams, giving your answer correct to 3 decimal places.
  4. Use your value of the product moment correlation coefficient to test, at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between the price and the mark for taste of these 9 strawberry jams. State your hypotheses clearly.
  5. State which of the tests in parts (b) and (d) is more appropriate for the cafe owner to use. Give a reason for your answer.
Edexcel S3 2013 June Q3
13 marks Standard +0.3
3. The table below shows the population and the number of council employees for different towns and villages.
Town or villagePopulationNumber of council employees
A21110
B3562
C104712
D246321
E489216
F647925
G657167
H657345
I984548
\(J\)1478434
  1. Find, to 3 decimal places, Spearman's rank correlation coefficient between the population and the number of council employees.
  2. Use your value of Spearman's rank correlation coefficient to test for evidence of a positive correlation between the population and the number of council employees. Use a \(2.5 \%\) significance level. State your hypotheses clearly. It is suggested that a product moment correlation coefficient would be a more suitable calculation in this case. The product moment correlation coefficient for these data is 0.627 to 3 decimal places.
  3. Use the value of the product moment correlation coefficient to test for evidence of a positive correlation between the population and the number of council employees. Use a \(2.5 \%\) significance level.
  4. Interpret and comment on your results from part(b) and part(c).
Edexcel S3 2014 June Q8
16 marks Standard +0.3
8. The heights, in metres, and weights, in kilograms, of a random sample of 9 men are shown in the table below
Man\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)
Height \(( x )\)1.681.741.751.761.781.821.841.881.98
Weight \(( y )\)757610077909511096120
  1. Given that \(\mathrm { S } _ { x x } = 0.0632 , \mathrm {~S} _ { y y } = 1957.5556\) and \(\mathrm { S } _ { x y } = 9.3433\) calculate, to 3 decimal places, the product moment correlation coefficient between height and weight for these men.
  2. Use your value of the product moment correlation coefficient to test whether or not there is evidence of a positive correlation between the height and weight of men. Use a \(5 \%\) significance level. State your hypotheses clearly. Peter does not know the heights or weights of the 9 men. He is given photographs of them and asked to put them in order of increasing weight. He puts them in the order $$A C E B G D I F H$$
  3. Find, to 3 decimal places, Spearman's rank correlation coefficient between Peter's order and the actual order.
  4. Use your value of Spearman's rank correlation coefficient to test for evidence of Peter's ability to correctly order men, by their weight, from their photographs. Use a 5\% significance level and state your hypotheses clearly.
Edexcel S3 2018 June Q1
13 marks Standard +0.3
  1. Phil measures the concentration of a radioactive element, \(c\), and the amount of dissolved solids, \(a\), of 8 random samples of groundwater. His results are shown in the table below.
Sample\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(c\)625700650645720600825665
\(a\)1.281.301.001.201.551.151.401.45
Given that $$\mathrm { S } _ { c c } = 34787.5 \quad \mathrm {~S} _ { a a } = 0.2172875 \quad \mathrm {~S} _ { c a } = 47.7625$$
  1. calculate, to 3 decimal places, the product moment correlation coefficient between the concentration of the radioactive element and the amount of dissolved solids for these groundwater samples.
  2. Use your value of the product moment correlation coefficient to test whether or not there is evidence of a positive correlation between the concentration of this radioactive element and the amount of dissolved solids in groundwater. Use a \(5 \%\) significance level. State your hypotheses clearly.
  3. Calculate, to 3 decimal places, Spearman's rank correlation coefficient between the concentration of the radioactive element and the amount of dissolved solids.
  4. Use your value of Spearman's rank correlation coefficient to test for evidence of a positive correlation between the concentration of the radioactive element and the amount of dissolved solids. Use a \(5 \%\) significance level. State your hypotheses clearly.
  5. Using your conclusions in part (b) and part (d), comment on the possible relationship between these variables.
AQA S1 2006 January Q5
11 marks Easy -1.2
5 [Figure 1, printed on the insert, is provided for use in this question.]
The table shows the times, in seconds, taken by a random sample of 10 boys from a junior swimming club to swim 50 metres freestyle and 50 metres backstroke.
BoyABCDEFGHIJ
Freestyle ( \(\boldsymbol { x }\) seconds)30.232.825.131.831.235.632.438.036.134.1
Backstroke ( \(y\) seconds)33.535.437.427.234.738.237.741.442.338.4
  1. On Figure 1, complete the scatter diagram for these data.
  2. Hence:
    1. give two distinct comments on what your scatter diagram reveals;
    2. state, without calculation, which of the following 3 values is most likely to be the value of the product moment correlation coefficient for the data in your scatter diagram. $$0.912 \quad 0.088 \quad 0.462$$
  3. In the sample of 10 boys, one boy is a junior-champion freestyle swimmer and one boy is a junior-champion backstroke swimmer. Identify the two most likely boys.
  4. Removing the data for the two boys whom you identified in part (c):
    1. calculate the value of the product moment correlation coefficient for the remaining 8 pairs of values of \(x\) and \(y\);
    2. comment, in context, on the value that you obtain.
Edexcel S1 Q6
17 marks Moderate -0.8
6. A school introduced a new programme of support lessons in 1994 with a view to improving grades in GCSE English. The table below shows the number of years since 1994, n, and the corresponding percentage of students achieving A to C grades in GCSE English, \(p\), for each year.
\(n\)123456
\(p ( \% )\)35.237.140.639.043.444.8
  1. Represent these data on a scatter diagram. You may use the following values. $$\Sigma n = 21 , \quad \Sigma p = 240.1 , \quad \Sigma n ^ { 2 } = 91 , \quad \Sigma p ^ { 2 } = 9675.41 , \quad \Sigma n p = 873 .$$
  2. Find an equation of the regression line of \(p\) on \(n\) and draw it on your graph.
  3. Calculate the product moment correlation coefficient for these data and comment on the suitability of a linear model for the relationship between \(n\) and \(p\) during this period.
Edexcel S1 Q2
8 marks Easy -1.2
2. A supermarket manager believes that those of her staff on lower rates of pay tend to work more hours of overtime.
  1. Suggest why this might be the case. To investigate her theory the manager recorded the number of hours of overtime, \(h\), worked by each of the store's 18 full-time staff during one week. She also recorded each employee's hourly rate of pay, \(\pounds p\), and summarised her results as follows: $$\Sigma p = 86 , \quad \Sigma h = 104.5 , \quad \Sigma p ^ { 2 } = 420.58 , \quad \Sigma h ^ { 2 } = 830.25 , \quad \Sigma p h = 487.3$$
  2. Calculate the product moment correlation coefficient for these data.
  3. Comment on the manager's hypothesis.
Edexcel S1 Q1
7 marks Moderate -0.8
  1. A shop recorded the number of pairs of gloves, \(n\), that it sold and the average daytime temperature, \(T ^ { \circ } \mathrm { C }\), for each month over a 12-month period.
The data was then summarised as follows: $$\Sigma T = 124 , \quad \Sigma n = 384 , \quad \Sigma T ^ { 2 } = 1802 , \quad \Sigma n ^ { 2 } = 18518 , \quad \Sigma T n = 2583 .$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Comment on what your value shows and suggest a reason for this.
AQA S3 2008 June Q1
7 marks Moderate -0.3
1 The best performances of a random sample of 20 junior athletes in the long jump, \(x\) metres, and in the high jump, \(y\) metres, were recorded. The following statistics were calculated from the results. $$S _ { x x } = 7.0036 \quad S _ { y y } = 0.8464 \quad S _ { x y } = 1.3781$$
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    (2 marks)
  2. Assuming that these data come from a bivariate normal distribution, investigate, at the \(1 \%\) level of significance, the claim that for junior athletes there is a positive correlation between \(x\) and \(y\).
  3. Interpret your conclusion in the context of this question.
AQA S3 2012 June Q1
6 marks Moderate -0.8
1 A wildlife expert measured the neck lengths, \(x\) metres, and the tail lengths, \(y\) metres, of a sample of 12 mature male giraffes as part of a study into their physical characteristics. The results are shown in the table.
Edexcel S3 Q2
6 marks Standard +0.3
2. A Geography teacher is interested in the link between mathematical ability and the ability to visualise three-dimensional situations. He gives a group of 15 students a test and records each student's score, \(m\), on the mathematics questions and each student's score, \(v\), on the visiospatial questions. He calculates the following summary statistics: $$S _ { m m } = 3747.73 , \quad S _ { v v } = 2791.33 , \quad S _ { m v } = 2564.33$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Stating your hypotheses clearly and using a \(5 \%\) level of significance test the theory that students who are good at Mathematics tend to have better visio-spatial awareness.
    (4 marks)
OCR MEI Further Statistics A AS 2019 June Q4
8 marks Moderate -0.3
4 A student is investigating correlations between various personality traits, two of which are conscientiousness and openness to new experiences.
She selects a random sample of 10 students at her university and uses standard tests to measure their conscientiousness and their openness. The product moment correlation coefficient between these two variables for the 10 students is 0.476 .
  1. Assuming that the underlying population has a bivariate Normal distribution, carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any correlation between openness and conscientiousness in students. Table 4.1 below shows the values of the product moment correlation coefficients between 5 different personality traits for a much larger sample of students. Those correlations that are significant at the \(5 \%\) level are denoted by a * after the value of the correlation. \begin{table}[h]
    NeuroticismExtroversionOpennessAgreeablenessConscientiousness
    Neuroticism1
    Extroversion-0.296*1
    Openness-0.0440.405*1
    Agreeableness-0.190*0.0610.0421
    Conscientiousness-0.485*0.1450.235*0.1121
    \captionsetup{labelformat=empty} \caption{Table 4.1}
    \end{table} The student analyses these factors for effect size.
    Guidelines often used when considering effect size are given in Table 4.2 below. \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Table 4.2}
    \end{table}
  2. The student notes that, despite the result of the test in part (a), the correlation between openness and conscientiousness is significant at the \(5 \%\) level with this second sample. Comment briefly on why this may be the case.
  3. The student intends to summarise her findings about relationships between these factors, including effect sizes, in a report.
    Use the information in Tables 4.1 and 4.2 to identify two summary points the student could make.
OCR MEI Further Statistics A AS 2023 June Q5
10 marks Standard +0.3
5 Two practice GCSE examinations in mathematics are given to all of the students in a large year group. A teacher wants to check whether there is a positive relationship between the marks obtained by the students in the two examinations. She selects a random sample of 20 students. Summary data for the marks obtained in the first and second practice examinations, \(x\) and \(y\) respectively, are as follows. $$\sum x = 565 \quad \sum y = 724 \quad \sum x ^ { 2 } = 17103 \quad \sum y ^ { 2 } = 29286 \quad \sum x y = 21635$$ The teacher decides to carry out a hypothesis test based on Pearson's product moment correlation coefficient.
  1. In this question you must show detailed reasoning. Calculate the value of Pearson's product moment correlation coefficient.
  2. Carry out the test at the \(5 \%\) significance level.
  3. Given that the teacher did not draw a scatter diagram before carrying out the test, comment on the validity of the test.
OCR MEI Further Statistics A AS 2024 June Q5
10 marks Moderate -0.3
5 A student is investigating possible association between the amount of coffee that an adult drinks each day and the number of hours that they remain awake each day. In an initial investigation, a random sample of 8 adults is selected. The student obtains the following information from each of these adults: the amount of coffee that they drink each day and the number of hours that they remain awake each day. The student analyses the data and finds that the associated product moment correlation coefficient is 0.6030 .
  1. State one assumption that must be made for a hypothesis test based on the product moment correlation coefficient to be carried out. For the remainder of this question you may assume that this assumption is true.
  2. Carry out a test at the \(5 \%\) significance level to investigate whether there is any correlation between amount of coffee drunk and number of hours awake. The student conducts a second investigation which is similar to the first but this time based on a random sample of 30 adults. The product moment correlation coefficient for the new data is 0.5487 . The student carries out an equivalent hypothesis test to the one carried out in part (b), again using a 5\% significance level.
  3. Identify any differences between the two tests and their results. You do not need to restate the hypotheses or explain the conclusion in context.
  4. You may assume the following guidelines for considering effect size.
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    Explain briefly why the results of the student's second investigation are likely to be more reliable than the results of the initial investigation.
OCR MEI Further Statistics A AS 2020 November Q2
12 marks Standard +0.3
2 A researcher is investigating the concentration of bacteria and fungi in the air in buildings. The researcher selects a random sample of 12 buildings and measures the concentrations of bacteria, \(x\), and fungi, \(y\), in the air in each building. Both concentrations are measured in the same standard units. Fig. 2 illustrates the data collected. The researcher wishes to test for a relationship between \(x\) and \(y\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ba3fcd3c-6834-4116-be0e-d5b27aed0a7e-3_595_844_513_255} \captionsetup{labelformat=empty} \caption{Fig. 2}
\end{figure}
  1. Explain why a test based on the product moment correlation coefficient is likely to be appropriate for these data. Summary statistics for the data are as follows. \(n = 12 \quad \sum x = 18030 \quad \sum y = 15550 \quad \sum x ^ { 2 } = 31458700 \quad \sum y ^ { 2 } = 21980500 \quad \sum x y = 25626800\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. Carry out a test at the \(5 \%\) significance level based on the product moment correlation coefficient to investigate whether there is any correlation between concentrations of bacteria and fungi.
  4. Explain why, in order for proper inference to be undertaken, the sample should be chosen randomly.
OCR MEI Further Statistics A AS 2021 November Q3
9 marks Standard +0.3
3 A student is investigating the link between temperature (in degrees Celsius) and electricity consumption (in Gigawatt-hours) in the country in which he lives. The student has read that there is strong negative correlation between daily mean temperature over the whole country and daily electricity consumption during a year. He wonders if this applies to an individual season. He therefore obtains data on the mean temperature and electricity consumption on ten randomly selected days in the summer. The spreadsheet output below shows the data, together with a scatter diagram to illustrate the data. \includegraphics[max width=\textwidth, alt={}, center]{5be067ff-4668-48d6-8ed2-b8dfa3e678f7-3_798_1593_639_251}
  1. Calculate Pearson's product moment correlation coefficient between daily mean temperature and daily electricity consumption. The student decides to carry out a hypothesis test to investigate whether there is negative correlation between daily mean temperature and daily electricity consumption during the summer.
  2. Explain why the student decides to carry out a test based on Pearson's product moment correlation coefficient.
  3. Show that the test at the \(5 \%\) significance level does not result in the null hypothesis being rejected.
  4. The student concludes that there is no correlation between the variables in the summer months. Comment on the student's conclusion.
OCR MEI Further Statistics Minor 2019 June Q5
16 marks Standard +0.3
5 A student wants to know if there is a positive correlation between the amounts of two pollutants, sulphur dioxide and PM10 particulates, on different days in the area of London in which he lives; these amounts, measured in suitable units, are denoted by \(s\) and \(p\) respectively.
He uses a government website to obtain data for a random sample of 15 days on which the amounts of these pollutants were measured simultaneously. Fig. 5.1 is a scatter diagram showing the data. Summary statistics for these 15 values of \(s\) and \(p\) are as follows. \(\sum s _ { 1 } = 155.4 \quad \sum p = 518.9 \quad \sum s ^ { 2 } = 2322.7 \quad \sum p ^ { 2 } = 21270.5 \quad \sum s p = 6009.1\) \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{4a4d5816-5b53-49a1-b72f-f8bcf3b4e8bc-4_935_1134_683_260} \captionsetup{labelformat=empty} \caption{Fig. 5.1}
\end{figure}
  1. Explain why the student might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
  2. Find the value of Pearson's product moment correlation coefficient.
  3. Carry out a test at the \(5 \%\) significance level to investigate whether there is positive correlation between the amounts of sulphur dioxide and PM10 particulates.
  4. Explain why the student made sure that the sample chosen was a random sample. The student also wishes to model the relationship between the amounts of nitrogen dioxide \(n\) and PM10 particulates \(p\).
    He takes a random sample of 54 values of the two variables, both measured at the same times. Fig. 5.2 is a scatter diagram which shows the data, together with the regression line of \(n\) on \(p\), the equation of the regression line and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{4a4d5816-5b53-49a1-b72f-f8bcf3b4e8bc-5_824_1230_495_258} \captionsetup{labelformat=empty} \caption{Fig. 5.2}
    \end{figure}
  5. Predict the value of \(n\) for \(p = 150\).
  6. Discuss the reliability of your prediction in part (e).
OCR MEI Further Statistics Minor 2024 June Q3
13 marks Standard +0.3
3 The scatter diagram below illustrates data concerning average annual income per person, \(\\) x\(, and average life expectancy, \)y$ years, for 45 randomly selected cities. \includegraphics[max width=\textwidth, alt={}, center]{464c80be-007b-4d5a-9fe5-2f35100bdea6-3_860_1465_354_244}
  1. State whether neither variable, one variable or both variables can be considered to be random in this situation. A student is researching possible positive association between average annual income and average life expectancy. The student decides that the data point labelled A on the scatter diagram is an outlier.
  2. Describe the apparent relationship between average annual income and average life expectancy for this data point relative to the rest of the data. The data for point A is removed. The student now wishes to carry out a hypothesis test using the product moment correlation coefficient for the remaining 44 data points to investigate whether there is positive correlation between average annual income and average life expectancy.
  3. Explain why this type of hypothesis test is appropriate in this situation. Justify your answer. The summary statistics for these 44 data points are as follows. \(\sum x = 751120 \sum y = 2397.1 \sum x ^ { 2 } = 14363849200 \sum y ^ { 2 } = 133014.63 \sum x y = 42465962\)
  4. Determine the value of the product moment correlation coefficient.
  5. Carry out the test at the 1\% significance level.
OCR MEI Further Statistics Minor 2021 November Q4
14 marks Standard +0.3
4 A scientist is investigating sea salinity (the level of salt in the sea) in a particular area. She wishes to check whether satellite measurements, \(y\), of salinity are similar to those directly measured, \(x\). Both variables are measured in parts per thousand in suitable units. The scientist obtains a random sample of 10 values of \(x\) and the related values of \(y\). Below is a screenshot of a scatter diagram to illustrate the data. She decides to carry out a hypothesis test to check if there is any correlation between direct measurement, \(x\), and satellite measurement, \(y\). \includegraphics[max width=\textwidth, alt={}, center]{691e8b55-e9a1-4fff-b9ee-a71ff1f73ead-5_830_837_589_246}
  1. Explain why the scientist might decide to carry out a test based on the product moment correlation coefficient. Summary statistics for \(x\) and \(y\) are as follows. \(n = 10 \quad \sum x = 351.9 \quad \sum y = 350.0 \quad \sum x ^ { 2 } = 12384.5 \quad \sum y ^ { 2 } = 12251.2 \quad \sum \mathrm { xy } = 12317.2\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive correlation between directly measured and satellite measured salinity levels.
  4. Explain why it would be preferable to use a larger sample. The scientist is also interested in whether there is any correlation between salinity and numbers of a particular species of shrimp in the water. She takes a large sample and finds that the product moment correlation coefficient for this sample is 0.165 . The result of a test based on this sample is to reject the null hypothesis and conclude that there is correlation between salinity and numbers of shrimp.
  5. Comment on the outcome of the hypothesis test with reference to the effect size of 0.165 .
OCR MEI Further Statistics Major 2023 June Q6
12 marks Standard +0.3
6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed \(x\) and upload speed \(y\) in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected. \includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
  1. Explain why the student decides to carry out a test based on the product moment correlation coefficient. Summary statistics for the 20 occasions are as follows. $$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any correlation between download speed and upload speed.
  4. Both of the variables, download speed and upload speed, are random. Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.