OCR MEI S2 2016 June — Question 1 18 marks

Exam BoardOCR MEI
ModuleS2 (Statistics 2)
Year2016
SessionJune
Marks18
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicHypothesis test of Spearman’s rank correlation coefficien
TypeHypothesis test for negative correlation
DifficultyStandard +0.3 This is a straightforward application of Spearman's rank correlation test following standard procedures. While it requires multiple steps (ranking data, calculating rs, hypothesis testing), each step is routine and well-practiced. The conceptual questions (parts ii, v, vi) test standard bookwork. The calculation in part (iii) is mechanical but time-consuming with 11 data points. Overall, this is slightly easier than average as it's a textbook-style question with no novel problem-solving required.
Spec2.02c Scatter diagrams and regression lines5.08e Spearman rank correlation5.08f Hypothesis test: Spearman rank

1 A researcher believes that there may be negative association between the quantity of fertiliser used and the percentage of the population who live in rural areas in different countries. The data below show the percentage of the population who live in rural areas and the fertiliser use measured in kg per hectare, for a random sample of 11 countries.
Percentage of population33658358169617747117
Fertiliser use764466831071765137157
  1. Draw a scatter diagram to illustrate the data.
  2. Explain why it might not be valid to carry out a test based on the product moment correlation coefficient in this case.
  3. Calculate the value of Spearman's rank correlation coefficient.
  4. Carry out a hypothesis test at the \(1 \%\) significance level to investigate the researcher's belief.
  5. Explain the meaning of ' \(1 \%\) significance level'.
  6. In order to carry out a test based on Spearman's rank correlation coefficient, what modelling assumptions, if any, are required about the underlying distribution?

Question 1:
Part (i)
AnswerMarks Guidance
AnswerMarks Guidance
Suitably labelled scatter diagram with 11 points plottedG1 For suitably labelled axes. Condone absence of scale here.
11 points correctly plotted relative to a suitable linear scaleG2,1,0 G1 if 9 or 10 correctly plotted. G0 if 3 or more incorrectly plotted/omitted. Allow axes interchanged.
[3]
Part (ii)
AnswerMarks Guidance
AnswerMarks Guidance
Points do not appear to be roughly ellipticalE1 For "not elliptical"
The population may not have a bivariate Normal distributionE1 For not underlying bivariate Normal. Do not allow "the data" in place of population/underlying. Allow "data is not from a bivariate Normal distribution". Do not allow "Normal bivariate…"
[2]
Part (iii)
AnswerMarks Guidance
AnswerMarks Guidance
Rankings calculated correctlyM1 For ranking (allow ranks reversed). NB No ranking scores 0/5
\(d^2\) values: 16, 25, 9, 4, 100, 9, 9, 81, 64, 0, 49M1 For \(d^2\)
\(\Sigma d^2 = 366\)A1 For \(\Sigma d^2\) (may be embedded in calculation)
\(r_s = 1 - \dfrac{6\Sigma d^2}{n(n^2-1)} = 1 - \dfrac{6 \times 366}{11 \times 120} = 1 - \dfrac{2196}{1320} = 1 - 1.6636\)M1 For method for \(r_s\)
\(= -0.664\) (to 3 s.f.) [allow \(-0.66\) to 2 s.f. or \(-73/110\)]A1 FT their \(\Sigma d^2\) provided \(-1 < r_s < 0\), and ranking used. NB No ranking scores 0/5
[5]
Part (iv)
AnswerMarks Guidance
AnswerMarks Guidance
\(H_0\): no association between percentage of population living in rural areas and fertiliser use (in the population of countries)B1 For null hypothesis in context. NB \(H_0\), \(H_1\) not in terms of \(\rho\)
\(H_1\): negative association between percentage of population living in rural areas and fertiliser use (in the population of countries)B1 For alternative hypothesis in context. Context needed in at least one hypothesis.
B1For population of countries or underlying population
One tail test critical value at 1% level is \(-0.7091\)B1 For \(\pm 0.7091\). No further marks from here if incorrect
Since \(-0.664 > -0.7091\) [or \(0.664 < 0.7091\)] there is…M1 For sensible comparison of "\(-0.664\)" with \(\pm 0.7091\) seen, leading to conclusion, only if \(-1 <\) their \(r_s < 0\)
…insufficient evidence to reject \(H_0\). There is insufficient evidence to suggest that there is negative association between percentage of population living in rural areas and fertiliser use (in the population of countries)A1 For not significant, oe, and correct conclusion in context. FT their \(r_s\) with correct cv.
[6]
Part (v)
AnswerMarks Guidance
AnswerMarks Guidance
It means that the probability of rejecting \(H_0\) given that it is correct is 1% o.e.E1 Allow "the probability of a false positive is 1%", "the probability of a Type I Error is 1%". Do not allow "It means that the probability rejecting \(H_0\) when it should have been accepted is 1%"
[1]
Part (vi)
AnswerMarks Guidance
AnswerMarks Guidance
NoneE1
[1]
# Question 1:

## Part (i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Suitably labelled scatter diagram with 11 points plotted | G1 | For suitably labelled axes. Condone absence of scale here. |
| 11 points correctly plotted relative to a suitable linear scale | G2,1,0 | G1 if 9 or 10 correctly plotted. G0 if 3 or more incorrectly plotted/omitted. Allow axes interchanged. |
| **[3]** | | |

## Part (ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Points do not appear to be roughly elliptical | E1 | For "not elliptical" |
| The population may not have a bivariate Normal distribution | E1 | For not **underlying** bivariate Normal. Do not allow "the data" in place of population/underlying. Allow "data is not from a bivariate Normal distribution". Do not allow "Normal bivariate…" |
| **[2]** | | |

## Part (iii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Rankings calculated correctly | M1 | For ranking (allow ranks reversed). **NB No ranking scores 0/5** |
| $d^2$ values: 16, 25, 9, 4, 100, 9, 9, 81, 64, 0, 49 | M1 | For $d^2$ |
| $\Sigma d^2 = 366$ | A1 | For $\Sigma d^2$ (may be embedded in calculation) |
| $r_s = 1 - \dfrac{6\Sigma d^2}{n(n^2-1)} = 1 - \dfrac{6 \times 366}{11 \times 120} = 1 - \dfrac{2196}{1320} = 1 - 1.6636$ | M1 | For method for $r_s$ |
| $= -0.664$ (to 3 s.f.) [allow $-0.66$ to 2 s.f. or $-73/110$] | A1 | FT their $\Sigma d^2$ provided $-1 < r_s < 0$, and ranking used. **NB No ranking scores 0/5** |
| **[5]** | | |

## Part (iv)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0$: no association between percentage of population living in rural areas and fertiliser use (in the population of countries) | B1 | For null hypothesis in context. **NB** $H_0$, $H_1$ not in terms of $\rho$ |
| $H_1$: **negative** association between percentage of population living in rural areas and fertiliser use (in the population of countries) | B1 | For alternative hypothesis in context. Context needed in at least one hypothesis. |
| | B1 | For **population of countries** or **underlying population** |
| One tail test critical value at 1% level is $-0.7091$ | B1 | For $\pm 0.7091$. **No further marks from here if incorrect** |
| Since $-0.664 > -0.7091$ [or $0.664 < 0.7091$] there is… | M1 | For sensible comparison of "$-0.664$" with $\pm 0.7091$ seen, leading to conclusion, only if $-1 <$ their $r_s < 0$ |
| …insufficient evidence to reject $H_0$. There is insufficient evidence to suggest that there is **negative** association between percentage of population living in rural areas and fertiliser use (in the population of countries) | A1 | For not significant, oe, and correct conclusion in context. FT their $r_s$ with correct cv. |
| **[6]** | | |

## Part (v)
| Answer | Marks | Guidance |
|--------|-------|----------|
| It means that the probability of rejecting $H_0$ given that it is correct is 1% o.e. | E1 | Allow "the probability of a false positive is 1%", "the probability of a Type I Error is 1%". Do not allow "It means that the probability rejecting $H_0$ when it should have been accepted is 1%" |
| **[1]** | | |

## Part (vi)
| Answer | Marks | Guidance |
|--------|-------|----------|
| None | E1 | |
| **[1]** | | |

---
1 A researcher believes that there may be negative association between the quantity of fertiliser used and the percentage of the population who live in rural areas in different countries. The data below show the percentage of the population who live in rural areas and the fertiliser use measured in kg per hectare, for a random sample of 11 countries.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | c | c | c | c | }
\hline
Percentage of population & 33 & 6 & 58 & 35 & 81 & 69 & 61 & 7 & 74 & 71 & 17 \\
\hline
Fertiliser use & 76 & 44 & 6 & 68 & 3 & 10 & 7 & 176 & 5 & 137 & 157 \\
\hline
\end{tabular}
\end{center}

(i) Draw a scatter diagram to illustrate the data.\\
(ii) Explain why it might not be valid to carry out a test based on the product moment correlation coefficient in this case.\\
(iii) Calculate the value of Spearman's rank correlation coefficient.\\
(iv) Carry out a hypothesis test at the $1 \%$ significance level to investigate the researcher's belief.\\
(v) Explain the meaning of ' $1 \%$ significance level'.\\
(vi) In order to carry out a test based on Spearman's rank correlation coefficient, what modelling assumptions, if any, are required about the underlying distribution?

\hfill \mbox{\textit{OCR MEI S2 2016 Q1 [18]}}