Question 1 - A-Level Maths

OCR MEI S2 2016 June — Question 1 18 marks

Exam Board	OCR MEI
Module	S2 (Statistics 2)
Year	2016
Session	June
Marks	18
Paper	Download PDF ↗
Mark scheme	Download PDF ↗
Topic	Hypothesis test of Spearman’s rank correlation coefficien
Type	Hypothesis test for negative correlation
Difficulty	Standard +0.3 This is a straightforward application of Spearman's rank correlation test following standard procedures. While it requires multiple steps (ranking data, calculating rs, hypothesis testing), each step is routine and well-practiced. The conceptual questions (parts ii, v, vi) test standard bookwork. The calculation in part (iii) is mechanical but time-consuming with 11 data points. Overall, this is slightly easier than average as it's a textbook-style question with no novel problem-solving required.
Spec	2.02c Scatter diagrams and regression lines 5.08e Spearman rank correlation 5.08f Hypothesis test: Spearman rank

1 A researcher believes that there may be negative association between the quantity of fertiliser used and the percentage of the population who live in rural areas in different countries. The data below show the percentage of the population who live in rural areas and the fertiliser use measured in kg per hectare, for a random sample of 11 countries.

Percentage of population	33	6	58	35	81	69	61	7	74	71	17
Fertiliser use	76	44	6	68	3	10	7	176	5	137	157

Draw a scatter diagram to illustrate the data.
Explain why it might not be valid to carry out a test based on the product moment correlation coefficient in this case.
Calculate the value of Spearman's rank correlation coefficient.
Carry out a hypothesis test at the \(1 \%\) significance level to investigate the researcher's belief.
Explain the meaning of ' \(1 \%\) significance level'.
In order to carry out a test based on Spearman's rank correlation coefficient, what modelling assumptions, if any, are required about the underlying distribution?

Show mark scheme Show mark scheme source

Question 1:

Part (i)

Answer	Marks	Guidance
Answer	Marks	Guidance
Suitably labelled scatter diagram with 11 points plotted	G1	For suitably labelled axes. Condone absence of scale here.
11 points correctly plotted relative to a suitable linear scale	G2,1,0	G1 if 9 or 10 correctly plotted. G0 if 3 or more incorrectly plotted/omitted. Allow axes interchanged.
[3]

Part (ii)

Answer	Marks	Guidance
Answer	Marks	Guidance
Points do not appear to be roughly elliptical	E1	For "not elliptical"
The population may not have a bivariate Normal distribution	E1	For not underlying bivariate Normal. Do not allow "the data" in place of population/underlying. Allow "data is not from a bivariate Normal distribution". Do not allow "Normal bivariate…"
[2]

Part (iii)

Answer	Marks	Guidance
Answer	Marks	Guidance
Rankings calculated correctly	M1	For ranking (allow ranks reversed). NB No ranking scores 0/5
\(d^2\) values: 16, 25, 9, 4, 100, 9, 9, 81, 64, 0, 49	M1	For \(d^2\)
\(\Sigma d^2 = 366\)	A1	For \(\Sigma d^2\) (may be embedded in calculation)
\(r_s = 1 - \dfrac{6\Sigma d^2}{n(n^2-1)} = 1 - \dfrac{6 \times 366}{11 \times 120} = 1 - \dfrac{2196}{1320} = 1 - 1.6636\)	M1	For method for \(r_s\)
\(= -0.664\) (to 3 s.f.) [allow \(-0.66\) to 2 s.f. or \(-73/110\)]	A1	FT their \(\Sigma d^2\) provided \(-1 < r_s < 0\), and ranking used. NB No ranking scores 0/5
[5]

Part (iv)

Answer	Marks	Guidance
Answer	Marks	Guidance
\(H_0\): no association between percentage of population living in rural areas and fertiliser use (in the population of countries)	B1	For null hypothesis in context. NB \(H_0\), \(H_1\) not in terms of \(\rho\)
\(H_1\): negative association between percentage of population living in rural areas and fertiliser use (in the population of countries)	B1	For alternative hypothesis in context. Context needed in at least one hypothesis.
B1	For population of countries or underlying population
One tail test critical value at 1% level is \(-0.7091\)	B1	For \(\pm 0.7091\). No further marks from here if incorrect
Since \(-0.664 > -0.7091\) [or \(0.664 < 0.7091\)] there is…	M1	For sensible comparison of "\(-0.664\)" with \(\pm 0.7091\) seen, leading to conclusion, only if \(-1 <\) their \(r_s < 0\)
…insufficient evidence to reject \(H_0\). There is insufficient evidence to suggest that there is negative association between percentage of population living in rural areas and fertiliser use (in the population of countries)	A1	For not significant, oe, and correct conclusion in context. FT their \(r_s\) with correct cv.
[6]

Part (v)

Answer	Marks	Guidance
Answer	Marks	Guidance
It means that the probability of rejecting \(H_0\) given that it is correct is 1% o.e.	E1	Allow "the probability of a false positive is 1%", "the probability of a Type I Error is 1%". Do not allow "It means that the probability rejecting \(H_0\) when it should have been accepted is 1%"
[1]

Part (vi)

Answer	Marks	Guidance
Answer	Marks	Guidance
None	E1
[1]

# Question 1:

## Part (i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Suitably labelled scatter diagram with 11 points plotted | G1 | For suitably labelled axes. Condone absence of scale here. |
| 11 points correctly plotted relative to a suitable linear scale | G2,1,0 | G1 if 9 or 10 correctly plotted. G0 if 3 or more incorrectly plotted/omitted. Allow axes interchanged. |
| **[3]** | | |

## Part (ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Points do not appear to be roughly elliptical | E1 | For "not elliptical" |
| The population may not have a bivariate Normal distribution | E1 | For not **underlying** bivariate Normal. Do not allow "the data" in place of population/underlying. Allow "data is not from a bivariate Normal distribution". Do not allow "Normal bivariate…" |
| **[2]** | | |

## Part (iii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Rankings calculated correctly | M1 | For ranking (allow ranks reversed). **NB No ranking scores 0/5** |
| $d^2$ values: 16, 25, 9, 4, 100, 9, 9, 81, 64, 0, 49 | M1 | For $d^2$ |
| $\Sigma d^2 = 366$ | A1 | For $\Sigma d^2$ (may be embedded in calculation) |
| $r_s = 1 - \dfrac{6\Sigma d^2}{n(n^2-1)} = 1 - \dfrac{6 \times 366}{11 \times 120} = 1 - \dfrac{2196}{1320} = 1 - 1.6636$ | M1 | For method for $r_s$ |
| $= -0.664$ (to 3 s.f.) [allow $-0.66$ to 2 s.f. or $-73/110$] | A1 | FT their $\Sigma d^2$ provided $-1 < r_s < 0$, and ranking used. **NB No ranking scores 0/5** |
| **[5]** | | |

## Part (iv)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0$: no association between percentage of population living in rural areas and fertiliser use (in the population of countries) | B1 | For null hypothesis in context. **NB** $H_0$, $H_1$ not in terms of $\rho$ |
| $H_1$: **negative** association between percentage of population living in rural areas and fertiliser use (in the population of countries) | B1 | For alternative hypothesis in context. Context needed in at least one hypothesis. |
| | B1 | For **population of countries** or **underlying population** |
| One tail test critical value at 1% level is $-0.7091$ | B1 | For $\pm 0.7091$. **No further marks from here if incorrect** |
| Since $-0.664 > -0.7091$ [or $0.664 < 0.7091$] there is… | M1 | For sensible comparison of "$-0.664$" with $\pm 0.7091$ seen, leading to conclusion, only if $-1 <$ their $r_s < 0$ |
| …insufficient evidence to reject $H_0$. There is insufficient evidence to suggest that there is **negative** association between percentage of population living in rural areas and fertiliser use (in the population of countries) | A1 | For not significant, oe, and correct conclusion in context. FT their $r_s$ with correct cv. |
| **[6]** | | |

## Part (v)
| Answer | Marks | Guidance |
|--------|-------|----------|
| It means that the probability of rejecting $H_0$ given that it is correct is 1% o.e. | E1 | Allow "the probability of a false positive is 1%", "the probability of a Type I Error is 1%". Do not allow "It means that the probability rejecting $H_0$ when it should have been accepted is 1%" |
| **[1]** | | |

## Part (vi)
| Answer | Marks | Guidance |
|--------|-------|----------|
| None | E1 | |
| **[1]** | | |

---

Show LaTeX source

1 A researcher believes that there may be negative association between the quantity of fertiliser used and the percentage of the population who live in rural areas in different countries. The data below show the percentage of the population who live in rural areas and the fertiliser use measured in kg per hectare, for a random sample of 11 countries.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | c | c | c | c | }
\hline
Percentage of population & 33 & 6 & 58 & 35 & 81 & 69 & 61 & 7 & 74 & 71 & 17 \\
\hline
Fertiliser use & 76 & 44 & 6 & 68 & 3 & 10 & 7 & 176 & 5 & 137 & 157 \\
\hline
\end{tabular}
\end{center}

(i) Draw a scatter diagram to illustrate the data.\\
(ii) Explain why it might not be valid to carry out a test based on the product moment correlation coefficient in this case.\\
(iii) Calculate the value of Spearman's rank correlation coefficient.\\
(iv) Carry out a hypothesis test at the $1 \%$ significance level to investigate the researcher's belief.\\
(v) Explain the meaning of ' $1 \%$ significance level'.\\
(vi) In order to carry out a test based on Spearman's rank correlation coefficient, what modelling assumptions, if any, are required about the underlying distribution?

\hfill \mbox{\textit{OCR MEI S2 2016 Q1 [18]}}

This paper (4 questions)

View full paper

Q1 18 Q2 16 Q3 18 Q4 20