| Exam Board | OCR MEI |
|---|---|
| Module | S2 (Statistics 2) |
| Year | 2006 |
| Session | January |
| Marks | 18 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Hypothesis test of Spearman’s rank correlation coefficien |
| Type | Hypothesis test for positive correlation |
| Difficulty | Standard +0.3 This is a standard textbook-style hypothesis test for Spearman's rank correlation with straightforward calculations (ranking data, computing rs, comparing to critical value). Parts (iii)-(v) require recall of standard assumptions and interpretations. The multi-part structure and routine nature place it slightly easier than average for A-level Further Maths statistics. |
| Spec | 5.08e Spearman rank correlation5.08f Hypothesis test: Spearman rank |
| \(x\) | 13.3 | 17.2 | 16.9 | 18.7 | 18.4 | 19.3 | 23.1 | 15.0 | 20.6 | 14.4 |
| \(y\) | 9 | 11 | 14 | 26 | 43 | 25 | 52 | 15 | 10 | 7 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| Rank table computed with \(d\) values: \(-1, 1, -1, -1, -3, 1, 0, -3, 6, 1\) | M1 | for ranking (allow all ranks reversed) |
| \(d^2\) values: \(1, 1, 1, 1, 9, 1, 0, 9, 36, 1\) | M1 | for \(d^2\) |
| \(\Sigma d^2 = 60\) | A1 | CAO for \(\Sigma d^2\) |
| \(r_s = 1 - \frac{6\Sigma d^2}{n(n^2-1)} = 1 - \frac{6 \times 60}{10 \times 99}\) | M1 | for structure of \(r_s\) using their \(\Sigma d^2\) |
| \(= 0.636\) (to 3 s.f.) [allow 0.64 to 2 s.f.] | A1 ft | for \( |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| \(H_0\): no association between \(x\) and \(y\) | B1 | for \(H_0\) |
| \(H_1\): positive association between \(x\) and \(y\) | B1 | for \(H_1\); NB \(H_0\ H_1\) not ito rho |
| Looking for positive association (one-tail test): | ||
| Critical value at 5% level is 0.5636 | B1 | for \(\pm 0.5636\) (FT their \(H_1\)) |
| Since \(0.636 > 0.5636\), there is sufficient evidence to reject \(H_0\) | M1 | for comparison with c.v., provided \( |
| i.e. conclude that there appears to be positive association between temperature and nitrous oxide level. | A1 | for conclusion in words f.t. their \(r_s\) and sensible cv 5 marks |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| Underlying distribution must be bivariate normal. | B1 | CAO for bivariate normal |
| If the distribution is bivariate normal then the scatter diagram will have an elliptical shape. | B1 | indep for elliptical shape |
| This scatter diagram is not elliptical and so a PMCC test would not be valid. (Allow comment indicating that the sample is too small to draw a firm conclusion on ellipticity and so on validity) | E1 dep | for conclusion 3 marks |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| \(n = 60\), PMCC critical value is \(r = 0.2997\) | B1 | |
| So the critical region is \(r \geq 0.2997\) | B1 FT | their sensible c.v. 2 marks |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| Any three of the following: | ||
| • Correlation does not imply causation | E1 | |
| • There could be a third factor (causing the correlation between temperature and ozone level) | E1 | |
| • the claim could be true | E1 | |
| • increased ozone could cause higher temperatures | 3 marks |
# Question 3:
## Part (i):
| Answer | Mark | Guidance |
|--------|------|----------|
| Rank table computed with $d$ values: $-1, 1, -1, -1, -3, 1, 0, -3, 6, 1$ | M1 | for ranking (allow all ranks reversed) |
| $d^2$ values: $1, 1, 1, 1, 9, 1, 0, 9, 36, 1$ | M1 | for $d^2$ |
| $\Sigma d^2 = 60$ | A1 | CAO for $\Sigma d^2$ |
| $r_s = 1 - \frac{6\Sigma d^2}{n(n^2-1)} = 1 - \frac{6 \times 60}{10 \times 99}$ | M1 | for structure of $r_s$ using their $\Sigma d^2$ |
| $= 0.636$ (to 3 s.f.) [allow 0.64 to 2 s.f.] | A1 ft | for $|r_s| < 1$; NB No ranking scores zero **5 marks** |
## Part (ii):
| Answer | Mark | Guidance |
|--------|------|----------|
| $H_0$: no association between $x$ and $y$ | B1 | for $H_0$ |
| $H_1$: positive association between $x$ and $y$ | B1 | for $H_1$; NB $H_0\ H_1$ not ito rho |
| Looking for positive association (one-tail test): | | |
| Critical value at 5% level is 0.5636 | B1 | for $\pm 0.5636$ (FT their $H_1$) |
| Since $0.636 > 0.5636$, there is sufficient evidence to reject $H_0$ | M1 | for comparison with c.v., provided $|r_s| < 1$ |
| i.e. conclude that there appears to be positive association between temperature and nitrous oxide level. | A1 | for conclusion in words f.t. their $r_s$ and sensible cv **5 marks** |
## Part (iii):
| Answer | Mark | Guidance |
|--------|------|----------|
| Underlying distribution must be bivariate normal. | B1 | CAO for bivariate normal |
| If the distribution is bivariate normal then the scatter diagram will have an elliptical shape. | B1 | indep for elliptical shape |
| This scatter diagram is not elliptical and so a PMCC test would not be valid. (Allow comment indicating that the sample is too small to draw a firm conclusion on ellipticity and so on validity) | E1 dep | for conclusion **3 marks** |
## Part (iv):
| Answer | Mark | Guidance |
|--------|------|----------|
| $n = 60$, PMCC critical value is $r = 0.2997$ | B1 | |
| So the critical region is $r \geq 0.2997$ | B1 FT | their sensible c.v. **2 marks** |
## Part (v):
| Answer | Mark | Guidance |
|--------|------|----------|
| Any three of the following: | | |
| • Correlation does not imply causation | E1 | |
| • There could be a third factor (causing the correlation between temperature and ozone level) | E1 | |
| • the claim could be true | E1 | |
| • increased ozone could cause higher temperatures | | **3 marks** |
---
3 A researcher is investigating the relationship between temperature and levels of the air pollutant nitrous oxide at a particular site. The researcher believes that there will be a positive correlation between the daily maximum temperature, $x$, and nitrous oxide level, $y$. Data are collected for 10 randomly selected days. The data, measured in suitable units, are given in the table and illustrated on the scatter diagram.
\begin{center}
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | }
\hline
$x$ & 13.3 & 17.2 & 16.9 & 18.7 & 18.4 & 19.3 & 23.1 & 15.0 & 20.6 & 14.4 \\
\hline
$y$ & 9 & 11 & 14 & 26 & 43 & 25 & 52 & 15 & 10 & 7 \\
\hline
\end{tabular}
\end{center}
\begin{tikzpicture}
\begin{axis}[
width=12cm, height=9cm,
axis lines=left,
xmin=8.5, xmax=27,
ymin=0, ymax=65,
% Define ticks based on the image
xtick={10, 15, 20, 25},
ytick={0, 10, 20, 30, 40, 50, 60},
% Remove default labels to place them manually like the image
xlabel={}, ylabel={},
axis line style={-latex},
tick label style={font=\small},
clip=false
]
% Scatter plot of the data points from the table
\addplot[only marks, mark=x, mark size=3pt, thick] coordinates {
(13.3, 9) (17.2, 11) (16.9, 14) (18.7, 26) (18.4, 43)
(19.3, 25) (23.1, 52) (15.0, 15) (20.6, 10) (14.4, 7)
};
% Replicating specific label placements
\node[anchor=east, align=right, font=\small] at (axis cs:8.3, 60) {Nitrous oxide \\ level};
\node[anchor=west, font=\small] at (axis cs:8.5, 61.5) {$y$};
\node[anchor=south east, font=\small] at (axis cs:26.5, 0.5) {$x$};
\node[anchor=north, font=\small] at (axis cs:23.5, -6) {Temperature};
% Creating the axis break (zigzag) on the x-axis
\draw[white, line width=3pt] (axis cs:9.2, 0) -- (axis cs:9.8, 0);
\draw[thick] (axis cs:9.0, -1.8) -- (axis cs:9.4, 1.8) -- (axis cs:9.7, -1.8) -- (axis cs:10.1, 1.8);
\end{axis}
\end{tikzpicture}\\
(i) Calculate the value of Spearman's rank correlation coefficient for these data.\\
(ii) Perform a hypothesis test at the $5 \%$ level to check the researcher's belief, stating your hypotheses clearly.\\
(iii) It is suggested that it would be preferable to carry out a test based on the product moment correlation coefficient. State the distributional assumption required for such a test to be valid. Explain how a scatter diagram can be used to check whether the distributional assumption is likely to be valid and comment on the validity in this case.\\
(iv) A statistician investigates data over a much longer period and finds that the assumptions for the use of the product moment correlation coefficient are in fact valid. Give the critical region for the test at the $1 \%$ level, based on a sample of 60 days.\\
(v) In a different research project, into the correlation between daily temperature and ozone pollution levels, a positive correlation is found. It is argued that this shows that high temperatures cause increased ozone levels. Comment on this claim.
\hfill \mbox{\textit{OCR MEI S2 2006 Q3 [18]}}