OCR MEI S2 2006 January — Question 3 18 marks

Exam BoardOCR MEI
ModuleS2 (Statistics 2)
Year2006
SessionJanuary
Marks18
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicHypothesis test of Spearman’s rank correlation coefficien
TypeHypothesis test for positive correlation
DifficultyStandard +0.3 This is a standard textbook-style hypothesis test for Spearman's rank correlation with straightforward calculations (ranking data, computing rs, comparing to critical value). Parts (iii)-(v) require recall of standard assumptions and interpretations. The multi-part structure and routine nature place it slightly easier than average for A-level Further Maths statistics.
Spec5.08e Spearman rank correlation5.08f Hypothesis test: Spearman rank

3 A researcher is investigating the relationship between temperature and levels of the air pollutant nitrous oxide at a particular site. The researcher believes that there will be a positive correlation between the daily maximum temperature, \(x\), and nitrous oxide level, \(y\). Data are collected for 10 randomly selected days. The data, measured in suitable units, are given in the table and illustrated on the scatter diagram.
\(x\)13.317.216.918.718.419.323.115.020.614.4
\(y\)911142643255215107
  1. Calculate the value of Spearman's rank correlation coefficient for these data.
  2. Perform a hypothesis test at the \(5 \%\) level to check the researcher's belief, stating your hypotheses clearly.
  3. It is suggested that it would be preferable to carry out a test based on the product moment correlation coefficient. State the distributional assumption required for such a test to be valid. Explain how a scatter diagram can be used to check whether the distributional assumption is likely to be valid and comment on the validity in this case.
  4. A statistician investigates data over a much longer period and finds that the assumptions for the use of the product moment correlation coefficient are in fact valid. Give the critical region for the test at the \(1 \%\) level, based on a sample of 60 days.
  5. In a different research project, into the correlation between daily temperature and ozone pollution levels, a positive correlation is found. It is argued that this shows that high temperatures cause increased ozone levels. Comment on this claim.

Question 3:
Part (i):
AnswerMarks Guidance
AnswerMark Guidance
Rank table computed with \(d\) values: \(-1, 1, -1, -1, -3, 1, 0, -3, 6, 1\)M1 for ranking (allow all ranks reversed)
\(d^2\) values: \(1, 1, 1, 1, 9, 1, 0, 9, 36, 1\)M1 for \(d^2\)
\(\Sigma d^2 = 60\)A1 CAO for \(\Sigma d^2\)
\(r_s = 1 - \frac{6\Sigma d^2}{n(n^2-1)} = 1 - \frac{6 \times 60}{10 \times 99}\)M1 for structure of \(r_s\) using their \(\Sigma d^2\)
\(= 0.636\) (to 3 s.f.) [allow 0.64 to 2 s.f.]A1 ft for \(
Part (ii):
AnswerMarks Guidance
AnswerMark Guidance
\(H_0\): no association between \(x\) and \(y\)B1 for \(H_0\)
\(H_1\): positive association between \(x\) and \(y\)B1 for \(H_1\); NB \(H_0\ H_1\) not ito rho
Looking for positive association (one-tail test):
Critical value at 5% level is 0.5636B1 for \(\pm 0.5636\) (FT their \(H_1\))
Since \(0.636 > 0.5636\), there is sufficient evidence to reject \(H_0\)M1 for comparison with c.v., provided \(
i.e. conclude that there appears to be positive association between temperature and nitrous oxide level.A1 for conclusion in words f.t. their \(r_s\) and sensible cv 5 marks
Part (iii):
AnswerMarks Guidance
AnswerMark Guidance
Underlying distribution must be bivariate normal.B1 CAO for bivariate normal
If the distribution is bivariate normal then the scatter diagram will have an elliptical shape.B1 indep for elliptical shape
This scatter diagram is not elliptical and so a PMCC test would not be valid. (Allow comment indicating that the sample is too small to draw a firm conclusion on ellipticity and so on validity)E1 dep for conclusion 3 marks
Part (iv):
AnswerMarks Guidance
AnswerMark Guidance
\(n = 60\), PMCC critical value is \(r = 0.2997\)B1
So the critical region is \(r \geq 0.2997\)B1 FT their sensible c.v. 2 marks
Part (v):
AnswerMarks Guidance
AnswerMark Guidance
Any three of the following:
• Correlation does not imply causationE1
• There could be a third factor (causing the correlation between temperature and ozone level)E1
• the claim could be trueE1
• increased ozone could cause higher temperatures 3 marks
# Question 3:

## Part (i):
| Answer | Mark | Guidance |
|--------|------|----------|
| Rank table computed with $d$ values: $-1, 1, -1, -1, -3, 1, 0, -3, 6, 1$ | M1 | for ranking (allow all ranks reversed) |
| $d^2$ values: $1, 1, 1, 1, 9, 1, 0, 9, 36, 1$ | M1 | for $d^2$ |
| $\Sigma d^2 = 60$ | A1 | CAO for $\Sigma d^2$ |
| $r_s = 1 - \frac{6\Sigma d^2}{n(n^2-1)} = 1 - \frac{6 \times 60}{10 \times 99}$ | M1 | for structure of $r_s$ using their $\Sigma d^2$ |
| $= 0.636$ (to 3 s.f.) [allow 0.64 to 2 s.f.] | A1 ft | for $|r_s| < 1$; NB No ranking scores zero **5 marks** |

## Part (ii):
| Answer | Mark | Guidance |
|--------|------|----------|
| $H_0$: no association between $x$ and $y$ | B1 | for $H_0$ |
| $H_1$: positive association between $x$ and $y$ | B1 | for $H_1$; NB $H_0\ H_1$ not ito rho |
| Looking for positive association (one-tail test): | | |
| Critical value at 5% level is 0.5636 | B1 | for $\pm 0.5636$ (FT their $H_1$) |
| Since $0.636 > 0.5636$, there is sufficient evidence to reject $H_0$ | M1 | for comparison with c.v., provided $|r_s| < 1$ |
| i.e. conclude that there appears to be positive association between temperature and nitrous oxide level. | A1 | for conclusion in words f.t. their $r_s$ and sensible cv **5 marks** |

## Part (iii):
| Answer | Mark | Guidance |
|--------|------|----------|
| Underlying distribution must be bivariate normal. | B1 | CAO for bivariate normal |
| If the distribution is bivariate normal then the scatter diagram will have an elliptical shape. | B1 | indep for elliptical shape |
| This scatter diagram is not elliptical and so a PMCC test would not be valid. (Allow comment indicating that the sample is too small to draw a firm conclusion on ellipticity and so on validity) | E1 dep | for conclusion **3 marks** |

## Part (iv):
| Answer | Mark | Guidance |
|--------|------|----------|
| $n = 60$, PMCC critical value is $r = 0.2997$ | B1 | |
| So the critical region is $r \geq 0.2997$ | B1 FT | their sensible c.v. **2 marks** |

## Part (v):
| Answer | Mark | Guidance |
|--------|------|----------|
| Any three of the following: | | |
| • Correlation does not imply causation | E1 | |
| • There could be a third factor (causing the correlation between temperature and ozone level) | E1 | |
| • the claim could be true | E1 | |
| • increased ozone could cause higher temperatures | | **3 marks** |

---
3 A researcher is investigating the relationship between temperature and levels of the air pollutant nitrous oxide at a particular site. The researcher believes that there will be a positive correlation between the daily maximum temperature, $x$, and nitrous oxide level, $y$. Data are collected for 10 randomly selected days. The data, measured in suitable units, are given in the table and illustrated on the scatter diagram.

\begin{center}
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | }
\hline
$x$ & 13.3 & 17.2 & 16.9 & 18.7 & 18.4 & 19.3 & 23.1 & 15.0 & 20.6 & 14.4 \\
\hline
$y$ & 9 & 11 & 14 & 26 & 43 & 25 & 52 & 15 & 10 & 7 \\
\hline
\end{tabular}
\end{center}

\begin{tikzpicture}
    \begin{axis}[
        width=12cm, height=9cm,
        axis lines=left,
        xmin=8.5, xmax=27,
        ymin=0, ymax=65,
        % Define ticks based on the image
        xtick={10, 15, 20, 25},
        ytick={0, 10, 20, 30, 40, 50, 60},
        % Remove default labels to place them manually like the image
        xlabel={}, ylabel={},
        axis line style={-latex},
        tick label style={font=\small},
        clip=false
    ]
        % Scatter plot of the data points from the table
        \addplot[only marks, mark=x, mark size=3pt, thick] coordinates {
            (13.3, 9) (17.2, 11) (16.9, 14) (18.7, 26) (18.4, 43)
            (19.3, 25) (23.1, 52) (15.0, 15) (20.6, 10) (14.4, 7)
        };

        % Replicating specific label placements
        \node[anchor=east, align=right, font=\small] at (axis cs:8.3, 60) {Nitrous oxide \\ level};
        \node[anchor=west, font=\small] at (axis cs:8.5, 61.5) {$y$};
        \node[anchor=south east, font=\small] at (axis cs:26.5, 0.5) {$x$};
        \node[anchor=north, font=\small] at (axis cs:23.5, -6) {Temperature};

        % Creating the axis break (zigzag) on the x-axis
        \draw[white, line width=3pt] (axis cs:9.2, 0) -- (axis cs:9.8, 0);
        \draw[thick] (axis cs:9.0, -1.8) -- (axis cs:9.4, 1.8) -- (axis cs:9.7, -1.8) -- (axis cs:10.1, 1.8);
    \end{axis}
\end{tikzpicture}\\
(i) Calculate the value of Spearman's rank correlation coefficient for these data.\\
(ii) Perform a hypothesis test at the $5 \%$ level to check the researcher's belief, stating your hypotheses clearly.\\
(iii) It is suggested that it would be preferable to carry out a test based on the product moment correlation coefficient. State the distributional assumption required for such a test to be valid. Explain how a scatter diagram can be used to check whether the distributional assumption is likely to be valid and comment on the validity in this case.\\
(iv) A statistician investigates data over a much longer period and finds that the assumptions for the use of the product moment correlation coefficient are in fact valid. Give the critical region for the test at the $1 \%$ level, based on a sample of 60 days.\\
(v) In a different research project, into the correlation between daily temperature and ozone pollution levels, a positive correlation is found. It is argued that this shows that high temperatures cause increased ozone levels. Comment on this claim.

\hfill \mbox{\textit{OCR MEI S2 2006 Q3 [18]}}