| Exam Board | OCR MEI |
|---|---|
| Module | S2 (Statistics 2) |
| Year | 2013 |
| Session | June |
| Marks | 18 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Hypothesis test of Pearson’s product-moment correlation coefficient |
| Type | Calculate PMCC from summary statistics |
| Difficulty | Standard +0.3 This is a straightforward application of standard PMCC formulas and hypothesis testing procedures. Part (i) requires substituting into the correlation coefficient formula, (ii) is a routine one-tailed test comparing r to critical values, (iii-iv) are bookwork recall, and (v) involves simple arithmetic adjustments to summary statistics. All steps are mechanical with no problem-solving or novel insight required, making it slightly easier than average. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.08d Hypothesis test: Pearson correlation |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| \(H_0: \rho = 0\) | B1 | For \(H_0\), \(H_1\) in symbols. Hypotheses in words must refer to population. Do not allow alternative symbols unless clearly defined as the population correlation coefficient. |
| \(H_1: \rho > 0\) (one-tailed test) | ||
| where \(\rho\) is the population correlation coefficient | B1 | For defining \(\rho\). Condone omission of "population" if correct notation \(\rho\) is used, but if \(\rho\) is defined as the sample correlation coefficient then award B0. Allow "\(\rho\) is the pmcc". |
| For \(n = 60\), 5% critical value \(= 0.2144\) | B1 | For critical value |
| Since \(0.665 > 0.2144\), the result is significant. | M1 | For sensible comparison leading to a conclusion provided that \( |
| Thus we have sufficient evidence to reject \(H_0\) | A1 | For reject \(H_0\) o.e. FT their \(r\) and critical value from 5% 1-tail column. |
| There is sufficient evidence at the 5% level to suggest that there is positive correlation between FEV1 before and after the two-week course. | E1 | For correct, non-assertive conclusion in context (allow '\(x\) and \(y\)' for context). E0 if \(H_0\) and \(H_1\) not stated, reversed or mention a value other than zero for \(\rho\) in \(H_0\). |
| [6] |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| The underlying population must have a bivariate Normal distribution. | B1 | Condone "bivariate Normal distribution", "underlying bivariate Normal distribution", but do not allow "the data have a bivariate Normal distribution" |
| Yes, since the scatter diagram appears to have a roughly elliptical shape. | E1 | Condone 'oval' or suitable diagram |
| [2] |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| The significance level is the probability of rejecting the null hypothesis when in fact it is true. | E1* | For "probability of rejecting \(H_0\)" or "probability of a significant result". |
| E1dep* | For "when \(H_0\) is true" | |
| [2] |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Marks | Guidance |
| \(\sum x = 43.62 + 0.45 = 44.07\) | B1 | For \(\sum x\) or \(\sum y\) or \(\sum xy\) |
| \(\sum y = 55.15 - 0.45 = 54.70\) | ||
| \(\sum xy = 40.66\) | ||
| \(\sum x^2 = 32.68 + 1 - 0.55^2 = 33.3775\) | B1 | For \(\sum x^2\) or \(\sum y^2\) (to 2 dp) |
| \(\sum y^2 = 51.44 - 1 + 0.55^2 = 50.7425\) | ||
| B1 | For all correct (ignore \(n\)) | |
| [3] |
# Question 1:
## Part (ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0: \rho = 0$ | B1 | For $H_0$, $H_1$ in symbols. Hypotheses in words must refer to population. Do not allow alternative symbols unless clearly defined as the population correlation coefficient. |
| $H_1: \rho > 0$ (one-tailed test) | | |
| where $\rho$ is the population correlation coefficient | B1 | For defining $\rho$. Condone omission of "population" if correct notation $\rho$ is used, but if $\rho$ is defined as the **sample** correlation coefficient then award **B0**. Allow "$\rho$ is the pmcc". |
| For $n = 60$, 5% critical value $= 0.2144$ | B1 | For critical value |
| Since $0.665 > 0.2144$, the result is significant. | M1 | For sensible comparison leading to a conclusion provided that $|r| < 1$. Sensible comparison: e.g. $0.665 > 0.2144$ is 'sensible' whereas $0.665 > -0.2144$ is 'not sensible'. Reversed inequality e.g. $0.665 < 0.2144$ gets max M1 A0. |
| Thus we have sufficient evidence to reject $H_0$ | A1 | For reject $H_0$ o.e. FT their $r$ and critical value from 5% 1-tail column. |
| There is sufficient evidence at the 5% level to **suggest** that there is **positive** correlation between FEV1 before and after the two-week course. | E1 | For correct, **non-assertive** conclusion in context (allow '$x$ and $y$' for context). E0 if $H_0$ and $H_1$ not stated, reversed or mention a value other than zero for $\rho$ in $H_0$. |
| | **[6]** | |
## Part (iii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| The underlying population must have a bivariate Normal distribution. | B1 | Condone "bivariate Normal distribution", "underlying bivariate Normal distribution", but **do not allow** "the **data** have a bivariate Normal distribution" |
| Yes, since the scatter diagram appears to have a roughly elliptical shape. | E1 | Condone 'oval' or suitable diagram |
| | **[2]** | |
## Part (iv)
| Answer | Marks | Guidance |
|--------|-------|----------|
| The significance level is the probability of rejecting the null hypothesis when in fact it is true. | E1* | For "probability of rejecting $H_0$" or "probability of a significant result". |
| | E1dep* | For "when $H_0$ is true" |
| | **[2]** | |
## Part (v)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $\sum x = 43.62 + 0.45 = 44.07$ | B1 | For $\sum x$ or $\sum y$ or $\sum xy$ |
| $\sum y = 55.15 - 0.45 = 54.70$ | | |
| $\sum xy = 40.66$ | | |
| $\sum x^2 = 32.68 + 1 - 0.55^2 = 33.3775$ | B1 | For $\sum x^2$ or $\sum y^2$ (to 2 dp) |
| $\sum y^2 = 51.44 - 1 + 0.55^2 = 50.7425$ | | |
| | B1 | For all correct (ignore $n$) |
| | **[3]** | |
---
1 Salbutamol is a drug used to improve lung function. In a medical trial, a random sample of 60 people with impaired lung function was selected. The forced expiratory volume in one second (FEV1) was measured for each person, both before being given salbutamol and again after a two-week course of the drug. The variables $x$ and $y$, measured in suitable units, represent FEV1 before and after the two-week course respectively. The data are illustrated in the scatter diagram below, together with the summary statistics for these data.\\
\includegraphics[max width=\textwidth, alt={}, center]{f3690bc0-3392-4f29-86f7-797d33fab4f1-2_682_1024_502_516}
Summary statistics:
$$n = 60 , \quad \sum x = 43.62 , \quad \sum y = 55.15 , \quad \sum x ^ { 2 } = 32.68 , \quad \sum y ^ { 2 } = 51.44 , \quad \sum x y = 40.66$$
(i) Calculate the sample product moment correlation coefficient.\\
(ii) Carry out a hypothesis test at the $5 \%$ significance level to investigate whether there is positive correlation between FEV1 before and after the course.\\
(iii) State the distributional assumption which is necessary for this test to be valid. State, with a reason, whether the assumption appears to be valid.\\
(iv) Explain the meaning of the term 'significance level'.\\
(v) Calculate the values of the summary statistics if the data point $x = 0.55 , y = 1.00$ had been incorrectly recorded as $x = 1.00 , y = 0.55$.
\hfill \mbox{\textit{OCR MEI S2 2013 Q1 [18]}}