OCR MEI S4 2016 June — Question 4

Exam BoardOCR MEI
ModuleS4 (Statistics 4)
Year2016
SessionJune
PaperDownload PDF ↗
Mark schemeDownload PDF ↗

4 The cardiovascular unit of a hospital is studying the effect on patients' heart rates of three different light exercises, \(\mathrm { A } , \mathrm { B }\) and C . Patients are given an exercise to do and the increases in their pulse rates are measured after 5 minutes. There are 16 patients in the study: 5 are chosen randomly and allocated to exercise A, 6 to exercise B, and 5 to exercise C. The data obtained are as follows.
ABC
636956
417244
425265
516448
475453
ABC
Sum of data244368266
Sum of squares122242291014410
  1. State the usual one-way analysis of variance model. Explain what the terms in the model mean in this context.
    State the distributional assumptions required for the standard test.
    Carry out the test at the \(5 \%\) level of significance and report your conclusions.
  2. Someone unfamiliar with analysis of variance analysed these data. They used three \(t\) tests to compare A with \(\mathrm { B } , \mathrm { B }\) with C , and C with A . The test comparing A with B was significant at the \(5 \%\) level; the other two tests were not significant at the \(5 \%\) level. Comment on this analysis, explaining whether it is better than, worse than or equivalent to the analysis carried out in part (i). Your comments should include consideration of the independence of the \(t\) tests and the overall level of significance of the procedure.

Part (i)
AnswerMarks Guidance
AnswerMarks Guidance
\(Y_{ij}=\mu+\alpha_i+\varepsilon_{ij}\)B1
where \(Y_{ij}\) is the \(j\)th value in the \(i\)th groupB1
\(\mu\) is the global mean in the underlying populationB1
\(\alpha_i\) is the 'treatment effect' in the \(i\)th groupB1 Or \(\mu_i-\mu\)
\(\varepsilon_{ij}\) is a random error termB1 Accept "residual"
In this context, \(\mu\) measures the average effect of the exercise regimes, and the \(\alpha_i\) represent the differences from the mean for the three regimesE1 Context explained at least once; 'Groups' are exercise regimes
\(\varepsilon_{ij}\) iid \(N(0,\sigma^2)\)B1 Distributional assumption
\(H_0\): the three exercise regimes give the same (population) increase in mean pulse rateB1 Or: \(\alpha_1=\alpha_2=\alpha_3(=0)\)
\(H_1\): the three exercise regimes do not give the same (population) increase in mean pulse rate Not all \(\alpha_i\) the same
\(\sum\frac{T_i^2}{n_i}-\frac{T^2}{n}=\frac{244^2}{5}+\frac{368^2}{6}+\frac{266^2}{5}-\frac{878^2}{16}=448.8167\)M1A1
\(\sum\sum y_{ij}^2-\frac{T^2}{n}=49544-\frac{878^2}{16}=1363.75\)M1A1
ANOVA table: Between Groups SS = 448.8167, df = 2, MS = 224.41, F ratio = 3.1885, F critical = 3.8056A1Ft, B1 Within Groups Sum Sq; Df all 3
Within Groups SS = 914.9333, df = 13, MS = 70.379A1Ft F ratio Ft their Sum Sqs; Ft their Total SS-BGSS
Total SS = 1363.75, df = 15B1 F critical; 3.81 from tables
Result not significantM1
Insufficient evidence to suppose that the exercise regimes have different effects on pulse rate on averageA1
Part (ii)
AnswerMarks Guidance
AnswerMarks Guidance
The analysis using three tests is not equivalent to ANOVA, and the multiple comparisons procedure is worse than ANOVAB1 Other points could be made; e.g. Multiple comparisons are likely to generate more type I errors than the nominal significance level would suggest
The three tests are not independentB1
The significance level of the whole procedure is therefore impossible to assessB1 However, multiple comparisons are useful post hoc to identify where the largest differences have occurred
A comparison with the different result obtained in (i)B1
and why this may be soB1
## Part (i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $Y_{ij}=\mu+\alpha_i+\varepsilon_{ij}$ | B1 | |
| where $Y_{ij}$ is the $j$th value in the $i$th group | B1 | |
| $\mu$ is the global mean in the underlying population | B1 | |
| $\alpha_i$ is the 'treatment effect' in the $i$th group | B1 | Or $\mu_i-\mu$ |
| $\varepsilon_{ij}$ is a random error term | B1 | Accept "residual" |
| In this context, $\mu$ measures the average effect of the exercise regimes, and the $\alpha_i$ represent the differences from the mean for the three regimes | E1 | Context explained at least once; 'Groups' are exercise regimes |
| $\varepsilon_{ij}$ iid $N(0,\sigma^2)$ | B1 | Distributional assumption |
| $H_0$: the three exercise regimes give the same (population) increase in mean pulse rate | B1 | Or: $\alpha_1=\alpha_2=\alpha_3(=0)$ |
| $H_1$: the three exercise regimes do not give the same (population) increase in mean pulse rate | | Not all $\alpha_i$ the same |
| $\sum\frac{T_i^2}{n_i}-\frac{T^2}{n}=\frac{244^2}{5}+\frac{368^2}{6}+\frac{266^2}{5}-\frac{878^2}{16}=448.8167$ | M1A1 | |
| $\sum\sum y_{ij}^2-\frac{T^2}{n}=49544-\frac{878^2}{16}=1363.75$ | M1A1 | |
| ANOVA table: Between Groups SS = 448.8167, df = 2, MS = 224.41, F ratio = 3.1885, F critical = 3.8056 | A1Ft, B1 | Within Groups Sum Sq; Df all 3 |
| Within Groups SS = 914.9333, df = 13, MS = 70.379 | A1Ft | F ratio Ft their Sum Sqs; Ft their Total SS-BGSS |
| Total SS = 1363.75, df = 15 | B1 | F critical; 3.81 from tables |
| Result not significant | M1 | |
| Insufficient evidence to suppose that the exercise regimes have different effects on pulse rate on average | A1 | |

## Part (ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| The analysis using three tests is not equivalent to ANOVA, and the multiple comparisons procedure is worse than ANOVA | B1 | Other points could be made; e.g. Multiple comparisons are likely to generate more type I errors than the nominal significance level would suggest |
| The three tests are not independent | B1 | |
| The significance level of the whole procedure is therefore impossible to assess | B1 | However, multiple comparisons are useful post hoc to identify where the largest differences have occurred |
| A comparison with the different result obtained in (i) | B1 | |
| and why this may be so | B1 | |
4 The cardiovascular unit of a hospital is studying the effect on patients' heart rates of three different light exercises, $\mathrm { A } , \mathrm { B }$ and C . Patients are given an exercise to do and the increases in their pulse rates are measured after 5 minutes. There are 16 patients in the study: 5 are chosen randomly and allocated to exercise A, 6 to exercise B, and 5 to exercise C.

The data obtained are as follows.

\begin{center}
\begin{tabular}{ | c | c | c | }
\hline
A & B & C \\
\hline
63 & 69 & 56 \\
\hline
41 & 72 & 44 \\
\hline
42 & 52 & 65 \\
\hline
51 & 64 & 48 \\
\hline
47 & 54 & 53 \\
\hline
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{ | l | c | c | c | }
\hline
 & A & B & \multicolumn{1}{|c|}{C} \\
\hline
Sum of data & 244 & 368 & 266 \\
\hline
Sum of squares & 12224 & 22910 & 14410 \\
\hline
\end{tabular}
\end{center}

\begin{enumerate}[label=(\roman*)]
\item State the usual one-way analysis of variance model.

Explain what the terms in the model mean in this context.\\
State the distributional assumptions required for the standard test.\\
Carry out the test at the $5 \%$ level of significance and report your conclusions.
\item Someone unfamiliar with analysis of variance analysed these data. They used three $t$ tests to compare A with $\mathrm { B } , \mathrm { B }$ with C , and C with A . The test comparing A with B was significant at the $5 \%$ level; the other two tests were not significant at the $5 \%$ level.

Comment on this analysis, explaining whether it is better than, worse than or equivalent to the analysis carried out in part (i). Your comments should include consideration of the independence of the $t$ tests and the overall level of significance of the procedure.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI S4 2016 Q4}}