OCR MEI S3 2012 January — Question 3 18 marks

Exam BoardOCR MEI
ModuleS3 (Statistics 3)
Year2012
SessionJanuary
Marks18
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared goodness of fit
TypeChi-squared goodness of fit: Poisson
DifficultyStandard +0.3 This is a straightforward application of chi-squared goodness of fit with expected frequencies already provided, requiring only calculation of test statistic and comparison to critical value. The Wilcoxon test in part (b) is also routine. Both parts involve standard procedures with no conceptual challenges, making this slightly easier than average for Further Maths Statistics.
Spec5.06b Fit prescribed distribution: chi-squared test5.07b Sign test: and Wilcoxon signed-rank

3
  1. A medical researcher is looking into the delay, in years, between first and second myocardial infarctions (heart attacks). The following table shows the results for a random sample of 225 patients.
    Delay (years)\(0 -\)\(1 -\)\(2 -\)\(3 -\)\(4 - 10\)
    Number of patients160401393
    The mean of this sample is used to construct a model which gives the following expected frequencies.
    Delay (years)\(0 -\)\(1 -\)\(2 -\)\(3 -\)\(4 - 10\)
    Number of patients142.2352.3219.257.084.12
    Carry out a test, using a \(2.5 \%\) level of significance, of the goodness of fit of the model to the data.
  2. A further piece of research compares the incidence of myocardial infarction in men aged 55 to 70 with that in women aged 55 to 70 . Incidence is measured by the number of infarctions per 10000 of the population. For a random sample of 8 health authorities across the UK, the following results for the year 2010 were obtained.
    Health authorityABCDEFGH
    Incidence in men4756155145545032
    Incidence in women3630304754552727
    A Wilcoxon paired sample test, using the hypotheses \(\mathrm { H } _ { 0 } : m = 0\) and \(\mathrm { H } _ { 1 } : m \neq 0\) where \(m\) is the population median difference, is to be carried out to investigate whether there is any difference between men and women on the whole.
    1. Explain why a paired test is being used in this context.
    2. Carry out the test using a \(10 \%\) level of significance.

Question 3:
Part (a)
AnswerMarks Guidance
AnswerMarks Guidance
\(H_0\): The model for the delay fits the data. \(H_1\): The model for the delay does not fit the data.B1, B1 Do not allow hypotheses of the form "Data fit model"
Observed frequencies: 160, 40, 13, 9, 3. Expected frequencies: 142.23, 52.32, 19.25, 7.08, 4.12
Merge last 2 cells: Obs 12, Exp 11.2M1
\(X^2 = 2.2202 + 2.9010 + 2.0292 + 0.0571 = 7.207(5)\)M1, A1 Calculation of \(X^2\). cao. If not merged, \(X^2 = 7.975(5...)\)
Refer to \(\chi^2_2\).M1 No ft if wrong. Allow correct dof (= cells \(- 2\)) from wrongly grouped table
Upper 2.5% point is 7.378.A1 c.a.o. Upper 2.5% point for c's dof. \(P(X^2 > 7.2075) = 0.0272\)
\(7.207 < 7.378\), \(\therefore\) Not Significant. Suggests it is reasonable to suppose the model fits the data.A1, A1 ft only c's test statistic. "Non-assertive" conclusion in words
[9]
Part (b)(i)
AnswerMarks Guidance
AnswerMarks Guidance
A paired test is used in this context in order to eliminate differences between health authorities.E1 oe
[1]
Part (b)(ii)
AnswerMarks Guidance
AnswerMarks Guidance
Differences: 11, 26, \(-15\), 4, \(-9\), \(-1\), 23, 5; Ranks: 5, 8, 6, 2, 4, 1, 7, 3M1, M1, A1 For differences (ZERO if not used); For ranks; ft from here if ranks wrong
\(W_- = 1 + 4 + 6 = 11\)B1 (or \(W_+ = 2+3+5+7+8 = 25\))
Refer to tables of Wilcoxon paired (single sample) statistic for \(n = 8\).M1 No ft from here if wrong
Lower 5% tail is 5 (or upper is 31 if 25 used).A1 ie a 2-tail test. No ft from here if wrong
\(11 > 5\), \(\therefore\) Result is not significant.A1 ft only c's test statistic
No evidence to suggest a difference between the incidences of myocardial infarction in men and women on the whole.A1 ft only c's test statistic. "Non-assertive" conclusion in context to include "on the whole"
[8]
# Question 3:

## Part (a)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0$: The model for the delay fits the data. $H_1$: The model for the delay does not fit the data. | B1, B1 | Do not allow hypotheses of the form "Data fit model" |
| Observed frequencies: 160, 40, 13, 9, 3. Expected frequencies: 142.23, 52.32, 19.25, 7.08, 4.12 | | |
| Merge last 2 cells: Obs 12, Exp 11.2 | M1 | |
| $X^2 = 2.2202 + 2.9010 + 2.0292 + 0.0571 = 7.207(5)$ | M1, A1 | Calculation of $X^2$. cao. If not merged, $X^2 = 7.975(5...)$ |
| Refer to $\chi^2_2$. | M1 | No ft if wrong. Allow correct dof (= cells $- 2$) from wrongly grouped table |
| Upper 2.5% point is 7.378. | A1 | c.a.o. Upper 2.5% point for c's dof. $P(X^2 > 7.2075) = 0.0272$ |
| $7.207 < 7.378$, $\therefore$ Not Significant. Suggests it is reasonable to suppose the model fits the data. | A1, A1 | ft only c's test statistic. "Non-assertive" conclusion in words |
| **[9]** | | |

## Part (b)(i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| A paired test is used in this context in order to eliminate differences between health authorities. | E1 | oe |
| **[1]** | | |

## Part (b)(ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Differences: 11, 26, $-15$, 4, $-9$, $-1$, 23, 5; Ranks: 5, 8, 6, 2, 4, 1, 7, 3 | M1, M1, A1 | For differences (ZERO if not used); For ranks; ft from here if ranks wrong |
| $W_- = 1 + 4 + 6 = 11$ | B1 | (or $W_+ = 2+3+5+7+8 = 25$) |
| Refer to tables of Wilcoxon paired (single sample) statistic for $n = 8$. | M1 | No ft from here if wrong |
| Lower 5% tail is 5 (or upper is 31 if 25 used). | A1 | ie a 2-tail test. No ft from here if wrong |
| $11 > 5$, $\therefore$ Result is not significant. | A1 | ft only c's test statistic |
| No evidence to suggest a difference between the incidences of myocardial infarction in men and women on the whole. | A1 | ft only c's test statistic. "Non-assertive" conclusion in context to include "on the whole" |
| **[8]** | | |

---
3
\begin{enumerate}[label=(\alph*)]
\item A medical researcher is looking into the delay, in years, between first and second myocardial infarctions (heart attacks). The following table shows the results for a random sample of 225 patients.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | }
\hline
Delay (years) & $0 -$ & $1 -$ & $2 -$ & $3 -$ & $4 - 10$ \\
\hline
Number of patients & 160 & 40 & 13 & 9 & 3 \\
\hline
\end{tabular}
\end{center}

The mean of this sample is used to construct a model which gives the following expected frequencies.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | }
\hline
Delay (years) & $0 -$ & $1 -$ & $2 -$ & $3 -$ & $4 - 10$ \\
\hline
Number of patients & 142.23 & 52.32 & 19.25 & 7.08 & 4.12 \\
\hline
\end{tabular}
\end{center}

Carry out a test, using a $2.5 \%$ level of significance, of the goodness of fit of the model to the data.
\item A further piece of research compares the incidence of myocardial infarction in men aged 55 to 70 with that in women aged 55 to 70 . Incidence is measured by the number of infarctions per 10000 of the population. For a random sample of 8 health authorities across the UK, the following results for the year 2010 were obtained.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | c | }
\hline
Health authority & A & B & C & D & E & F & G & H \\
\hline
Incidence in men & 47 & 56 & 15 & 51 & 45 & 54 & 50 & 32 \\
\hline
Incidence in women & 36 & 30 & 30 & 47 & 54 & 55 & 27 & 27 \\
\hline
\end{tabular}
\end{center}

A Wilcoxon paired sample test, using the hypotheses $\mathrm { H } _ { 0 } : m = 0$ and $\mathrm { H } _ { 1 } : m \neq 0$ where $m$ is the population median difference, is to be carried out to investigate whether there is any difference between men and women on the whole.
\begin{enumerate}[label=(\roman*)]
\item Explain why a paired test is being used in this context.
\item Carry out the test using a $10 \%$ level of significance.
\end{enumerate}\end{enumerate}

\hfill \mbox{\textit{OCR MEI S3 2012 Q3 [18]}}