Question 3 - A-Level Maths

OCR MEI S3 2012 January — Question 3 18 marks

Exam Board	OCR MEI
Module	S3 (Statistics 3)
Year	2012
Session	January
Marks	18
Paper	Download PDF ↗
Mark scheme	Download PDF ↗
Topic	Chi-squared goodness of fit
Type	Chi-squared goodness of fit: Poisson
Difficulty	Standard +0.3 This is a straightforward application of chi-squared goodness of fit with expected frequencies already provided, requiring only calculation of test statistic and comparison to critical value. The Wilcoxon test in part (b) is also routine. Both parts involve standard procedures with no conceptual challenges, making this slightly easier than average for Further Maths Statistics.
Spec	5.06b Fit prescribed distribution: chi-squared test 5.07b Sign test: and Wilcoxon signed-rank

A medical researcher is looking into the delay, in years, between first and second myocardial infarctions (heart attacks). The following table shows the results for a random sample of 225 patients.
Delay (years) \(0 -\) \(1 -\) \(2 -\) \(3 -\) \(4 - 10\)
Number of patients 160 40 13 9 3
The mean of this sample is used to construct a model which gives the following expected frequencies.
Delay (years) \(0 -\) \(1 -\) \(2 -\) \(3 -\) \(4 - 10\)
Number of patients 142.23 52.32 19.25 7.08 4.12
Carry out a test, using a \(2.5 \%\) level of significance, of the goodness of fit of the model to the data.
A further piece of research compares the incidence of myocardial infarction in men aged 55 to 70 with that in women aged 55 to 70 . Incidence is measured by the number of infarctions per 10000 of the population. For a random sample of 8 health authorities across the UK, the following results for the year 2010 were obtained.
Health authority A B C D E F G H
Incidence in men 47 56 15 51 45 54 50 32
Incidence in women 36 30 30 47 54 55 27 27
A Wilcoxon paired sample test, using the hypotheses \(\mathrm { H } _ { 0 } : m = 0\) and \(\mathrm { H } _ { 1 } : m \neq 0\) where \(m\) is the population median difference, is to be carried out to investigate whether there is any difference between men and women on the whole.
1. Explain why a paired test is being used in this context.
2. Carry out the test using a \(10 \%\) level of significance.

Show mark scheme Show mark scheme source

Question 3:

Part (a)

Answer	Marks	Guidance
Answer	Marks	Guidance
\(H_0\): The model for the delay fits the data. \(H_1\): The model for the delay does not fit the data.	B1, B1	Do not allow hypotheses of the form "Data fit model"
Observed frequencies: 160, 40, 13, 9, 3. Expected frequencies: 142.23, 52.32, 19.25, 7.08, 4.12
Merge last 2 cells: Obs 12, Exp 11.2	M1
\(X^2 = 2.2202 + 2.9010 + 2.0292 + 0.0571 = 7.207(5)\)	M1, A1	Calculation of \(X^2\). cao. If not merged, \(X^2 = 7.975(5...)\)
Refer to \(\chi^2_2\).	M1	No ft if wrong. Allow correct dof (= cells \(- 2\)) from wrongly grouped table
Upper 2.5% point is 7.378.	A1	c.a.o. Upper 2.5% point for c's dof. \(P(X^2 > 7.2075) = 0.0272\)
\(7.207 < 7.378\), \(\therefore\) Not Significant. Suggests it is reasonable to suppose the model fits the data.	A1, A1	ft only c's test statistic. "Non-assertive" conclusion in words
[9]

Part (b)(i)

Answer	Marks	Guidance
Answer	Marks	Guidance
A paired test is used in this context in order to eliminate differences between health authorities.	E1	oe
[1]

Part (b)(ii)

Answer	Marks	Guidance
Answer	Marks	Guidance
Differences: 11, 26, \(-15\), 4, \(-9\), \(-1\), 23, 5; Ranks: 5, 8, 6, 2, 4, 1, 7, 3	M1, M1, A1	For differences (ZERO if not used); For ranks; ft from here if ranks wrong
\(W_- = 1 + 4 + 6 = 11\)	B1	(or \(W_+ = 2+3+5+7+8 = 25\))
Refer to tables of Wilcoxon paired (single sample) statistic for \(n = 8\).	M1	No ft from here if wrong
Lower 5% tail is 5 (or upper is 31 if 25 used).	A1	ie a 2-tail test. No ft from here if wrong
\(11 > 5\), \(\therefore\) Result is not significant.	A1	ft only c's test statistic
No evidence to suggest a difference between the incidences of myocardial infarction in men and women on the whole.	A1	ft only c's test statistic. "Non-assertive" conclusion in context to include "on the whole"
[8]

# Question 3:

## Part (a)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0$: The model for the delay fits the data. $H_1$: The model for the delay does not fit the data. | B1, B1 | Do not allow hypotheses of the form "Data fit model" |
| Observed frequencies: 160, 40, 13, 9, 3. Expected frequencies: 142.23, 52.32, 19.25, 7.08, 4.12 | | |
| Merge last 2 cells: Obs 12, Exp 11.2 | M1 | |
| $X^2 = 2.2202 + 2.9010 + 2.0292 + 0.0571 = 7.207(5)$ | M1, A1 | Calculation of $X^2$. cao. If not merged, $X^2 = 7.975(5...)$ |
| Refer to $\chi^2_2$. | M1 | No ft if wrong. Allow correct dof (= cells $- 2$) from wrongly grouped table |
| Upper 2.5% point is 7.378. | A1 | c.a.o. Upper 2.5% point for c's dof. $P(X^2 > 7.2075) = 0.0272$ |
| $7.207 < 7.378$, $\therefore$ Not Significant. Suggests it is reasonable to suppose the model fits the data. | A1, A1 | ft only c's test statistic. "Non-assertive" conclusion in words |
| **[9]** | | |

## Part (b)(i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| A paired test is used in this context in order to eliminate differences between health authorities. | E1 | oe |
| **[1]** | | |

## Part (b)(ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Differences: 11, 26, $-15$, 4, $-9$, $-1$, 23, 5; Ranks: 5, 8, 6, 2, 4, 1, 7, 3 | M1, M1, A1 | For differences (ZERO if not used); For ranks; ft from here if ranks wrong |
| $W_- = 1 + 4 + 6 = 11$ | B1 | (or $W_+ = 2+3+5+7+8 = 25$) |
| Refer to tables of Wilcoxon paired (single sample) statistic for $n = 8$. | M1 | No ft from here if wrong |
| Lower 5% tail is 5 (or upper is 31 if 25 used). | A1 | ie a 2-tail test. No ft from here if wrong |
| $11 > 5$, $\therefore$ Result is not significant. | A1 | ft only c's test statistic |
| No evidence to suggest a difference between the incidences of myocardial infarction in men and women on the whole. | A1 | ft only c's test statistic. "Non-assertive" conclusion in context to include "on the whole" |
| **[8]** | | |

---

Show LaTeX source

3
\begin{enumerate}[label=(\alph*)]
\item A medical researcher is looking into the delay, in years, between first and second myocardial infarctions (heart attacks). The following table shows the results for a random sample of 225 patients.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | }
\hline
Delay (years) & $0 -$ & $1 -$ & $2 -$ & $3 -$ & $4 - 10$ \\
\hline
Number of patients & 160 & 40 & 13 & 9 & 3 \\
\hline
\end{tabular}
\end{center}

The mean of this sample is used to construct a model which gives the following expected frequencies.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | }
\hline
Delay (years) & $0 -$ & $1 -$ & $2 -$ & $3 -$ & $4 - 10$ \\
\hline
Number of patients & 142.23 & 52.32 & 19.25 & 7.08 & 4.12 \\
\hline
\end{tabular}
\end{center}

Carry out a test, using a $2.5 \%$ level of significance, of the goodness of fit of the model to the data.
\item A further piece of research compares the incidence of myocardial infarction in men aged 55 to 70 with that in women aged 55 to 70 . Incidence is measured by the number of infarctions per 10000 of the population. For a random sample of 8 health authorities across the UK, the following results for the year 2010 were obtained.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | c | }
\hline
Health authority & A & B & C & D & E & F & G & H \\
\hline
Incidence in men & 47 & 56 & 15 & 51 & 45 & 54 & 50 & 32 \\
\hline
Incidence in women & 36 & 30 & 30 & 47 & 54 & 55 & 27 & 27 \\
\hline
\end{tabular}
\end{center}

A Wilcoxon paired sample test, using the hypotheses $\mathrm { H } _ { 0 } : m = 0$ and $\mathrm { H } _ { 1 } : m \neq 0$ where $m$ is the population median difference, is to be carried out to investigate whether there is any difference between men and women on the whole.
\begin{enumerate}[label=(\roman*)]
\item Explain why a paired test is being used in this context.
\item Carry out the test using a $10 \%$ level of significance.
\end{enumerate}\end{enumerate}

\hfill \mbox{\textit{OCR MEI S3 2012 Q3 [18]}}

This paper (4 questions)

View full paper

Q1 18 Q2 18 Q3 18 Q4 18

Delay (years)	\(0 -\)	\(1 -\)	\(2 -\)	\(3 -\)	\(4 - 10\)
Number of patients	160	40	13	9	3

Health authority	A	B	C	D	E	F	G	H
Incidence in men	47	56	15	51	45	54	50	32
Incidence in women	36	30	30	47	54	55	27	27