| Exam Board | OCR MEI |
|---|---|
| Module | S2 (Statistics 2) |
| Year | 2007 |
| Session | June |
| Marks | 18 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Chi-squared test of independence |
| Type | Interpret association after test |
| Difficulty | Standard +0.3 This is a standard chi-squared test of independence with straightforward calculations: computing expected frequencies from row/column totals, calculating contributions, and comparing to critical values. Part (ii) requires basic interpretation of the data, and part (iii) is a routine normal approximation to binomial. All techniques are textbook applications with no novel insight required, making it slightly easier than average. |
| Spec | 2.04e Normal distribution: as model N(mu, sigma^2)2.04f Find normal probabilities: Z transformation2.05a Hypothesis testing language: null, alternative, p-value, significance2.05c Significance levels: one-tail and two-tail5.06a Chi-squared: contingency tables |
| Observed | Sex | \multirow{2}{*}{Row totals} | ||||
| \cline { 3 - 4 } | Male | Female | ||||
\multirow{3}{*}{
| Under 40 | 70 | 54 | 124 | ||
| \cline { 2 - 4 } | \(40 - 49\) | 76 | 36 | 112 | ||
| \cline { 2 - 5 } | 50 and over | 52 | 12 | 64 | ||
| Column totals | 198 | 102 | 300 | |||
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Marks | Guidance |
| \(H_0\): no association between age group and sex; \(H_1\): some association between age group and sex | B1 | In context |
| Expected values: Under 40: 81.84, 42.16; 40–49: 73.92, 38.08; 50+: 42.24, 21.76 | M1 A1 | To 2dp |
| Valid attempt at \(\frac{(O-E)^2}{E}\) | M1 | |
| Contributions: 1.713, 3.325, 0.059, 0.114, 2.255, 4.378 summed | M1dep | For summation |
| \(X^2 = 11.84\) | A1 | CAO |
| 2 degrees of freedom; critical value at 5% \(= 5.991\) | B1 B1 | B1 for 2 d.o.f.; B1 CAO for cv |
| Result is significant; some association between age group and sex | B1 E1 | B1 dep on their cv & \(X^2\); E1 conclusion in context |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Marks | Guidance |
| More females in under 40 group and fewer in 50+ than expected if no association | E1 | |
| Reverse is true for males; data support the suggestion | E1 E1dep | E1dep on at least one previous E1 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Marks | Guidance |
| Binomial\((300, 0.03)\); \(n=300\), \(p=0.03\) | B1 | CAO |
| Use Poisson approximation with \(\lambda = np = 9\) | B1 | For Poisson; B1dep for Poisson(9) |
| \(P(X \geq 12) = 1 - P(X \leq 11) = 1 - 0.8030 = 0.197\) | M1 A1 | M1 for tables to find \(1 - P(X \leq 11)\) |
| OR: Normal approx. \(N(9, 8.73)\); \(P(X > 11.5) = P\!\left(Z > \frac{11.5-9}{\sqrt{8.73}}\right) = P(Z > 0.846) = 1 - 0.8012 = 0.199\) | B1 M1 A1 | B1 for Normal; B1dep for parameters; M1 for correct tail (cc not required for M1) |
# Question 4:
## Part (i)
| Answer/Working | Marks | Guidance |
|---|---|---|
| $H_0$: no association between age group and sex; $H_1$: some association between age group and sex | B1 | In context |
| Expected values: Under 40: 81.84, 42.16; 40–49: 73.92, 38.08; 50+: 42.24, 21.76 | M1 A1 | To 2dp |
| Valid attempt at $\frac{(O-E)^2}{E}$ | M1 | |
| Contributions: 1.713, 3.325, 0.059, 0.114, 2.255, 4.378 summed | M1dep | For summation |
| $X^2 = 11.84$ | A1 | CAO |
| 2 degrees of freedom; critical value at 5% $= 5.991$ | B1 B1 | B1 for 2 d.o.f.; B1 CAO for cv |
| Result is significant; some association between age group and sex | B1 E1 | B1 dep on their cv & $X^2$; E1 conclusion in context |
## Part (ii)
| Answer/Working | Marks | Guidance |
|---|---|---|
| More females in under 40 group and fewer in 50+ than expected if no association | E1 | |
| Reverse is true for males; data support the suggestion | E1 E1dep | E1dep on at least one previous E1 |
## Part (iii)
| Answer/Working | Marks | Guidance |
|---|---|---|
| Binomial$(300, 0.03)$; $n=300$, $p=0.03$ | B1 | CAO |
| Use Poisson approximation with $\lambda = np = 9$ | B1 | For Poisson; B1dep for Poisson(9) |
| $P(X \geq 12) = 1 - P(X \leq 11) = 1 - 0.8030 = 0.197$ | M1 A1 | M1 for tables to find $1 - P(X \leq 11)$ |
| OR: Normal approx. $N(9, 8.73)$; $P(X > 11.5) = P\!\left(Z > \frac{11.5-9}{\sqrt{8.73}}\right) = P(Z > 0.846) = 1 - 0.8012 = 0.199$ | B1 M1 A1 | B1 for Normal; B1dep for parameters; M1 for correct tail (cc not required for M1) |
4 The sexes and ages of a random sample of 300 runners taking part in marathons are classified as follows.
\begin{center}
\begin{tabular}{ | c | l | c | c | c | }
\hline
\multicolumn{2}{|c|}{Observed} & \multicolumn{2}{c|}{Sex} & \multirow{2}{*}{Row totals} \\
\cline { 3 - 4 }
& Male & Female & \\
\hline
\multirow{3}{*}{\begin{tabular}{ c }
Age \\
group \\
\end{tabular}} & Under 40 & 70 & 54 & 124 \\
\cline { 2 - 4 }
& $40 - 49$ & 76 & 36 & 112 \\
\cline { 2 - 5 }
& 50 and over & 52 & 12 & 64 \\
\hline
\multicolumn{2}{|c|}{Column totals} & 198 & 102 & 300 \\
\hline
\end{tabular}
\end{center}
(i) Carry out a test at the $5 \%$ significance level to examine whether there is any association between age group and sex. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic.\\
(ii) Does your analysis support the suggestion that women are less likely than men to enter marathons as they get older? Justify your answer.
For marathons in general, on average $3 \%$ of runners are 'Female, 50 and over'. The random variable $X$ represents the number of 'Female, 50 and over' runners in a random sample of size 300.\\
(iii) Use a suitable approximating distribution to find $\mathrm { P } ( X \geqslant 12 )$.
\hfill \mbox{\textit{OCR MEI S2 2007 Q4 [18]}}