| Exam Board | OCR MEI |
|---|---|
| Module | S3 (Statistics 3) |
| Year | 2008 |
| Session | January |
| Marks | 18 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Hypothesis test of binomial distributions |
| Type | Binomial parameters from given information |
| Difficulty | Standard +0.3 This is a standard chi-squared goodness-of-fit test with straightforward parameter estimation from grouped data. Part (a)(i) requires calculating a weighted mean and relating it to np, part (a)(ii) is a routine application of chi-squared testing with expected frequencies, and part (b) tests basic sampling knowledge. The calculations are mechanical with no conceptual challenges beyond A-level statistics curriculum. |
| Spec | 5.02c Linear coding: effects on mean and variance5.06b Fit prescribed distribution: chi-squared test |
| Number of girls | Number of families |
| 0 | 32 |
| 1 | 110 |
| 2 | 154 |
| 3 | 125 |
| 4 | 63 |
| 5 | 16 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| \(\bar{x} = \frac{1125}{500} = 2.25\); For binomial \(E(X) = n \times p\) | B1, M1 | Use of mean of binomial distribution. May be implicit. |
| \(\therefore \hat{p} = \frac{2.25}{5} = 0.45\) | A1 | Beware: answer given. |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| Expected frequencies calculated: \(f_e(\text{calc})\): 25.164, 102.944, 168.455, 137.827, 56.384, 9.226; \(f_e(\text{tables})\): 25.15, 102.95, 168.45, 137.85, 56.35, 9.25 | M1, A1 | Calculation of expected frequencies. All correct. |
| \(X^2 = 1.8571 + 0.4836 + 1.2404 + 1.1938 + 0.7763 + 4.9737 = 10.52(49)\) | M1, A1 | Or using tables: \(1.8657 + 0.4828 + 1.2396 + 1.1978 + 0.7848 + 4.9257\); c.a.o. Or using tables: \(10.49(64)\) |
| Refer to \(\chi^2_4\) | M1 | Allow correct df \((= \text{cells} - 2)\) from wrongly grouped or ungrouped table, and FT. Otherwise, no FT if wrong. |
| Upper 5% point is 9.488. Significant. Suggests binomial model does not fit. | A1, A1, A1 | No ft from here if wrong. ft only c's test statistic. ft only c's test statistic. |
| The model appears to overestimate in the middle and to underestimate at the tails. The biggest discrepancy is at \(X = 5\). | E1, E1 | Accept also any other sensible comment e.g. at 2.5% significance, the result would NOT have been significant. |
| A binomial model assumes all trials are independent with a constant probability of "success". It seems unlikely that there will be independence within families and/or that \(p\) will be the same for all families. | E2 | (E2, 1, 0) Any sensible comment which addresses independence and constant \(p\). |
| Answer | Marks | Guidance |
|---|---|---|
| Answer | Mark | Guidance |
| She should try to choose a simple random sample | E1 | |
| which would involve establishing a sampling frame and using some form of random number generator. | E1, E1 | Allow sensible discussion of practical limitations of choosing a random sample. Allow other sensible suggestions. E.g. systematic sample - choosing every tenth family; stratified sample - by the number of girls in a family. |
# Question 4:
## Part (a)(i)
| Answer | Mark | Guidance |
|--------|------|----------|
| $\bar{x} = \frac{1125}{500} = 2.25$; For binomial $E(X) = n \times p$ | B1, M1 | Use of mean of binomial distribution. May be implicit. |
| $\therefore \hat{p} = \frac{2.25}{5} = 0.45$ | A1 | Beware: answer given. |
## Part (a)(ii)
| Answer | Mark | Guidance |
|--------|------|----------|
| Expected frequencies calculated: $f_e(\text{calc})$: 25.164, 102.944, 168.455, 137.827, 56.384, 9.226; $f_e(\text{tables})$: 25.15, 102.95, 168.45, 137.85, 56.35, 9.25 | M1, A1 | Calculation of expected frequencies. All correct. |
| $X^2 = 1.8571 + 0.4836 + 1.2404 + 1.1938 + 0.7763 + 4.9737 = 10.52(49)$ | M1, A1 | Or using tables: $1.8657 + 0.4828 + 1.2396 + 1.1978 + 0.7848 + 4.9257$; c.a.o. Or using tables: $10.49(64)$ |
| Refer to $\chi^2_4$ | M1 | Allow correct df $(= \text{cells} - 2)$ from wrongly grouped or ungrouped table, and FT. Otherwise, no FT if wrong. |
| Upper 5% point is 9.488. Significant. Suggests binomial model does not fit. | A1, A1, A1 | No ft from here if wrong. ft only c's test statistic. ft only c's test statistic. |
| The model appears to overestimate in the middle and to underestimate at the tails. The biggest discrepancy is at $X = 5$. | E1, E1 | Accept also any other sensible comment e.g. at 2.5% significance, the result would NOT have been significant. |
| A binomial model assumes all trials are independent with a constant probability of "success". It seems unlikely that there will be independence within families and/or that $p$ will be the same for all families. | E2 | (E2, 1, 0) Any sensible comment which addresses independence and constant $p$. |
## Part (b)
| Answer | Mark | Guidance |
|--------|------|----------|
| She should try to choose a simple random sample | E1 | |
| which would involve establishing a sampling frame and using some form of random number generator. | E1, E1 | Allow sensible discussion of practical limitations of choosing a random sample. Allow other sensible suggestions. E.g. systematic sample - choosing every tenth family; stratified sample - by the number of girls in a family. |
4
\begin{enumerate}[label=(\alph*)]
\item In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution $\mathrm { B } ( 5 , p )$ should model these data.
\begin{center}
\begin{tabular}{ | c | c | }
\hline
Number of girls & Number of families \\
\hline
0 & 32 \\
\hline
1 & 110 \\
\hline
2 & 154 \\
\hline
3 & 125 \\
\hline
4 & 63 \\
\hline
5 & 16 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\roman*)]
\item Use this information to calculate an estimate for the mean number of girls per family of 5 children. Hence show that 0.45 can be taken as an estimate of $p$.
\item Investigate at a $5 \%$ significance level whether the binomial model with $p$ estimated as 0.45 fits the data. Comment on your findings and also on the extent to which the conditions for a binomial model are likely to be met.
\end{enumerate}\item A researcher wishes to select 50 families from the 500 in part (a) for further study. Suggest what sort of sample she might choose and describe how she should go about choosing it.
\end{enumerate}
\hfill \mbox{\textit{OCR MEI S3 2008 Q4 [18]}}