Question 4 - A-Level Maths

OCR MEI S3 2008 January — Question 4 18 marks

Exam Board	OCR MEI
Module	S3 (Statistics 3)
Year	2008
Session	January
Marks	18
Paper	Download PDF ↗
Mark scheme	Download PDF ↗
Topic	Hypothesis test of binomial distributions
Type	Binomial parameters from given information
Difficulty	Standard +0.3 This is a standard chi-squared goodness-of-fit test with straightforward parameter estimation from grouped data. Part (a)(i) requires calculating a weighted mean and relating it to np, part (a)(ii) is a routine application of chi-squared testing with expected frequencies, and part (b) tests basic sampling knowledge. The calculations are mechanical with no conceptual challenges beyond A-level statistics curriculum.
Spec	5.02c Linear coding: effects on mean and variance 5.06b Fit prescribed distribution: chi-squared test

In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution \(\mathrm { B } ( 5 , p )\) should model these data.
Number of girls Number of families
0 32
1 110
2 154
3 125
4 63
5 16
1. Use this information to calculate an estimate for the mean number of girls per family of 5 children. Hence show that 0.45 can be taken as an estimate of \(p\).
2. Investigate at a \(5 \%\) significance level whether the binomial model with \(p\) estimated as 0.45 fits the data. Comment on your findings and also on the extent to which the conditions for a binomial model are likely to be met.
A researcher wishes to select 50 families from the 500 in part (a) for further study. Suggest what sort of sample she might choose and describe how she should go about choosing it.

Show mark scheme Show mark scheme source

Question 4:

Part (a)(i)

Answer	Marks	Guidance
Answer	Mark	Guidance
\(\bar{x} = \frac{1125}{500} = 2.25\); For binomial \(E(X) = n \times p\)	B1, M1	Use of mean of binomial distribution. May be implicit.
\(\therefore \hat{p} = \frac{2.25}{5} = 0.45\)	A1	Beware: answer given.

Part (a)(ii)

Answer	Marks	Guidance
Answer	Mark	Guidance
Expected frequencies calculated: \(f_e(\text{calc})\): 25.164, 102.944, 168.455, 137.827, 56.384, 9.226; \(f_e(\text{tables})\): 25.15, 102.95, 168.45, 137.85, 56.35, 9.25	M1, A1	Calculation of expected frequencies. All correct.
\(X^2 = 1.8571 + 0.4836 + 1.2404 + 1.1938 + 0.7763 + 4.9737 = 10.52(49)\)	M1, A1	Or using tables: \(1.8657 + 0.4828 + 1.2396 + 1.1978 + 0.7848 + 4.9257\); c.a.o. Or using tables: \(10.49(64)\)
Refer to \(\chi^2_4\)	M1	Allow correct df \((= \text{cells} - 2)\) from wrongly grouped or ungrouped table, and FT. Otherwise, no FT if wrong.
Upper 5% point is 9.488. Significant. Suggests binomial model does not fit.	A1, A1, A1	No ft from here if wrong. ft only c's test statistic. ft only c's test statistic.
The model appears to overestimate in the middle and to underestimate at the tails. The biggest discrepancy is at \(X = 5\).	E1, E1	Accept also any other sensible comment e.g. at 2.5% significance, the result would NOT have been significant.
A binomial model assumes all trials are independent with a constant probability of "success". It seems unlikely that there will be independence within families and/or that \(p\) will be the same for all families.	E2	(E2, 1, 0) Any sensible comment which addresses independence and constant \(p\).

Part (b)

Answer	Marks	Guidance
Answer	Mark	Guidance
She should try to choose a simple random sample	E1
which would involve establishing a sampling frame and using some form of random number generator.	E1, E1	Allow sensible discussion of practical limitations of choosing a random sample. Allow other sensible suggestions. E.g. systematic sample - choosing every tenth family; stratified sample - by the number of girls in a family.

# Question 4:

## Part (a)(i)
| Answer | Mark | Guidance |
|--------|------|----------|
| $\bar{x} = \frac{1125}{500} = 2.25$; For binomial $E(X) = n \times p$ | B1, M1 | Use of mean of binomial distribution. May be implicit. |
| $\therefore \hat{p} = \frac{2.25}{5} = 0.45$ | A1 | Beware: answer given. |

## Part (a)(ii)
| Answer | Mark | Guidance |
|--------|------|----------|
| Expected frequencies calculated: $f_e(\text{calc})$: 25.164, 102.944, 168.455, 137.827, 56.384, 9.226; $f_e(\text{tables})$: 25.15, 102.95, 168.45, 137.85, 56.35, 9.25 | M1, A1 | Calculation of expected frequencies. All correct. |
| $X^2 = 1.8571 + 0.4836 + 1.2404 + 1.1938 + 0.7763 + 4.9737 = 10.52(49)$ | M1, A1 | Or using tables: $1.8657 + 0.4828 + 1.2396 + 1.1978 + 0.7848 + 4.9257$; c.a.o. Or using tables: $10.49(64)$ |
| Refer to $\chi^2_4$ | M1 | Allow correct df $(= \text{cells} - 2)$ from wrongly grouped or ungrouped table, and FT. Otherwise, no FT if wrong. |
| Upper 5% point is 9.488. Significant. Suggests binomial model does not fit. | A1, A1, A1 | No ft from here if wrong. ft only c's test statistic. ft only c's test statistic. |
| The model appears to overestimate in the middle and to underestimate at the tails. The biggest discrepancy is at $X = 5$. | E1, E1 | Accept also any other sensible comment e.g. at 2.5% significance, the result would NOT have been significant. |
| A binomial model assumes all trials are independent with a constant probability of "success". It seems unlikely that there will be independence within families and/or that $p$ will be the same for all families. | E2 | (E2, 1, 0) Any sensible comment which addresses independence and constant $p$. |

## Part (b)
| Answer | Mark | Guidance |
|--------|------|----------|
| She should try to choose a simple random sample | E1 | |
| which would involve establishing a sampling frame and using some form of random number generator. | E1, E1 | Allow sensible discussion of practical limitations of choosing a random sample. Allow other sensible suggestions. E.g. systematic sample - choosing every tenth family; stratified sample - by the number of girls in a family. |

Show LaTeX source

4
\begin{enumerate}[label=(\alph*)]
\item In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution $\mathrm { B } ( 5 , p )$ should model these data.

\begin{center}
\begin{tabular}{ | c | c | }
\hline
Number of girls & Number of families \\
\hline
0 & 32 \\
\hline
1 & 110 \\
\hline
2 & 154 \\
\hline
3 & 125 \\
\hline
4 & 63 \\
\hline
5 & 16 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\roman*)]
\item Use this information to calculate an estimate for the mean number of girls per family of 5 children. Hence show that 0.45 can be taken as an estimate of $p$.
\item Investigate at a $5 \%$ significance level whether the binomial model with $p$ estimated as 0.45 fits the data. Comment on your findings and also on the extent to which the conditions for a binomial model are likely to be met.
\end{enumerate}\item A researcher wishes to select 50 families from the 500 in part (a) for further study. Suggest what sort of sample she might choose and describe how she should go about choosing it.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI S3 2008 Q4 [18]}}

This paper (4 questions)

View full paper

Q1 18 Q2 18 Q3 18 Q4 18