| Exam Board | Edexcel |
|---|---|
| Module | FS1 AS (Further Statistics 1 AS) |
| Year | 2021 |
| Session | June |
| Marks | 10 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Chi-squared goodness of fit |
| Type | Chi-squared goodness of fit: Binomial |
| Difficulty | Standard +0.3 This is a standard chi-squared goodness of fit test with binomial distribution. Part (a) requires simple subtraction to find missing expected frequencies, part (b) is a routine hypothesis test following a standard template (combining cells, calculating test statistic, comparing to critical value), and part (c) involves finding the sample mean and converting to a probability estimate. All steps are procedural with no novel insight required, making it slightly easier than average. |
| Spec | 5.02b Expectation and variance: discrete random variables5.02c Linear coding: effects on mean and variance5.06b Fit prescribed distribution: chi-squared test5.06c Fit other distributions: discrete and continuous |
| Number of seeds that do not germinate | 0 | 1 | 2 | 3 | 4 | 5 | 6 or more |
| Frequency | 15 | 35 | 38 | 22 | 10 | 5 | 0 |
| Number of seeds that do not germinate | 0 | 1 | 2 | 3 | 4 | 5 or more |
| Expected Frequency | 24.42 | 40.70 | \(r\) | 17.45 | 6.73 | \(s\) |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Mark | Guidance |
| \(r = 125 \times P(X = 2)\) or \(s = 125 \times (1 - P(X \leq 4))\) or \(s = 125 - (24.42 + 40.70 + \text{"their } r\text{"} + 17.45 + 6.73)\) or \(r = 125 - (24.42 + 40.70 + \text{"their } s\text{"} + 17.45 + 6.73)\) | M1 (AO3.4) | A correct method for finding \(r\) or \(s\); implied by a correct value for \(s\) or \(r\) |
| \(r = \text{awrt } 33.07\), \(s = \text{awrt } 2.63\) | A1 (AO1.1b) | Both correct and to two decimal places |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Mark | Guidance |
| \(H_0\): \(B(40, 0.04)\) is a suitable model; \(H_1\): \(B(40, 0.04)\) is not a suitable model | B1 (AO3.4) | Both hypotheses correct. Must include \(B(40, 0.04)\) in at least one hypothesis |
| Cells are combined when expected frequencies \(< 5\), therefore combine last 2 cells | M1 (AO2.1) | May be implied by awrt 9.75/9.76 but not \(df = 4\) |
| \(\chi^2 = \sum \frac{(15-24.42)^2}{24.42} + \frac{(35-40.7)^2}{40.7} + \ldots + \frac{(15-(6.73 + \text{""}2.63\text{""}))^2}{(6.73+\text{""}2.63\text{""})}\) | M1 (AO1.1b) | Correct method for finding \(\chi^2\); if no method shown it must be correct (allow M0 M1 for awrt 10.1) |
| \(= 9.752\) | A1 (AO1.1b) | awrt 9.75/9.76 |
| Degrees of freedom \(= 5 - 1 = 4\) | B1 (AO1.1b) | For use of one constraint, e.g. sight of \(5-1=4\); or just 4 if working shows cells not combined or \(\chi^2\) value of awrt 9.8; or allow just 5 if working shows cells not combined or \(\chi^2\) value of awrt 10.1 |
| There is significant evidence to reject \(H_0\) as \(9.752 > 9.488\), therefore Amodita's model is not supported | A1cao (AO3.5a) | Dep on both M marks. Correct conclusion in context; must have sight of CV 9.488 (condone 9.49). Allow \(B(40, 0.04)\) is not a suitable model. NB condone missing distribution if already penalised in hypotheses |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Mark | Guidance |
| \(\frac{[0\times15+]1\times35+2\times38+3\times22+4\times10+5\times5}{125\times40} = [0.0484]\) | M1 (AO1.1b) | Using the data to find a value of \(p\) |
| \(p = 0.0484\) | A1 (AO1.1b) | Allow any of 0.048, 0.05 if working shown |
# Question 1:
## Part (a)
| Answer/Working | Mark | Guidance |
|---|---|---|
| $r = 125 \times P(X = 2)$ **or** $s = 125 \times (1 - P(X \leq 4))$ **or** $s = 125 - (24.42 + 40.70 + \text{"their } r\text{"} + 17.45 + 6.73)$ **or** $r = 125 - (24.42 + 40.70 + \text{"their } s\text{"} + 17.45 + 6.73)$ | M1 (AO3.4) | A correct method for finding $r$ or $s$; implied by a correct value for $s$ or $r$ |
| $r = \text{awrt } 33.07$, $s = \text{awrt } 2.63$ | A1 (AO1.1b) | Both correct and to two decimal places |
## Part (b)
| Answer/Working | Mark | Guidance |
|---|---|---|
| $H_0$: $B(40, 0.04)$ is a suitable model; $H_1$: $B(40, 0.04)$ is not a suitable model | B1 (AO3.4) | Both hypotheses correct. Must include $B(40, 0.04)$ in at least one hypothesis |
| Cells are combined when expected frequencies $< 5$, therefore combine last 2 cells | M1 (AO2.1) | May be implied by awrt 9.75/9.76 but not $df = 4$ |
| $\chi^2 = \sum \frac{(15-24.42)^2}{24.42} + \frac{(35-40.7)^2}{40.7} + \ldots + \frac{(15-(6.73 + \text{""}2.63\text{""}))^2}{(6.73+\text{""}2.63\text{""})}$ | M1 (AO1.1b) | Correct method for finding $\chi^2$; if no method shown it must be correct (allow M0 M1 for awrt 10.1) |
| $= 9.752$ | A1 (AO1.1b) | awrt 9.75/9.76 |
| Degrees of freedom $= 5 - 1 = 4$ | B1 (AO1.1b) | For use of one constraint, e.g. sight of $5-1=4$; **or** just 4 if working shows cells not combined or $\chi^2$ value of awrt 9.8; **or** allow just 5 if working shows cells not combined or $\chi^2$ value of awrt 10.1 |
| There is significant evidence to reject $H_0$ as $9.752 > 9.488$, therefore Amodita's model is not supported | A1cao (AO3.5a) | Dep on both M marks. Correct conclusion in context; must have sight of CV 9.488 (condone 9.49). Allow $B(40, 0.04)$ is not a suitable model. NB condone missing distribution if already penalised in hypotheses |
## Part (c)
| Answer/Working | Mark | Guidance |
|---|---|---|
| $\frac{[0\times15+]1\times35+2\times38+3\times22+4\times10+5\times5}{125\times40} = [0.0484]$ | M1 (AO1.1b) | Using the data to find a value of $p$ |
| $p = 0.0484$ | A1 (AO1.1b) | Allow any of 0.048, 0.05 if working shown |
\begin{enumerate}
\item Flobee sells tomato seeds in packets, each containing 40 seeds. Flobee advertises that only 4\% of its tomato seeds do not germinate.
\end{enumerate}
Amodita is investigating the germination of Flobee's tomato seeds. She plants 125 packets of Flobee's tomato seeds and records the number of seeds that do not germinate in each packet.
\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | }
\hline
Number of seeds that do not germinate & 0 & 1 & 2 & 3 & 4 & 5 & 6 or more \\
\hline
Frequency & 15 & 35 & 38 & 22 & 10 & 5 & 0 \\
\hline
\end{tabular}
\end{center}
Amodita wants to test whether the binomial distribution $\mathrm { B } ( 40,0.04 )$ is a suitable model for these data.
The table below shows the expected frequencies, to 2 decimal places, using this model.
\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | }
\hline
Number of seeds that do not germinate & 0 & 1 & 2 & 3 & 4 & 5 or more \\
\hline
Expected Frequency & 24.42 & 40.70 & $r$ & 17.45 & 6.73 & $s$ \\
\hline
\end{tabular}
\end{center}
(a) Calculate the value of $r$ and the value of $s$\\
(b) Stating your hypotheses clearly, carry out the test at the $5 \%$ level of significance. You should state the number of degrees of freedom, critical value and conclusion clearly.
Amodita believes that Flobee should use a more realistic value for the percentage of their tomato seeds that do not germinate.\\
She decides to test the data using a new model $\mathrm { B } ( 40 , p )$\\
(c) Showing your working, suggest a more realistic value for $p$
\hfill \mbox{\textit{Edexcel FS1 AS 2021 Q1 [10]}}