| Exam Board | Edexcel |
|---|---|
| Module | S3 (Statistics 3) |
| Year | 2021 |
| Session | January |
| Marks | 10 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Chi-squared goodness of fit |
| Type | Chi-squared goodness of fit: Uniform |
| Difficulty | Standard +0.3 This is a straightforward chi-squared test question covering standard S3 material. Part (a) requires calculating expected frequencies for uniform distribution (88/4=22) and comparing a given test statistic to critical values from tables. Part (b) asks for standard hypotheses for independence. Part (c) requires one expected frequency calculation using row/column totals. All steps are routine applications of formulas with no problem-solving insight required, making it slightly easier than average. |
| Spec | 5.06a Chi-squared: contingency tables |
| \cline { 2 - 5 } \multicolumn{1}{c|}{} | Music | Art | Sports | Computers |
| No. of students | 14 | 28 | 27 | 19 |
| School Club | Music | Art | Sports | Computers |
| School \(\boldsymbol { A }\) | 3 | 10 | 9 | 8 |
| School \(\boldsymbol { B }\) | 1 | 11 | 13 | 5 |
| School \(\boldsymbol { C }\) | 11 | 6 | 7 | 4 |
| Answer | Marks | Guidance |
|---|---|---|
| Answer/Working | Marks | Guidance |
| All expected frequencies are \((88 \div 4) = 22\) | B1 | 1st B1 for 22 |
| Degrees of freedom = 3, so critical value \(\chi_3^2(5\%) = 7.815\) | B1, B1ft | 2nd B1 for degrees of freedom = 3 (can be implied by sight of 7.815 as cv); 3rd B1ft for 7.815 (or better - cal: 7.8147279..., or correct 5% cv for their d.f.) |
| Not significant so insufficient evidence to suggest not uniformly distributed | B1 | 4th B1 for comment suggesting uniform distribution is a suitable model. Must follow from comparing 6.09 with their cv. Do not allow contradictory statements e.g. "significant" so uniform dist' is suitable |
| Subtotal: (4 marks) | ||
| e.g. \(H_0\): School is independent of club chosen; \(H_1\): Club chosen depends on which school a student is from | B1 | B1 for both hypotheses with some context ("club" and "school" mentioned at least once). Use of "independence" or "association" |
| Subtotal: (1 mark) | ||
| \(\frac{28 \times 17}{88} = 5.409...\) awrt 5.41 | B1 | B1 for correct expression or awrt 5.41 (allow \(\frac{119}{22}\)) |
| Subtotal: (1 mark) | ||
| Expected frequency for Music and School C is \(4.77 < 5\) (Allow \(\frac{105}{22}\) for 4.77). So combine Music column with another column giving 3×3 table so 4 df | B1, B1 | 1st B1 for identifying that Music & School C has \(E_i\) that is < 5 (a value to 2 sf should be seen, may be in (c), but must state this \(E_i < 5\) as well). 2nd B1 for pooling music with another column leading to 3×3 table and 4 degrees of freedom. Must clearly state the pooling and evidence for 4 df e.g. allow \((3-1) \times (4-1) = 1)\) [NB pooling with Art gives 4.3987..., with Sports 4.3247..., with Computers 7.2879...] |
| Subtotal: (2 marks) | ||
| Critical value \(\chi_4^2(5\%) = 9.488\) | B1 | 1st B1 for 9.488 (or awrt 9.488) |
| [Not significant so] insufficient evidence of an association between school and choice of club | B1 | 2nd B1 for correct, not significant, conclusion mentioning school and clubs |
| Subtotal: (2 marks) |
| Answer/Working | Marks | Guidance |
|---|---|---|
| All expected frequencies are $(88 \div 4) = 22$ | B1 | 1st B1 for 22 |
| Degrees of freedom = 3, so critical value $\chi_3^2(5\%) = 7.815$ | B1, B1ft | 2nd B1 for degrees of freedom = 3 (can be implied by sight of 7.815 as cv); 3rd B1ft for 7.815 (or better - cal: 7.8147279..., or correct 5% cv for their d.f.) |
| Not significant so insufficient evidence to suggest not uniformly distributed | B1 | 4th B1 for comment suggesting uniform distribution is a suitable model. Must follow from comparing 6.09 with their cv. Do not allow contradictory statements e.g. "significant" so uniform dist' is suitable |
| **Subtotal: (4 marks)** | | |
| e.g. $H_0$: School is independent of club chosen; $H_1$: Club chosen depends on which school a student is from | B1 | B1 for both hypotheses with some context ("club" and "school" mentioned at least once). Use of "independence" or "association" |
| **Subtotal: (1 mark)** | | |
| $\frac{28 \times 17}{88} = 5.409...$ awrt **5.41** | B1 | B1 for correct expression or awrt 5.41 (allow $\frac{119}{22}$) |
| **Subtotal: (1 mark)** | | |
| Expected frequency for Music and School C is $4.77 < 5$ (Allow $\frac{105}{22}$ for 4.77). So combine Music column with another column giving 3×3 table so 4 df | B1, B1 | 1st B1 for identifying that Music & School C has $E_i$ that is < 5 (a value to 2 sf should be seen, may be in (c), but must state this $E_i < 5$ as well). 2nd B1 for pooling music with another column leading to 3×3 table and 4 degrees of freedom. Must clearly state the pooling and evidence for 4 df e.g. allow $(3-1) \times (4-1) = 1)$ [NB pooling with Art gives 4.3987..., with Sports 4.3247..., with Computers 7.2879...] |
| **Subtotal: (2 marks)** | | |
| Critical value $\chi_4^2(5\%) = 9.488$ | B1 | 1st B1 for 9.488 (or awrt 9.488) |
| [Not significant so] insufficient evidence of an association between school and choice of club | B1 | 2nd B1 for correct, not significant, conclusion mentioning school and clubs |
| **Subtotal: (2 marks)** | | |
**Total: [10 marks]**
---
3. The students in a group of schools can choose a club to join. There are 4 clubs available: Music, Art, Sports and Computers. The director collected information about the number of students in each club, using a random sample of 88 students from across the schools. The results are given in Table 1 below.
\begin{table}[h]
\begin{center}
\begin{tabular}{ | c | c | c | c | c | }
\cline { 2 - 5 }
\multicolumn{1}{c|}{} & Music & Art & Sports & Computers \\
\hline
No. of students & 14 & 28 & 27 & 19 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 1}
\end{center}
\end{table}
The director uses a chi-squared test to determine whether or not the students are uniformly distributed across the 4 clubs.
\begin{enumerate}[label=(\alph*)]
\item \begin{enumerate}[label=(\roman*)]
\item Find the expected frequencies he should use.
Given that the test statistic he calculated was 6.09 (to 3 significant figures)
\item use a $5 \%$ level of significance to complete the test. You should state the degrees of freedom and the critical value used.
The director wishes to examine the situation in more detail and takes a second random sample of 88 students. The director assumes that within each school, students select their clubs independently. The students come from 3 schools and the distribution of the students from each school amongst the clubs is given in Table 2 below.
\begin{table}[h]
\begin{center}
\begin{tabular}{ | l | c | c | c | c | }
\hline
School Club & Music & Art & Sports & Computers \\
\hline
School $\boldsymbol { A }$ & 3 & 10 & 9 & 8 \\
\hline
School $\boldsymbol { B }$ & 1 & 11 & 13 & 5 \\
\hline
School $\boldsymbol { C }$ & 11 & 6 & 7 & 4 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 2}
\end{center}
\end{table}
The director wishes to test for an association between a student's school and the club they choose.
\end{enumerate}\item State hypotheses suitable for such a test.
\item Calculate the expected frequency for School $C$ and the Computers club.
The director calculates the test statistic to be 7.29 (to 3 significant figures) with 4 degrees of freedom.
\item Explain clearly why his test has 4 degrees of freedom.
\item Complete the test using a $5 \%$ level of significance and stating clearly your critical value.
\end{enumerate}
\hfill \mbox{\textit{Edexcel S3 2021 Q3 [10]}}