Edexcel S3 2021 January — Question 3 10 marks

Exam BoardEdexcel
ModuleS3 (Statistics 3)
Year2021
SessionJanuary
Marks10
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared goodness of fit
TypeChi-squared goodness of fit: Uniform
DifficultyStandard +0.3 This is a straightforward chi-squared test question covering standard S3 material. Part (a) requires calculating expected frequencies for uniform distribution (88/4=22) and comparing a given test statistic to critical values from tables. Part (b) asks for standard hypotheses for independence. Part (c) requires one expected frequency calculation using row/column totals. All steps are routine applications of formulas with no problem-solving insight required, making it slightly easier than average.
Spec5.06a Chi-squared: contingency tables

3. The students in a group of schools can choose a club to join. There are 4 clubs available: Music, Art, Sports and Computers. The director collected information about the number of students in each club, using a random sample of 88 students from across the schools. The results are given in Table 1 below. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{1}{c|}{}MusicArtSportsComputers
No. of students14282719
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The director uses a chi-squared test to determine whether or not the students are uniformly distributed across the 4 clubs.
    1. Find the expected frequencies he should use. Given that the test statistic he calculated was 6.09 (to 3 significant figures)
    2. use a \(5 \%\) level of significance to complete the test. You should state the degrees of freedom and the critical value used. The director wishes to examine the situation in more detail and takes a second random sample of 88 students. The director assumes that within each school, students select their clubs independently. The students come from 3 schools and the distribution of the students from each school amongst the clubs is given in Table 2 below. \begin{table}[h]
      School ClubMusicArtSportsComputers
      School \(\boldsymbol { A }\)31098
      School \(\boldsymbol { B }\)111135
      School \(\boldsymbol { C }\)11674
      \captionsetup{labelformat=empty} \caption{Table 2}
      \end{table} The director wishes to test for an association between a student's school and the club they choose.
  1. State hypotheses suitable for such a test.
  2. Calculate the expected frequency for School \(C\) and the Computers club. The director calculates the test statistic to be 7.29 (to 3 significant figures) with 4 degrees of freedom.
  3. Explain clearly why his test has 4 degrees of freedom.
  4. Complete the test using a \(5 \%\) level of significance and stating clearly your critical value.

AnswerMarks Guidance
Answer/WorkingMarks Guidance
All expected frequencies are \((88 \div 4) = 22\)B1 1st B1 for 22
Degrees of freedom = 3, so critical value \(\chi_3^2(5\%) = 7.815\)B1, B1ft 2nd B1 for degrees of freedom = 3 (can be implied by sight of 7.815 as cv); 3rd B1ft for 7.815 (or better - cal: 7.8147279..., or correct 5% cv for their d.f.)
Not significant so insufficient evidence to suggest not uniformly distributedB1 4th B1 for comment suggesting uniform distribution is a suitable model. Must follow from comparing 6.09 with their cv. Do not allow contradictory statements e.g. "significant" so uniform dist' is suitable
Subtotal: (4 marks)
e.g. \(H_0\): School is independent of club chosen; \(H_1\): Club chosen depends on which school a student is fromB1 B1 for both hypotheses with some context ("club" and "school" mentioned at least once). Use of "independence" or "association"
Subtotal: (1 mark)
\(\frac{28 \times 17}{88} = 5.409...\) awrt 5.41B1 B1 for correct expression or awrt 5.41 (allow \(\frac{119}{22}\))
Subtotal: (1 mark)
Expected frequency for Music and School C is \(4.77 < 5\) (Allow \(\frac{105}{22}\) for 4.77). So combine Music column with another column giving 3×3 table so 4 dfB1, B1 1st B1 for identifying that Music & School C has \(E_i\) that is < 5 (a value to 2 sf should be seen, may be in (c), but must state this \(E_i < 5\) as well). 2nd B1 for pooling music with another column leading to 3×3 table and 4 degrees of freedom. Must clearly state the pooling and evidence for 4 df e.g. allow \((3-1) \times (4-1) = 1)\) [NB pooling with Art gives 4.3987..., with Sports 4.3247..., with Computers 7.2879...]
Subtotal: (2 marks)
Critical value \(\chi_4^2(5\%) = 9.488\)B1 1st B1 for 9.488 (or awrt 9.488)
[Not significant so] insufficient evidence of an association between school and choice of clubB1 2nd B1 for correct, not significant, conclusion mentioning school and clubs
Subtotal: (2 marks)
Total: [10 marks]
| Answer/Working | Marks | Guidance |
|---|---|---|
| All expected frequencies are $(88 \div 4) = 22$ | B1 | 1st B1 for 22 |
| Degrees of freedom = 3, so critical value $\chi_3^2(5\%) = 7.815$ | B1, B1ft | 2nd B1 for degrees of freedom = 3 (can be implied by sight of 7.815 as cv); 3rd B1ft for 7.815 (or better - cal: 7.8147279..., or correct 5% cv for their d.f.) |
| Not significant so insufficient evidence to suggest not uniformly distributed | B1 | 4th B1 for comment suggesting uniform distribution is a suitable model. Must follow from comparing 6.09 with their cv. Do not allow contradictory statements e.g. "significant" so uniform dist' is suitable |
| **Subtotal: (4 marks)** | | |
| e.g. $H_0$: School is independent of club chosen; $H_1$: Club chosen depends on which school a student is from | B1 | B1 for both hypotheses with some context ("club" and "school" mentioned at least once). Use of "independence" or "association" |
| **Subtotal: (1 mark)** | | |
| $\frac{28 \times 17}{88} = 5.409...$ awrt **5.41** | B1 | B1 for correct expression or awrt 5.41 (allow $\frac{119}{22}$) |
| **Subtotal: (1 mark)** | | |
| Expected frequency for Music and School C is $4.77 < 5$ (Allow $\frac{105}{22}$ for 4.77). So combine Music column with another column giving 3×3 table so 4 df | B1, B1 | 1st B1 for identifying that Music & School C has $E_i$ that is < 5 (a value to 2 sf should be seen, may be in (c), but must state this $E_i < 5$ as well). 2nd B1 for pooling music with another column leading to 3×3 table and 4 degrees of freedom. Must clearly state the pooling and evidence for 4 df e.g. allow $(3-1) \times (4-1) = 1)$ [NB pooling with Art gives 4.3987..., with Sports 4.3247..., with Computers 7.2879...] |
| **Subtotal: (2 marks)** | | |
| Critical value $\chi_4^2(5\%) = 9.488$ | B1 | 1st B1 for 9.488 (or awrt 9.488) |
| [Not significant so] insufficient evidence of an association between school and choice of club | B1 | 2nd B1 for correct, not significant, conclusion mentioning school and clubs |
| **Subtotal: (2 marks)** | | |

**Total: [10 marks]**

---
3. The students in a group of schools can choose a club to join. There are 4 clubs available: Music, Art, Sports and Computers. The director collected information about the number of students in each club, using a random sample of 88 students from across the schools. The results are given in Table 1 below.

\begin{table}[h]
\begin{center}
\begin{tabular}{ | c | c | c | c | c | }
\cline { 2 - 5 }
\multicolumn{1}{c|}{} & Music & Art & Sports & Computers \\
\hline
No. of students & 14 & 28 & 27 & 19 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 1}
\end{center}
\end{table}

The director uses a chi-squared test to determine whether or not the students are uniformly distributed across the 4 clubs.
\begin{enumerate}[label=(\alph*)]
\item \begin{enumerate}[label=(\roman*)]
\item Find the expected frequencies he should use.

Given that the test statistic he calculated was 6.09 (to 3 significant figures)
\item use a $5 \%$ level of significance to complete the test. You should state the degrees of freedom and the critical value used.

The director wishes to examine the situation in more detail and takes a second random sample of 88 students. The director assumes that within each school, students select their clubs independently. The students come from 3 schools and the distribution of the students from each school amongst the clubs is given in Table 2 below.

\begin{table}[h]
\begin{center}
\begin{tabular}{ | l | c | c | c | c | }
\hline
School Club & Music & Art & Sports & Computers \\
\hline
School $\boldsymbol { A }$ & 3 & 10 & 9 & 8 \\
\hline
School $\boldsymbol { B }$ & 1 & 11 & 13 & 5 \\
\hline
School $\boldsymbol { C }$ & 11 & 6 & 7 & 4 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 2}
\end{center}
\end{table}

The director wishes to test for an association between a student's school and the club they choose.
\end{enumerate}\item State hypotheses suitable for such a test.
\item Calculate the expected frequency for School $C$ and the Computers club.

The director calculates the test statistic to be 7.29 (to 3 significant figures) with 4 degrees of freedom.
\item Explain clearly why his test has 4 degrees of freedom.
\item Complete the test using a $5 \%$ level of significance and stating clearly your critical value.
\end{enumerate}

\hfill \mbox{\textit{Edexcel S3 2021 Q3 [10]}}