OCR Further Statistics AS 2020 November — Question 5 12 marks

Exam BoardOCR
ModuleFurther Statistics AS (Further Statistics AS)
Year2020
SessionNovember
Marks12
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared test of independence
TypeCell combining required
DifficultyStandard +0.3 This is a standard chi-squared test of independence with routine cell combination. Part (a) requires checking expected frequencies against the rule of 5, part (b) involves straightforward calculation of expected frequencies and chi-squared contributions using standard formulas, and part (c) would be comparing to critical values. All steps are textbook procedures with no novel insight required, making it slightly easier than average.
Spec5.06a Chi-squared: contingency tables

5 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h]
\multirow{2}{*}{Observed frequencies}Session
EarlyMiddleLate
\multirow{3}{*}{Age group}< 25242040
25 to 604210
> 60282210
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The cinema manager carries out a test of whether there is any association between age group and session attended.
  1. Show that it is necessary to combine cells in order to carry out the test. It is decided to combine the second and third rows of the table. Some of the expected frequencies for the table with rows combined, and the corresponding contributions to the \(\chi ^ { 2 }\) test statistic, are shown in the following incomplete tables. \begin{table}[h]
    \multirow{2}{*}{Expected frequencies}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 2529.423.1
    \(\geqslant 25\)26.620.9
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table} \begin{table}[h]
    \multirow{2}{*}{Contribution to \(\chi ^ { 2 }\)}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 250.99180.4160
    \(\geqslant 25\)1.09620.4598
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  2. In the Printed Answer Booklet, complete both tables.
  3. Carry out the test at the \(5 \%\) significance level.
  4. Use the figures in your completed Table 3 to comment on the numbers of the audience in different age groups.

Question 5:
Part (a)
AnswerMarks Guidance
Expected frequency for Middle/25 to 60 is \(4.4\), which is \(< 5\) so must combine cellsB1\*ft, depB1 [2] Correctly obtain this \(F_E\), ft on addition errors; "\(< 5\)" explicit and correct deduction
Part (b)
AnswerMarks Guidance
EarlyMiddle Late
\(29.4\)\(23.1\) \(31.5\)
\(26.6\)\(20.9\) \(28.5\)
EarlyMiddle Late
\(0.9918\)\(0.4160\) \(2.2937\)
\(1.0962\)\(0.4598\) \(2.5351\)
B1Both, allow \(28.4\) for \(28.5\)
B1, B1 [3]awrt \(2.29\), but allow \(2.3\); In range \([2.53, 2.54]\)
Part (c)
AnswerMarks Guidance
\(H_0\): no association between session and age group. \(H_1\): some associationB1 Both. Allow "independent" etc
\(\Sigma X^2 = 7.793\)B1 Correct value of \(X^2\), awrt \(7.79\) (allow even if wrong in (b))
\(\nu = 2,\ \chi^2(2)_{\text{crit}} = 5.991\)B1 Correct CV and comparison
Reject \(H_0\)M1ft Correct first conclusion, FT on their TS only
Significant evidence of association between session attended and age groupA1ft [5] Contextualised, not too assertive
Part (d)
AnswerMarks Guidance
The two biggest contributions to \(\chi^2\) are both for the late session … when the proportion of younger people is higher, and of older people is lower, than the null hypothesis would suggestM1ft, A1ft [2] Refer to biggest contribution(s), FT on their answers to (b), needs "reject \(H_0\)"; Full answer, referring to at least one cell (ignore comments on next highest cells)
# Question 5:

## Part (a)
Expected frequency for Middle/25 to 60 is $4.4$, which is $< 5$ so must combine cells | **B1\*ft, depB1** [2] | Correctly obtain this $F_E$, ft on addition errors; "$< 5$" explicit and correct deduction

## Part (b)
| | Early | Middle | Late |
|---|---|---|---|
| | $29.4$ | $23.1$ | $31.5$ |
| | $26.6$ | $20.9$ | $28.5$ |

| | Early | Middle | Late |
|---|---|---|---|
| | $0.9918$ | $0.4160$ | $2.2937$ |
| | $1.0962$ | $0.4598$ | $2.5351$ |

**B1** | Both, allow $28.4$ for $28.5$
**B1, B1** [3] | awrt $2.29$, but allow $2.3$; In range $[2.53, 2.54]$

## Part (c)
$H_0$: no association between session and age group. $H_1$: some association | **B1** | Both. Allow "independent" etc
$\Sigma X^2 = 7.793$ | **B1** | Correct value of $X^2$, awrt $7.79$ (allow even if wrong in **(b)**)
$\nu = 2,\ \chi^2(2)_{\text{crit}} = 5.991$ | **B1** | Correct CV and comparison
Reject $H_0$ | **M1ft** | Correct first conclusion, FT on their TS only
Significant evidence of association between session attended and age group | **A1ft** [5] | Contextualised, not too assertive

## Part (d)
The two biggest contributions to $\chi^2$ are both for the late session … when the proportion of younger people is higher, and of older people is lower, than the null hypothesis would suggest | **M1ft, A1ft** [2] | Refer to biggest contribution(s), FT on their answers to **(b)**, needs "reject $H_0$"; Full answer, referring to at least one cell (ignore comments on next highest cells)

---
5 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1.

\begin{table}[h]
\begin{center}
\begin{tabular}{|l|l|l|l|l|}
\hline
\multicolumn{2}{|c|}{\multirow{2}{*}{Observed frequencies}} & \multicolumn{3}{|c|}{Session} \\
\hline
 &  & Early & Middle & Late \\
\hline
\multirow{3}{*}{Age group} & < 25 & 24 & 20 & 40 \\
\hline
 & 25 to 60 & 4 & 2 & 10 \\
\hline
 & > 60 & 28 & 22 & 10 \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 1}
\end{center}
\end{table}

The cinema manager carries out a test of whether there is any association between age group and session attended.
\begin{enumerate}[label=(\alph*)]
\item Show that it is necessary to combine cells in order to carry out the test.

It is decided to combine the second and third rows of the table. Some of the expected frequencies for the table with rows combined, and the corresponding contributions to the $\chi ^ { 2 }$ test statistic, are shown in the following incomplete tables.

\begin{table}[h]
\begin{center}
\begin{tabular}{|l|l|l|l|l|}
\hline
\multicolumn{2}{|c|}{\multirow{2}{*}{Expected frequencies}} & \multicolumn{3}{|c|}{Session} \\
\hline
 &  & Early & Middle & Late \\
\hline
\multirow{2}{*}{Age group} & < 25 & 29.4 & 23.1 &  \\
\hline
 & $\geqslant 25$ & 26.6 & 20.9 &  \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 2}
\end{center}
\end{table}

\begin{table}[h]
\begin{center}
\begin{tabular}{|l|l|l|l|l|}
\hline
\multicolumn{2}{|c|}{\multirow{2}{*}{Contribution to $\chi ^ { 2 }$}} & \multicolumn{3}{|c|}{Session} \\
\hline
 &  & Early & Middle & Late \\
\hline
\multirow{2}{*}{Age group} & < 25 & 0.9918 & 0.4160 &  \\
\hline
 & $\geqslant 25$ & 1.0962 & 0.4598 &  \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Table 3}
\end{center}
\end{table}
\item In the Printed Answer Booklet, complete both tables.
\item Carry out the test at the $5 \%$ significance level.
\item Use the figures in your completed Table 3 to comment on the numbers of the audience in different age groups.
\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics AS 2020 Q5 [12]}}