OCR MEI Further Statistics Minor 2021 November — Question 3 13 marks

Exam BoardOCR MEI
ModuleFurther Statistics Minor (Further Statistics Minor)
Year2021
SessionNovember
Marks13
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared goodness of fit
TypeSpreadsheet-based chi-squared test
DifficultyStandard +0.3 This is a standard chi-squared test for independence with straightforward calculations. Students need to recall validity conditions, compute expected frequencies and contributions using given formulas, perform a hypothesis test, and interpret results. While it requires multiple steps and understanding of the chi-squared distribution, all techniques are routine for Further Statistics students with no novel problem-solving required.
Spec5.06a Chi-squared: contingency tables

3 A student wants to know whether there is any association between age and whether or not people smoke. The student takes a sample of 120 adults and asks each of them whether or not they smoke. Below is a screenshot showing part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted.
ABCDE
1\multirow{3}{*}{}Observed frequency
2Age
316-3435-5960 and over
4\multirow{2}{*}{Smoking status}Smoker1373
5Non-smoker284326
6
7Expected frequency
87.8583
933.1417
10
11Contributions to the test statistic
123.36420.69641.1775
130.16510.2792
11
  1. The student wants to carry out a chi-squared test to analyse the data. State a requirement of the sample if the test is to be valid. For the rest of this question, you should assume that this requirement is met.
  2. Determine the missing values in each of the following cells.
    Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any association between age and smoking status.
  3. Discuss what the data suggest about the smoking status for each different age group.

Question 3:
AnswerMarks Guidance
3(a) The sample must be random
[1]1.2
3(b) 23×29
E8: =5.5583
120
(28−33.1417)2
C13:
33.1417
AnswerMarks
= 0.7977B1
M1
A1
AnswerMarks
[3]1.1
1.1a
1.1
AnswerMarks Guidance
3(c) H : No association between age and smoking (status)
0
H : Some association between age & smoking (status)
1
Degrees of freedom = 2
Critical value = 5.991
Test statistic = 3.3642 + 0.6964 + ... + 0.2792 = 6.4801
6.4801 > 5.991
There is sufficient evidence at the 5% level to suggest
that there is association between age and smoking
AnswerMarks
(status)B1
B1
B1
B1FT
M1
A1
AnswerMarks
[6]3.4
3.3
1.1
1.1
2.2b
AnswerMarks
3.5aBoth hypotheses needed
Use of ‘correlation’ in place of
‘association’ is B0
or
or p-value = 0.0392
2
FT𝜒𝜒 t 2 he ( i6r. 4v8al0u1e ) o=f C01.936 08
or 0.9608 > 0.95 or 0.0392 < 0.05
Correct test and critical values required
Use of ‘correlation’ in place of
AnswerMarks
‘association’ is A0Comparing their test
and critical values
leading to a
conclusion.
Conclusion in context
AnswerMarks Guidance
3(d) For 16-34 year olds the contribution of 3.3642 suggests
that more are smokers than would be expected.
For 35-59 year olds things are (approximately) as
expected if there were no association.
For people aged 60 and over the contribution of 1.1775
suggests that fewer are smokers than would be
AnswerMarks
expected.E1
E1
E1
AnswerMarks
[3]2.3
3.5a
AnswerMarks
3.2aMax of 2 marks out of 3 if no
contributions are mentioned.
Allow equivalent statements about
AnswerMarks
non-smokersShould take each age
group in turn and
discuss status
Max 2 marks if done
differently
Question 3:
3 | (a) | The sample must be random | B1
[1] | 1.2
3 | (b) | 23×29
E8: =5.5583
120
(28−33.1417)2
C13:
33.1417
= 0.7977 | B1
M1
A1
[3] | 1.1
1.1a
1.1
3 | (c) | H : No association between age and smoking (status)
0
H : Some association between age & smoking (status)
1
Degrees of freedom = 2
Critical value = 5.991
Test statistic = 3.3642 + 0.6964 + ... + 0.2792 = 6.4801
6.4801 > 5.991
There is sufficient evidence at the 5% level to suggest
that there is association between age and smoking
(status) | B1
B1
B1
B1FT
M1
A1
[6] | 3.4
3.3
1.1
1.1
2.2b
3.5a | Both hypotheses needed
Use of ‘correlation’ in place of
‘association’ is B0
or
or p-value = 0.0392
2
FT𝜒𝜒 t 2 he ( i6r. 4v8al0u1e ) o=f C01.936 08
or 0.9608 > 0.95 or 0.0392 < 0.05
Correct test and critical values required
Use of ‘correlation’ in place of
‘association’ is A0 | Comparing their test
and critical values
leading to a
conclusion.
Conclusion in context
3 | (d) | For 16-34 year olds the contribution of 3.3642 suggests
that more are smokers than would be expected.
For 35-59 year olds things are (approximately) as
expected if there were no association.
For people aged 60 and over the contribution of 1.1775
suggests that fewer are smokers than would be
expected. | E1
E1
E1
[3] | 2.3
3.5a
3.2a | Max of 2 marks out of 3 if no
contributions are mentioned.
Allow equivalent statements about
non-smokers | Should take each age
group in turn and
discuss status
Max 2 marks if done
differently
3 A student wants to know whether there is any association between age and whether or not people smoke. The student takes a sample of 120 adults and asks each of them whether or not they smoke. Below is a screenshot showing part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted.

\begin{center}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
 & A & B & C & D & E \\
\hline
1 & \multicolumn{2}{|c|}{\multirow{3}{*}{}} & \multicolumn{3}{|c|}{Observed frequency} \\
\hline
2 &  &  & \multicolumn{3}{|c|}{Age} \\
\hline
3 &  &  & 16-34 & 35-59 & 60 and over \\
\hline
4 & \multirow{2}{*}{Smoking status} & Smoker & 13 & 7 & 3 \\
\hline
5 &  & Non-smoker & 28 & 43 & 26 \\
\hline
6 &  &  & \multicolumn{3}{|c|}{} \\
\hline
7 &  &  & \multicolumn{3}{|c|}{Expected frequency} \\
\hline
8 &  &  & 7.8583 &  &  \\
\hline
9 &  &  & 33.1417 &  &  \\
\hline
10 &  &  & \multicolumn{3}{|c|}{} \\
\hline
11 &  &  & \multicolumn{3}{|c|}{Contributions to the test statistic} \\
\hline
12 &  &  & 3.3642 & 0.6964 & 1.1775 \\
\hline
13 &  &  &  & 0.1651 & 0.2792 \\
\hline
11 &  &  &  &  &  \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\alph*)]
\item The student wants to carry out a chi-squared test to analyse the data.

State a requirement of the sample if the test is to be valid.

For the rest of this question, you should assume that this requirement is met.
\item Determine the missing values in each of the following cells.

\begin{itemize}
  \item E8
  \item C13
\item In this question you must show detailed reasoning.
\end{itemize}

Carry out a hypothesis test at the $5 \%$ significance level to investigate whether there is any association between age and smoking status.
\item Discuss what the data suggest about the smoking status for each different age group.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Minor 2021 Q3 [13]}}