Moderate -0.3 This is a standard chi-squared test of independence with straightforward calculations. Part (a) is basic statistical reasoning, part (b) requires simple expected frequency calculation using row/column totals, and part (c) involves comparing a given test statistic to critical values. The test statistic is provided, eliminating the most tedious calculation. This is slightly easier than average as it's a routine application of a standard technique with no conceptual challenges.
4 A genetics researcher is investigating whether there is any association between natural hair colour and natural eye colour. A random sample of 800 adults is selected. Each adult can categorise their natural hair colour as blonde, brown, black or red and their natural eye colour as brown, blue or green.
Explain the benefit of using a random sample in this investigation.
The data collected from the sample are summarised in Table 4.1.
\begin{table}[h]
\end{table}
The researcher decides to carry out a chi-squared test.
Determine the expected frequencies for each eye colour in the blonde hair category.
You are given that the test statistic is 28.62 to 2 decimal places.
Carry out the chi-squared test at the 10\% significance level.
Table 4.2 shows the chi-squared contributions for some of the categories. The contributions for the categories relating to green eye colour have been deliberately omitted.
\begin{table}[h]
Calculate the chi-squared contribution for the green eye and blonde hair category.
With reference to the values in Table 4.2, discuss what the data suggest about brown eye colour and blue eye colour for people with blonde hair.
A different researcher, carrying out the same investigation, independently takes a different random sample of size 800 and performs the same hypothesis test, but at the 1\% significance level, reaching the same conclusion as the original test.
By comparing only the significance level of the two tests, specify which test, the one at the 10\% significance level or the one at the 1\% significance level, provides stronger evidence for the conclusion. Justify your answer.
There is sufficient evidence (at the 10% level) to
suggest that there is some association between
Answer
Marks
(natural) hair colour and (natural) eye colour
B1
B1
B1
M1
A1
Answer
Marks
[5]
3.3
1.1
3.4
1.1
Answer
Marks
2.2b
or
H : Hair colour and eye colour are
0
independent
H : Hair colour and eye colour are
1
not independent
Making correct comparison
between given value and their CV
and drawing consistent inference
Non-assertive, contextual
conclusion from correct critical
Answer
Marks
value
p-value is 7.18 (or 7.19)10–5 < 0.1 so
reject H
0
Answer
Marks
Guidance
4
(d)
(19 – 13.97)2 / 13.97 = 1.811
[1]
1.1
FT Their expected value from (b).
1.786 comes from using 14 or 14.0.
Answer
Marks
Guidance
4
(e)
The high levels of the (2-) contributions implies
that the number of blond people with blue eyes is
different/higher than expected and the number
of blond people with brown eyes is
different/lower than expected
The fact that 61 > 44.45 suggests that more
people with blonde hair have blue eyes than
would be expected (if there were no association),
and 47 < 68.58 suggests fewer people with blond
Answer
Marks
hair have brown eyes than expected.
B1
B1
Answer
Marks
[2]
3.5a
3.5a
Ignore comments about blonde
hair/green eyes.
If B0B0 then SC1 for fewer people
with blonde hair have brown eyes than
would be expected and more people
with blonde hair have blue eyes than
expected provided 61 > 44.45 and 47
< 68.58 is quoted
Answer
Marks
Guidance
4
(f)
The test at the 1% significance level since the
test statistic exceeding the critical value is less
likely to have been caused by random factors
Answer
Marks
Guidance
(i.e. if the null hypothesis is true)
B1
[1]
3.5a
Or the chance that H is rejected
0
when true is lower, or the chance
of a false positive is less.
Need a comparative like less or
Answer
Marks
lower
If the conclusion to (c) is that there is
no association then B1 can be awarded
for “The test at the 10% level since if
there is a small association it is more
likely be considered significant by this
test so the fact that this test did not
reject H is more informative” oe
0
Question 4:
4 | (a) | If a sample is random then it is valid to draw
(statistical) inferences from it | B1
[1] | 2.4 | No context necessary; just the
ideas that if random then the
sample probably represents the
population and if this is so then
conclusions we draw are likely to
be valid. | Needs a reference to the purpose of
the sample e.g. inference,
investigation, analysis, test, statistic,
conclusion…, and a word that
qualifies validity e.g. unbiased,
proper, accurate
4 | (b) | 800(127/800)(m/800) or 127(m/800) where
m = 432, 280 or 88.
Brown: 68.58, Blue 44.45, Green 13.97 | M1
A1
[2] | 1.1
1.1 | Showing any one correct
calculation for expected frequency.
For this mark condone any
confusion between eye colours
At least 3 sf | If no working shown, all three must be
correct to at least 3 s.f. for both marks
4 | (c) | H : There is no association between hair colour
0
and eye colour
and
H : There is some association between hair
1
colour and eye colour
= (4 – 1)(3 – 1) = 6
(2 ) = 10.64
6 10%
28.62 > 10.64 so H is rejected
0
There is sufficient evidence (at the 10% level) to
suggest that there is some association between
(natural) hair colour and (natural) eye colour | B1
B1
B1
M1
A1
[5] | 3.3
1.1
3.4
1.1
2.2b | or
H : Hair colour and eye colour are
0
independent
H : Hair colour and eye colour are
1
not independent
Making correct comparison
between given value and their CV
and drawing consistent inference
Non-assertive, contextual
conclusion from correct critical
value | p-value is 7.18 (or 7.19)10–5 < 0.1 so
reject H
0
4 | (d) | (19 – 13.97)2 / 13.97 = 1.811 | B1FT
[1] | 1.1 | FT Their expected value from (b). | Answer should be quoted to 4 sf.
1.786 comes from using 14 or 14.0.
4 | (e) | The high levels of the (2-) contributions implies
that the number of blond people with blue eyes is
different/higher than expected and the number
of blond people with brown eyes is
different/lower than expected
The fact that 61 > 44.45 suggests that more
people with blonde hair have blue eyes than
would be expected (if there were no association),
and 47 < 68.58 suggests fewer people with blond
hair have brown eyes than expected. | B1
B1
[2] | 3.5a
3.5a | Ignore comments about blonde
hair/green eyes.
If B0B0 then SC1 for fewer people
with blonde hair have brown eyes than
would be expected and more people
with blonde hair have blue eyes than
expected provided 61 > 44.45 and 47
< 68.58 is quoted
4 | (f) | The test at the 1% significance level since the
test statistic exceeding the critical value is less
likely to have been caused by random factors
(i.e. if the null hypothesis is true) | B1
[1] | 3.5a | Or the chance that H is rejected
0
when true is lower, or the chance
of a false positive is less.
Need a comparative like less or
lower | If the conclusion to (c) is that there is
no association then B1 can be awarded
for “The test at the 10% level since if
there is a small association it is more
likely be considered significant by this
test so the fact that this test did not
reject H is more informative” oe
0
4 A genetics researcher is investigating whether there is any association between natural hair colour and natural eye colour. A random sample of 800 adults is selected. Each adult can categorise their natural hair colour as blonde, brown, black or red and their natural eye colour as brown, blue or green.
\begin{enumerate}[label=(\alph*)]
\item Explain the benefit of using a random sample in this investigation.
The data collected from the sample are summarised in Table 4.1.
\begin{table}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Table 4.1}
\begin{tabular}{|l|l|l|l|l|l|l|}
\hline
\multicolumn{2}{|c|}{\multirow{2}{*}{Observed frequency}} & \multicolumn{4}{|c|}{Hair Colour} & \\
\hline
& & Blonde & Brown & Black & Red & Total \\
\hline
\multirow{3}{*}{Eye Colour} & Brown & 47 & 153 & 196 & 36 & 432 \\
\hline
& Blue & 61 & 78 & 115 & 26 & 280 \\
\hline
& Green & 19 & 22 & 31 & 16 & 88 \\
\hline
& Total & 127 & 253 & 342 & 78 & 800 \\
\hline
\end{tabular}
\end{center}
\end{table}
The researcher decides to carry out a chi-squared test.
\item Determine the expected frequencies for each eye colour in the blonde hair category.
You are given that the test statistic is 28.62 to 2 decimal places.
\item Carry out the chi-squared test at the 10\% significance level.
Table 4.2 shows the chi-squared contributions for some of the categories. The contributions for the categories relating to green eye colour have been deliberately omitted.
\begin{table}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Table 4.2}
\begin{tabular}{ | c | l | c | c | c | c | }
\hline
\multicolumn{2}{|c|}{\begin{tabular}{ c }
Chi-squared \\
contributions \\
\end{tabular}} & \multicolumn{4}{|c|}{Hair Colour} \\
\cline { 2 - 6 }
& Blonde & Brown & Black & Red & \\
\hline
\multirow{3}{*}{\begin{tabular}{ c }
Eye \\
Colour \\
\end{tabular}} & Brown & 6.791 & 1.964 & 0.694 & 0.889 \\
\cline { 2 - 6 }
& Blue & 6.162 & 1.257 & 0.185 & 0.062 \\
\cline { 2 - 6 }
& Green & & & & \\
\hline
\end{tabular}
\end{center}
\end{table}
\item Calculate the chi-squared contribution for the green eye and blonde hair category.
\item With reference to the values in Table 4.2, discuss what the data suggest about brown eye colour and blue eye colour for people with blonde hair.
\item A different researcher, carrying out the same investigation, independently takes a different random sample of size 800 and performs the same hypothesis test, but at the 1\% significance level, reaching the same conclusion as the original test.
By comparing only the significance level of the two tests, specify which test, the one at the 10\% significance level or the one at the 1\% significance level, provides stronger evidence for the conclusion. Justify your answer.
\end{enumerate}
\hfill \mbox{\textit{OCR MEI Further Statistics Minor 2024 Q4 [12]}}