Chi-squared test of independence

A question is this type if and only if it involves testing whether two categorical variables are independent using a contingency table and chi-squared test.

157 questions · Standard +0.2

Sort by: Default | Easiest first | Hardest first
AQA Further AS Paper 2 Statistics 2024 June Q2
1 marks Easy -1.2
2 A test for association is to be carried out. The tables below show the observed frequencies and the expected frequencies that are to be used for the test.
ObservedXYZ
A28666
B884
C541610
Expected\(\mathbf { X }\)\(\mathbf { Y }\)\(\mathbf { Z }\)
\(\mathbf { A }\)451540
\(\mathbf { B }\)938
\(\mathbf { C }\)361232
It is necessary to merge some rows or columns before the test can be carried out.
Find the entry in the tables that provides evidence for this.
Circle your answer.
[0pt] [1 mark]
Observed A-Z
Observed B-Z
Expected A-X
Expected B-Y
AQA Further Paper 3 Statistics 2019 June Q6
9 marks Standard +0.3
6 During August, 102 candidates took their driving test at centre \(A\) and 60 passed. During the same month, 110 candidates took their driving test at centre \(B\) and 80 passed. 6
  1. Test whether the driving test result is independent of the driving test centre using the \(5 \%\) level of significance. 6
  2. Rebecca claims that if the result of the test in part (a) is to reject the null hypothesis then it is easier to pass a driving test at centre \(B\) than centre \(A\). State, with a reason, whether or not you agree with Rebecca's claim.
AQA Further Paper 3 Statistics 2020 June Q8
6 marks Standard +0.3
8 Ray is conducting a hypothesis test with the hypotheses \(\mathrm { H } _ { 0 }\) : There is no association between time of day and number of snacks eaten \(\mathrm { H } _ { 1 }\) : There is an association between time of day and number of snacks eaten
He calculates expected frequencies correct to two decimal places, which are given in the following table.
Number of snacks eaten
\cline { 2 - 5 }\cline { 2 - 4 }012 or more
\cline { 2 - 4 } Time of Day23.6821.055.26
\cline { 2 - 5 }Night21.3218.954.74
\cline { 2 - 5 }
\cline { 2 - 5 }
Ray calculates his test statistic using \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) 8
  1. State, with a reason, the error Ray has made and describe any changes Ray will need to make to his test.
    8
  2. Having made the necessary corrections as described in part (a), the correct value of the test statistic is 8.74 Complete Ray's hypothesis test using a \(1 \%\) level of significance.
AQA Further Paper 3 Statistics 2021 June Q6
7 marks Moderate -0.5
6 Danai is investigating the number of speeding offences in different towns in a country. She carries out a hypothesis test to test for association between town and number of speeding offences per year. 6
  1. State the hypotheses for this test. 6
  2. The observed frequencies, \(O\), have been collected and the expected frequencies, \(E\), have been calculated in an \(n \times m\) contingency table, where \(n > 3\) and \(m > 3\) One of the values of \(E\) is less than 5 6
    1. Explain what steps Danai should take before calculating the test statistic.
      6
  3. (ii) State an expression for the test statistic Danai should calculate.
    6
  4. Danai correctly calculates the value of the test statistic to be 45.22 The number of degrees of freedom for the test is 25
    Determine the outcome of Danai's test, using the \(1 \%\) level of significance.
AQA Further Paper 3 Statistics 2023 June Q5
8 marks Standard +0.3
5 A school management team oversees 11 different schools.
The school management team allows each student in the schools to choose one enrichment activity from 11 possible activities. The school management team count the number of students in each school choosing each of the possible activities. They then conduct a \(\chi ^ { 2 }\)-test for association with the data they have gathered. 5
  1. Exactly one of the calculated expected frequencies for the \(\chi ^ { 2 }\)-test is less than 5
    Explain why the number of degrees of freedom for the test is 90
    5
  2. The school management team claims that there is an association between the school a student attends and the activity they choose. The test statistic is 124.8 Test the claim using the \(1 \%\) level of significance.
    5
  3. During the hypothesis test, the value of \(\frac { ( O - E ) ^ { 2 } } { E }\), where \(O\) is the observed frequency and \(E\) is the expected frequency, was calculated for each group of students. The values for four groups of students are shown in the table below.
    Group\(\frac { ( O - E ) ^ { 2 } } { E }\)
    Attends school 3 and chose activity 10.01
    Attends school 8 and chose activity 318.5
    Attends school 8 and chose activity 724.2
    Attends school 11 and chose activity 749.0
    State, with a reason, which of the four groups of students represents the strongest source of association.
AQA Further Paper 3 Statistics 2024 June Q9
11 marks Standard +0.3
9 A company owns three shops, A, B and C, which are based in different towns. Each shop gives a questionnaire to 250 of their customers, and every customer completes the questionnaire. One of the questions asks whether the customer rates the shop as good, satisfactory or poor. For shop A, 26\% of customers rate the shop as good and 38\% of customers rate the shop as poor. For shop B, 32\% of customers rate the shop as good and 40\% of customers rate the shop as satisfactory. Altogether, there are 210 good ratings and 261 satisfactory ratings. 9
  1. Complete the following table with the observed frequencies.
    \multirow{2}{*}{}Rating
    GoodSatisfactoryPoor
    \multirow{3}{*}{Shop}A
    B
    C
    9
  2. Carry out a test for association between shop and rating, using the 1\% level of significance.
OCR FS1 AS 2021 June Q3
12 marks Standard +0.3
3 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h] \end{table}
QuestionSolutionMarksAOsGuidance
1(a)-0.954 BCB2 [2]1.1 1.1SC: If B0, give B1 if two of 7.04, 29.0[4], -13.6[4] (or 35.2, 145[.2], -68.2) seen
1(b)Points lie close to a straight line Line has negative gradientB1 B1 [2]2.2b 1.1Must refer to line, not just "negative correlation"
1(c)No, it will be the same as \(x \rightarrow a\) is a linear transformationB1 [1]2.2aOE. Either "same" with correct reason, or "disagree" with correct reason. Allow any clear valid technical term
2(a)NeitherB1 [1]1.2
2(b)\(q = 1.13 + 0.620 p\)B1B1 B1 [3]1.1,1.1 1.10.62(0) correct; both numbers correct Fully correct answer including letters
2(c)(i)2.68B1ft [1]1.1awrt 2.68, ft on their (b) if letters correct
2(c)(ii)2.5 is within data range, and points (here) are close to line/well correlatedB1 B1 [2]2.2b 2.2bAt least one reason, allow "no because points not close to line" Full argument, two reasons needed
2(d)
Not much data here/points scattered/ possible outliers
So not very reliable
M1 A1 [2]2.3 1.1Reason for not very reliable (not "extrapolation") Full argument and conclusion, not too assertive (not wholly unreliable!)
3(a)Expected frequency for Middle/25 to 60 is 4.4 which is < 5 so must combine cellsB1*ft depB1 [2]2.4 3.5bCorrectly obtain this \(F _ { E }\), ft on addition errors " < 5" explicit and correct deduction
3(b)
EarlyMiddleLate
29.423.131.5
26.620.928.5
EarlyMiddleLate
0.99180.41602.2937
1.09620.45982.5351
B11.1
Both, allow 28.4 for 28.5
awrt 2.29, but allow 2.3 In range [2.53, 2.54]
QuestionSolutionMarksAOsGuidance
3(c)
\(\mathrm { H } _ { 0 }\) : no association between session and age group. \(\mathrm { H } _ { 1 }\) : some association
\(\Sigma X ^ { 2 } = 7.793\)
\(v = 2 , \chi ^ { 2 } ( 2 ) _ { \text {crit } } = 5.991\)
Reject \(\mathrm { H } _ { 0 }\).
Significant evidence of association between session attended and age group.
B1
B1
B1
M1ft
A1ft [5]
1.1
1.1
1.1
1.1
2.2b
Both. Allow "independent" etc
Correct value of \(X ^ { 2 }\), awrt 7.79 (allow even if wrong in (b))
Correct CV and comparison
Correct first conclusion, FT on their TS only
Contextualised, not too assertive
3(d)The two biggest contributions to \(\chi ^ { 2 }\) are both for the late session ... ... when the proportion of younger people is higher, and of older people is lower, than the null hypothesis would suggest.
M1ft
A1ft
[2]
1.1
2.4
Refer to biggest contribution(s), FT on their answers to (b), needs "reject \(\mathrm { H } _ { 0 }\) "
Full answer, referring to at least one cell (ignore comments on next highest cells)
\multirow[t]{2}{*}{4}\multirow{2}{*}{}\multirow{2}{*}{OR:}
\(\frac { { } ^ { 2 m } C _ { 2 } \times m } { { } ^ { 3 m } C _ { 3 } }\)
\(= \frac { 2 m ( 2 m - 1 ) } { 2 } \times m \div \frac { 3 m ( 3 m - 1 ) ( 3 m - 2 ) } { 6 }\)
\(= \frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) }\) \(\frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) } = \frac { 28 } { 55 }\)
\(\Rightarrow 16 m ^ { 2 } - 71 m + 28 = 0\)
\(m = 4\) BC
Reject \(m = \frac { 7 } { 16 }\) as \(m\) is an integer
M1
M1
A1
M1
A1
M1
A1
[7]
3.1b
3.1b
2.1
3.1a
2.1
1.1
3.2a
Use \({ } ^ { 2 m } C _ { 2 }\) and \(m\)
Divide by \({ } ^ { 3 m } C _ { 3 }\)
Correct expression in terms of \(m\) (allow with \(m\) not cancelled yet)
Equate to \(\frac { 28 } { 55 }\) \simplify to three-term quadratic
Correct simplified quadratic, or (quadratic) \(\times m , = 0\), aef Solve to get both 4 and \(\frac { 7 } { 16 }\)
Explicitly reject \(m = \frac { 7 } { 16 }\)
\(\frac { 2 m ( 2 m - 1 ) \times m \times 3 ! } { 3 m ( 3 m - 1 ) ( 3 m - 2 ) \times 2 }\) then as above
Multiplication method can get full marks, but if no 3 or 3 !, max
M1M0A0 M1A0M0A0