Chi-squared test of independence

A question is this type if and only if it involves testing whether two categorical variables are independent using a contingency table and chi-squared test.

157 questions · Standard +0.2

Sort by: Default | Easiest first | Hardest first
CAIE FP2 2010 June Q10
13 marks Standard +0.3
10 Three new flu vaccines, \(A , B\) and \(C\), were tested on 500 volunteers. The vaccines were assigned randomly to the volunteers and 178 received \(A , 149\) received \(B\) and 173 received \(C\). During the following year, 30 of the volunteers given \(A\) caught flu, 29 of the volunteers given \(B\) caught flu, and 16 of the volunteers given \(C\) caught flu. Carry out a suitable test for independence at the 5\% significance level. Without using a statistical test, decide which of the vaccines appears to be most effective.
CAIE FP2 2011 June Q6
7 marks Standard +0.3
6 A random sample of residents in a town took part in a survey. They were asked whether they would prefer the local council to spend money on improving the local bus service or on improving the quality of road surfaces. The responses are shown in the following table, classified according to the area of the town in which the residents live.
Area 1Area 2Area 3
Local bus service733630
Road surfaces474420
Using a \(5 \%\) significance level, test whether there is an association between the area lived in and preference for improving the local bus service or improving the quality of road surfaces.
CAIE FP2 2012 June Q10
11 marks Challenging +1.2
10 Random samples of employees are taken from two companies, \(A\) and \(B\). Each employee is asked which of three types of coffee (Cappuccino, Latte, Ground) they prefer. The results are shown in the following table.
CappuccinoLatteGround
Company \(A\)605232
Company \(B\)354031
Test, at the 5\% significance level, whether coffee preferences of employees are independent of their company. Larger random samples, consisting of \(N\) times as many employees from each company, are taken. In each company, the proportions of employees preferring the three types of coffee remain unchanged. Find the least possible value of \(N\) that would lead to the conclusion, at the \(1 \%\) significance level, that coffee preferences of employees are not independent of their company.
CAIE FP2 2012 June Q8
9 marks Standard +0.3
8 Residents of three towns \(A , B\) and \(C\) were asked to grade the reliability of their digital television signal as good, satisfactory or poor. A random sample of responses from each town is taken and the numbers in each category are given in the following table.
GoodSatisfactoryPoor
Town \(A\)243414
Town \(B\)586026
Town \(C\)203430
Test, at the 2.5\% significance level, whether grade of reliability is independent of town. Identify which town makes the greatest contribution to the test statistic and relate your answer to the context of the question.
CAIE FP2 2013 June Q11 OR
Challenging +1.8
A researcher is investigating the relationship between the political allegiance of university students and their childhood environment. He chooses a random sample of 100 students and finds that 60 have political allegiance to the Alliance party. He also classifies their childhood environment as rural or urban, and finds that 45 had a rural childhood. The researcher carries out a test, at the \(10 \%\) significance level, on this data and finds that political allegiance is independent of childhood environment. Given that \(A\) is the number of students in the sample who both support the Alliance party and have a rural childhood, find the greatest and least possible values of \(A\). A second random sample of size \(100 N\), where \(N\) is an integer, is taken from the university student population. It is found that the proportions supporting the Alliance party from urban and rural childhoods are the same as in the first sample. Given that the value of \(A\) in the first sample was 29, find the greatest possible value of \(N\) that would lead to the same conclusion (that political allegiance is independent of childhood environment) from a test, at the \(10 \%\) significance level, on this second set of data.
CAIE FP2 2015 June Q6
8 marks Moderate -0.3
6 The reliability of the broadband connection received from two suppliers, \(A\) and \(B\), is classified as good, fair or poor by a random sample of householders. The information collected is summarised in the following table.
Reliability
\cline { 3 - 5 } \multicolumn{2}{|c|}{}GoodFairPoor
\multirow{2}{*}{Supplier}\(A\)656333
\cline { 2 - 5 }\(B\)514444
Test, at the 5\% significance level, whether reliability is independent of supplier.
CAIE FP2 2018 June Q8
8 marks Standard +0.3
8 A manufacturer produces three types of car: hatchbacks, saloons and estates. Each type of car is available in one of three colours: silver, blue and red. The manufacturer wants to know whether the popularity of the colour of the car is related to the type of car. A random sample of 300 cars chosen by customers gives the information summarised in the following table.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Colour of car
\cline { 3 - 5 } \multicolumn{2}{c|}{}SilverBlueRed
\multirow{3}{*}{Type of car}Hatchback533641
\cline { 2 - 5 }Saloon294031
\cline { 2 - 5 }Estate282418
Test at the \(10 \%\) significance level whether the colour of car chosen by customers is independent of the type of car.
CAIE FP2 2019 June Q8
8 marks Standard +0.3
8 Two salesmen, \(A\) and \(B\), work at a company that arranges different types of holidays: self-catering, hotel and cruise. The table shows, for a random sample of 150 holidays, the number of each type arranged by each salesman.
Type of holiday
\cline { 3 - 5 } \multicolumn{2}{|c|}{}Self-cateringHotelCruise
\multirow{2}{*}{Salesman}\(A\)253821
\cline { 2 - 5 }\(B\)282117
Test at the 10\% significance level whether the type of holiday arranged is independent of the salesman.
CAIE FP2 2010 November Q8
7 marks Standard +0.3
8 The owner of three driving schools, \(A , B\) and \(C\), wished to assess whether there was an association between passing the driving test and the school attended. He selected a random sample of learner drivers from each of his schools and recorded the numbers of passes and failures at each school. The results that he obtained are shown in the table below.
\multirow{2}{*}{}Driving school attended
\cline { 2 - 4 }\(A\)\(B\)\(C\)
Passes231517
Failures272543
Using a \(\chi ^ { 2 }\)-test and a \(5 \%\) level of significance, test whether there is an association between passing or failing the driving test and the driving school attended.
CAIE FP2 2013 November Q10
12 marks Challenging +1.2
10 Customers were asked which of three brands of coffee, \(A , B\) and \(C\), they prefer. For a random sample of 80 male customers and 60 female customers, the numbers preferring each brand are shown in the following table.
\(A\)\(B\)\(C\)
Male323612
Female183012
Test, at the \(5 \%\) significance level, whether there is a difference between coffee preferences of male and female customers. A larger random sample is now taken. It consists of \(80 n\) male customers and \(60 n\) female customers, where \(n\) is a positive integer. It is found that the proportions choosing each brand are identical to those in the smaller sample. Find the least value of \(n\) that would lead to a different conclusion for the 5\% significance level hypothesis test.
CAIE FP2 2017 November Q8
8 marks Standard +0.3
8 Members of a Statistics club are voting to elect a new president of the club. Members must choose to vote either by post or by text or by email. The method of voting chosen by a random sample of 60 male members and 40 female members is given in the following table.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Method of voting
\cline { 3 - 5 } \multicolumn{2}{c|}{}PostTextEmail
\multirow{2}{*}{Gender}Male101238
\cline { 2 - 5 }Female52114
Test, at the \(1 \%\) significance level, whether there is an association between method of voting and gender.
AQA Further AS Paper 2 Statistics 2021 June Q7
11 marks Standard +0.3
7 Two employees, \(A\) and \(B\), both produce the same toy for a company. The company records the total number of errors made per day by each employee during a 40-day period. The results are summarised in the following table. Employee
Number of errors made per day
0123 or moreTotal
\(A\)81020240
B18415340
Total261435580
The company claims that there is an association between employee and number of errors made per day. 7
  1. Test the company's claim, using the \(5 \%\) level of significance.
    7
  2. By considering observed and expected frequencies, interpret in context the association between employee and number of errors made per day. \includegraphics[max width=\textwidth, alt={}, center]{9be40ed6-6df8-426a-8afd-fefc17287de6-12_2492_1723_217_150}
    \includegraphics[max width=\textwidth, alt={}]{9be40ed6-6df8-426a-8afd-fefc17287de6-16_2496_1721_214_148}
AQA Further AS Paper 2 Statistics Specimen Q7
9 marks Standard +0.3
7 A dairy industry researcher, Robyn, decided to investigate the milk yield, classified as low, medium or high, obtained from four different breeds of cow, \(\mathrm { A } , \mathrm { B } , \mathrm { C }\) and D . The milk yield of a sample of 105 cows was monitored and the results are summarised in contingency Table 1.
\multirow{2}{*}{Table 1}Yield
LowMediumHighTotal
\multirow{4}{*}{Breed}A451221
B106420
C817732
D520732
Total274830105
The sample of cows may be regarded as random.
Robyn decides to carry out a \(\chi ^ { 2 }\)-test for association between milk yield and breed using the information given in Table 1. 7
  1. Contingency Table 2 gives some of the expected frequencies for this test.
    Complete Table 2 with the missing expected values.
    \multirow[t]{2}{*}{Table 2}Yield
    LowMediumHigh
    \multirow{4}{*}{Breed}A6
    B5.149.145.71
    C
    D8.2314.639.14
    7
    1. For Robyn's test, the test statistic \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 19.4\) correct to three significant figures.
      Use this information to carry out Robyn's test, using the \(1 \%\) level of significance.
      7
  2. (ii) By considering the observed frequencies given in Table 1 with the expected frequencies in Table 2, interpret, in context, the association, if any, between milk yield and breed.
OCR Further Statistics AS 2022 June Q4
7 marks Standard +0.3
4 A school pupil keeps a note of whether her journeys to school and from school are delayed. The results for a random sample of journeys are shown in the table.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Direction of journey
\cline { 2 - 3 } \multicolumn{1}{c|}{}To schoolFrom school
Delayed6456
Not delayed74106
Test at the 10\% significance level whether there is association between delays and the direction of the journey.
OCR Further Statistics AS 2024 June Q2
8 marks Standard +0.3
2 For a random sample of 160 employees of a large company, the principal method of transport for getting to work, arranged according to grade of employee, is shown in the table.
GradeWalk or cyclePrivate motorised transportPublic transport
A9136
B164341
C11813
A test is carried out at the \(5 \%\) significance level of whether there is association between grade of employee and method of transport.
  1. State appropriate hypotheses for the test. The contributions to the test statistic are shown in the following table, correct to 3 decimal places.
    GradeWalk or cyclePrivate motorised transportPublic transport
    A1.1570.2891.929
    B1.8780.2250.327
    C2.0061.8000.083
  2. Show how the value 0.225 is obtained.
  3. Complete the test, stating the conclusion.
  4. Which combination of grade of employee and method of transport most strongly suggests association? Justify your answer.
OCR Further Statistics AS 2020 November Q5
12 marks Standard +0.3
5 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h]
\multirow{2}{*}{Observed frequencies}Session
EarlyMiddleLate
\multirow{3}{*}{Age group}< 25242040
25 to 604210
> 60282210
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The cinema manager carries out a test of whether there is any association between age group and session attended.
  1. Show that it is necessary to combine cells in order to carry out the test. It is decided to combine the second and third rows of the table. Some of the expected frequencies for the table with rows combined, and the corresponding contributions to the \(\chi ^ { 2 }\) test statistic, are shown in the following incomplete tables. \begin{table}[h]
    \multirow{2}{*}{Expected frequencies}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 2529.423.1
    \(\geqslant 25\)26.620.9
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table} \begin{table}[h]
    \multirow{2}{*}{Contribution to \(\chi ^ { 2 }\)}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 250.99180.4160
    \(\geqslant 25\)1.09620.4598
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  2. In the Printed Answer Booklet, complete both tables.
  3. Carry out the test at the \(5 \%\) significance level.
  4. Use the figures in your completed Table 3 to comment on the numbers of the audience in different age groups.
OCR Further Statistics AS Specimen Q3
8 marks Standard +0.3
3 Carl believes that the proportions of men and women who own black cars are different. He obtained a random sample of people who each owned exactly one car. The results are summarised in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}BlackNon-black
Men6971
Women3055
Test at the 5\% significance level whether Carl's belief is justified.
OCR Further Statistics 2019 June Q6
10 marks Standard +0.3
6 Yusha investigates the proportion of left-handed people living in two cities, \(A\) and \(B\). He obtains data from random samples from the two cities. His results are shown in the table, in which \(L\) denotes "left-handed".
\(L\)\(L ^ { \prime }\)
\(A\)149
\(B\)2651
  1. Test at the 10\% significance level whether there is association between being left-handed and living in a particular city. A person is chosen at random from one of the cities \(A\) and \(B\).
    Let \(A\) denote "the person lives in city \(A\) ".
  2. State the relationship between \(\mathrm { P } ( L )\) and \(P ( L \mid A )\) according to the model implied by the null hypothesis of your test.
  3. Use the data in the table to suggest a value for \(P ( L \mid A )\) given by an improved model.
Edexcel S3 2022 January Q4
10 marks Standard +0.3
4. A manager at a large estate agency believes that the type of property affects the time taken to sell it. A random sample of 125 properties sold is shown in the table.
\multirow{2}{*}{}Type of property
BungalowFlatHouseTotal
Sold within three months7294682
Sold in more than three months9191543
Total164861125
Test, at the \(5 \%\) level of significance, whether there is evidence for an association between the type of property and the time taken to sell it. You should state your hypotheses, expected frequencies, test statistic and the critical value used for this test.
Edexcel S3 2022 January Q4
14 marks Standard +0.3
  1. A survey was carried out with students that had studied Maths, Physics and Chemistry at a college between 2016 and 2020. The students were divided into two groups \(A\) and \(B\).
    1. Explain how a sample could be obtained from this population using quota sampling.
    The students were asked which of the three subjects they enjoyed the most. The results of the survey are shown in the table.
    \multirow{2}{*}{}Subject enjoyed the most
    MathsPhysicsChemistryTotal
    Group A16101339
    Group B38131061
    Total542323100
  2. Test, at the \(5 \%\) level of significance, whether the subject enjoyed the most is independent of group. You should state your hypotheses, expected frequencies, test statistic and the critical value used for this test. The Headteacher discovered later that the results were actually based on a random sample of 200 students but had been recorded in the table as percentages.
  3. For the test in part (b), state with reasons the effect, if any, that this information would have on
    1. the null and alternative hypotheses,
    2. the critical value,
    3. the value of the test statistic,
    4. the conclusion of the test.
Edexcel S3 2024 January Q1
8 marks Standard +0.3
  1. Chen is treating vines to prevent fungus appearing. One month after the treatment, Chen monitors the vines to see if fungus is present.
The contingency table shows information about the type of treatment for a sample of 150 vines and whether or not fungus is present.
\multirow{2}{*}{}Type of treatment
NoneSulphurCopper sulphate
No fungus present205548
Fungus present1089
Test, at the \(5 \%\) level of significance, whether or not there is any association between the type of treatment and the presence of fungus.
Show your working clearly, stating your hypotheses, expected frequencies, test statistic and critical value.
Edexcel S3 2014 June Q5
12 marks Standard +0.3
  1. A random sample of 200 people were asked which hot drink they preferred from tea, coffee and hot chocolate. The results are given below.
\cline { 3 - 6 } \multicolumn{2}{|c|}{}
\multirow{2}{*}{Total}
\cline { 3 - 5 } \multicolumn{2}{|c|}{}TeaCoffeeHot Chocolate
\multirow{2}{*}{Gender}Males57261194
\cline { 2 - 6 }Females424717106
Total997328200
  1. Test, at the \(5 \%\) significance level, whether or not there is an association between type of drink preferred and gender. State your hypotheses and show your working clearly. You should state your expected frequencies to 2 decimal places.
  2. State what difference using a \(0.5 \%\) significance level would make to your conclusion. Give a reason for your answer.
Edexcel S3 2015 June Q5
12 marks Standard +0.3
  1. A Head of Department at a large university believes that gender is independent of the grade obtained by students on a Business Foundation course. A random sample was taken of 200 male students and 160 female students who had studied the course.
The results are summarised below.
\cline { 3 - 4 } \multicolumn{2}{c|}{}MaleFemale
\multirow{3}{*}{Grade}Distinction\(18.5 \%\)\(27.5 \%\)
\cline { 2 - 4 }Merit\(63.5 \%\)\(60.0 \%\)
\cline { 2 - 4 }Unsatisfactory\(18.0 \%\)\(12.5 \%\)
Stating your hypotheses clearly, test the Head of Department's belief using a 5\% level of significance. Show your working clearly.
Edexcel S3 2016 June Q2
12 marks Moderate -0.3
2. A researcher investigates the results of candidates who took their driving test at one of three driving test centres. A random sample of 620 candidates gave the following results.
\multirow{2}{*}{}Driving test centre\multirow{2}{*}{Total}
\(\boldsymbol { A }\)BC
\multirow{2}{*}{Result}Pass9911068277
Fail108116119343
Total207226187620
  1. Test, at the \(5 \%\) level of significance, whether there is an association between the results of candidates' driving tests and the driving test centre. State your hypotheses and show your working clearly. You should state your expected frequencies correct to 2 decimal places. The researcher decides to conduct a further investigation into the results of candidates' driving tests.
  2. State which driving test centre you would recommend for further investigation. Give a reason for your answer.
Edexcel S3 2017 June Q2
10 marks Standard +0.3
2. A school uses online report cards to promote both hard work and good behaviour of its pupils. Each card details a pupil's recent achievement and contains exactly one of three inspirational messages \(A , B\) or \(C\), chosen by the pupil's teacher. The headteacher believes that there is an association between the pupil's gender and the inspirational message chosen. He takes a random sample of 225 pupils and examines the card for each pupil. His results are shown in Table 1. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{2}{c|}{}Inspirational message\multirow{2}{*}{Total}
\cline { 3 - 5 } \multicolumn{2}{c|}{}\(\boldsymbol { A }\)\(\boldsymbol { B }\)\(\boldsymbol { C }\)
\multirow{2}{*}{
Pupil's
gender
}
Male253745107
\cline { 2 - 6 }Female325036118
Total578781225
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Stating your hypotheses clearly, test, at the \(10 \%\) level of significance, whether or not there is evidence to support the headteacher's belief. Show your working clearly. You should state your expected frequencies correct to 2 decimal places.