5.06a Chi-squared: contingency tables

179 questions

Sort by: Default | Easiest first | Hardest first
CAIE FP2 2015 June Q6
8 marks Moderate -0.3
6 The reliability of the broadband connection received from two suppliers, \(A\) and \(B\), is classified as good, fair or poor by a random sample of householders. The information collected is summarised in the following table.
Reliability
\cline { 3 - 5 } \multicolumn{2}{|c|}{}GoodFairPoor
\multirow{2}{*}{Supplier}\(A\)656333
\cline { 2 - 5 }\(B\)514444
Test, at the 5\% significance level, whether reliability is independent of supplier.
CAIE FP2 2018 June Q8
8 marks Standard +0.3
8 A manufacturer produces three types of car: hatchbacks, saloons and estates. Each type of car is available in one of three colours: silver, blue and red. The manufacturer wants to know whether the popularity of the colour of the car is related to the type of car. A random sample of 300 cars chosen by customers gives the information summarised in the following table.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Colour of car
\cline { 3 - 5 } \multicolumn{2}{c|}{}SilverBlueRed
\multirow{3}{*}{Type of car}Hatchback533641
\cline { 2 - 5 }Saloon294031
\cline { 2 - 5 }Estate282418
Test at the \(10 \%\) significance level whether the colour of car chosen by customers is independent of the type of car.
CAIE FP2 2019 June Q8
8 marks Standard +0.3
8 Two salesmen, \(A\) and \(B\), work at a company that arranges different types of holidays: self-catering, hotel and cruise. The table shows, for a random sample of 150 holidays, the number of each type arranged by each salesman.
Type of holiday
\cline { 3 - 5 } \multicolumn{2}{|c|}{}Self-cateringHotelCruise
\multirow{2}{*}{Salesman}\(A\)253821
\cline { 2 - 5 }\(B\)282117
Test at the 10\% significance level whether the type of holiday arranged is independent of the salesman.
CAIE FP2 2017 November Q8
8 marks Standard +0.3
8 Members of a Statistics club are voting to elect a new president of the club. Members must choose to vote either by post or by text or by email. The method of voting chosen by a random sample of 60 male members and 40 female members is given in the following table.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Method of voting
\cline { 3 - 5 } \multicolumn{2}{c|}{}PostTextEmail
\multirow{2}{*}{Gender}Male101238
\cline { 2 - 5 }Female52114
Test, at the \(1 \%\) significance level, whether there is an association between method of voting and gender.
AQA Further AS Paper 2 Statistics 2021 June Q7
11 marks Standard +0.3
7 Two employees, \(A\) and \(B\), both produce the same toy for a company. The company records the total number of errors made per day by each employee during a 40-day period. The results are summarised in the following table. Employee
Number of errors made per day
0123 or moreTotal
\(A\)81020240
B18415340
Total261435580
The company claims that there is an association between employee and number of errors made per day. 7
  1. Test the company's claim, using the \(5 \%\) level of significance.
    7
  2. By considering observed and expected frequencies, interpret in context the association between employee and number of errors made per day. \includegraphics[max width=\textwidth, alt={}, center]{9be40ed6-6df8-426a-8afd-fefc17287de6-12_2492_1723_217_150}
    \includegraphics[max width=\textwidth, alt={}]{9be40ed6-6df8-426a-8afd-fefc17287de6-16_2496_1721_214_148}
AQA Further AS Paper 2 Statistics Specimen Q7
9 marks Standard +0.3
7 A dairy industry researcher, Robyn, decided to investigate the milk yield, classified as low, medium or high, obtained from four different breeds of cow, \(\mathrm { A } , \mathrm { B } , \mathrm { C }\) and D . The milk yield of a sample of 105 cows was monitored and the results are summarised in contingency Table 1.
\multirow{2}{*}{Table 1}Yield
LowMediumHighTotal
\multirow{4}{*}{Breed}A451221
B106420
C817732
D520732
Total274830105
The sample of cows may be regarded as random.
Robyn decides to carry out a \(\chi ^ { 2 }\)-test for association between milk yield and breed using the information given in Table 1. 7
  1. Contingency Table 2 gives some of the expected frequencies for this test.
    Complete Table 2 with the missing expected values.
    \multirow[t]{2}{*}{Table 2}Yield
    LowMediumHigh
    \multirow{4}{*}{Breed}A6
    B5.149.145.71
    C
    D8.2314.639.14
    7
  2. (i) For Robyn's test, the test statistic \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 19.4\) correct to three significant figures.
    Use this information to carry out Robyn's test, using the \(1 \%\) level of significance.
    7 (b) (ii) By considering the observed frequencies given in Table 1 with the expected frequencies in Table 2, interpret, in context, the association, if any, between milk yield and breed.
OCR Further Statistics AS 2019 June Q7
9 marks Standard +0.3
7 In a standard model from genetic theory, the ratios of types \(a , b , c\) and \(d\) of a characteristic from a genetic cross are predicted to be 9:3:3:1. Andrei collects 120 specimens from such a cross, and the numbers corresponding to each type of the characteristic are given in the table.
Type\(a\)\(b\)\(c\)\(d\)
Frequency5133306
Andrei tests, at the 1\% significance level, whether the observed frequencies are consistent with the standard model.
  1. State appropriate hypotheses for the test.
  2. Carry out the test.
  3. State with a reason which one of the frequencies is least consistent with the standard model.
  4. Suggest a different, improved model by changing exactly two of the ratio values.
OCR Further Statistics AS 2022 June Q4
7 marks Standard +0.3
4 A school pupil keeps a note of whether her journeys to school and from school are delayed. The results for a random sample of journeys are shown in the table.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Direction of journey
\cline { 2 - 3 } \multicolumn{1}{c|}{}To schoolFrom school
Delayed6456
Not delayed74106
Test at the 10\% significance level whether there is association between delays and the direction of the journey.
OCR Further Statistics AS 2024 June Q2
8 marks Standard +0.3
2 For a random sample of 160 employees of a large company, the principal method of transport for getting to work, arranged according to grade of employee, is shown in the table.
GradeWalk or cyclePrivate motorised transportPublic transport
A9136
B164341
C11813
A test is carried out at the \(5 \%\) significance level of whether there is association between grade of employee and method of transport.
  1. State appropriate hypotheses for the test. The contributions to the test statistic are shown in the following table, correct to 3 decimal places.
    GradeWalk or cyclePrivate motorised transportPublic transport
    A1.1570.2891.929
    B1.8780.2250.327
    C2.0061.8000.083
  2. Show how the value 0.225 is obtained.
  3. Complete the test, stating the conclusion.
  4. Which combination of grade of employee and method of transport most strongly suggests association? Justify your answer.
OCR Further Statistics AS 2020 November Q5
12 marks Standard +0.3
5 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h]
\multirow{2}{*}{Observed frequencies}Session
EarlyMiddleLate
\multirow{3}{*}{Age group}< 25242040
25 to 604210
> 60282210
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The cinema manager carries out a test of whether there is any association between age group and session attended.
  1. Show that it is necessary to combine cells in order to carry out the test. It is decided to combine the second and third rows of the table. Some of the expected frequencies for the table with rows combined, and the corresponding contributions to the \(\chi ^ { 2 }\) test statistic, are shown in the following incomplete tables. \begin{table}[h]
    \multirow{2}{*}{Expected frequencies}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 2529.423.1
    \(\geqslant 25\)26.620.9
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table} \begin{table}[h]
    \multirow{2}{*}{Contribution to \(\chi ^ { 2 }\)}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 250.99180.4160
    \(\geqslant 25\)1.09620.4598
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  2. In the Printed Answer Booklet, complete both tables.
  3. Carry out the test at the \(5 \%\) significance level.
  4. Use the figures in your completed Table 3 to comment on the numbers of the audience in different age groups.
OCR Further Statistics 2019 June Q6
10 marks Standard +0.3
6 Yusha investigates the proportion of left-handed people living in two cities, \(A\) and \(B\). He obtains data from random samples from the two cities. His results are shown in the table, in which \(L\) denotes "left-handed".
\(L\)\(L ^ { \prime }\)
\(A\)149
\(B\)2651
  1. Test at the 10\% significance level whether there is association between being left-handed and living in a particular city. A person is chosen at random from one of the cities \(A\) and \(B\).
    Let \(A\) denote "the person lives in city \(A\) ".
  2. State the relationship between \(\mathrm { P } ( L )\) and \(P ( L \mid A )\) according to the model implied by the null hypothesis of your test.
  3. Use the data in the table to suggest a value for \(P ( L \mid A )\) given by an improved model.
Edexcel S3 2021 January Q3
10 marks Standard +0.3
3. The students in a group of schools can choose a club to join. There are 4 clubs available: Music, Art, Sports and Computers. The director collected information about the number of students in each club, using a random sample of 88 students from across the schools. The results are given in Table 1 below. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{1}{c|}{}MusicArtSportsComputers
No. of students14282719
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The director uses a chi-squared test to determine whether or not the students are uniformly distributed across the 4 clubs.
    1. Find the expected frequencies he should use. Given that the test statistic he calculated was 6.09 (to 3 significant figures)
    2. use a \(5 \%\) level of significance to complete the test. You should state the degrees of freedom and the critical value used. The director wishes to examine the situation in more detail and takes a second random sample of 88 students. The director assumes that within each school, students select their clubs independently. The students come from 3 schools and the distribution of the students from each school amongst the clubs is given in Table 2 below. \begin{table}[h]
      School ClubMusicArtSportsComputers
      School \(\boldsymbol { A }\)31098
      School \(\boldsymbol { B }\)111135
      School \(\boldsymbol { C }\)11674
      \captionsetup{labelformat=empty} \caption{Table 2}
      \end{table} The director wishes to test for an association between a student's school and the club they choose.
  1. State hypotheses suitable for such a test.
  2. Calculate the expected frequency for School \(C\) and the Computers club. The director calculates the test statistic to be 7.29 (to 3 significant figures) with 4 degrees of freedom.
  3. Explain clearly why his test has 4 degrees of freedom.
  4. Complete the test using a \(5 \%\) level of significance and stating clearly your critical value.
Edexcel S3 2022 January Q4
10 marks Standard +0.3
4. A manager at a large estate agency believes that the type of property affects the time taken to sell it. A random sample of 125 properties sold is shown in the table.
\multirow{2}{*}{}Type of property
BungalowFlatHouseTotal
Sold within three months7294682
Sold in more than three months9191543
Total164861125
Test, at the \(5 \%\) level of significance, whether there is evidence for an association between the type of property and the time taken to sell it. You should state your hypotheses, expected frequencies, test statistic and the critical value used for this test.
Edexcel S3 2022 January Q4
14 marks Standard +0.3
  1. A survey was carried out with students that had studied Maths, Physics and Chemistry at a college between 2016 and 2020. The students were divided into two groups \(A\) and \(B\).
    1. Explain how a sample could be obtained from this population using quota sampling.
    The students were asked which of the three subjects they enjoyed the most. The results of the survey are shown in the table.
    \multirow{2}{*}{}Subject enjoyed the most
    MathsPhysicsChemistryTotal
    Group A16101339
    Group B38131061
    Total542323100
  2. Test, at the \(5 \%\) level of significance, whether the subject enjoyed the most is independent of group. You should state your hypotheses, expected frequencies, test statistic and the critical value used for this test. The Headteacher discovered later that the results were actually based on a random sample of 200 students but had been recorded in the table as percentages.
  3. For the test in part (b), state with reasons the effect, if any, that this information would have on
    1. the null and alternative hypotheses,
    2. the critical value,
    3. the value of the test statistic,
    4. the conclusion of the test.
Edexcel S3 2023 January Q3
9 marks Moderate -0.8
3 A mobile phone company offers an insurance policy to its customers when they purchase a mobile phone. The company conducted a survey on the age of the customers and whether or not claims were made. A random sample of 1200 customers from this company was investigated for 2020 and the results are shown in the table below.
Claim made in 2020No claim made in 2020Total
\multirow{3}{*}{Age}17-20 years24176200
21-50 years48652700
51 years and over14286300
Total8611141200
The data are to be used to determine whether or not making a claim is independent of age.
  1. Calculate the expected frequencies for the age group 51 years and over that
    1. made a claim in 2020
    2. did not make a claim in 2020 The 4 classes of customers aged between 17 and 50 give a value of \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 7.123\) correct to 3 decimal places.
  2. Test, at the \(1 \%\) level of significance, whether or not making a claim is independent of age. Show your working clearly, stating your hypotheses, the degrees of freedom, the test statistic and the critical value used.
Edexcel S3 2024 January Q1
8 marks Standard +0.3
  1. Chen is treating vines to prevent fungus appearing. One month after the treatment, Chen monitors the vines to see if fungus is present.
The contingency table shows information about the type of treatment for a sample of 150 vines and whether or not fungus is present.
\multirow{2}{*}{}Type of treatment
NoneSulphurCopper sulphate
No fungus present205548
Fungus present1089
Test, at the \(5 \%\) level of significance, whether or not there is any association between the type of treatment and the presence of fungus.
Show your working clearly, stating your hypotheses, expected frequencies, test statistic and critical value.
Edexcel S3 2014 June Q5
12 marks Standard +0.3
  1. A random sample of 200 people were asked which hot drink they preferred from tea, coffee and hot chocolate. The results are given below.
\cline { 3 - 6 } \multicolumn{2}{|c|}{}
\multirow{2}{*}{Total}
\cline { 3 - 5 } \multicolumn{2}{|c|}{}TeaCoffeeHot Chocolate
\multirow{2}{*}{Gender}Males57261194
\cline { 2 - 6 }Females424717106
Total997328200
  1. Test, at the \(5 \%\) significance level, whether or not there is an association between type of drink preferred and gender. State your hypotheses and show your working clearly. You should state your expected frequencies to 2 decimal places.
  2. State what difference using a \(0.5 \%\) significance level would make to your conclusion. Give a reason for your answer.
Edexcel S3 2016 June Q2
12 marks Moderate -0.3
2. A researcher investigates the results of candidates who took their driving test at one of three driving test centres. A random sample of 620 candidates gave the following results.
\multirow{2}{*}{}Driving test centre\multirow{2}{*}{Total}
\(\boldsymbol { A }\)BC
\multirow{2}{*}{Result}Pass9911068277
Fail108116119343
Total207226187620
  1. Test, at the \(5 \%\) level of significance, whether there is an association between the results of candidates' driving tests and the driving test centre. State your hypotheses and show your working clearly. You should state your expected frequencies correct to 2 decimal places. The researcher decides to conduct a further investigation into the results of candidates' driving tests.
  2. State which driving test centre you would recommend for further investigation. Give a reason for your answer.
Edexcel S3 2017 June Q2
10 marks Standard +0.3
2. A school uses online report cards to promote both hard work and good behaviour of its pupils. Each card details a pupil's recent achievement and contains exactly one of three inspirational messages \(A , B\) or \(C\), chosen by the pupil's teacher. The headteacher believes that there is an association between the pupil's gender and the inspirational message chosen. He takes a random sample of 225 pupils and examines the card for each pupil. His results are shown in Table 1. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{2}{c|}{}Inspirational message\multirow{2}{*}{Total}
\cline { 3 - 5 } \multicolumn{2}{c|}{}\(\boldsymbol { A }\)\(\boldsymbol { B }\)\(\boldsymbol { C }\)
\multirow{2}{*}{
Pupil's
gender
}
Male253745107
\cline { 2 - 6 }Female325036118
Total578781225
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Stating your hypotheses clearly, test, at the \(10 \%\) level of significance, whether or not there is evidence to support the headteacher's belief. Show your working clearly. You should state your expected frequencies correct to 2 decimal places.
Edexcel S3 2018 June Q4
9 marks Standard +0.3
4. A company selects a random sample of five of its warehouses. The table below summarises the number of employees, in thousands, at each warehouse and the number of reported first aid incidents at each warehouse during 2017
WarehouseA\(B\)CDE
Number of employees, (in thousands)213.832.2
Number of reported first aid incidents1510402623
The personnel manager claims that the mean number of reported first aid incidents per 1000 employees is the same at each of the company's warehouses.
  1. Stating your hypotheses clearly, use a \(5 \%\) level of significance to test the manager's claim. Jean, the safety officer at warehouse \(C\), kept a record of each reported first aid incident at warehouse \(C\) in 2017. Jean wishes to select a systematic sample of 10 records from warehouse \(C\).
  2. Explain, in detail, how Jean should obtain such a sample.
Edexcel S3 2021 June Q2
9 marks Standard +0.3
  1. A doctor believes that the diet of her patients and their health are not independent.
She takes a random sample of 200 patients and records whether they are in good health or poor health and whether they have a good diet or a poor diet. The results are summarised in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Good healthPoor health
Good diet868
Poor diet9115
Stating your hypotheses clearly, test the doctor's belief using a \(5 \%\) level of significance. Show your working for your test statistic and state your critical value clearly.
Edexcel S3 2023 June Q2
10 marks Moderate -0.3
  1. A business accepts cash, bank cards or mobile apps as payment methods.
The manager wishes to test whether or not there is an association between the payment amount and the payment method used. The manager takes a random sample of 240 payments and records the payment amount and the payment method used. The manager's results are shown in the table.
\multirow{2}{*}{}Payment amount
Under £50£50 to £150Over £150
\multirow{3}{*}{Payment method}Cash231918
Bank card213231
Mobile app163941
Using these results,
  1. calculate the expected frequencies for the payment amount under \(\pounds 50\) that
    1. use cash
    2. use a bank card
    3. use a mobile app Given that for the other 6 classes \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 2.4048\) to 4 decimal places,
  2. test, at the \(5 \%\) level of significance, whether or not there is evidence for an association between the payment amount and the payment method used. You should state the hypotheses, the test statistic, the degrees of freedom and the critical value used for this test.
Edexcel S3 2024 June Q4
11 marks Standard +0.3
  1. The manager of a company making ice cream believes that the proportions of people in the population who prefer vanilla, chocolate, strawberry and other are in the ratio \(10 : 5 : 2 : 3\)
The manager takes a random sample of 400 customers and records their age and favourite ice cream flavour. The results are shown in the table below.
\multirow{2}{*}{}Ice cream flavour
VanillaChocolateStrawberryOtherTotal
\multirow{3}{*}{Age}Child95251325158
Teenager57201736130
Adult36501016112
Total188954077400
  1. Use the data in the table to test, at the \(5 \%\) level of significance, the manager's belief. You should state your hypotheses, test statistic, critical value and conclusion clearly. A researcher wants to investigate whether or not there is a relationship between the age of a customer and their favourite ice cream flavour. In order to test whether favourite ice cream flavour and age are related, the researcher plans to carry out a \(\chi ^ { 2 }\) test.
  2. Use the table to calculate expected frequencies for the group
    1. teenagers whose favourite ice cream flavour is vanilla,
    2. adults whose favourite ice cream flavour is chocolate.
  3. Write down the number of degrees of freedom for this \(\chi ^ { 2 }\) test.
Edexcel S3 2020 October Q2
9 marks Moderate -0.3
2. A university awards its graduates a degree in one of three categories, Distinction, Merit or Pass. Table 1 shows information about a random sample of 200 graduates from three departments, Arts, Humanities and Sciences. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{1}{c|}{}ArtsHumanitiesSciencesTotal
Distinction22323892
Merit15301358
Pass18151750
Total557768
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Xiu wants to carry out a test of independence between the category of degree and the department. Table 2 shows some of the values of \(\frac { ( O - E ) ^ { 2 } } { E }\) for this test. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{1}{c|}{}ArtsHumanitiesSciencesTotal
Distinction0.430.331.442.20
Merit0.062.632.294.98
Pass
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Complete Table 2
  2. Hence, complete Xiu's hypothesis test using a \(5 \%\) level of significance. You should state the hypotheses, the degrees of freedom and the critical value used for this test.
Edexcel S3 2021 October Q4
11 marks Moderate -0.3
  1. A local village radio station, LSB, decides to survey adults in its broadcasting area about the programmes it produces. \(L S B\) broadcasts to 4 villages \(\mathrm { A } , \mathrm { B } , \mathrm { C }\) and D .
    The number of households in each of the villages is given below.
VillageNumber of households
A41
B164
C123
D82
LSB decides to take a stratified sample of 200 households.
  1. Explain how to select the households for this stratified sample.
    (3) One of the questions in the survey related to the age group of each member of the household and whether they listen to \(L S B\). The data received are shown below.
    \multirow{2}{*}{}Age group
    18-4950-69Older than 69
    Listen to LSB13016265
    Do not listen to LSB789862
    The data are to be used to determine whether or not there is an association between the age group and whether they listen to \(L S B\).
  2. Calculate the expected frequencies for the age group 50-69 that
    1. listen to \(L S B\)
    2. do not listen to \(L S B\) (2) Given that for the other 4 classes \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 4.657\) to 3 decimal places,
  3. test at the \(5 \%\) level of significance, whether or not there is evidence of an association between age and listening to \(L S B\). Show your working clearly, stating the degrees of freedom and the critical value.