Chi-squared test of independence

A question is this type if and only if it involves testing whether two categorical variables are independent using a contingency table and chi-squared test.

157 questions · Standard +0.2

Sort by: Default | Easiest first | Hardest first
Edexcel S3 2021 June Q2
9 marks Standard +0.3
  1. A doctor believes that the diet of her patients and their health are not independent.
She takes a random sample of 200 patients and records whether they are in good health or poor health and whether they have a good diet or a poor diet. The results are summarised in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Good healthPoor health
Good diet868
Poor diet9115
Stating your hypotheses clearly, test the doctor's belief using a \(5 \%\) level of significance. Show your working for your test statistic and state your critical value clearly.
Edexcel S3 2021 October Q4
11 marks Moderate -0.3
  1. A local village radio station, LSB, decides to survey adults in its broadcasting area about the programmes it produces. \(L S B\) broadcasts to 4 villages \(\mathrm { A } , \mathrm { B } , \mathrm { C }\) and D .
    The number of households in each of the villages is given below.
VillageNumber of households
A41
B164
C123
D82
LSB decides to take a stratified sample of 200 households.
  1. Explain how to select the households for this stratified sample.
    (3) One of the questions in the survey related to the age group of each member of the household and whether they listen to \(L S B\). The data received are shown below.
    \multirow{2}{*}{}Age group
    18-4950-69Older than 69
    Listen to LSB13016265
    Do not listen to LSB789862
    The data are to be used to determine whether or not there is an association between the age group and whether they listen to \(L S B\).
  2. Calculate the expected frequencies for the age group 50-69 that
    1. listen to \(L S B\)
    2. do not listen to \(L S B\) (2) Given that for the other 4 classes \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 4.657\) to 3 decimal places,
  3. test at the \(5 \%\) level of significance, whether or not there is evidence of an association between age and listening to \(L S B\). Show your working clearly, stating the degrees of freedom and the critical value.
Edexcel S3 2018 Specimen Q5
12 marks Standard +0.3
  1. A Head of Department at a large university believes that gender is independent of the grade obtained by students on a Business Foundation course. A random sample was taken of 200 male students and 160 female students who had studied the course.
The results are summarised below.
\cline { 3 - 4 } \multicolumn{2}{c|}{}MaleFemale
\multirow{3}{*}{Grade}Distinction\(18.5 \%\)\(27.5 \%\)
\cline { 2 - 4 }Merit\(63.5 \%\)\(60.0 \%\)
\cline { 2 - 4 }Unsatisfactory\(18.0 \%\)\(12.5 \%\)
Stating your hypotheses clearly, test the Head of Department's belief using a \(5 \%\) level of significance. Show your working clearly.
Edexcel S3 Specimen Q5
10 marks Standard +0.3
5. A random sample of 100 people were asked if their finances were worse, the same or better than this time last year. The sample was split according to their annual income and the results are shown in the table below.
\backslashbox{Annual income}{Finances}WorseSameBetter
Under £15 00014119
£15000 and above172029
Test, at the \(5 \%\) level of significance, whether or not the relative state of their finances is independent of their income range. State your hypotheses and show your working clearly. \includegraphics[max width=\textwidth, alt={}, center]{304e58fa-eb82-4e2d-83f4-848f3eb461c8-15_2576_1774_141_159}
Edexcel S3 2006 January Q4
9 marks Moderate -0.3
4. People over the age of 65 are offered an annual flu injection. A health official took a random sample from a list of patients who were over 65 . She recorded their gender and whether or not the offer of an annual flu injection was accepted or rejected. The results are summarised below.
GenderAcceptedRejected
Male170110
Female280140
Using a \(5 \%\) significance level, test whether or not there is an association between gender and acceptance or rejection of an annual flu injection. State your hypotheses clearly.
Edexcel S3 2002 June Q5
11 marks Standard +0.3
5. The manager of a leisure centre collected data on the usage of the facilities in the centre by its members. A random sample from her records is summarised below.
FacilityMaleFemale
Pool4068
Jacuzzi2633
Gym5231
Making your method clear, test whether or not there is any evidence of an association between gender and use of the club facilities. State your hypotheses clearly and use a \(5 \%\) level of significance.
(11)
Edexcel S3 2003 June Q4
11 marks Moderate -0.3
4. A new drug to treat the common cold was used with a randomly selected group of 100 volunteers. Each was given the drug and their health was monitored to see if they caught a cold. A randomly selected control group of 100 volunteers was treated with a dummy pill. The results are shown in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}ColdNo cold
Drug3466
Dummy pill4555
Using a \(5 \%\) significance level, test whether or not the chance of catching a cold is affected by taking the new drug. State your hypotheses clearly.
Edexcel S3 2004 June Q5
12 marks Standard +0.3
5. A random sample of 500 adults completed a questionnaire on how often they took part in some form of exercise. They gave a response of 'never', 'sometimes' or 'regularly'. Of those asked, \(52 \%\) were females of whom \(10 \%\) never exercised and \(35 \%\) exercised regularly. Of the males, \(12.5 \%\) never exercised and \(55 \%\) sometimes exercised. Test, at the \(5 \%\) level of significance, whether or not there is any association between gender and the amount of exercise. State your hypotheses clearly.
Edexcel S3 2005 June Q1
4 marks Standard +0.3
  1. (a) State two reasons why stratified sampling might be chosen as a method of sampling when carrying out a statistical survey.
    (b) State one advantage and one disadvantage of quota sampling.
  2. A sample of size 5 is taken from a population that is normally distributed with mean 10 and standard deviation 3 . Find the probability that the sample mean lies between 7 and 10 .
    (Total 6 marks)
  3. A researcher carried out a survey of three treatments for a fruit tree disease. The contingency table below shows the results of a survey of a random sample of 60 diseased trees.
No actionRemove diseased branchesSpray with chemicals
Tree died within 1 year1056
Tree survived for 1-4 years597
Tree survived beyond 4 years567
Test, at the \(5 \%\) level of significance, whether or not there is any association between the treatment of the trees and their survival. State your hypotheses and conclusion clearly.
(Total 11 marks)
Edexcel S3 2006 June Q6
11 marks Standard +0.3
6. A research worker studying colour preference and the age of a random sample of 50 children obtained the results shown below.
Age in yearsRedBlueTotals
412618
810717
126915
Totals282250
Using a \(5 \%\) significance level, carry out a test to decide whether or not there is an association between age and colour preference. State your hypotheses clearly.
Edexcel S3 2007 June Q2
10 marks Standard +0.3
  1. The Director of Studies at a large college believed that students' grades in Mathematics were independent of their grades in English. She examined the results of a random group of candidates who had studied both subjects and she recorded the number of candidates in each of the 6 categories shown.
Maths grade A or BMaths grade C or DMaths grade E or U
English grade A or B252510
English grade C to U153015
  1. Stating your hypotheses clearly, test the Director's belief using a \(10 \%\) level of significance. You must show each step of your working. The Head of English suggested that the Director was losing accuracy by combining the English grades C to U in one row. He suggested that the Director should split the English grades into two rows, grades C or D and grades E or U as for Mathematics.
  2. State why this might lead to problems in performing the test.
Edexcel S3 2008 June Q2
11 marks Standard +0.3
2. Students in a mixed sixth form college are classified as taking courses in either Arts, Science or Humanities. A random sample of students from the college gave the following results
\cline { 3 - 4 } \multicolumn{2}{c|}{}Course
\cline { 3 - 5 } \multicolumn{2}{c|}{}ArtsScienceHumanities
EsuderBoy305035
\cline { 2 - 5 }Girl402042
Showing your working clearly, test, at the \(1 \%\) level of significance, whether or not there is an association between gender and the type of course taken. State your hypotheses clearly.
Edexcel S3 2010 June Q5
10 marks Standard +0.3
  1. A random sample of 100 people were asked if their finances were worse, the same or better than this time last year. The sample was split according to their annual income and the results are shown in the table below.
Annual income FinancesWorseSameBetter
Under \(\pounds 15000\)14119
\(\pounds 15000\) and above172029
Test, at the \(5 \%\) level of significance, whether or not the relative state of their finances is independent of their income range. State your hypotheses and show your working clearly.
Edexcel S3 2011 June Q3
10 marks Moderate -0.3
3. A factory manufactures batches of an electronic component. Each component is manufactured in one of three shifts. A component may have one of two types of defect, \(D _ { 1 }\) or \(D _ { 2 }\), at the end of the manufacturing process. A production manager believes that the type of defect is dependent upon the shift that manufactured the component. He examines 200 randomly selected defective components and classifies them by defect type and shift. The results are shown in the table below.
\backslashbox{Shift}{Defect type}\(D _ { 1 }\)\(D _ { 2 }\)
First shift4518
Second shift5520
Third shift5012
Stating your hypotheses, test, at the \(10 \%\) level of significance, whether or not there is evidence to support the manager's belief. Show your working clearly.
Edexcel S3 2012 June Q4
10 marks Standard +0.3
  1. Two breeds of chicken are surveyed to measure their egg yield. The results are shown in the table below.
\backslashbox{Breed}{Egg yield}LowMediumHigh
Leghorn225226
Cornish14324
Showing each stage of your working clearly, test, at the \(5 \%\) significance level, whether or not there is an association between egg yield and breed of chicken. State your hypotheses clearly.
Edexcel S3 2013 June Q4
12 marks Standard +0.3
  1. John thinks that a person's eye colour is related to their hair colour. He takes a random sample of 600 people and records their eye and hair colours. The results are shown in Table 1.
\begin{table}[h]
\multirow{2}{*}{}Hair colour
BlackBrownRedBlondeTotal
\multirow{5}{*}{Eye colour}Brown451251558243
Blue34901058192
Hazel20381626100
Green62972365
Total10528248165600
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} John carries out a \(\chi ^ { 2 }\) test in order to test whether eye colour and hair colour are related. He calculates the expected frequencies shown in Table 2. \begin{table}[h]
\multirow{2}{*}{}Hair colour
BlackBrownRedBlonde
\multirow{4}{*}{Eye colour}Brown42.5114.219.466.8
Blue33.690.215.452.8
Hazel17.547827.5
Green11.430.65.217.9
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Show how the value 47 in Table 2 has been calculated.
  2. Write down the number of degrees of freedom John should use in this \(\chi ^ { 2 }\) test. Given that the value of the \(\chi ^ { 2 }\) statistic is 20.6 , to 3 significant figures,
  3. find the smallest value of \(\alpha\) for which the null hypothesis will be rejected at the \(\alpha \%\) level of significance.
  4. Use the data from Table 1 to test at the \(5 \%\) level of significance whether or not the proportions of people in the population with black, brown, red and blonde hair are in the ratio 2:6:1:3 State your hypotheses clearly.
Edexcel S3 2013 June Q1
10 marks Moderate -0.3
  1. A doctor takes a random sample of 100 patients and measures their intake of saturated fats in their food and the level of cholesterol in their blood. The results are summarised in the table below.
\backslashbox{Intake of saturated fats}{Cholesterol level}HighLow
High128
Low2654
Using a \(5 \%\) level of significance, test whether or not there is an association between cholesterol level and intake of saturated fats. State your hypotheses and show your working clearly.
Edexcel S3 2014 June Q2
7 marks Standard +0.3
  1. A survey asked a random sample of 200 people their age and the main use of their mobile phone.
The results are shown in Table 1 below. \begin{table}[h]
\multirow{2}{*}{}Main use of their mobile phone
InternetTextsPhone calls
\multirow{3}{*}{Age}Under 2027149
From 20 to 40323429
Over 40151921
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The data are to be used to test whether or not age and main use of their mobile phone are independent. Table 2 shows the expected frequencies for each group, assuming people's age and main use of their mobile phone are independent. \begin{table}[h]
\multirow{2}{*}{}Main use of their mobile phone
InternetTextsPhone calls
\multirow{3}{*}{Age}Under 2018.516.7514.75
From 20 to 4035.1531.82528.025
Over 4020.3518.42516.225
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. For users under 20 choosing the Internet as the main use of their mobile phone,
    1. verify that the expected frequency is 18.5
    2. show that the contribution to the \(\chi ^ { 2 }\) test statistic is 3.91 to 3 significant figures.
  2. Given that the \(\chi ^ { 2 }\) test statistic for the data is 9.893 to 3 decimal places, test at the \(5 \%\) level of significance whether or not age and main use of their mobile phone are independent. State your hypotheses clearly.
Edexcel S3 2014 June Q3
10 marks Standard +0.3
3. A number of males and females were asked to rate their happiness under the headings "not happy", "fairly happy" and "very happy". The results are shown in the table below
Happiness\multirow{2}{*}{Total}
\cline { 3 - 5 } \multicolumn{2}{|c|}{}Not happyFairly happyVery happy
\multirow{2}{*}{Gender}Female9433486
\cline { 2 - 6 }Male13251654
Total226850140
Stating your hypotheses, test at the \(5 \%\) level of significance, whether or not there is evidence of an association between happiness and gender. Show your working clearly.
Edexcel S3 2016 June Q1
4 marks Standard +0.3
  1. (a) State two reasons why stratified sampling might be a more suitable sampling method than simple random sampling.
    (b) State two reasons why stratified sampling might be a more suitable sampling method than quota sampling.
  2. A new drug to vaccinate against influenza was given to 110 randomly chosen volunteers. The volunteers were given the drug in one of 3 different concentrations, \(A , B\) and \(C\), and then were monitored to see if they caught influenza. The results are shown in the table below.
\cline { 2 - 4 } \multicolumn{1}{c|}{}ABC
Influenza12299
No influenza152322
Test, at the \(10 \%\) level of significance, whether or not there is an association between catching influenza and the concentration of the new drug. State your hypotheses and show your working clearly. You should state your expected frequencies to 2 decimal places.
(10)
Edexcel S3 2017 June Q4
14 marks Standard +0.3
4. A psychologist carries out a survey of the perceived body weight of 150 randomly chosen people. He asks them if they think they are underweight, about right or overweight. His results are summarised in the table below.
\cline { 2 - 4 } \multicolumn{1}{c|}{}UnderweightAbout rightOverweight
Male202230
Female162834
The psychologist calculates two of the expected frequencies, to 2 decimal places, for a test of independence between perceived body weight and gender. These results are shown in the table below.
\cline { 2 - 4 } \multicolumn{1}{c|}{}UnderweightAbout rightOverweight
Male17.28
Female18.72
  1. Complete the table of expected frequencies shown above.
  2. Test, at the \(10 \%\) level of significance, whether or not perceived body weight is independent of gender. State your hypotheses clearly. The psychologist now combines the male and female data to test whether or not body weight types are chosen equally.
  3. Find the smallest significance level, from the tables in the formula booklet, for which there is evidence of a preference.
Edexcel S3 Q1
5 marks Moderate -0.3
  1. A random sample \(X _ { 1 } , X _ { 2 } , \ldots , X _ { 10 }\) is taken from a normal population with mean 100 and standard deviation 14.
    1. Write down the distribution of \(\bar { X }\), the mean of this sample.
    2. Find \(\mathrm { P } ( | \bar { X } - 100 | > 5 )\).
    3. A random sample of the invoices, for books purchased by the customers of a large bookshop, was classified by book cover (hardback, paperback) and type of book (novel, textbook, general interest). As part of the analysis of these invoices, an approximate \(\chi ^ { 2 }\) statistic was calculated and found to be 11.09 .
    Assuming that there was no need to amalgamate any of the classifications, carry out an appropriate test to determine whether or not there was any association between book cover and type of book. State your hypotheses clearly and use a \(5 \%\) level of significance.
    (6 marks)
Edexcel S3 Specimen Q7
11 marks Standard +0.3
7. A survey in a college was commissioned to investigate whether or not there was any association between gender and passing a driving test. A group of 50 male and 50 female students were asked whether they passed or failed their driving test at the first attempt. All the students asked had taken the test. The results were as follows.
PassFail
Male2327
Female3218
Stating your hypotheses clearly test, at the \(10 \%\) level, whether or not there is any evidence of an association between gender and passing a driving test at the first attempt.
AQA S2 2006 January Q2
12 marks Standard +0.3
2 Year 12 students at Newstatus School choose to participate in one of four sports during the Spring term. The students' choices are summarised in the table.
SquashBadmintonArcheryHockeyTotal
Male516301970
Female4203353110
Total9366372180
  1. Use a \(\chi ^ { 2 }\) test, at the \(5 \%\) level of significance, to determine whether the choice of sport is independent of gender.
  2. Interpret your result in part (a) as it relates to students choosing hockey.
AQA S2 2007 January Q7
10 marks Standard +0.3
7 A statistics unit is required to determine whether or not there is an association between students' performances in mathematics at Key Stage 3 and at GCE. A survey of the results of 500 students showed the following information:
\multirow{2}{*}{}GCE Grade\multirow[b]{2}{*}{Total}
ABCBelow C
\multirow{3}{*}{Key Stage 3 Level}860554743205
755323126144
640383538151
Total155125113107500
  1. Use a \(\chi ^ { 2 }\) test at the \(10 \%\) level of significance to determine whether there is an association between students' performances in mathematics at Key Stage 3 and at GCE.
  2. Comment on the number of students who gained a grade A at GCE having gained a level 7 at Key Stage 3.