Chi-squared test of independence

A question is this type if and only if it involves testing whether two categorical variables are independent using a contingency table and chi-squared test.

157 questions · Standard +0.2

Sort by: Default | Easiest first | Hardest first
WJEC Further Unit 2 2018 June Q6
8 marks Moderate -0.5
6. A student, considering options for the future, collects data on education and salary. The table below shows the highest level of education attained and the salary bracket of a random sample of 664 people.
Fewer than 5 GCSE5 or more GCSE3 A LevelsUniversity degreePost graduate qualificationTotal
Less than £200001832202810108
£20000 to £60000509511215550462
More than £600003222935594
Total7114916121865664
By conducting a chi-squared test for independence, the student investigates the relationship between the highest level of education attained and the salary earned.
  1. State the null and alternative hypotheses.
  2. The table below shows the expected values. Calculate the value of \(k\).
    Expected valuesFewer than 5 GCSE5 or more GCSE3 A LevelsUniversity degreePost graduate qualification
    Less than £20000\(k\)24.2326•1935.4610.57
    £20000 to £6000049.40103.67112.02151.68\(45 \cdot 23\)
    More than £6000010.0521.09\(22 \cdot 79\)30.869.20
  3. The following computer output is obtained. Calculate the values of \(m\) and \(n\).
    Chi Squared ContributionsFewer than 5 GCSE5 or more GCSE3 A LevelsUniversity degreePost graduate qualification
    Less than £200003.604530799\(m\)1.461651.56860.03098
    £20000 to £600000.0072727350.725354E-060.072640.50396
    More than £60000\(4 \cdot 946619863\)0.03897169081\(0 \cdot 55498\)\(n\)
    X-squared \(= 19 \cdot 61301 , d f = 8 , p\)-value \(= 0 \cdot 0119\)
    1. Without carrying out any further calculations, explain how X-squared \(= 19 \cdot 61301\) (the \(\chi ^ { 2 }\) test statistic) was calculated.
    2. Comment on the values in the "Fewer than 5 GCSE" column of the table in part (c).
  4. The student says that the highest levels of education lead to the highest paying jobs. Comment on the accuracy of the student's statement.
WJEC Further Unit 2 2022 June Q6
11 marks Standard +0.3
6. An online survey on the use of social media asked the following question: \begin{displayquote} "Do you use any form of social media?" \end{displayquote} The results for a total of 1953 respondents are shown in the table below.
Age in years
Use social media18-2930-4950-6465 or olderTotal
Yes3104123481961266
No42116196333687
Total3525285445291953
To test whether there is a relationship between social media use and age, a significance test is carried out at the \(5 \%\) level.
  1. State the null and alternative hypotheses.
  2. Show how the expected frequency \(228 \cdot 18\) is calculated in the table below.
    Expected valuesAge in years
    Use social media18-2930-4950-6465 or older
    Yes\(228 \cdot 18\)\(342 \cdot 27\)352.64342.92
    No123.82185.73191.36186.08
  3. Determine the value of \(s\) in the table below.
    Chi-squared contributionsAge in years
    Use social media18-2930-4950-6465 or older
    Yes29.34\(s\)0.0662.94
    No54.0726-180.11115.99
  4. Complete the significance test, showing all your working.
  5. A student, analysing these data on a spreadsheet, obtains the following output. \includegraphics[max width=\textwidth, alt={}, center]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-5_202_1271_445_415} Explain why the student must have made an error in calculating the \(p\)-value.
WJEC Further Unit 2 2024 June Q5
12 marks Moderate -0.5
5. Lily is interested in the relationship between the way in which students learned Welsh and their attitude towards the Welsh language. Students were categorised as having learned Welsh in one of three ways:
  • from one Welsh-speaking parent/carer at home,
  • from two Welsh-speaking parents/carers at home,
  • at school only, for those with no Welsh-speaking parents/carers at home.
The students were asked to rate their attitude towards the Welsh language from 'Very negative' to 'Very positive'. The following data for a random sample of 253 students were collected as part of a project.
Learned Welsh
AttitudeFrom two parents/carersFrom one parent/carerAt school onlyTotal
Very negative2143046
Slightly negative4202145
Neutral1217837
Slightly positive21191151
Very positive25212874
Total649198253
Lily intends to carry out a chi-squared test for independence at the \(5 \%\) level. She produces the following tables which are incomplete.
Expected FrequenciesLearned Welsh
AttitudeFrom two parents/carersFrom one parent/carerAt school only
Very negative11.6416.5517.82
Slightly negative11.3816.1917.43
Neutral9.3613.3114.33
Slightly positive12.9018.3419.75
Very positiveF26.6228.66
Chi-Squared ContributionsLearned Welsh
AttitudeFrom two parents/carersFrom one parent/carerAt school only
Very negative7.980.398.33
Slightly negative\(4 \cdot 79\)0.900.73
Neutral\(0 \cdot 74\)1.02G
Slightly positive5.080.023.88
Very positive2.111.190.02
Total20.703.52H
  1. Calculate the values of \(F , G\) and \(H\).
  2. Carry out Lily's chi-squared test for independence at the \(5 \%\) level.
  3. By referring to the figures in the tables on pages 16 and 17, give two comments on the relationship between the way students learned Welsh and their attitude towards the Welsh language.
WJEC Further Unit 2 Specimen Q7
12 marks Moderate -0.5
7. The Pew Research Center's Internet Project offers scholars access to raw data sets from their research. One of the Pew Research Center's projects was on teenagers and technology. A random sample of American families was selected to complete a questionnaire. For each of their children, between and including the ages of 13 and 15, parents of these families were asked: Do you know your child's password for any of [his/her] social media accounts?
Responses to this question were received from 493 families. The table below provides a summary of their responses.
Age (years)Total
Parent know password131415
Yes767567218
No66103106275
Total142178173493
  1. A test for significance is to be undertaken to see whether there is an association between whether a parent knows any of their child's social media passwords and the age of the child.
    1. Clearly state the null and alternative hypotheses.
    2. Obtain the expected value that is missing from the table below, indicating clearly how it is calculated from the data values given in the table above. Expected values:
      Age (years)
      Parent knows
      password
      \(\mathbf { 1 3 }\)\(\mathbf { 1 4 }\)\(\mathbf { 1 5 }\)
      Yes62.7978.7176.50
      No99.2996.50
    3. Obtain the two chi-squared contributions that are missing from the table below. Chi-squared contributions:
      Age (years)
      Parent knows
      password
      \(\mathbf { 1 3 }\)\(\mathbf { 1 4 }\)\(\mathbf { 1 5 }\)
      Yes0.1751.180
      No2.2030.935
      The following output was obtained from the statistical package that was used to undertake the analysis: $$\text { Pearson chi-squared } ( 2 ) = 7.409 \quad p \text {-value } = 0.0305$$
    4. Indicate how the degrees of freedom have been calculated for the chi-squared statistic.
    5. Interpret the output obtained from the statistical test in terms of the initial hypotheses.
  2. Comment on the nature of the association observed, based on the contributions to the test statistic calculated in (a).
AQA Further Paper 3 Statistics Specimen Q5
8 marks Standard +0.3
5 Students at a science department of a university are offered the opportunity to study an optional language module, either German or Mandarin, during their second year of study. From a sample of 50 students who opted to study a language module, 31 were female. Of those who opted to study Mandarin, 8 were female and 12 were male. Test, using the \(5 \%\) level of significance, whether choice of language is independent of gender. The sample of students may be regarded as random.
[0pt] [8 marks] Turn over for the next question
Edexcel FS1 AS 2018 June Q4
7 marks Standard +0.3
  1. Abram carried out a survey of two treatments for a plant fungus. The contingency table below shows the results of a survey of a random sample of 125 plants with the fungus.
\multirow{2}{*}{}Treatment
No actionPlant sprayed oncePlant sprayed every day
\multirow{3}{*}{Outcome}Plant died within a month151625
Plant survived for 1-6 months82510
Plant survived beyond 6 months7145
Abram calculates expected frequencies to carry out a suitable test. Seven of these are given in the partly-completed table below.
\multirow{2}{*}{}Treatment
No actionPlant sprayed oncePlant sprayed every day
\multirow{3}{*}{Outcome}Plant died within a month17.92
Plant survived for 1-6 months10.3218.9213.76
Plant survived beyond 6 months6.2411.448.32
The value of \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) for the 7 given values is 8.29
Test at the \(2.5 \%\) level of significance, whether or not there is an association between the treatment of the plants and their survival. State your hypotheses and conclusion clearly.
Edexcel FS1 AS 2019 June Q1
6 marks Standard +0.3
  1. A leisure club offers a choice of one of three activities to its 150 members on a Tuesday evening. The manager believes that there may be an association between the choice of activity and the age of the member and collected the following data.
\backslashbox{Age \(\boldsymbol { a }\) years}{Activity}BadmintonBowlsSnooker
\(a < 20\)933
\(20 \leqslant a < 40\)101014
\(40 \leqslant a < 50\)16155
\(50 \leqslant a < 60\)151311
\(a \geqslant 60\)4193
  1. Write down suitable hypotheses for a test of the manager's belief. The manager calculated expected frequencies to use in the test.
  2. Calculate the expected frequency of members aged 60 or over who choose snooker, used by the manager.
  3. Explain why there are 6 degrees of freedom used in this test. The test statistic used to test the manager's belief is 19.583
  4. Using a 5\% level of significance, complete the test of the manager's belief.
Edexcel FS1 AS 2020 June Q2
15 marks Standard +0.3
  1. In an experiment, James flips a coin 3 times and records the number of heads. He carries out the experiment 100 times with his left hand and 100 times with his right hand.
\multirow{2}{*}{}Number of heads
0123
Left hand7294222
Right hand13353616
  1. Test, at the \(5 \%\) level of significance, whether or not there is an association between the hand he flips the coin with and the number of heads. You should state your hypotheses, the degrees of freedom and the critical value used for this test.
  2. Assuming the coin is unbiased, write down the distribution of the number of heads in 3 flips.
  3. Carry out a \(\chi ^ { 2 }\) test, at the \(10 \%\) level of significance, to test whether or not the distribution you wrote down in part (b) is a suitable model for the number of heads obtained in the 200 trials of James' experiment. You should state your hypotheses, the degrees of freedom and the critical value used for this test.
Edexcel FS1 AS 2022 June Q1
7 marks Moderate -0.3
  1. Stuart is investigating a treatment for a disease that affects fruit trees. He has 400 fruit trees and applies the treatment to a random sample of these trees. The remainder of the trees have no treatment. He records the number of years, \(y\), that each fruit tree remains free from this disease.
The results are summarised in the table below.
\cline { 3 - 3 } \multicolumn{2}{c|}{}Treatment
\cline { 3 - 4 } \multicolumn{2}{c|}{}AppliedNot applied
\multirow{3}{*}{
Number of years free
from this disease
}
\(y < 1\)1525
\cline { 2 - 4 }\(1 \leqslant y < 2\)3561
\cline { 2 - 4 }\(2 \leqslant y\)124140
The data are to be used to determine whether or not there is an association between the application of the treatment and the number of years that a fruit tree remains free from this disease.
  1. Calculate the expected frequencies for
    1. Applied and \(y < 1\)
    2. Not applied and \(1 \leqslant y < 2\) The value of \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) for the other four classes is 2.642 to 3 decimal places.
  2. Test, at the \(5 \%\) level of significance, whether or not there is an association between the application of the treatment and the number of years a fruit tree remains free from this disease. You should state your hypotheses, test statistic, critical value and conclusion clearly.
Edexcel FS1 AS 2023 June Q2
6 marks Standard +0.3
  1. A bag contains a large number of balls, all of the same size and weight. The balls are coloured Red, Blue or Yellow.
Jasmine asks each child in a group of 150 children to close their eyes, select a ball from the bag and show it to her. The child then replaces the ball and repeats the process a second time. If both balls are the same colour the child receives a prize.
The results are given in the table below.
\backslashbox{2nd colour}{1st colour}RedBlueYellowTotal
Red31111860
Blue810927
Yellow2193363
Total603060150
Jasmine carries out a test, at the \(5 \%\) level of significance, to see whether or not the colour of the 2nd ball is independent of the colour of the 1st ball.
  1. Calculate the expected frequencies for the cases where both balls are the same colour. The test statistic Jasmine obtained was 12.712 to three decimal places.
  2. Use this value to complete the test, stating the critical value and conclusion clearly. With reference to your calculations in part (a) and the nature of the experiment, (c) give a plausible reason why Jasmine may have obtained her conclusion in part (b).
Edexcel FS1 AS 2024 June Q1
6 marks Moderate -0.3
  1. Sharma believes that each computer game he sells appeals equally to all age ranges.
To investigate this, he takes a random sample of 100 people who play these games and asks them which of the games \(A , B\) or \(C\) they prefer.
The results are summarised in the table below.
Computer game\(A\)\(B\)\(C\)
\multirow{3}{*}{Age range}\(< 20\)8156
\cline { 2 - 5 }\(20 - 30\)21129
\cline { 2 - 5 }\(> 30\)61013
  1. Write down hypotheses for a suitable test to assess Sharma's belief.
  2. For the test, calculate the expected frequency for
    1. those players aged under 20 who prefer game \(C\)
    2. those players aged between 20 and 30 who prefer game \(A\)
  3. State the degrees of freedom of the test statistic for this test. Sharma correctly calculates the test statistic for this test to be 11.542 (to 3 decimal places).
  4. Using a \(5 \%\) significance level, and stating your critical value, comment on Sharma's belief.
Edexcel FS1 AS Specimen Q1
8 marks Standard +0.3
  1. A university foreign language department carried out a survey of prospective students to find out which of three languages they were most interested in studying.
A random sample of 150 prospective students gave the following results.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Language
\cline { 3 - 5 } \multicolumn{2}{c|}{}FrenchSpanishM andarin
\multirow{2}{*}{Gender}M ale232220
\cline { 2 - 5 }Female383215
A test is carried out at the \(1 \%\) level of significance to determine whether or not there is an association between gender and choice of language.
  1. State the null hypothesis for this test.
  2. Show that the expected frequency for females choosing Spanish is 30.6
  3. Calculate the test statistic for this test, stating the expected frequencies you have used.
  4. State whether or not the null hypothesis is rejected. Justify your answer.
  5. Explain whether or not the null hypothesis would be rejected if the test was carried out at the \(10 \%\) level of significance. \section*{Q uestion 1 continued} \section*{Q uestion 1 continued} \section*{Q uestion 1 continued}
Edexcel FS1 2024 June Q3
6 marks Standard +0.3
  1. Tisam took a survey of students' favourite colours. The results are summarised in the table below.
\multirow{2}{*}{}Colour
RedBlueGreenYellowBlackTotal
\multirow{3}{*}{Year group}1-534151422388
6-92332129884
10-12528198868
Total6275453919240
Tisam carries out a suitable test to see if there is any association between favourite colour and year group.
  1. Write down the hypotheses for a suitable test. For her table, Tisam only needs to check one cell to show that none of the expected frequencies are less than 5
    1. Identify this cell, giving your reason.
    2. Calculate the expected frequency for this cell. The test statistic for Tisam's test is 38.449
  2. Using a \(1 \%\) level of significance, complete the test. You should state your critical value and conclusion clearly.
OCR MEI Further Statistics A AS Specimen Q3
10 marks Standard +0.3
3 In this question you must show detailed reasoning. A student is investigating what people think about organic food. She wishes to see if there is any difference between the opinions of females and males. She takes a random sample of 100 people and asks each of them if they think that organic food is better for their health than non-organic food. She will use the data to conduct a hypothesis test. The table below shows the opinions of these 100 people.
\cline { 3 - 4 } \multicolumn{2}{c|}{}Sex
\cline { 3 - 4 } \multicolumn{2}{c|}{}FemaleMale
\multirow{2}{*}{
Opinion on
organic food
}
Organic better3518
\cline { 2 - 4 }Not better2225
  1. Explain why the student should use a random sample.
  2. Carry out a test at the \(5 \%\) significance level to examine whether there is any association between a person's sex and their opinion on organic food. Show your calculations.
OCR MEI Further Statistics Minor 2020 November Q3
8 marks Standard +0.3
3 In this question you must show detailed reasoning. In a survey into pet ownership, one of the questions was 'Do you own either a cat or a dog (or both)?’. A total of 121 people took part in the survey and you should assume that they form a random sample of people in a particular town. The results, classified by the age of the person being surveyed, are shown in Table 3. \begin{table}[h]
\multirow{2}{*}{}Ownership of cat or dog
Does ownDoes not own
\multirow{2}{*}{Age}Over 45 years3829
Under 45 years2331
\captionsetup{labelformat=empty} \caption{Table 3}
\end{table} Carry out a test at the 10\% significance level to investigate whether, for people in this town, there is any association between age and ownership of a cat or dog.
OCR Further Statistics 2018 March Q6
10 marks Standard +0.3
6 The captain of a sports team analyses the team's results according to the weather conditions, classified as "sunny" and "not sunny". The frequencies are shown in the following table.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Results
\cline { 3 - 5 } \multicolumn{2}{c|}{}WinDrawLose
\multirow{2}{*}{Weather}Sunny1235
\cline { 2 - 5 }Not sunny81210
  1. Test at the \(5 \%\) significance level whether the team's performances are associated with weather conditions.
  2. (a) Identify the cell that gives the largest contribution to the test statistic.
    (b) Interpret your answer to part (ii)(a).
OCR FS1 AS 2018 March Q7
11 marks Standard +0.3
7 The numbers of students taking A levels in three subjects at a school were classified by the year in which they entered the school as follows.
\cline { 2 - 5 } \multicolumn{1}{c|}{}SubjectMathematicsEnglishPhysics
\multirow{3}{*}{
Year of
Entry
}
Year 717167
\cline { 2 - 5 }Year 121325
The Head of the school carries out a significance test at the \(10 \%\) level to test whether subjects taken are independent of year of entry.
  1. Show that in carrying out the test it is necessary to combine columns.
  2. Suggest a reason why it is more sensible to combine the columns for Mathematics and Physics than the columns for Physics and English.
  3. Carry out the test.
  4. State which cell gives the largest contribution to the test statistic.
  5. Interpret your answer to part (iv).
Edexcel S3 Q5
11 marks Standard +0.3
5. The manager of a leisure centre collected data on the usage of the facilities in the centre by its members. A random sample from her records is summarised below.
FacilityMaleFemale
Pool4068
Jacuzzi2633
Gym5231
Making your method clear, test whether or not there is any evidence of an association between gender and use of the club facilities. State your hypotheses clearly and use a \(5 \%\) level of significance.
AQA S2 2009 January Q1
11 marks Standard +0.3
1 Fortune High School gave its students a wider choice of subjects to study. The table shows the number of students, of each gender, who chose to study each of the additional subjects during the school year 2007/08.
\cline { 2 - 5 } \multicolumn{1}{c|}{}Bulgarian
Climate
Change
FinancePolish
Male7312540
Female2242219
Assuming that these data form a random sample, use a \(\chi ^ { 2 }\) test, at the \(10 \%\) level of significance, to test whether the choice of these subjects is independent of gender.
(11 marks)
AQA S2 2007 June Q1
10 marks Standard +0.3
1 Two groups of patients, suffering from the same medical condition, took part in a clinical trial of a new drug. One of the groups was given the drug whilst the other group was given a placebo, a drug that has no physical effect on their medical condition. The table shows the number of patients in each group and whether or not their condition improved.
\cline { 2 - 3 } \multicolumn{1}{c|}{}PlaceboDrug
Condition improved2046
Condition did not improve5529
Conduct a \(\chi ^ { 2 }\) test, at the \(5 \%\) level of significance, to determine whether the condition of the patients at the conclusion of the trial is associated with the treatment that they were given.
(10 marks)
AQA S2 2009 June Q3
12 marks Standard +0.3
3 A sample survey, conducted to determine the attitudes of residents to a proposed reorganisation of local schools, gave the following results.
Against reorganisationNot against reorganisation
\multirow{5}{*}{Age of resident}16-1792
18-211710
22-4911590
50-654134
Over 6534
Use a \(\chi ^ { 2 }\) test, at the \(5 \%\) level of significance, to determine whether there is an association between the ages of residents and their attitudes to the proposed reorganisation of local schools.
AQA Further AS Paper 2 Statistics 2018 June Q8
10 marks Standard +0.3
8 An insurance company groups its vehicle insurance policies into two categories, car insurance and motorbike insurance. The number of claims in a random sample of 80 policies was monitored and the results summarised in contingency Table 1. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 1}
\multirow{2}{*}{}Number of claims
0123 or moreTotal
\multirow[b]{3}{*}{Type of insurance policy}Car91011535
Motorbike19138545
Total2823191080
\end{table} The insurance company decides to carry out a \(\chi ^ { 2 }\)-test for association between number of claims and type of insurance policy using the information given in Table 1. 8
  1. The contingency table shown in Table 2 gives some of the exact expected frequencies for this test. Complete Table 2 with the missing exact expected values. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 2}
    \multirow{2}{*}{}Number of claims
    0123 or more
    \multirow{2}{*}{Type of insurance policy}Car10.06254.375
    Motorbike10.6875
    \end{table} 8
  2. Carry out the insurance company's test, using the \(10 \%\) level of significance. \includegraphics[max width=\textwidth, alt={}, center]{313cd5ce-07ff-4781-a134-565b8b221145-12_2488_1719_219_150} Additional page, if required.
    Write the question numbers in the left-hand margin. Additional page, if required.
    Write the question numbers in the left-hand margin. Additional page, if required.
    Write the question numbers in the left-hand margin.
AQA Further AS Paper 2 Statistics 2019 June Q7
9 marks Standard +0.3
7 Mohammed is conducting a medical trial to study the effect of two drugs, \(A\) and \(B\), on the amount of time it takes to recover from a particular illness. Drug \(A\) is used by one group of 60 patients and drug \(B\) is used by a second group of 60 patients. The results are summarised in the table:
AQA Further AS Paper 2 Statistics 2022 June Q7
8 marks Standard +0.3
7 Wade and Odelia are investigating whether there is an association between the region where a person lives and the brand of washing powder they use. They decide to conduct a \(\chi ^ { 2 }\)-test for association and survey a random sample of 200 people. The expected frequencies for the test have been calculated and are shown in the contingency table below.
AQA Further AS Paper 2 Statistics 2023 June Q7
10 marks Standard +0.3
7 A theatre has morning, afternoon and evening shows. On one particular day, the theatre asks all of its customers to state whether they enjoyed or did not enjoy the show. The results are summarised in the table.
Morning showAfternoon showEvening showTotal
Enjoyed6291172325
Not enjoyed2535115175
Total87126287500
The theatre claims that there is no association between the show that a customer attends and whether they enjoyed the show. 7
  1. Investigate the theatre's claim, using a \(2.5 \%\) level of significance.
    7
  2. By considering observed and expected frequencies, interpret in context the association between the show that a customer attends and whether they enjoyed the show.