Chi-squared test of independence

158 questions · 14 question types identified

Sort by: Question count | Difficulty
Standard 2×3 contingency table

A question is this type if and only if the data form a 2-row by 3-column (or 3-row by 2-column) contingency table requiring a chi-squared test of independence with 2 degrees of freedom, with no need to combine cells.

36 Standard +0.1
22.8% of questions
Show example »
7 Mohammed is conducting a medical trial to study the effect of two drugs, \(A\) and \(B\), on the amount of time it takes to recover from a particular illness. Drug \(A\) is used by one group of 60 patients and drug \(B\) is used by a second group of 60 patients. The results are summarised in the table:
View full question →
Hardest question Standard +0.3 »
1 Young children are learning to read using two different reading schemes, \(A\) and \(B\). The standards achieved are measured against the national average standard achieved and classified as above average, average or below average. For two randomly chosen groups of young children, the numbers in each category are shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Standard achieved
\cline { 2 - 4 } \multicolumn{1}{c|}{}Above averageAverageBelow average
Scheme \(A\)313522
Scheme \(B\)195043
Test at the \(5 \%\) significance level whether standard achieved is independent of the reading scheme used.
View full question →
Standard 2×2 contingency table

A question is this type if and only if the data form a 2-row by 2-column contingency table requiring a chi-squared test of independence with 1 degree of freedom.

24 Standard +0.1
15.2% of questions
Show example »
Carl believes that the proportions of men and women who own black cars are different. He obtained a random sample of people who each owned exactly one car. The results are summarised in the table below.
BlackNon-black
Men6971
Women3055
Test at the 5\% significance level whether Carl's belief is justified. [8]
View full question →
Easiest question Moderate -0.3 »
6 Certain types of food are now sold in metric units. A random sample of 1000 shoppers was asked whether they were in favour of the change to metric units or not. The results, classified according to age, were as shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Age of shopper
\cline { 2 - 4 } \multicolumn{1}{c|}{}Under 3535 and overTotal
In favour of change187161348
Not in favour of change283369652
Total4705301000
  1. Use a \(\chi ^ { 2 }\) test to show that there is very strong evidence that shoppers' views about changing to metric units are not independent of their ages.
  2. The data may also be regarded as consisting of two random samples of shoppers; one sample consists of 470 shoppers aged under 35 , of whom 187 were in favour of change, and the second sample consists of 530 shoppers aged 35 or over, of whom 161 were in favour of change. Determine whether a test for equality of population proportions supports the conclusion in part (i).
View full question →
Hardest question Standard +0.8 »
4 A study in 1981 investigated the effect of water fluoridation on children's dental health. In a town with fluoridation, 61 out of a random sample of 107 children showed signs of increased tooth decay after six months. In a town without fluoridation the corresponding number was 106 out of a random sample of 143 children. The population proportions of children with increased tooth decay are denoted by \(p _ { 1 }\) and \(p _ { 2 }\) for the towns with fluoridation and without fluoridation respectively. A test is carried out of the null hypothesis \(p _ { 1 } = p _ { 2 }\) against the alternative hypothesis \(p _ { 1 } < p _ { 2 }\). Find the smallest significance level at which the null hypothesis is rejected.
View full question →
Expected frequencies partially provided

A question is this type if and only if some expected frequencies or contributions to the test statistic are given in a table and the student must complete the table and/or verify specific values before carrying out the test.

22 Standard +0.2
13.9% of questions
Show example »
7 Wade and Odelia are investigating whether there is an association between the region where a person lives and the brand of washing powder they use. They decide to conduct a \(\chi ^ { 2 }\)-test for association and survey a random sample of 200 people. The expected frequencies for the test have been calculated and are shown in the contingency table below.
View full question →
Easiest question Moderate -0.3 »
5 Gloria is a market trader who sells jeans. She trades on Mondays, Wednesdays and Fridays. Wishing to investigate whether the volume of trade depends on the day of the week, Gloria analysed a random sample of 150 days' sales and classified them by day and volume (low, medium and high). The results are given in the table below.
Day
MondayWednesdayFriday
\multirow{3}{*}{Volume}Low15132
Medium232623
High12927
Gloria asked a statistician to perform a suitable test of independence and, as part of this test, expected frequencies were calculated. These are shown in the table below.
Day
MondayWednesdayFriday
Low10.009.6010.40
VolumeMedium24.0023.0424.96
High16.0015.3616.64
  1. Show how the value 23.04 for medium volume on Wednesday has been obtained.
  2. State, giving a reason, if it is necessary to combine any rows or columns in order to carry out the test. The value of the test statistic is found to be 21.15, correct to 2 decimal places.
  3. Stating suitable hypotheses for the test, give its conclusion using a \(1 \%\) significance level. Gloria wishes to hold a sale and asks the statistician to advise her on which day to hold it in order to sell as much as possible.
  4. State the day that the statistician should advise and give a reason for the choice.
View full question →
Hardest question Standard +0.3 »
4 A student is investigating whether there is any association between the species of shellfish that occur on a rocky shore and where they are located. A random sample of 160 shellfish is selected and the numbers of shellfish in each category are summarised in the table below.
Location
\cline { 3 - 5 } \multicolumn{2}{|c|}{}ExposedShelteredPool
\multirow{3}{*}{Species}Limpet243216
\cline { 2 - 5 }Mussel24113
\cline { 2 - 5 }Other52223
  1. Write down null and alternative hypotheses for a test to examine whether there is any association between species and location. The contributions to the test statistic for the usual \(\chi ^ { 2 }\) test are shown in the table below.
    ContributionLocation
    \cline { 3 - 5 }ExposedShelteredPool
    \multirow{3}{*}{Species}Limpet0.00090.25850.4450
    \cline { 2 - 5 }Mussel10.34721.27564.8773
    \cline { 2 - 5 }Other8.07190.14027.4298
    The sum of these contributions is 32.85 .
  2. Calculate the expected frequency for mussels in pools. Verify the corresponding contribution 4.8773 to the test statistic.
  3. Carry out the test at the \(5 \%\) level of significance, stating your conclusion clearly.
  4. For each species, comment briefly on how its distribution compares with what would be expected if there were no association.
  5. If 3 of the 160 shellfish are selected at random, one from each of the 3 types of location, find the probability that all 3 of them are limpets.
View full question →
Interpret association after test

A question is this type if and only if, after performing the chi-squared test, the student is explicitly asked to interpret or comment on the nature of the association by comparing observed and expected frequencies for specific cells or categories.

18 Standard +0.3
11.4% of questions
Show example »
6 During August, 102 candidates took their driving test at centre \(A\) and 60 passed. During the same month, 110 candidates took their driving test at centre \(B\) and 80 passed. 6
  1. Test whether the driving test result is independent of the driving test centre using the \(5 \%\) level of significance. 6
  2. Rebecca claims that if the result of the test in part (a) is to reject the null hypothesis then it is easier to pass a driving test at centre \(B\) than centre \(A\). State, with a reason, whether or not you agree with Rebecca's claim.
View full question →
Easiest question Moderate -0.3 »
2 A large multinational company recruits employees from all four countries in the UK. For a sample of 250 recruits, the percentages of males and females from each of the countries are shown in Table 1. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 1}
\cline { 2 - 5 } \multicolumn{1}{c|}{}EnglandScotlandWales
Northern
Ireland
Male22.817.610.86.8
Female15.617.27.61.6
\end{table}
  1. Add the frequencies to the contingency table, Table 2, below.
  2. Carry out a \(\chi ^ { 2 }\)-test at the \(10 \%\) significance level to investigate whether there is an association between country and gender of recruits.
  3. By comparing observed and expected values, make one comment about the distribution of female recruits.
    [0pt] [1 mark] \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 2}
    EnglandScotlandWalesNorthern IrelandTotal
    Male145
    Female105
    Total250
    \end{table}
View full question →
Hardest question Standard +0.3 »
5 Two companies, \(P\) and \(Q\), produce a certain type of paint brush. An independent examiner rates the quality of the brushes produced as poor, satisfactory or good. He takes a random sample of brushes from each company. The examiner's ratings are summarised in the table.
CompanyPoorSatisfactoryGood
\(P\)184364
\(Q\)222231
  1. Test, at the \(5 \%\) significance level, whether quality of brushes is independent of company.
  2. Compare the quality of the brushes produced by the two companies.
View full question →
Cell combining required

A question is this type if and only if the student is explicitly required to combine rows or columns before carrying out the chi-squared test because one or more expected frequencies would otherwise fall below 5.

15 Standard +0.1
9.5% of questions
Show example »
3 A sample survey, conducted to determine the attitudes of residents to a proposed reorganisation of local schools, gave the following results.
Against reorganisationNot against reorganisation
\multirow{5}{*}{Age of resident}16-1792
18-211710
22-4911590
50-654134
Over 6534
Use a \(\chi ^ { 2 }\) test, at the \(5 \%\) level of significance, to determine whether there is an association between the ages of residents and their attitudes to the proposed reorganisation of local schools.
View full question →
Easiest question Easy -1.2 »
2 A test for association is to be carried out. The tables below show the observed frequencies and the expected frequencies that are to be used for the test.
ObservedXYZ
A28666
B884
C541610
Expected\(\mathbf { X }\)\(\mathbf { Y }\)\(\mathbf { Z }\)
\(\mathbf { A }\)451540
\(\mathbf { B }\)938
\(\mathbf { C }\)361232
It is necessary to merge some rows or columns before the test can be carried out.
Find the entry in the tables that provides evidence for this.
Circle your answer.
[0pt] [1 mark]
Observed A-Z
Observed B-Z
Expected A-X
Expected B-Y
View full question →
Hardest question Standard +0.3 »
6 A scientist is investigating whether the ability to remember depends on age. A random sample of 150 students in different age groups is chosen. Each student is shown a set of 20 objects for thirty seconds and then asked to list as many as they can remember. The students are graded \(A\) or \(B\) according to how many objects they remembered correctly: grade \(A\) for 16 or more correct and grade \(B\) for fewer than 16 correct. The results are shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Age of students
\cline { 2 - 4 } \multicolumn{1}{c|}{}\(11 - 12\) years\(13 - 14\) years\(15 - 16\) years
Grade \(A\)251619
Grade \(B\)284517
  1. Carry out a \(\chi ^ { 2 }\)-test at the \(2.5 \%\) significance level to test whether grade is independent of age of student.
    The scientist decides instead to use three grades: grade \(A\) for 16 or more correct, grade \(B\) for 10 to 15 correct and grade \(C\) for fewer than 10 correct. The results are shown in the following table.
    \multirow{2}{*}{}Age of students
    11-12 years13-14 years15-16 years
    Grade \(A\)251619
    Grade \(B\)122711
    Grade \(C\)16186
    With this second set of data, the test statistic is calculated as 10.91.
  2. Complete the \(\chi ^ { 2 }\)-test at the \(2.5 \%\) significance level for this second set of data.
  3. State, with a reason, whether you would prefer to use the result from part (a) or part (b) to investigate whether the ability to remember depends on age.
    If you use the following page to complete the answer to any question, the question number must be clearly shown.
View full question →
Standard 3×3 contingency table

A question is this type if and only if the data form a 3-row by 3-column contingency table requiring a chi-squared test of independence with 4 degrees of freedom, with no need to combine cells.

14 Standard +0.4
8.9% of questions
Show example »
6. A market researcher recorded the number of adverts for vehicles in each of three categories on ITV, Channel 4 and Channel 5 over a period of time. The results are shown in the table below.
ITVChannel 4Channel 5
Family Saloon693528
Sports Car202818
Off-road Vehicle12228
  1. Stating your hypotheses clearly, test at the \(5 \%\) level of significance whether or not there is evidence of the proportion of adverts for each type of vehicle being dependent on the channel.
  2. Suggest a reason for your result in part (a).
View full question →
Easiest question Standard +0.3 »
3 There are three bus companies in a city. The council is investigating whether the buses reliably arrive at their destination on time. The results from random samples of buses from each company are summarised in the following table.
\multirow{2}{*}{}Bus company
\(A\)\(B\)\(C\)Total
\multirow{3}{*}{Arrival}Early22221054
On time305242124
Late28261872
Total8010070250
Test, at the \(5 \%\) significance level, whether the reliability of buses is independent of bus company.
View full question →
Hardest question Standard +0.8 »
4 An agricultural company conducts a trial of five fertilisers (A, B, C, D, E) in an experimental field at its research station. The fertilisers are applied to plots of the field according to a completely randomised design. The yields of the crop from the plots, measured in a standard unit, are analysed by the one-way analysis of variance, from which it appears that there are no real differences among the effects of the fertilisers. A statistician notes that the residual mean square in the analysis of variance is considerably larger than had been anticipated from knowledge of the general behaviour of the crop, and therefore suspects that there is some inadequacy in the design of the trial.
  1. Explain briefly why the statistician should be suspicious of the design.
  2. Explain briefly why an inflated residual leads to difficulty in interpreting the results of the analysis of variance, in particular that the null hypothesis is more likely to be accepted erroneously. Further investigation indicates that the soil at the west side of the experimental field is naturally more fertile than that at the east side, with a consistent 'fertility gradient' from west to east.
  3. What experimental design can accommodate this feature? Provide a simple diagram of the experimental field indicating a suitable layout. The company decides to conduct a new trial in its glasshouse, where experimental conditions can be controlled so that a completely randomised design is appropriate. The yields are as follows.
    Fertiliser AFertiliser BFertiliser CFertiliser DFertiliser E
    23.626.018.829.017.7
    18.235.316.737.216.5
    32.430.523.032.612.8
    20.831.428.331.420.4
    [The sum of these data items is 502.6 and the sum of their squares is 13610.22 .]
  4. Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  5. State the assumptions about the distribution of the experimental error that underlie your analysis in part (iv).
View full question →
Larger contingency table (4+ categories)

A question is this type if and only if the contingency table has at least one dimension with 4 or more categories and no cell-combining is required, resulting in 6 or more degrees of freedom.

8 Standard +0.0
5.1% of questions
Show example »
A \(\chi^2\) test is carried out in a school to test for association between the class a student belongs to and the number of times they are late to school in a week. The contingency table below gives the expected values for the test.
Number of times late
01234
A8.121415.12144.76
Class B8.9915.516.7415.55.27
C11.8920.522.1420.56.97
Find a possible value for the degrees of freedom for the test. Circle your answer. [1 mark] 6 \quad 8 \quad 12 \quad 15
View full question →
Test statistic given, complete the test

A question is this type if and only if the value of the chi-squared test statistic is provided and the student is only required to complete the hypothesis test by comparing to a critical value and stating a conclusion.

6 Moderate -0.2
3.8% of questions
Show example »
In a test of association of two factors, \(A\) and \(B\), a \(2 \times 2\) contingency table yielded \(5.63\) for the value of \(\chi^2\) with Yates' correction.
  1. State the null hypothesis and alternative hypothesis for the test. [1]
  2. State how Yates' correction is applied, and whether it increases or decreases the value of \(\chi^2\). [2]
  3. Carry out the test at the \(2\frac{1}{2}\%\) significance level. [3]
View full question →
Percentages given, table construction required

A question is this type if and only if the observed frequencies are presented as percentages rather than raw counts, requiring the student to first construct the contingency table of actual frequencies before performing the test.

5 Standard +0.2
3.2% of questions
Show example »
5. A random sample of 500 adults completed a questionnaire on how often they took part in some form of exercise. They gave a response of 'never', 'sometimes' or 'regularly'. Of those asked, \(52 \%\) were females of whom \(10 \%\) never exercised and \(35 \%\) exercised regularly. Of the males, \(12.5 \%\) never exercised and \(55 \%\) sometimes exercised. Test, at the \(5 \%\) level of significance, whether or not there is any association between gender and the amount of exercise. State your hypotheses clearly.
View full question →
Contingency table construction from description

A question is this type if and only if the contingency table is not fully given but must be constructed by the student from a written description of counts before the chi-squared test can be performed.

4 Standard +0.3
2.5% of questions
Show example »
5 Students at a science department of a university are offered the opportunity to study an optional language module, either German or Mandarin, during their second year of study. From a sample of 50 students who opted to study a language module, 31 were female. Of those who opted to study Mandarin, 8 were female and 12 were male. Test, using the \(5 \%\) level of significance, whether choice of language is independent of gender. The sample of students may be regarded as random.
[0pt] [8 marks] Turn over for the next question
View full question →
Scaled sample, find minimum N

A question is this type if and only if, after an initial chi-squared test, the student is asked to find the minimum integer scaling factor N such that a larger sample with the same proportions would lead to a different conclusion at a given significance level.

3 Challenging +1.1
1.9% of questions
Show example »
Random samples of employees are taken from two companies, \(A\) and \(B\). Each employee is asked which of three types of coffee (Cappuccino, Latte, Ground) they prefer. The results are shown in the following table.
CappuccinoLatteGround
Company \(A\)605232
Company \(B\)354031
Test, at the 5\% significance level, whether coffee preferences of employees are independent of their company. [7] Larger random samples, consisting of \(N\) times as many employees from each company, are taken. In each company, the proportions of employees preferring the three types of coffee remain unchanged. Find the least possible value of \(N\) that would lead to the conclusion, at the 1\% significance level, that coffee preferences of employees are not independent of their company. [4]
View full question →
Type I / Type II error interpretation

A question is this type if and only if, following a chi-squared test, the student is asked to identify or explain a Type I or Type II error in the context of the specific investigation.

2 Standard +0.0
1.3% of questions
Show example »
6 A survey is carried out in an attempt to determine whether the salary achieved by the age of 30 is associated with having had a university education. The results of this survey are given in the table.
Salary < £30000Salary \(\boldsymbol { \geq }\) £30000Total
University education5278130
No university education6357120
Total115135250
  1. Use a \(\chi ^ { 2 }\) test, at the \(10 \%\) level of significance, to determine whether the salary achieved by the age of 30 is associated with having had a university education.
  2. What do you understand by a Type I error in this context?
View full question →
Chi-squared test with algebraic entries

A question is this type if and only if one or more cells in the contingency table are expressed as algebraic unknowns (e.g. a, b, e) and the student must work with these expressions to find expected frequencies or interpret the test statistic.

1 Standard +0.3
0.6% of questions
Show example »
5. Charlie carried out a survey on the main type of investment people have. The contingency table below shows the results of a survey of a random sample of people.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Main type of investment
\cline { 3 - 5 } \multicolumn{2}{c|}{}BondsCashStocks
\multirow{2}{*}{Age}\(25 - 44\)\(a\)\(b - e\)\(e\)
\cline { 2 - 5 }\(45 - 75\)\(c\)\(d - 59\)59
  1. Find an expression, in terms of \(a , b , c\) and \(d\), for the difference between the observed and the expected value \(( O - E )\) for the group whose main type of investment is Bonds and are aged \(45 - 75\) Express your answer as a single fraction in its simplest form. Given that \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 9.62\) for this information,
  2. test, at the \(5 \%\) level of significance, whether or not there is evidence of an association between the age of a person and the main type of investment they have. You should state your hypotheses, critical value and conclusion clearly. You may assume that no cells need to be combined.
    [0pt]
View full question →
Independence vs association, state hypotheses only

A question is this type if and only if the primary task is to write down null and alternative hypotheses for a chi-squared test without necessarily completing the full test procedure.

0
0.0% of questions