Cell combining required

A question is this type if and only if the student is explicitly required to combine rows or columns before carrying out the chi-squared test because one or more expected frequencies would otherwise fall below 5.

15 questions · Standard +0.1

5.06a Chi-squared: contingency tables
Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2023 June Q6
10 marks Standard +0.3
6 A scientist is investigating whether the ability to remember depends on age. A random sample of 150 students in different age groups is chosen. Each student is shown a set of 20 objects for thirty seconds and then asked to list as many as they can remember. The students are graded \(A\) or \(B\) according to how many objects they remembered correctly: grade \(A\) for 16 or more correct and grade \(B\) for fewer than 16 correct. The results are shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Age of students
\cline { 2 - 4 } \multicolumn{1}{c|}{}\(11 - 12\) years\(13 - 14\) years\(15 - 16\) years
Grade \(A\)251619
Grade \(B\)284517
  1. Carry out a \(\chi ^ { 2 }\)-test at the \(2.5 \%\) significance level to test whether grade is independent of age of student.
    The scientist decides instead to use three grades: grade \(A\) for 16 or more correct, grade \(B\) for 10 to 15 correct and grade \(C\) for fewer than 10 correct. The results are shown in the following table.
    \multirow{2}{*}{}Age of students
    11-12 years13-14 years15-16 years
    Grade \(A\)251619
    Grade \(B\)122711
    Grade \(C\)16186
    With this second set of data, the test statistic is calculated as 10.91.
  2. Complete the \(\chi ^ { 2 }\)-test at the \(2.5 \%\) significance level for this second set of data.
  3. State, with a reason, whether you would prefer to use the result from part (a) or part (b) to investigate whether the ability to remember depends on age.
    If you use the following page to complete the answer to any question, the question number must be clearly shown.
OCR S3 2014 June Q7
9 marks Standard +0.3
7 A random sample of 100 adults with a chronic disease was chosen. Each adult was randomly assigned to one of three different treatments. After six months of treatment, each adult was then assessed and classified as 'much improved', 'improved', 'slightly improved' or 'no change'. The results are summarised in Table 1. \begin{table}[h]
Treatment \(A\)Treatment \(B\)Treatment \(C\)
Much improved12164
Improved13126
Slightly improved767
No change539
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} A \(\chi ^ { 2 }\) test, at the \(5 \%\) significance level, is to be carried out.
  1. State suitable hypotheses. Combining the last two rows of Table 1 gives Table 2. \begin{table}[h]
    Treatment \(A\)Treatment \(B\)Treatment \(C\)
    Much improved12164
    Improved13126
    Slightly improved/ No change12916
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. By considering the expected frequencies for Treatment \(C\) in Table 1, explain why it was necessary to combine rows.
  3. Show that the contribution to the \(\chi ^ { 2 }\) value for the cell 'slightly improved/no change, Treatment \(C\) ' is 4.231 , correct to 3 decimal places. You are given that the \(\chi ^ { 2 }\) test statistic is 10.51 , correct to 2 decimal places.
  4. Carry out the test.
OCR S3 2013 June Q6
13 marks Standard +0.3
6 A random sample of 80 students who had all studied Biology, Chemistry and Art at a college was each asked which they enjoyed most. The results, classified according to gender, are given in the table.
Subject
\cline { 2 - 5 }BiologyChemistryArt
\cline { 2 - 5 } GenderMale13411
\cline { 2 - 5 }Female3787
\cline { 2 - 5 }
\cline { 2 - 5 }
It is required to carry out a test of independence between subject most enjoyed and gender at the \(2 \frac { 1 } { 2 } \%\) significance level.
  1. Calculate the expected values for the cells.
  2. Explain why it is necessary to combine cells, and choose a suitable combination.
  3. Carry out the test.
OCR Further Statistics AS 2020 November Q5
12 marks Standard +0.3
5 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h]
\multirow{2}{*}{Observed frequencies}Session
EarlyMiddleLate
\multirow{3}{*}{Age group}< 25242040
25 to 604210
> 60282210
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The cinema manager carries out a test of whether there is any association between age group and session attended.
  1. Show that it is necessary to combine cells in order to carry out the test. It is decided to combine the second and third rows of the table. Some of the expected frequencies for the table with rows combined, and the corresponding contributions to the \(\chi ^ { 2 }\) test statistic, are shown in the following incomplete tables. \begin{table}[h]
    \multirow{2}{*}{Expected frequencies}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 2529.423.1
    \(\geqslant 25\)26.620.9
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table} \begin{table}[h]
    \multirow{2}{*}{Contribution to \(\chi ^ { 2 }\)}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 250.99180.4160
    \(\geqslant 25\)1.09620.4598
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  2. In the Printed Answer Booklet, complete both tables.
  3. Carry out the test at the \(5 \%\) significance level.
  4. Use the figures in your completed Table 3 to comment on the numbers of the audience in different age groups.
AQA S2 2013 January Q2
12 marks Moderate -0.3
2 A large estate agency would like all the properties that it handles to be sold within three months. A manager wants to know whether the type of property affects the time taken to sell it. The data for a random sample of properties sold are tabulated below.
\multirow{2}{*}{}Type of property
FlatTerracedSemidetachedDetachedTotal
Sold within three months434281884
Sold in more than three months9188641
Total13523624125
  1. Conduct a \(\chi ^ { 2 }\)-test, at the \(10 \%\) level of significance, to determine whether there is an association between the type of property and the time taken to sell it. Explain why it is necessary to combine two columns before carrying out this test.
  2. The manager plans to spend extra money on advertising for one type of property in an attempt to increase the number sold within three months. Explain why the manager might choose:
    1. terraced properties;
    2. flats.
      (2 marks)
AQA S2 2015 June Q5
10 marks Standard +0.3
5 In a particular town, a survey was conducted on a sample of 200 residents aged 41 years to 50 years. The survey questioned these residents to discover the age at which they had left full-time education and the greatest rate of income tax that they were paying at the time of the survey. The summarised data obtained from the survey are shown in the table.
\multirow{2}{*}{Greatest rate of income tax paid}Age when leaving education (years)\multirow[b]{2}{*}{Total}
16 or less17 or 1819 or more
Zero323439
Basic1021217131
Higher175830
Total1512029200
  1. Use a \(\chi ^ { 2 }\)-test, at the \(5 \%\) level of significance, to investigate whether there is an association between age when leaving education and greatest rate of income tax paid.
  2. It is believed that residents of this town who had left education at a later age were more likely to be paying the higher rate of income tax. Comment on this belief.
    [0pt] [1 mark]
WJEC Further Unit 2 2024 June Q5
12 marks Moderate -0.5
5. Lily is interested in the relationship between the way in which students learned Welsh and their attitude towards the Welsh language. Students were categorised as having learned Welsh in one of three ways:
  • from one Welsh-speaking parent/carer at home,
  • from two Welsh-speaking parents/carers at home,
  • at school only, for those with no Welsh-speaking parents/carers at home.
The students were asked to rate their attitude towards the Welsh language from 'Very negative' to 'Very positive'. The following data for a random sample of 253 students were collected as part of a project.
Learned Welsh
AttitudeFrom two parents/carersFrom one parent/carerAt school onlyTotal
Very negative2143046
Slightly negative4202145
Neutral1217837
Slightly positive21191151
Very positive25212874
Total649198253
Lily intends to carry out a chi-squared test for independence at the \(5 \%\) level. She produces the following tables which are incomplete.
Expected FrequenciesLearned Welsh
AttitudeFrom two parents/carersFrom one parent/carerAt school only
Very negative11.6416.5517.82
Slightly negative11.3816.1917.43
Neutral9.3613.3114.33
Slightly positive12.9018.3419.75
Very positiveF26.6228.66
Chi-Squared ContributionsLearned Welsh
AttitudeFrom two parents/carersFrom one parent/carerAt school only
Very negative7.980.398.33
Slightly negative\(4 \cdot 79\)0.900.73
Neutral\(0 \cdot 74\)1.02G
Slightly positive5.080.023.88
Very positive2.111.190.02
Total20.703.52H
  1. Calculate the values of \(F , G\) and \(H\).
  2. Carry out Lily's chi-squared test for independence at the \(5 \%\) level.
  3. By referring to the figures in the tables on pages 16 and 17, give two comments on the relationship between the way students learned Welsh and their attitude towards the Welsh language.
Edexcel FS1 AS 2019 June Q1
6 marks Standard +0.3
  1. A leisure club offers a choice of one of three activities to its 150 members on a Tuesday evening. The manager believes that there may be an association between the choice of activity and the age of the member and collected the following data.
\backslashbox{Age \(\boldsymbol { a }\) years}{Activity}BadmintonBowlsSnooker
\(a < 20\)933
\(20 \leqslant a < 40\)101014
\(40 \leqslant a < 50\)16155
\(50 \leqslant a < 60\)151311
\(a \geqslant 60\)4193
  1. Write down suitable hypotheses for a test of the manager's belief. The manager calculated expected frequencies to use in the test.
  2. Calculate the expected frequency of members aged 60 or over who choose snooker, used by the manager.
  3. Explain why there are 6 degrees of freedom used in this test. The test statistic used to test the manager's belief is 19.583
  4. Using a 5\% level of significance, complete the test of the manager's belief.
OCR FS1 AS 2018 March Q7
11 marks Standard +0.3
7 The numbers of students taking A levels in three subjects at a school were classified by the year in which they entered the school as follows.
\cline { 2 - 5 } \multicolumn{1}{c|}{}SubjectMathematicsEnglishPhysics
\multirow{3}{*}{
Year of
Entry
}
Year 717167
\cline { 2 - 5 }Year 121325
The Head of the school carries out a significance test at the \(10 \%\) level to test whether subjects taken are independent of year of entry.
  1. Show that in carrying out the test it is necessary to combine columns.
  2. Suggest a reason why it is more sensible to combine the columns for Mathematics and Physics than the columns for Physics and English.
  3. Carry out the test.
  4. State which cell gives the largest contribution to the test statistic.
  5. Interpret your answer to part (iv).
AQA S2 2009 June Q3
12 marks Standard +0.3
3 A sample survey, conducted to determine the attitudes of residents to a proposed reorganisation of local schools, gave the following results.
Against reorganisationNot against reorganisation
\multirow{5}{*}{Age of resident}16-1792
18-211710
22-4911590
50-654134
Over 6534
Use a \(\chi ^ { 2 }\) test, at the \(5 \%\) level of significance, to determine whether there is an association between the ages of residents and their attitudes to the proposed reorganisation of local schools.
AQA Further AS Paper 2 Statistics 2024 June Q2
1 marks Easy -1.2
2 A test for association is to be carried out. The tables below show the observed frequencies and the expected frequencies that are to be used for the test.
ObservedXYZ
A28666
B884
C541610
Expected\(\mathbf { X }\)\(\mathbf { Y }\)\(\mathbf { Z }\)
\(\mathbf { A }\)451540
\(\mathbf { B }\)938
\(\mathbf { C }\)361232
It is necessary to merge some rows or columns before the test can be carried out.
Find the entry in the tables that provides evidence for this.
Circle your answer.
[0pt] [1 mark]
Observed A-Z
Observed B-Z
Expected A-X
Expected B-Y
AQA Further Paper 3 Statistics 2020 June Q8
6 marks Standard +0.3
8 Ray is conducting a hypothesis test with the hypotheses \(\mathrm { H } _ { 0 }\) : There is no association between time of day and number of snacks eaten \(\mathrm { H } _ { 1 }\) : There is an association between time of day and number of snacks eaten
He calculates expected frequencies correct to two decimal places, which are given in the following table.
Number of snacks eaten
\cline { 2 - 5 }\cline { 2 - 4 }012 or more
\cline { 2 - 4 } Time of Day23.6821.055.26
\cline { 2 - 5 }Night21.3218.954.74
\cline { 2 - 5 }
\cline { 2 - 5 }
Ray calculates his test statistic using \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) 8
  1. State, with a reason, the error Ray has made and describe any changes Ray will need to make to his test.
    8
  2. Having made the necessary corrections as described in part (a), the correct value of the test statistic is 8.74 Complete Ray's hypothesis test using a \(1 \%\) level of significance.
AQA Further Paper 3 Statistics 2021 June Q6
7 marks Moderate -0.5
6 Danai is investigating the number of speeding offences in different towns in a country. She carries out a hypothesis test to test for association between town and number of speeding offences per year. 6
  1. State the hypotheses for this test. 6
  2. The observed frequencies, \(O\), have been collected and the expected frequencies, \(E\), have been calculated in an \(n \times m\) contingency table, where \(n > 3\) and \(m > 3\) One of the values of \(E\) is less than 5 6 (b) (i) Explain what steps Danai should take before calculating the test statistic.
    6 (b) (ii) State an expression for the test statistic Danai should calculate.
    6
  3. Danai correctly calculates the value of the test statistic to be 45.22 The number of degrees of freedom for the test is 25
    Determine the outcome of Danai's test, using the \(1 \%\) level of significance.
AQA Further Paper 3 Statistics 2023 June Q5
8 marks Standard +0.3
5 A school management team oversees 11 different schools.
The school management team allows each student in the schools to choose one enrichment activity from 11 possible activities. The school management team count the number of students in each school choosing each of the possible activities. They then conduct a \(\chi ^ { 2 }\)-test for association with the data they have gathered. 5
  1. Exactly one of the calculated expected frequencies for the \(\chi ^ { 2 }\)-test is less than 5
    Explain why the number of degrees of freedom for the test is 90
    5
  2. The school management team claims that there is an association between the school a student attends and the activity they choose. The test statistic is 124.8 Test the claim using the \(1 \%\) level of significance.
    5
  3. During the hypothesis test, the value of \(\frac { ( O - E ) ^ { 2 } } { E }\), where \(O\) is the observed frequency and \(E\) is the expected frequency, was calculated for each group of students. The values for four groups of students are shown in the table below.
    Group\(\frac { ( O - E ) ^ { 2 } } { E }\)
    Attends school 3 and chose activity 10.01
    Attends school 8 and chose activity 318.5
    Attends school 8 and chose activity 724.2
    Attends school 11 and chose activity 749.0
    State, with a reason, which of the four groups of students represents the strongest source of association.
AQA S2 2016 June Q5
13 marks Standard +0.3
A car manufacturer keeps a record of how many of the new cars that it has sold experience mechanical problems during the first year. The manufacturer also records whether the cars have a petrol engine or a diesel engine. Data for a random sample of 250 cars are shown in the table.
Problems during first 3 monthsProblems during first year but after first 3 monthsNo problems during first yearTotal
Petrol engine1035170215
Diesel engine482335
Total1443193250
  1. Use a \(\chi^2\)-test to investigate, at the 10% significance level, whether there is an association between the mechanical problems experienced by a new car from this manufacturer and the type of engine. [11 marks]
  2. Arisa is planning to buy a new car from this manufacturer. She would prefer to buy a car with a diesel engine, but a friend has told her that cars with diesel engines experience more mechanical problems. Based on your answer to part (a), state, with a reason, the advice that you would give to Arisa. [2 marks]