5.06a Chi-squared: contingency tables

179 questions

Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2020 June Q1
6 marks Standard +0.3
1 Young children are learning to read using two different reading schemes, \(A\) and \(B\). The standards achieved are measured against the national average standard achieved and classified as above average, average or below average. For two randomly chosen groups of young children, the numbers in each category are shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Standard achieved
\cline { 2 - 4 } \multicolumn{1}{c|}{}Above averageAverageBelow average
Scheme \(A\)313522
Scheme \(B\)195043
Test at the \(5 \%\) significance level whether standard achieved is independent of the reading scheme used.
CAIE Further Paper 4 2021 June Q2
7 marks Standard +0.3
2 A driving school employs four instructors to prepare people for their driving test. The allocation of people to instructors is random. For each of the instructors, the following table gives the number of people who passed and the number who failed their driving test last year.
Instructor \(A\)Instructor \(B\)Instructor \(C\)Instructor \(D\)Total
Pass72425268234
Fail33344158166
Total1057693126400
Test at the 10\% significance level whether success in the driving test is independent of the instructor.
CAIE Further Paper 4 2020 June Q1
6 marks Standard +0.3
1 Two randomly selected groups of students, with similar ranges of abilities, take the same examination in different rooms. One group of 140 students takes the examination with background music playing. The other group of 210 students takes the examination in silence. Each student is awarded a grade for their performance in the examination and the numbers from each group gaining each grade are shown in the following table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Grade awarded
\cline { 2 - 4 } \multicolumn{1}{c|}{}ABC
Background music495140
Silence936849
Test at the 10\% significance level whether grades awarded are independent of whether background music is playing during the examination.
CAIE Further Paper 4 2022 June Q2
7 marks Standard +0.3
2 A scientist is investigating the size of shells at various beach locations. She selects four beach locations and takes a random sample of shells from each of these beaches. She classifies each shell as large or small. Her results are summarised in the following table.
\multirow{2}{*}{}Beach location
A\(B\)CDTotal
\multirow{2}{*}{Size of shell}Large68699681314
Small28556439186
Total96124160120500
Test, at the 10\% significance level, whether the size of shell is independent of the beach location.
CAIE Further Paper 4 2023 June Q6
10 marks Standard +0.3
6 A scientist is investigating whether the ability to remember depends on age. A random sample of 150 students in different age groups is chosen. Each student is shown a set of 20 objects for thirty seconds and then asked to list as many as they can remember. The students are graded \(A\) or \(B\) according to how many objects they remembered correctly: grade \(A\) for 16 or more correct and grade \(B\) for fewer than 16 correct. The results are shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Age of students
\cline { 2 - 4 } \multicolumn{1}{c|}{}\(11 - 12\) years\(13 - 14\) years\(15 - 16\) years
Grade \(A\)251619
Grade \(B\)284517
  1. Carry out a \(\chi ^ { 2 }\)-test at the \(2.5 \%\) significance level to test whether grade is independent of age of student.
    The scientist decides instead to use three grades: grade \(A\) for 16 or more correct, grade \(B\) for 10 to 15 correct and grade \(C\) for fewer than 10 correct. The results are shown in the following table.
    \multirow{2}{*}{}Age of students
    11-12 years13-14 years15-16 years
    Grade \(A\)251619
    Grade \(B\)122711
    Grade \(C\)16186
    With this second set of data, the test statistic is calculated as 10.91.
  2. Complete the \(\chi ^ { 2 }\)-test at the \(2.5 \%\) significance level for this second set of data.
  3. State, with a reason, whether you would prefer to use the result from part (a) or part (b) to investigate whether the ability to remember depends on age.
    If you use the following page to complete the answer to any question, the question number must be clearly shown.
CAIE Further Paper 4 2024 June Q5
8 marks Standard +0.3
5 Two companies, \(P\) and \(Q\), produce a certain type of paint brush. An independent examiner rates the quality of the brushes produced as poor, satisfactory or good. He takes a random sample of brushes from each company. The examiner's ratings are summarised in the table.
CompanyPoorSatisfactoryGood
\(P\)184364
\(Q\)222231
  1. Test, at the \(5 \%\) significance level, whether quality of brushes is independent of company.
  2. Compare the quality of the brushes produced by the two companies.
CAIE Further Paper 4 2024 June Q3
7 marks Standard +0.3
3 There are three bus companies in a city. The council is investigating whether the buses reliably arrive at their destination on time. The results from random samples of buses from each company are summarised in the following table.
\multirow{2}{*}{}Bus company
\(A\)\(B\)\(C\)Total
\multirow{3}{*}{Arrival}Early22221054
On time305242124
Late28261872
Total8010070250
Test, at the \(5 \%\) significance level, whether the reliability of buses is independent of bus company.
CAIE Further Paper 4 2022 November Q2
7 marks Standard +0.3
2 In the colleges in three regions of a particular country, students are given individual targets to achieve. Their performance is measured against their individual target and graded as 'above target', 'on target' or 'below target'. For a random sample of students from each of the three regions, the observed frequencies are summarised in the following table.
\multirow{2}{*}{}Region
ABCTotal
\multirow{3}{*}{Performance}Above target624144147
On target1029495291
Below target564561162
Total220180200600
Test, at the 10\% significance level, whether performance is independent of region.
CAIE Further Paper 4 2023 November Q2
7 marks Moderate -0.3
2 A town council has published its plans for redeveloping the town centre and residents are being asked whether they approve or disapprove. A random sample of 250 responses has been selected from residents in the four main streets in the town: North, East, South and West Streets. The results are shown in the table.
\cline { 2 - 5 } \multicolumn{1}{c|}{}North StreetEast StreetSouth StreetWest Street
Approve33544226
Disapprove1939289
Test, at the \(5 \%\) significance level, whether the opinions of the residents are independent of the streets on which they live.
OCR MEI S2 2006 January Q4
18 marks Standard +0.3
4 The table summarises the usual method of travelling to school for 200 randomly selected pupils from primary and secondary schools in a city.
PrimarySecondary
\multirow{3}{*}{
Method of
travel
}
Bus2149
\cline { 2 - 4 }Car6515
\cline { 2 - 4 }Cycle or Walk3416
  1. Write down null and alternative hypotheses for a test to examine whether there is any association between method of travel and type of school.
  2. Calculate the expected frequency for primary school bus users. Calculate also the corresponding contribution to the test statistic for the usual \(\chi ^ { 2 }\) test.
  3. Given that the value of the test statistic for the usual \(\chi ^ { 2 }\) test is 42.64 , carry out the test at the \(5 \%\) level of significance, stating your conclusion clearly. The mean travel time for pupils who travel by bus is known to be 18.3 minutes. A survey is carried out to determine whether the mean travel time to school by car is different from 18.3 minutes. In the survey, 20 pupils who travel by car are selected at random. Their mean travel time is found to be 22.4 minutes.
  4. Assuming that car travel times are Normally distributed with standard deviation 8.0 minutes, carry out a test at the \(10 \%\) level, stating your hypotheses and conclusion clearly.
  5. Comment on the suggestion that pupils should use a bus if they want to get to school quickly.
OCR MEI S2 2008 January Q4
19 marks Standard +0.3
4
  1. A researcher believes that there may be some association between a student's sex and choice of certain subjects at A-level. A random sample of 250 A -level students is selected. The table below shows, for each sex, how many study either or both of the two subjects, Mathematics and English.
    Mathematics onlyEnglish onlyBothNeitherRow totals
    Male381963295
    Female4255949155
    Column totals80741581250
    Carry out a test at the \(5 \%\) significance level to examine whether there is any association between a student's sex and choice of subjects. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic. [12]
  2. Over a long period it has been determined that the mean score of students in a particular English module is 67.4 and the standard deviation is 8.9. A new teaching method is introduced with the aim of improving the results. A random sample of 12 students taught by the new method is selected. Their mean score is found to be 68.3. Carry out a test at the \(10 \%\) level to investigate whether the new method appears to have been successful. State carefully your null and alternative hypotheses. You should assume that the scores are Normally distributed and that the standard deviation is unchanged.
OCR MEI S2 2006 June Q4
18 marks Standard +0.3
4 A survey of a random sample of 250 people is carried out. Their musical preferences are categorized as pop, classical or jazz. Their ages are categorized as under 25, 25 to 50, or over 50. The results are as follows.
\multirow{2}{*}{}Musical preference\multirow{2}{*}{Row totals}
PopClassicalJazz
\multirow{3}{*}{Age group}Under 2557151284
25-5043212185
Over 5022322781
Column totals1226860250
  1. Carry out a test at the \(5 \%\) significance level to examine whether there is any association between musical preference and age group. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic.
  2. Discuss briefly how musical preferences vary between the age groups, as shown by the contributions to the test statistic.
OCR MEI S2 2007 June Q4
18 marks Standard +0.3
4 The sexes and ages of a random sample of 300 runners taking part in marathons are classified as follows.
ObservedSex\multirow{2}{*}{Row totals}
\cline { 3 - 4 }MaleFemale
\multirow{3}{*}{
Age
group
}
Under 407054124
\cline { 2 - 4 }\(40 - 49\)7636112
\cline { 2 - 5 }50 and over521264
Column totals198102300
  1. Carry out a test at the \(5 \%\) significance level to examine whether there is any association between age group and sex. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic.
  2. Does your analysis support the suggestion that women are less likely than men to enter marathons as they get older? Justify your answer. For marathons in general, on average \(3 \%\) of runners are 'Female, 50 and over'. The random variable \(X\) represents the number of 'Female, 50 and over' runners in a random sample of size 300.
  3. Use a suitable approximating distribution to find \(\mathrm { P } ( X \geqslant 12 )\).
OCR MEI S2 2008 June Q4
18 marks Standard +0.3
4 A student is investigating whether there is any association between the species of shellfish that occur on a rocky shore and where they are located. A random sample of 160 shellfish is selected and the numbers of shellfish in each category are summarised in the table below.
Location
\cline { 3 - 5 } \multicolumn{2}{|c|}{}ExposedShelteredPool
\multirow{3}{*}{Species}Limpet243216
\cline { 2 - 5 }Mussel24113
\cline { 2 - 5 }Other52223
  1. Write down null and alternative hypotheses for a test to examine whether there is any association between species and location. The contributions to the test statistic for the usual \(\chi ^ { 2 }\) test are shown in the table below.
    ContributionLocation
    \cline { 3 - 5 }ExposedShelteredPool
    \multirow{3}{*}{Species}Limpet0.00090.25850.4450
    \cline { 2 - 5 }Mussel10.34721.27564.8773
    \cline { 2 - 5 }Other8.07190.14027.4298
    The sum of these contributions is 32.85 .
  2. Calculate the expected frequency for mussels in pools. Verify the corresponding contribution 4.8773 to the test statistic.
  3. Carry out the test at the \(5 \%\) level of significance, stating your conclusion clearly.
  4. For each species, comment briefly on how its distribution compares with what would be expected if there were no association.
  5. If 3 of the 160 shellfish are selected at random, one from each of the 3 types of location, find the probability that all 3 of them are limpets.
OCR S3 2007 January Q7
15 marks Standard +0.3
7 It is thought that a person's eye colour is related to the reaction of the person's skin to ultra-violet light. As part of a study, a random sample of 140 people were treated with a standard dose of ultra-violet light. The degree of reaction was classified as None, Mild or Strong. The results are given in Table 1. The corresponding expected frequencies for a \(\chi ^ { 2 }\) test of association between eye colour and reaction are shown in Table 2. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 1
Observed frequencies}
Eye colour
BlueBrownOtherTotal
None12171039
ReactionMild31211163
Strong2241238
Total654233140
\end{table} \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 2
Expected frequencies}
Eye colour
BlueBrownOther
None18.1111.709.19
ReactionMild29.2518.9014.85
Strong17.6411.408.96
\end{table}
  1. (a) State suitable hypotheses for the test.
    (b) Show how the expected frequency of 18.11 in Table 2 is obtained.
    (c) Show that the three cells in the top row together contribute 4.53 to the calculated value of \(\chi ^ { 2 }\), correct to 2 decimal places.
    (d) You are given that the total calculated value of \(\chi ^ { 2 }\) is 12.78 , correct to 2 decimal places. Give the smallest value of \(\alpha\) obtained from the tables for which the null hypothesis would be rejected at the \(\alpha \%\) significance level.
  2. Test, at the \(5 \%\) significance level, whether the proportions of people in the whole population with blue eyes, brown eyes and other colours are in the ratios \(2 : 2 : 1\).
OCR S3 2008 January Q6
15 marks Standard +0.3
6 The Research and Development department of a paint manufacturer has produced paint of three different shades of grey, \(G _ { 1 } , G _ { 2 }\) and \(G _ { 3 }\). In order to find the reaction of the public to these shades, each of a random sample of 120 people was asked to state which shade they preferred. The results, classified by gender, are shown in Table 1. \begin{table}[h]
Shade
\cline { 2 - 5 }\(G _ { 1 }\)\(G _ { 2 }\)\(G _ { 3 }\)
\cline { 2 - 5 } GenderMale112423
Female181331
\cline { 2 - 5 }
\cline { 2 - 5 }
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Table 2 shows the corresponding expected values, correct to 2 decimal places, for a test of independence. \begin{table}[h]
Shade
\cline { 2 - 5 }\(G _ { 1 }\)\(G _ { 2 }\)\(G _ { 3 }\)
\cline { 2 - 5 } GenderMale14.0217.8826.10
Female14.9819.1227.90
\cline { 2 - 5 }
\cline { 2 - 5 }
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Show how the value 17.88 for Male, \(G _ { 2 }\) was obtained.
  2. Test, at the \(5 \%\) significance level, whether gender and preferred shade are independent.
  3. Determine the smallest significance level obtained from tables or calculator for which there is evidence that not all shades are equally preferred by people in general, irrespective of gender.
OCR S3 2011 January Q7
11 marks Standard +0.3
7
  1. When should Yates' correction be applied when carrying out a \(\chi ^ { 2 }\) test? Two vaccines against typhoid fever, \(A\) and \(B\), were tested on a total of 700 people in Nepal during a particular year. The vaccines were allocated randomly and whether or not typhoid had developed was noted during the following year. The results are shown in the table.
    \multirow{2}{*}{}Vaccines
    \cline { 2 - 3 }\(A\)\(B\)
    Developed typhoid194
    Did not develop typhoid310367
  2. Carry out a suitable \(\chi ^ { 2 }\) test at the \(1 \%\) significance level to determine whether the outcome depends on the vaccine used. Comment on the result.
OCR S3 2006 June Q2
6 marks Standard +0.3
2 The manager of a factory with a large number of employees investigated when accidents to employees occurred during 8-hour shifts. An analysis was made of 600 randomly chosen accidents that occurred over a year. The following table shows the numbers of accidents occurring in the four consecutive 2-hour periods of the 8-hour shifts.
Period1234
Number of accidents138127165170
Test, at the \(5 \%\) significance level, whether the proportions of all accidents that occur in the four time periods differ.
OCR S3 2006 June Q5
9 marks Moderate -0.3
5 Gloria is a market trader who sells jeans. She trades on Mondays, Wednesdays and Fridays. Wishing to investigate whether the volume of trade depends on the day of the week, Gloria analysed a random sample of 150 days' sales and classified them by day and volume (low, medium and high). The results are given in the table below.
Day
MondayWednesdayFriday
\multirow{3}{*}{Volume}Low15132
Medium232623
High12927
Gloria asked a statistician to perform a suitable test of independence and, as part of this test, expected frequencies were calculated. These are shown in the table below.
Day
MondayWednesdayFriday
Low10.009.6010.40
VolumeMedium24.0023.0424.96
High16.0015.3616.64
  1. Show how the value 23.04 for medium volume on Wednesday has been obtained.
  2. State, giving a reason, if it is necessary to combine any rows or columns in order to carry out the test. The value of the test statistic is found to be 21.15, correct to 2 decimal places.
  3. Stating suitable hypotheses for the test, give its conclusion using a \(1 \%\) significance level. Gloria wishes to hold a sale and asks the statistician to advise her on which day to hold it in order to sell as much as possible.
  4. State the day that the statistician should advise and give a reason for the choice.
OCR S3 2007 June Q4
9 marks Standard +0.3
4 The students in a large university department take a trial examination some time before the proper examination. A random sample of 60 students took both examinations during a particular course. 42 students passed the trial examination, 36 passed the proper examination and 13 failed both examinations.
  1. Copy and complete the following contingency table.
    Proper
    \cline { 2 - 4 } \multicolumn{1}{l}{}PassFailTotal
    \cline { 2 - 5 }Pass42
    \cline { 2 - 5 } TrialFail13
    \cline { 2 - 5 }Total3660
  2. Carry out a test of independence at the \(\frac { 1 } { 2 } \%\) level of significance.
OCR S3 Specimen Q6
14 marks Moderate -0.3
6 Certain types of food are now sold in metric units. A random sample of 1000 shoppers was asked whether they were in favour of the change to metric units or not. The results, classified according to age, were as shown in the table.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Age of shopper
\cline { 2 - 4 } \multicolumn{1}{c|}{}Under 3535 and overTotal
In favour of change187161348
Not in favour of change283369652
Total4705301000
  1. Use a \(\chi ^ { 2 }\) test to show that there is very strong evidence that shoppers' views about changing to metric units are not independent of their ages.
  2. The data may also be regarded as consisting of two random samples of shoppers; one sample consists of 470 shoppers aged under 35 , of whom 187 were in favour of change, and the second sample consists of 530 shoppers aged 35 or over, of whom 161 were in favour of change. Determine whether a test for equality of population proportions supports the conclusion in part (i).
OCR MEI S4 2006 June Q4
24 marks Standard +0.3
4 An experiment is carried out to compare five industrial paints, A, B, C, D, E, that are intended to be used to protect exterior surfaces in polluted urban environments. Five different types of surface (I, II, III, IV, V) are to be used in the experiment, and five specimens of each type of surface are available. Five different external locations ( \(1,2,3,4,5\) ) are used in the experiment. The paints are applied to the specimens of the surfaces which are then left in the locations for a period of six months. At the end of this period, a "score" is given to indicate how effective the paint has been in protecting the surface.
  1. Name a suitable experimental design for this trial and give an example of an experimental layout. Initial analysis of the data indicates that any differences between the types of surface are negligible, as also are any differences between the locations. It is therefore decided to analyse the data by one-way analysis of variance.
  2. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  3. The data for analysis are as follows. Higher scores indicate better performance.
    Paint APaint BPaint CPaint DPaint E
    6466596564
    5868567852
    7376696956
    6070607261
    6771637158
    [The sum of these data items is 1626 and the sum of their squares is 106838 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a 5\% significance level. Report briefly on your conclusions.
    [0pt] [12]
OCR S3 2014 June Q7
9 marks Standard +0.3
7 A random sample of 100 adults with a chronic disease was chosen. Each adult was randomly assigned to one of three different treatments. After six months of treatment, each adult was then assessed and classified as 'much improved', 'improved', 'slightly improved' or 'no change'. The results are summarised in Table 1. \begin{table}[h]
Treatment \(A\)Treatment \(B\)Treatment \(C\)
Much improved12164
Improved13126
Slightly improved767
No change539
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} A \(\chi ^ { 2 }\) test, at the \(5 \%\) significance level, is to be carried out.
  1. State suitable hypotheses. Combining the last two rows of Table 1 gives Table 2. \begin{table}[h]
    Treatment \(A\)Treatment \(B\)Treatment \(C\)
    Much improved12164
    Improved13126
    Slightly improved/ No change12916
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. By considering the expected frequencies for Treatment \(C\) in Table 1, explain why it was necessary to combine rows.
  3. Show that the contribution to the \(\chi ^ { 2 }\) value for the cell 'slightly improved/no change, Treatment \(C\) ' is 4.231 , correct to 3 decimal places. You are given that the \(\chi ^ { 2 }\) test statistic is 10.51 , correct to 2 decimal places.
  4. Carry out the test.
OCR MEI S2 2009 January Q4
17 marks Standard +0.3
4 A gardening research organisation is running a trial to examine the growth and the size of flowers of various plants.
  1. In the trial, seeds of three types of plant are sown. The growth of each plant is classified as good, average or poor. The results are shown in the table.
    \multirow{2}{*}{}Growth\multirow[t]{2}{*}{Row totals}
    GoodAveragePoor
    \multirow{3}{*}{Type of plant}Coriander12281555
    Aster7182348
    Fennel14221147
    Column totals336849150
    Carry out a test at the \(5 \%\) significance level to examine whether there is any association between growth and type of plant. State carefully your null and alternative hypotheses. Include a table of the contributions of each cell to the test statistic.
  2. It is known that the diameter of marigold flowers is Normally distributed with mean 47 mm and standard deviation 8.5 mm . A certain fertiliser is expected to cause flowers to have a larger mean diameter, but without affecting the standard deviation. A large number of marigolds are grown using this fertiliser. The diameters of a random sample of 50 of the flowers are measured and the mean diameter is found to be 49.2 mm . Carry out a hypothesis test at the \(1 \%\) significance level to check whether flowers grown with this fertiliser appear to be larger on average. Use hypotheses \(\mathrm { H } _ { 0 } : \mu = 47 , \mathrm { H } _ { 1 } : \mu > 47\), where \(\mu \mathrm { mm }\) represents the mean diameter of all marigold flowers grown with this fertiliser.
OCR MEI S2 2010 January Q4
18 marks Moderate -0.3
4 A council provides waste paper recycling services for local businesses. Some businesses use the standard service for recycling paper, others use a special service for dealing with confidential documents, and others use both. Businesses are classified as small or large. A survey of a random sample of 285 businesses gives the following data for size of business and recycling service.
Recycling Service
\cline { 3 - 5 } \multicolumn{2}{|c|}{}StandardSpecialBoth
Size of
business
Small352644
Large555273
  1. Write down null and alternative hypotheses for a test to examine whether there is any association between size of business and recycling service used. The contributions to the test statistic for the usual \(\chi ^ { 2 }\) test are shown in the table below.
    Recycling Service
    \cline { 3 - 5 } \multicolumn{2}{|c|}{}StandardSpecialBoth
    Size of
    business
    Small0.10230.26070.0186
    Large0.05970.15200.0108
    The sum of these contributions is 0.6041 .
  2. Calculate the expected frequency for large businesses using the special service. Verify the corresponding contribution 0.1520 to the test statistic.
  3. Carry out the test at the \(5 \%\) level of significance, stating your conclusion clearly. The council is also investigating the weight of rubbish in domestic dustbins. In 2008 the average weight of rubbish in bins was 32.8 kg . The council has now started a recycling initiative and wishes to determine whether there has been a reduction in the weight of rubbish in bins. A random sample of 50 domestic dustbins is selected and it is found that the mean weight of rubbish per bin is now 30.9 kg , and the standard deviation is 3.4 kg .
  4. Carry out a test at the \(5 \%\) level to investigate whether the mean weight of rubbish has been reduced in comparison with 2008 . State carefully your null and alternative hypotheses. www.ocr.org.uk after the live examination series.
    If OCR has unwittingly failed to correctly acknowledge or clear any third-party content in this assessment material, OCR will be happy to correct its mistake at the earliest possible opportunity.
    For queries or further information please contact the Copyright Team, First Floor, 9 Hills Road, Cambridge CB2 1 GE.
    OCR is part of the