5.06a Chi-squared: contingency tables

179 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI Further Statistics Major Specimen Q9
13 marks Standard +0.3
9 A random sample of adults in the UK were asked to state their primary source of news: television (T), internet (I), newspapers (N) or radio (R). The responses were classified by age group, and an analysis was carried out to see if there is any association between age group and primary source of news. Fig. 9 is a screenshot showing part of the spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1SourceAge group
2of news18-3233-4748-6465+
3T63617180275
4I33332212100
5N98112048
6R499527
7109111113117450
8
9Expected frequencies
1066.6167.8369.0671.50
1124.2224.6726.00
1211.6311.8412.0512.48
136.546.666.787.02
14
15Contributions to the test statistic
160.200.690.051.01
173.182.827.54
180.590.094.53
190.990.820.730.58
20test statistic25.45
\captionsetup{labelformat=empty} \caption{Fig. 9}
\end{table}
  1. (A) State the sample size.
    (B) Give the name of the appropriate hypothesis test.
    (C) State the null and alternative hypotheses.
  2. Showing your calculations, find the missing values in cells
OCR FS1 AS 2021 June Q3
12 marks Standard +0.3
3 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h] \end{table}
QuestionSolutionMarksAOsGuidance
1(a)-0.954 BCB2 [2]1.1 1.1SC: If B0, give B1 if two of 7.04, 29.0[4], -13.6[4] (or 35.2, 145[.2], -68.2) seen
1(b)Points lie close to a straight line Line has negative gradientB1 B1 [2]2.2b 1.1Must refer to line, not just "negative correlation"
1(c)No, it will be the same as \(x \rightarrow a\) is a linear transformationB1 [1]2.2aOE. Either "same" with correct reason, or "disagree" with correct reason. Allow any clear valid technical term
2(a)NeitherB1 [1]1.2
2(b)\(q = 1.13 + 0.620 p\)B1B1 B1 [3]1.1,1.1 1.10.62(0) correct; both numbers correct Fully correct answer including letters
2(c)(i)2.68B1ft [1]1.1awrt 2.68, ft on their (b) if letters correct
2(c)(ii)2.5 is within data range, and points (here) are close to line/well correlatedB1 B1 [2]2.2b 2.2bAt least one reason, allow "no because points not close to line" Full argument, two reasons needed
2(d)
Not much data here/points scattered/ possible outliers
So not very reliable
M1 A1 [2]2.3 1.1Reason for not very reliable (not "extrapolation") Full argument and conclusion, not too assertive (not wholly unreliable!)
3(a)Expected frequency for Middle/25 to 60 is 4.4 which is < 5 so must combine cellsB1*ft depB1 [2]2.4 3.5bCorrectly obtain this \(F _ { E }\), ft on addition errors " < 5" explicit and correct deduction
3(b)
EarlyMiddleLate
29.423.131.5
26.620.928.5
EarlyMiddleLate
0.99180.41602.2937
1.09620.45982.5351
B11.1
Both, allow 28.4 for 28.5
awrt 2.29, but allow 2.3 In range [2.53, 2.54]
QuestionSolutionMarksAOsGuidance
3(c)
\(\mathrm { H } _ { 0 }\) : no association between session and age group. \(\mathrm { H } _ { 1 }\) : some association
\(\Sigma X ^ { 2 } = 7.793\)
\(v = 2 , \chi ^ { 2 } ( 2 ) _ { \text {crit } } = 5.991\)
Reject \(\mathrm { H } _ { 0 }\).
Significant evidence of association between session attended and age group.
B1
B1
B1
M1ft
A1ft [5]
1.1
1.1
1.1
1.1
2.2b
Both. Allow "independent" etc
Correct value of \(X ^ { 2 }\), awrt 7.79 (allow even if wrong in (b))
Correct CV and comparison
Correct first conclusion, FT on their TS only
Contextualised, not too assertive
3(d)The two biggest contributions to \(\chi ^ { 2 }\) are both for the late session ... ... when the proportion of younger people is higher, and of older people is lower, than the null hypothesis would suggest.
M1ft
A1ft
[2]
1.1
2.4
Refer to biggest contribution(s), FT on their answers to (b), needs "reject \(\mathrm { H } _ { 0 }\) "
Full answer, referring to at least one cell (ignore comments on next highest cells)
\multirow[t]{2}{*}{4}\multirow{2}{*}{}\multirow{2}{*}{OR:}
\(\frac { { } ^ { 2 m } C _ { 2 } \times m } { { } ^ { 3 m } C _ { 3 } }\)
\(= \frac { 2 m ( 2 m - 1 ) } { 2 } \times m \div \frac { 3 m ( 3 m - 1 ) ( 3 m - 2 ) } { 6 }\)
\(= \frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) }\) \(\frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) } = \frac { 28 } { 55 }\)
\(\Rightarrow 16 m ^ { 2 } - 71 m + 28 = 0\)
\(m = 4\) BC
Reject \(m = \frac { 7 } { 16 }\) as \(m\) is an integer
M1
M1
A1
M1
A1
M1
A1
[7]
3.1b
3.1b
2.1
3.1a
2.1
1.1
3.2a
Use \({ } ^ { 2 m } C _ { 2 }\) and \(m\)
Divide by \({ } ^ { 3 m } C _ { 3 }\)
Correct expression in terms of \(m\) (allow with \(m\) not cancelled yet)
Equate to \(\frac { 28 } { 55 }\) \simplify to three-term quadratic
Correct simplified quadratic, or (quadratic) \(\times m , = 0\), aef Solve to get both 4 and \(\frac { 7 } { 16 }\)
Explicitly reject \(m = \frac { 7 } { 16 }\)
\(\frac { 2 m ( 2 m - 1 ) \times m \times 3 ! } { 3 m ( 3 m - 1 ) ( 3 m - 2 ) \times 2 }\) then as above
Multiplication method can get full marks, but if no 3 or 3 !, max
M1M0A0 M1A0M0A0
OCR FS1 AS 2021 June Q2
9 marks Moderate -0.5
2 After a holiday organised for a group, the company organising the holiday obtained scores out of 10 for six different aspects of the holiday. The company obtained responses from 100 couples and 100 single travellers. The total scores for each of the aspects are given in the following table.
QuestionAnswerMarkAOGuidance
1(a)\(\frac { 1 } { 0.2 } = 5\)M1 A1 [2]3.3 1.1Geometric distribution soi 5 (or \(5.00 \ldots\) ) only
1(b)\(0.8 ^ { 2 } - 0.8 ^ { 10 }\) \(= \mathbf { 0 . 5 3 3 } \quad ( 0.5326258 \ldots )\)M1 A1 [2]1.1 3.4
Allow for powers 2, 3, 4 and 9, 10, 11 .
Awrt 0.533, www. [5201424/976562]
Or \(0.2 \left( 0.8 ^ { 2 } + \ldots . + 0.8 ^ { 9 } \right) , \pm 1\) term at either end [0.506, 0.378, 0.275, 0.405, 0.302, 0.554, 0.426, 0.324]
1(c)
\(\mathrm { P } ( \geq 10 ) = 0.8 ^ { 9 }\)
\(= 0.1342 \ldots\)
B(30, 0.1342...)
Variance \(= n p q\) = 3.486...
M1
A1
M1
A1ft [4]
3.1b
1.1
3.1b
1.1
Or \(0.8 ^ { 10 }\). Can be implied by correct \(p\)
[0.10737... is M1A0 here]
Stated or implied, their \(0.8 ^ { 9 }\) or \(0.8 ^ { 10 }\)
In range [3.48, 3.49]
SC: 0.134(2) oe not properly shown: B2 for correct final answer.
SC: 2.875 from \(0.8 ^ { 10 }\) : M1A0M1A1ft
QuestionAnswerMarkAOGuidance
2(a)Test is for rankings/rankings arbitrary/not bivariate normal etcB1 [1]2.4OE
2(b)
\(\mathrm { H } _ { 0 } : \rho _ { s } = 0 , \mathrm { H } _ { 1 } : \rho _ { s } > 0\), where \(\rho _ { s }\) is the population rank correlation coefficient
Ranks 543612
512643
\(\Sigma d ^ { 2 } = 20\)
\(r _ { s } = 1 - \frac { 6 \times 20 } { 6 \times 35 }\)
\(= 3 / 7\) or \(0.42857 \ldots\)
<0.9429
B1
B1
M1
A1
B1
1.1
1.1
1.1
1.1
1.1
Allow \(\rho _ { s }\) not defined; allow \(\rho\).
Allow: \(\mathrm { H } _ { 0 }\) : no association between rankings.
\(\mathrm { H } _ { 1 }\) : positive association (but not \(\mathrm { H } _ { 1 }\) : association)
Do not reject \(\mathrm { H } _ { 0 }\)
Insufficient evidence of association between ranking given by the two categories
M1ft
A1ft
[7]
1.1
2.2b
FT on their \(\Sigma d ^ { 2 }\) only
2(c)Not dependent on any distributional assumptions
B1
[1]
1.2Oe (cf. Specification, 5.08f)
QuestionAnswerMarkAOGuidance
3(a)Failures occur to no fixed pattern/are not predictableB1 [1]1.1OE. NOT "independent"
3(b)Failures occur independently of one another and at constant average rate
B1
B1
[2]
1.1
1.1
Not recoverable from (a) if independence not restated here; must be contextualised
Ignore "singly". Allow "uniform" rate, not "constant rate" or "constant probability"; must be contextualised
3(c)
Variance (1.6384) \(\approx\) mean
So suggests that it is likely to be well modelled
M1
A1
[2]
1.1
3.5a
Compare variance (or SD). Allow square/square-root confusion
Correct comparison and conclusion, 1.64 or better seen
3(d)\(\mathrm { e } ^ { - 1.61 }\)
B1
[1]
3.4Exact needed, allow even if \(0 !\) or \(1.61 ^ { 0 }\) or both left in
3(e)
1\(\geq 2\)
0.3220.478
B1
B1
[2]
3.4
1.1
One correct e.g. 0.3218
Other correct e.g. 0.4783
3(f)\(\mathrm { P } ( F = 1 )\) will be smaller as single failures are less likely
B1*
depB1
[2]
3.5c
3.3
OE. Partial answer: B1
CAIE FP2 2010 June Q10
13 marks Standard +0.3
Three new flu vaccines, \(A\), \(B\) and \(C\), were tested on \(500\) volunteers. The vaccines were assigned randomly to the volunteers and \(178\) received \(A\), \(149\) received \(B\) and \(173\) received \(C\). During the following winter, of the volunteers given \(A\) caught flu, \(29\) of the volunteers given \(B\) caught flu, and \(16\) of the volunteers given \(C\) caught flu. Carry out a suitable test for independence at the \(5\%\) significance level. [10] Without using a statistical test, decide which of the vaccines appears to be most effective. [3]
CAIE FP2 2012 June Q10
11 marks Standard +0.8
Random samples of employees are taken from two companies, \(A\) and \(B\). Each employee is asked which of three types of coffee (Cappuccino, Latte, Ground) they prefer. The results are shown in the following table.
CappuccinoLatteGround
Company \(A\)605232
Company \(B\)354031
Test, at the 5% significance level, whether coffee preferences of employees are independent of their company. [7] Larger random samples, consisting of \(N\) times as many employees from each company, are taken. In each company, the proportions of employees preferring the three types of coffee remain unchanged. Find the least possible value of \(N\) that would lead to the conclusion, at the 1% significance level, that coffee preferences of employees are not independent of their company. [4]
CAIE FP2 2012 June Q10
11 marks Standard +0.8
Random samples of employees are taken from two companies, \(A\) and \(B\). Each employee is asked which of three types of coffee (Cappuccino, Latte, Ground) they prefer. The results are shown in the following table.
CappuccinoLatteGround
Company \(A\)605232
Company \(B\)354031
Test, at the 5\% significance level, whether coffee preferences of employees are independent of their company. [7] Larger random samples, consisting of \(N\) times as many employees from each company, are taken. In each company, the proportions of employees preferring the three types of coffee remain unchanged. Find the least possible value of \(N\) that would lead to the conclusion, at the 1\% significance level, that coffee preferences of employees are not independent of their company. [4]
CAIE FP2 2010 November Q8
7 marks Standard +0.3
The owner of three driving schools, \(A\), \(B\) and \(C\), wished to assess whether there was an association between passing the driving test and the school attended. He selected a random sample of learner drivers from each of his schools and recorded the numbers of passes and failures at each school. The results that he obtained are shown in the table below.
Driving school attended
\(A\)\(B\)\(C\)
Passes231517
Failures272543
Using a \(\chi^2\)-test and a 5% level of significance, test whether there is an association between passing or failing the driving test and the driving school attended. [7]
Edexcel S3 2015 June Q5
12 marks Standard +0.3
A Head of Department at a large university believes that gender is independent of the grade obtained by students on a Business Foundation course. A random sample was taken of 200 male students and 160 female students who had studied the course. The results are summarised below.
MaleFemale
Distinction18.5\%27.5\%
Merit63.5\%60.0\%
Unsatisfactory18.0\%12.5\%
Stating your hypotheses clearly, test the Head of Department's belief using a 5\% level of significance. Show your working clearly. [12]
Edexcel S3 Q5
11 marks Standard +0.3
The manager of a leisure centre collected data on the usage of the facilities in the centre by its members. A random sample from her records is summarised below.
FacilityMaleFemale
Pool4068
Jacuzzi2633
Gym5231
Making your method clear, test whether or not there is any evidence of an association between gender and use of the club facilities. State your hypotheses clearly and use a 5\% level of significance. [11]
Edexcel S3 2005 June Q3
Standard +0.3
A researcher carried out a survey of three treatments for a fruit tree disease. The contingency table below shows the results of a survey of a random sample of 60 diseased trees.
No actionRemove diseased branchesSpray with chemicals
Tree died within 1 year1056
Tree survived for 1–4 years597
Tree survived beyond 4 years567
Test, at the 5\% level of significance, whether or not there is any association between the treatment of the trees and their survival. State your hypotheses and conclusion clearly. (Total 11 marks)
Edexcel S3 2006 June Q6
11 marks Standard +0.3
A research worker studying colour preference and the age of a random sample of 50 children obtained the results shown below.
Age in yearsRedBlueTotals
412618
810717
126915
Totals282250
Using a 5\% significance level, carry out a test to decide whether or not there is an association between age and colour preference. State your hypotheses clearly. [11]
Edexcel S3 2011 June Q3
10 marks Standard +0.3
A factory manufactures batches of an electronic component. Each component is manufactured in one of three shifts. A component may have one of two types of defect, \(D_1\) or \(D_2\), at the end of the manufacturing process. A production manager believes that the type of defect is dependent upon the shift that manufactured the component. He examines 200 randomly selected defective components and classifies them by defect type and shift. The results are shown in the table below.
\(D_1\)\(D_2\)
First shift4518
Second shift5520
Third shift5012
Stating your hypotheses, test, at the 10\% level of significance, whether or not there is evidence to support the manager's belief. Show your working clearly. [10]
Edexcel S3 2016 June Q2
Standard +0.3
A new drug to vaccinate against influenza was given to 110 randomly chosen volunteers. The volunteers were given the drug in one of 3 different concentrations, \(A\), \(B\) and \(C\), and then were monitored to see if they caught influenza. The results are shown in the table below.
\(A\)\(B\)\(C\)
Influenza12299
No influenza152322
Test, at the 10\% level of significance, whether or not there is an association between catching influenza and the concentration of the new drug. State your hypotheses and show your working clearly. You should state your expected frequencies to 2 decimal places. (10)
Edexcel S3 Q2
6 marks Standard +0.3
A random sample of the invoices, for books purchased by the customers of a large bookshop, was classified by book cover (hardback, paperback) and type of book (novel, textbook, general interest). As part of the analysis of these invoices, an approximate \(\chi^2\) statistic was calculated and found to be 11.09. Assuming that there was no need to amalgamate any of the classifications, carry out an appropriate test to determine whether or not there was any association between book cover and type of book. State your hypotheses clearly and use a 5% level of significance. [6]
Edexcel S3 Specimen Q7
11 marks Moderate -0.3
A survey in a college was commissioned to investigate whether or not there was any association between gender and passing a driving test. A group of 50 male and 50 female students were asked whether they passed or failed their driving test at the first attempt. All the students asked had taken the test. The results were as follows.
PassFail
Male2327
Female3218
Stating your hypotheses clearly test, at the 10\% level, whether or not there is any evidence of an association between gender and passing a driving test at the first attempt. [11]
AQA S2 2010 June Q2
8 marks Standard +0.3
It is claimed that a new drug is effective in the prevention of sickness in holiday-makers. A sample of \(100\) holiday-makers was surveyed, with the following results.
SicknessNo sicknessTotal
Drug taken245680
No drug taken11920
Total3565100
Assuming that the \(100\) holiday-makers are a random sample, use a \(\chi^2\) test, at the \(5\%\) level of significance, to investigate the claim. [8 marks]
AQA S2 2016 June Q5
13 marks Standard +0.3
A car manufacturer keeps a record of how many of the new cars that it has sold experience mechanical problems during the first year. The manufacturer also records whether the cars have a petrol engine or a diesel engine. Data for a random sample of 250 cars are shown in the table.
Problems during first 3 monthsProblems during first year but after first 3 monthsNo problems during first yearTotal
Petrol engine1035170215
Diesel engine482335
Total1443193250
  1. Use a \(\chi^2\)-test to investigate, at the 10% significance level, whether there is an association between the mechanical problems experienced by a new car from this manufacturer and the type of engine. [11 marks]
  2. Arisa is planning to buy a new car from this manufacturer. She would prefer to buy a car with a diesel engine, but a friend has told her that cars with diesel engines experience more mechanical problems. Based on your answer to part (a), state, with a reason, the advice that you would give to Arisa. [2 marks]
OCR MEI S2 2007 January Q4
18 marks Standard +0.3
Two educational researchers are investigating the relationship between personal ambitions and home location of students. The researchers classify students into those whose main personal ambition is good academic results and those who have some other ambition. A random sample of 480 students is selected.
  1. One researcher summarises the data as follows.
    \multirow{2}{*}{Observed}Home location
    \cline{2-3}CityNon-city
    \multirow{2}{*}{Ambition}Good results102147
    \cline{2-3}Other75156
    Carry out a test at the 5\% significance level to examine whether there is any association between home location and ambition. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic. [9]
  2. The other researcher summarises the same data in a different way as follows.
    \multirow{2}{*}{Observed}Home location
    \cline{2-4}CityTownCountry
    \multirow{2}{*}{Ambition}Good results1028364
    \cline{2-4}Other756492
    1. Calculate the expected frequencies for both 'Country' cells. [2]
    2. The test statistic for these data is 10.94. Carry out a test at the 5\% level based on this table, using the same hypotheses as in part (i). [3]
    3. The table below gives the contribution of each cell to the test statistic. Discuss briefly how personal ambitions are related to home location. [2]
      \multirow{2}{*}{
      Contribution to the
      test statistic
      }
      Home location
      \cline{2-4}CityTownCountry
      \multirow{2}{*}{Ambition}Good results1.1290.5963.540
      \cline{2-4}Other1.2170.6433.816
  3. Comment briefly on whether the analysis in part (ii) means that the conclusion in part (i) is invalid. [2]
OCR S3 2012 January Q1
6 marks Moderate -0.8
In a test of association of two factors, \(A\) and \(B\), a \(2 \times 2\) contingency table yielded \(5.63\) for the value of \(\chi^2\) with Yates' correction.
  1. State the null hypothesis and alternative hypothesis for the test. [1]
  2. State how Yates' correction is applied, and whether it increases or decreases the value of \(\chi^2\). [2]
  3. Carry out the test at the \(2\frac{1}{2}\%\) significance level. [3]
OCR MEI S3 2010 June Q3
18 marks Standard +0.3
  1. In order to prevent and/or control the spread of infectious diseases, the Government has various vaccination programmes. One such programme requires people to receive a booster injection at the age of 18. It is felt that the proportion of people receiving this booster could be increased and a publicity campaign is undertaken for this purpose. In order to assess the effectiveness of this campaign, health authorities across the country are asked to report the percentage of 18-year-olds receiving the booster before and after the campaign. The results for a randomly chosen sample of 9 authorities are as follows.
    AuthorityABCDEFGHI
    Before769888818684839380
    After829793778395919589
    This sample is to be tested to see whether the campaign appears to have been successful in raising the percentage receiving the booster.
    1. Explain why the use of paired data is appropriate in this context. [1]
    2. Carry out an appropriate Wilcoxon signed rank test using these data, at the 5\% significance level. [10]
  2. Benford's Law predicts the following probability distribution for the first significant digit in some large data sets.
    Digit123456789
    Probability0.3010.1760.1250.0970.0790.0670.0580.0510.046
    On one particular day, the first significant digits of the stock market prices of the shares of a random sample of 200 companies gave the following results.
    Digit123456789
    Frequency55342716151712159
    Test at the 10\% level of significance whether Benford's Law provides a reasonable model in the context of share prices. [7]
Edexcel S3 Q4
11 marks Standard +0.3
A hospital administrator is assessing staffing needs for its Accident and Emergency Department at different times of day. The administrator already has data on the number of admissions at different times of day but needs to know if the proportion of the cases that are serious remains constant. Staff are asked to assess whether each person arriving at Accident and Emergency has a "minor" or "serious" problem and the results for three different time periods are shown below.
MinorSerious
8 a.m. – 6 p.m.4511
6 p.m. – 2 a.m.4922
2 a.m. – 8 a.m.147
Stating your hypotheses clearly, test at the 5% level of significance whether or not there is evidence of the proportion of serious injuries being different at different times of day. [11]
Edexcel S3 Q6
11 marks Standard +0.3
Two schools in the same town advertise at the same time for new heads of English and History departments. The number of applicants for each post are shown in the table below.
EnglishHistory
Highfield School3214
Rowntree School4826
Stating your hypotheses clearly, test at the 10\% level of significance whether or not there is evidence of the proportion of applicants for each job being different in the two schools. [11 marks]
Edexcel S3 Q6
15 marks Standard +0.3
A survey found that of the 320 people questioned who had passed their driving test aged under twenty-five, 104 had been involved in an accident in the two years following their test. Of the 80 people in the survey who were aged twenty-five or over when they passed their test, 16 had been involved in an accident in the following two years.
  1. Draw up a contingency table showing this information. [2]
It is desired to test whether the proportion of drivers having accidents within two years of passing their test is different for those who were aged under twenty-five at the time of passing their test than for those aged twenty-five or over.
    1. Stating your hypotheses clearly, carry out the test at the 5\% level of significance.
    2. Explain clearly why there is only one degree of freedom. [11]
It is found that 12 people who were aged under twenty-five when they took their test and had been involved in an accident in the following two years had been omitted from the information given.
  1. Explain why you do not need to repeat the calculation to know the correct result of the test. [2]
AQA Further AS Paper 2 Statistics 2020 June Q2
1 marks Moderate -0.8
A \(\chi^2\) test is carried out in a school to test for association between the class a student belongs to and the number of times they are late to school in a week. The contingency table below gives the expected values for the test.
Number of times late
01234
A8.121415.12144.76
Class B8.9915.516.7415.55.27
C11.8920.522.1420.56.97
Find a possible value for the degrees of freedom for the test. Circle your answer. [1 mark] 6 \quad 8 \quad 12 \quad 15
AQA Further AS Paper 2 Statistics 2020 June Q7
6 marks Moderate -0.8
A restaurant has asked Sylvia to conduct a \(\chi^2\) test for association between meal ordered and age of customer.
  1. State the hypotheses that Sylvia should use for her test. [1 mark]
  2. Sylvia correctly calculates her value of the test statistic to be 44.1 She uses a 5% level of significance and the degrees of freedom for the test is 30 Sylvia accepts the null hypothesis. Explain whether or not Sylvia was correct to accept the null hypothesis. [4 marks]
  3. State in context the correct conclusion to Sylvia's test. [1 mark]