Standard 3×3 contingency table

A question is this type if and only if the data form a 3-row by 3-column contingency table requiring a chi-squared test of independence with 4 degrees of freedom, with no need to combine cells.

14 questions · Standard +0.4

Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2024 June Q3
7 marks Standard +0.3
3 There are three bus companies in a city. The council is investigating whether the buses reliably arrive at their destination on time. The results from random samples of buses from each company are summarised in the following table.
\multirow{2}{*}{}Bus company
\(A\)\(B\)\(C\)Total
\multirow{3}{*}{Arrival}Early22221054
On time305242124
Late28261872
Total8010070250
Test, at the \(5 \%\) significance level, whether the reliability of buses is independent of bus company.
CAIE Further Paper 4 2022 November Q2
7 marks Standard +0.3
2 In the colleges in three regions of a particular country, students are given individual targets to achieve. Their performance is measured against their individual target and graded as 'above target', 'on target' or 'below target'. For a random sample of students from each of the three regions, the observed frequencies are summarised in the following table.
\multirow{2}{*}{}Region
ABCTotal
\multirow{3}{*}{Performance}Above target624144147
On target1029495291
Below target564561162
Total220180200600
Test, at the 10\% significance level, whether performance is independent of region.
OCR MEI S4 2006 June Q4
24 marks Standard +0.3
4 An experiment is carried out to compare five industrial paints, A, B, C, D, E, that are intended to be used to protect exterior surfaces in polluted urban environments. Five different types of surface (I, II, III, IV, V) are to be used in the experiment, and five specimens of each type of surface are available. Five different external locations ( \(1,2,3,4,5\) ) are used in the experiment. The paints are applied to the specimens of the surfaces which are then left in the locations for a period of six months. At the end of this period, a "score" is given to indicate how effective the paint has been in protecting the surface.
  1. Name a suitable experimental design for this trial and give an example of an experimental layout. Initial analysis of the data indicates that any differences between the types of surface are negligible, as also are any differences between the locations. It is therefore decided to analyse the data by one-way analysis of variance.
  2. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  3. The data for analysis are as follows. Higher scores indicate better performance.
    Paint APaint BPaint CPaint DPaint E
    6466596564
    5868567852
    7376696956
    6070607261
    6771637158
    [The sum of these data items is 1626 and the sum of their squares is 106838 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a 5\% significance level. Report briefly on your conclusions.
    [0pt] [12]
OCR MEI S4 2007 June Q4
24 marks Standard +0.8
4 An agricultural company conducts a trial of five fertilisers (A, B, C, D, E) in an experimental field at its research station. The fertilisers are applied to plots of the field according to a completely randomised design. The yields of the crop from the plots, measured in a standard unit, are analysed by the one-way analysis of variance, from which it appears that there are no real differences among the effects of the fertilisers. A statistician notes that the residual mean square in the analysis of variance is considerably larger than had been anticipated from knowledge of the general behaviour of the crop, and therefore suspects that there is some inadequacy in the design of the trial.
  1. Explain briefly why the statistician should be suspicious of the design.
  2. Explain briefly why an inflated residual leads to difficulty in interpreting the results of the analysis of variance, in particular that the null hypothesis is more likely to be accepted erroneously. Further investigation indicates that the soil at the west side of the experimental field is naturally more fertile than that at the east side, with a consistent 'fertility gradient' from west to east.
  3. What experimental design can accommodate this feature? Provide a simple diagram of the experimental field indicating a suitable layout. The company decides to conduct a new trial in its glasshouse, where experimental conditions can be controlled so that a completely randomised design is appropriate. The yields are as follows.
    Fertiliser AFertiliser BFertiliser CFertiliser DFertiliser E
    23.626.018.829.017.7
    18.235.316.737.216.5
    32.430.523.032.612.8
    20.831.428.331.420.4
    [The sum of these data items is 502.6 and the sum of their squares is 13610.22 .]
  4. Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  5. State the assumptions about the distribution of the experimental error that underlie your analysis in part (iv).
OCR MEI S4 2008 June Q4
24 marks Standard +0.3
4
  1. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  2. An examinations authority is considering using an external contractor for the typesetting and printing of its examination papers. Four contractors are being investigated. A random sample of 20 examination papers over the entire range covered by the authority is selected and 5 are allocated at random to each contractor for preparation. The authority carefully checks the printed papers for errors and assigns a score to each to indicate the overall quality (higher scores represent better quality). The scores are as follows.
    Contractor AContractor BContractor CContractor D
    41545641
    49454536
    50505446
    44505038
    56474935
    [The sum of these data items is 936 and the sum of their squares is 44544 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  3. The authority thinks that there might be differences in the ways the contractors cope with the preparation of examination papers in different subject areas. For this purpose, the subject areas are broadly divided into mathematics, sciences, languages, humanities, and others. The authority wishes to design a further investigation, ensuring that each of these subject areas is covered by each contractor. Name the experimental design that should be used and describe briefly the layout of the investigation.
OCR MEI S4 2010 June Q4
24 marks Standard +0.3
4 At an agricultural research station, a trial is made of four varieties (A, B, C, D) of a certain crop in an experimental field. The varieties are grown on plots in the field and their yields are measured in a standard unit.
  1. It is at first thought that there may be a consistent trend in the natural fertility of the soil in the field from the west side to the east, though no other trends are known. Name an experimental design that should be used in these circumstances and give an example of an experimental layout. Initial analysis suggests that any natural fertility trend may in fact be ignored, so the data from the trial are analysed by one-way analysis of variance.
  2. The usual model for one-way analysis of variance of the yields \(y _ { i j }\) may be written as $$y _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ where the \(e _ { i j }\) represent the experimental errors. Interpret the other terms in the model. State the usual distributional assumptions for the \(e _ { i j }\).
  3. The data for the yields are as follows, each variety having been used on 5 plots.
    Variety
    ABCD
    12.314.214.113.6
    11.913.113.212.8
    12.813.114.613.3
    12.212.513.714.3
    13.512.713.413.8
    $$\left[ \Sigma \Sigma y _ { i j } = 265.1 , \quad \Sigma \Sigma y _ { i j } ^ { 2 } = 3524.31 . \right]$$ Construct the usual one-way analysis of variance table and carry out the usual test, at the 5\% significance level. Report briefly on your conclusions. {www.ocr.org.uk} after the live examination series.
    If OCR has unwittingly failed to correctly acknowledge or clear any third-party content in this assessment material, OCR will be happy to correct its mistake at the earliest possible opportunity. For queries or further information please contact the Copyright Team, First Floor, 9 Hills Road, Cambridge CB2 1GE.
    OCR is part of the
OCR MEI S2 2009 January Q4
17 marks Standard +0.3
4 A gardening research organisation is running a trial to examine the growth and the size of flowers of various plants.
  1. In the trial, seeds of three types of plant are sown. The growth of each plant is classified as good, average or poor. The results are shown in the table.
    \multirow{2}{*}{}Growth\multirow[t]{2}{*}{Row totals}
    GoodAveragePoor
    \multirow{3}{*}{Type of plant}Coriander12281555
    Aster7182348
    Fennel14221147
    Column totals336849150
    Carry out a test at the \(5 \%\) significance level to examine whether there is any association between growth and type of plant. State carefully your null and alternative hypotheses. Include a table of the contributions of each cell to the test statistic.
  2. It is known that the diameter of marigold flowers is Normally distributed with mean 47 mm and standard deviation 8.5 mm . A certain fertiliser is expected to cause flowers to have a larger mean diameter, but without affecting the standard deviation. A large number of marigolds are grown using this fertiliser. The diameters of a random sample of 50 of the flowers are measured and the mean diameter is found to be 49.2 mm . Carry out a hypothesis test at the \(1 \%\) significance level to check whether flowers grown with this fertiliser appear to be larger on average. Use hypotheses \(\mathrm { H } _ { 0 } : \mu = 47 , \mathrm { H } _ { 1 } : \mu > 47\), where \(\mu \mathrm { mm }\) represents the mean diameter of all marigold flowers grown with this fertiliser.
OCR MEI S2 2011 January Q4
18 marks Standard +0.3
4 A researcher is investigating the sizes of pebbles at various locations in a river. Three sites in the river are chosen and each pebble sampled at each site is classified as large, medium or small. The results are as follows.
Site\multirow{2}{*}{
Row
totals
}
\cline { 3 - 6 } \multicolumn{2}{|c|}{}ABC
\multirow{3}{*}{
Pebble
size
}
Large15121037
\cline { 2 - 6 }Medium28174590
\cline { 2 - 6 }Small473336116
Column totals906291243
  1. Carry out a test at the \(5 \%\) significance level to examine whether there is any association between pebble size and site. Your working should include a table of the contributions of each cell to the test statistic.
  2. By referring to each site, comment briefly on how the size of the pebbles compares with what would be expected if there were no association. You should support your answers by referring to your table of contributions.
OCR S3 2016 June Q2
7 marks Standard +0.3
2 A random sample of 200 American voters were asked about which political party they supported and their attitude to a proposed new form of taxation. The voters' responses are summarised in the table. Attitude
\cline { 2 - 5 }In favourNeutralAgainst
\cline { 2 - 5 }Democrat581616
\cline { 2 - 5 } PartyIndependent25411
\cline { 2 - 5 }Republican172033
\cline { 2 - 5 }
\cline { 2 - 5 }
Carry out a \(\chi ^ { 2 }\) test, at the \(1 \%\) level of significance, to investigate whether there is an association between party supported and attitude to the proposed form of taxation.
OCR MEI S4 Q4
12 marks Standard +0.8
4 An experiment is carried out to compare five industrial paints, A, B, C, D, E, that are intended to be used to protect exterior surfaces in polluted urban environments. Five different types of surface (I, II, III, IV, V) are to be used in the experiment, and five specimens of each type of surface are available. Five different external locations ( \(1,2,3,4,5\) ) are used in the experiment. The paints are applied to the specimens of the surfaces which are then left in the locations for a period of six months. At the end of this period, a "score" is given to indicate how effective the paint has been in protecting the surface.
  1. Name a suitable experimental design for this trial and give an example of an experimental layout. Initial analysis of the data indicates that any differences between the types of surface are negligible, as also are any differences between the locations. It is therefore decided to analyse the data by one-way analysis of variance.
  2. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  3. The data for analysis are as follows. Higher scores indicate better performance. The underlying distributions of strengths are assumed to be Normal for both suppliers, with variances 2.45 for supplier A and 1.40 for supplier B.
  4. Test at the \(5 \%\) level of significance whether it is reasonable to assume that the mean strengths from the two suppliers are equal.
  5. Provide a two-sided 90\% confidence interval for the true mean difference.
  6. Show that the test procedure used in part (i), with samples of sizes 7 and 5 and a \(5 \%\) significance level, leads to acceptance of the null hypothesis of equal means if \(- 1.556 < \bar { x } - \bar { y } < 1.556\), where \(\bar { x }\) and \(\bar { y }\) are the observed sample means from suppliers A and B . Hence find the probability of a Type II error for this test procedure if in fact the true mean strength from supplier A is 2.0 units more than that from supplier B.
  7. A manager suggests that the Wilcoxon rank sum test should be used instead, comparing the median strengths for the samples of sizes 7 and 5 . Give one reason why this suggestion might be sensible and two why it might not.
CAIE FP2 2018 June Q8
8 marks Standard +0.3
8 A manufacturer produces three types of car: hatchbacks, saloons and estates. Each type of car is available in one of three colours: silver, blue and red. The manufacturer wants to know whether the popularity of the colour of the car is related to the type of car. A random sample of 300 cars chosen by customers gives the information summarised in the following table.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Colour of car
\cline { 3 - 5 } \multicolumn{2}{c|}{}SilverBlueRed
\multirow{3}{*}{Type of car}Hatchback533641
\cline { 2 - 5 }Saloon294031
\cline { 2 - 5 }Estate282418
Test at the \(10 \%\) significance level whether the colour of car chosen by customers is independent of the type of car.
Edexcel S3 Q6
14 marks Standard +0.3
6. A market researcher recorded the number of adverts for vehicles in each of three categories on ITV, Channel 4 and Channel 5 over a period of time. The results are shown in the table below.
ITVChannel 4Channel 5
Family Saloon693528
Sports Car202818
Off-road Vehicle12228
  1. Stating your hypotheses clearly, test at the \(5 \%\) level of significance whether or not there is evidence of the proportion of adverts for each type of vehicle being dependent on the channel.
  2. Suggest a reason for your result in part (a).
Edexcel FS1 AS 2023 June Q2
6 marks Standard +0.3
  1. A bag contains a large number of balls, all of the same size and weight. The balls are coloured Red, Blue or Yellow.
Jasmine asks each child in a group of 150 children to close their eyes, select a ball from the bag and show it to her. The child then replaces the ball and repeats the process a second time. If both balls are the same colour the child receives a prize.
The results are given in the table below.
\backslashbox{2nd colour}{1st colour}RedBlueYellowTotal
Red31111860
Blue810927
Yellow2193363
Total603060150
Jasmine carries out a test, at the \(5 \%\) level of significance, to see whether or not the colour of the 2nd ball is independent of the colour of the 1st ball.
  1. Calculate the expected frequencies for the cases where both balls are the same colour. The test statistic Jasmine obtained was 12.712 to three decimal places.
  2. Use this value to complete the test, stating the critical value and conclusion clearly. With reference to your calculations in part (a) and the nature of the experiment, (c) give a plausible reason why Jasmine may have obtained her conclusion in part (b).
OCR FS1 AS 2021 June Q3
12 marks Standard +0.3
3 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h] \end{table}
QuestionSolutionMarksAOsGuidance
1(a)-0.954 BCB2 [2]1.1 1.1SC: If B0, give B1 if two of 7.04, 29.0[4], -13.6[4] (or 35.2, 145[.2], -68.2) seen
1(b)Points lie close to a straight line Line has negative gradientB1 B1 [2]2.2b 1.1Must refer to line, not just "negative correlation"
1(c)No, it will be the same as \(x \rightarrow a\) is a linear transformationB1 [1]2.2aOE. Either "same" with correct reason, or "disagree" with correct reason. Allow any clear valid technical term
2(a)NeitherB1 [1]1.2
2(b)\(q = 1.13 + 0.620 p\)B1B1 B1 [3]1.1,1.1 1.10.62(0) correct; both numbers correct Fully correct answer including letters
2(c)(i)2.68B1ft [1]1.1awrt 2.68, ft on their (b) if letters correct
2(c)(ii)2.5 is within data range, and points (here) are close to line/well correlatedB1 B1 [2]2.2b 2.2bAt least one reason, allow "no because points not close to line" Full argument, two reasons needed
2(d)
Not much data here/points scattered/ possible outliers
So not very reliable
M1 A1 [2]2.3 1.1Reason for not very reliable (not "extrapolation") Full argument and conclusion, not too assertive (not wholly unreliable!)
3(a)Expected frequency for Middle/25 to 60 is 4.4 which is < 5 so must combine cellsB1*ft depB1 [2]2.4 3.5bCorrectly obtain this \(F _ { E }\), ft on addition errors " < 5" explicit and correct deduction
3(b)
EarlyMiddleLate
29.423.131.5
26.620.928.5
EarlyMiddleLate
0.99180.41602.2937
1.09620.45982.5351
B11.1
Both, allow 28.4 for 28.5
awrt 2.29, but allow 2.3 In range [2.53, 2.54]
QuestionSolutionMarksAOsGuidance
3(c)
\(\mathrm { H } _ { 0 }\) : no association between session and age group. \(\mathrm { H } _ { 1 }\) : some association
\(\Sigma X ^ { 2 } = 7.793\)
\(v = 2 , \chi ^ { 2 } ( 2 ) _ { \text {crit } } = 5.991\)
Reject \(\mathrm { H } _ { 0 }\).
Significant evidence of association between session attended and age group.
B1
B1
B1
M1ft
A1ft [5]
1.1
1.1
1.1
1.1
2.2b
Both. Allow "independent" etc
Correct value of \(X ^ { 2 }\), awrt 7.79 (allow even if wrong in (b))
Correct CV and comparison
Correct first conclusion, FT on their TS only
Contextualised, not too assertive
3(d)The two biggest contributions to \(\chi ^ { 2 }\) are both for the late session ... ... when the proportion of younger people is higher, and of older people is lower, than the null hypothesis would suggest.
M1ft
A1ft
[2]
1.1
2.4
Refer to biggest contribution(s), FT on their answers to (b), needs "reject \(\mathrm { H } _ { 0 }\) "
Full answer, referring to at least one cell (ignore comments on next highest cells)
\multirow[t]{2}{*}{4}\multirow{2}{*}{}\multirow{2}{*}{OR:}
\(\frac { { } ^ { 2 m } C _ { 2 } \times m } { { } ^ { 3 m } C _ { 3 } }\)
\(= \frac { 2 m ( 2 m - 1 ) } { 2 } \times m \div \frac { 3 m ( 3 m - 1 ) ( 3 m - 2 ) } { 6 }\)
\(= \frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) }\) \(\frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) } = \frac { 28 } { 55 }\)
\(\Rightarrow 16 m ^ { 2 } - 71 m + 28 = 0\)
\(m = 4\) BC
Reject \(m = \frac { 7 } { 16 }\) as \(m\) is an integer
M1
M1
A1
M1
A1
M1
A1
[7]
3.1b
3.1b
2.1
3.1a
2.1
1.1
3.2a
Use \({ } ^ { 2 m } C _ { 2 }\) and \(m\)
Divide by \({ } ^ { 3 m } C _ { 3 }\)
Correct expression in terms of \(m\) (allow with \(m\) not cancelled yet)
Equate to \(\frac { 28 } { 55 }\) \simplify to three-term quadratic
Correct simplified quadratic, or (quadratic) \(\times m , = 0\), aef Solve to get both 4 and \(\frac { 7 } { 16 }\)
Explicitly reject \(m = \frac { 7 } { 16 }\)
\(\frac { 2 m ( 2 m - 1 ) \times m \times 3 ! } { 3 m ( 3 m - 1 ) ( 3 m - 2 ) \times 2 }\) then as above
Multiplication method can get full marks, but if no 3 or 3 !, max
M1M0A0 M1A0M0A0