Chi-squared goodness of fit: Given ratios

A question is this type if and only if it tests whether observed frequencies match specified theoretical ratios or proportions.

16 questions · Standard +0.4

Sort by: Default | Easiest first | Hardest first
OCR S3 2014 June Q2
6 marks Standard +0.3
2 In a study of the inheritance of skin colouration in corn snakes, a researcher found 865 snakes with black and orange bodies, 320 snakes with black bodies, 335 snakes with orange bodies and 112 snakes with bodies of other colours. Theory predicts that snakes of these colours should occur in the ratios \(9 : 3 : 3 : 1\). Test, at the \(5 \%\) significance level, whether these experimental results are compatible with theory.
OCR S3 2013 January Q6
7 marks Standard +0.3
6 A large population of plants consists of five species \(A , B , C , D\) and \(E\) in the proportions \(p _ { A } , p _ { B } , p _ { C } , p _ { D }\) and \(p _ { E }\) respectively. A random sample of 120 plants consisted of \(23,14,24,27\) and 32 of \(A , B , C , D\) and \(E\) respectively. Carry out a test at the \(10 \%\) significance level of the null hypothesis that the proportions are \(p _ { \mathrm { A } } = p _ { \mathrm { B } } = 0.15 , p _ { \mathrm { C } } = p _ { \mathrm { D } } = 0.25\) and \(p _ { \mathrm { E } } = 0.2\).
OCR MEI S3 2009 January Q4
18 marks Standard +0.3
4
  1. Explain the meaning of 'opportunity sampling'. Give one reason why it might be used and state one disadvantage of using it. A market researcher is conducting an 'on-street' survey in a busy city centre, for which he needs to stop and interview 100 people. For each interview the researcher counts the number of people he has to ask until one agrees to be interviewed. The data collected are as follows.
    No. of people asked1234567 or more
    Frequency261917131186
    A model for these data is proposed as follows, where \(p\) (assumed constant throughout) is the probability that a person asked agrees to be interviewed, and \(q = 1 - p\).
    No. of people asked1234567 or more
    Probability\(p\)\(p q\)\(p q ^ { 2 }\)\(p q ^ { 3 }\)\(p q ^ { 4 }\)\(p q ^ { 5 }\)\(q ^ { 6 }\)
  2. Verify that these probabilities add to 1 whatever the value of \(p\).
  3. Initially it is thought that on average 1 in 4 people asked agree to be interviewed. Test at the \(10 \%\) level of significance whether it is reasonable to suppose that the model applies with \(p = 0.25\).
  4. Later an estimate of \(p\) obtained from the data is used in the analysis. The value of the test statistic (with no combining of cells) is found to be 9.124 . What is the outcome of this new test? Comment on your answer in relation to the outcome of the test in part (iii).
OCR MEI S3 2016 June Q2
18 marks Standard +0.3
2
  1. A genetic model involving body colour and eye colour of fruit flies predicts that offspring will consist of four phenotypes in the ratio \(9 : 3 : 3 : 1\). A random sample of 200 such offspring is taken. Their phenotypes are found to be as follows.
    PhenotypeBrown body Red eyeBrown body Brown eyeBlack body Red eyeBlack body Brown eye
    Frequency12537326
    Relative proportion from model9331
    Carry out a test, using a \(2.5 \%\) level of significance, of the goodness of fit of the genetic model to these data.
  2. The median length of European fruit flies is 2.5 mm . South American fruit flies are believed to be larger than European fruit flies. A random sample of 12 South American fruit flies is taken. The flies are found to have the following lengths (in mm). \(1.7 \quad 1.4\) \(3.1 \quad 3.5\) 3.8
    4.2
    2.2
    2.9
    4.4
    2.6 \(3.9 \quad 3.2\) Carry out a Wilcoxon signed rank test, using a \(5 \%\) level of significance, to test this belief.
CAIE FP2 2009 June Q9
9 marks Standard +0.3
9 The proportions of blood types \(\mathrm { A } , \mathrm { B } , \mathrm { AB }\) and O in the Australian population are \(38 \% , 10 \% , 3 \%\) and \(49 \%\) respectively. In order to test whether the population in Sydney conforms to these figures, a random sample of 200 residents is selected. The table shows the observed frequencies of these types in the sample.
Blood TypeABABO
Frequency57249110
Carry out a suitable test at the 5\% significance level. Find the smallest sample size that could be used for the test.
CAIE FP2 2010 June Q7
8 marks Challenging +1.2
7 Benford's Law states that, in many tables containing large numbers of numerical values, the probability distribution of the leading non-zero digit \(D\) is given by $$\mathrm { P } ( D = d ) = \log _ { 10 } \left( \frac { d + 1 } { d } \right) , \quad d = 1,2 , \ldots , 9 .$$ The following table shows a summary of a random sample of 100 non-zero leading digits taken from a table of cumulative probabilities for the Poisson distribution.
Leading digit12345\(\geqslant 6\)
Frequency222113111122
Carry out a suitable goodness of fit test at the 10\% significance level.
OCR Further Statistics AS 2019 June Q7
9 marks Standard +0.3
7 In a standard model from genetic theory, the ratios of types \(a , b , c\) and \(d\) of a characteristic from a genetic cross are predicted to be 9:3:3:1. Andrei collects 120 specimens from such a cross, and the numbers corresponding to each type of the characteristic are given in the table.
Type\(a\)\(b\)\(c\)\(d\)
Frequency5133306
Andrei tests, at the 1\% significance level, whether the observed frequencies are consistent with the standard model.
  1. State appropriate hypotheses for the test.
  2. Carry out the test.
  3. State with a reason which one of the frequencies is least consistent with the standard model.
  4. Suggest a different, improved model by changing exactly two of the ratio values.
OCR Further Statistics AS 2023 June Q6
12 marks Standard +0.3
6 A machine is used to toss a coin repeatedly. Rosa believes that the outcome of each toss made by the machine is not independent of the previous toss. Rosa gets the machine to toss a coin 6 times and record the number of heads, \(X\), obtained. After recording the number of heads obtained, Rosa resets the machine and gets it to toss the coin 6 more times. Rosa again records the number of heads obtained and she repeats this procedure until she has recorded 88 independent values of \(X\).
  1. The sample mean and sample variance of \(X\) are 3.35 and 3.392 respectively. Explain what these results suggest about the validity of a binomial model \(\mathrm { B } ( 6 , p )\) for the data. Rosa uses a computer spreadsheet to work out the probabilities for a more sophisticated model in which the outcome of each toss is dependent on the outcome of the previous toss. Her model suggests that the probabilities \(\mathrm { P } ( X = x )\), for \(x = 0,1,2,3,4,5,6\), are approximately in the ratio \(5 : 6 : 7 : 8 : 7 : 6 : 5\). She carries out a \(\chi ^ { 2 }\) test to investigate whether this model is a good fit for the data. The following table shows the full results of the experiments, together with some of the calculations needed for the test.
    \(x\)0123456Total
    Observed frequency710161515111488
    Expected frequency
    Contribution to \(\chi ^ { 2 }\) statistic0.90.33330.28570.06250.0714
  2. In the Printed Answer Booklet, complete the table.
  3. Carry out the test, using a 10\% significance level.
  4. Rosa says that the results definitely show that one of the two proposed models is correct. Comment on this statement.
OCR Further Statistics 2022 June Q9
10 marks Challenging +1.2
9 The head teacher of a school believes that, on average, pupil absences on the days Monday, Tuesday, Wednesday, Thursday and Friday are in the ratio \(3 : 2 : 2 : 2 : 3\). The head teacher takes a random sample of 120 pupil absences. The results are as follows.
Day of weekMondayTuesdayWednesdayThursdayFriday
Number of absences2816241636
  1. Test at the \(5 \%\) significance level whether these results are consistent with the head teacher's belief. A significance test at the \(5 \%\) level is also carried out on a second, independent, random sample of \(n\) pupil absences. All the numbers of absences are integers. The ratio of the numbers of absences for each day in this sample is identical to the ratio of the numbers of absences for each day in the original sample of size 120.
  2. Determine the smallest value of \(n\) for which the conclusion of this significance test is that the data are not consistent with the head teacher's belief.
OCR Further Statistics 2020 November Q7
10 marks Standard +0.3
7 A biased spinner has five sides, numbered 1 to 5 . Elmer spins the spinner repeatedly and counts the number of spins, \(X\), up to and including the first time that the number 2 appears. He carries out this experiment 100 times and records the frequency \(f\) with which each value of \(X\) is obtained. His results are shown in Table 1, together with the values of \(x f\). \begin{table}[h]
\(x\)123456\(\geqslant 7\)Total
Frequency \(f\)2015913101023100
\(x f\)203027525060161400
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. State an appropriate distribution with which to model \(X\), determining the value(s) of any parameter(s). Elmer carries out a goodness-of-fit test, at the \(5 \%\) level, for the distribution in part (a). Table 2 gives some of his calculations, in which numbers that are not exact have been rounded to 3 decimal places. \begin{table}[h]
    \(x\)123456\(\geqslant 7\)
    Observed frequency \(O\)2015913101023
    Expected frequency \(E\)2518.7514.06310.5477.9105.93317.798
    ( \(\mathrm { O } - \mathrm { E } ) ^ { 2 } / \mathrm { E }\)10.751.8230.5710.5522.7891.520
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Show how the expected frequency corresponding to \(x \geqslant 7\) was obtained.
  3. Carry out the test.
Edexcel S3 2018 June Q4
9 marks Standard +0.3
4. A company selects a random sample of five of its warehouses. The table below summarises the number of employees, in thousands, at each warehouse and the number of reported first aid incidents at each warehouse during 2017
WarehouseA\(B\)CDE
Number of employees, (in thousands)213.832.2
Number of reported first aid incidents1510402623
The personnel manager claims that the mean number of reported first aid incidents per 1000 employees is the same at each of the company's warehouses.
  1. Stating your hypotheses clearly, use a \(5 \%\) level of significance to test the manager's claim. Jean, the safety officer at warehouse \(C\), kept a record of each reported first aid incident at warehouse \(C\) in 2017. Jean wishes to select a systematic sample of 10 records from warehouse \(C\).
  2. Explain, in detail, how Jean should obtain such a sample.
Edexcel S3 2024 June Q4
11 marks Standard +0.3
  1. The manager of a company making ice cream believes that the proportions of people in the population who prefer vanilla, chocolate, strawberry and other are in the ratio \(10 : 5 : 2 : 3\)
The manager takes a random sample of 400 customers and records their age and favourite ice cream flavour. The results are shown in the table below.
\multirow{2}{*}{}Ice cream flavour
VanillaChocolateStrawberryOtherTotal
\multirow{3}{*}{Age}Child95251325158
Teenager57201736130
Adult36501016112
Total188954077400
  1. Use the data in the table to test, at the \(5 \%\) level of significance, the manager's belief. You should state your hypotheses, test statistic, critical value and conclusion clearly. A researcher wants to investigate whether or not there is a relationship between the age of a customer and their favourite ice cream flavour. In order to test whether favourite ice cream flavour and age are related, the researcher plans to carry out a \(\chi ^ { 2 }\) test.
  2. Use the table to calculate expected frequencies for the group
    1. teenagers whose favourite ice cream flavour is vanilla,
    2. adults whose favourite ice cream flavour is chocolate.
  3. Write down the number of degrees of freedom for this \(\chi ^ { 2 }\) test.
Edexcel S3 2021 October Q2
8 marks Standard +0.3
2. Andy has some apple trees. Over many years she has graded each apple from her trees as \(A , B , C , D\) or \(E\) according to the quality of the apple, with \(A\) being the highest quality and \(E\) being the lowest quality. She knows that the proportion of apples in each grade produced by her trees is as follows.
Grade\(A\)\(B\)\(C\)\(D\)\(E\)
Proportion\(4 \%\)\(28 \%\)\(52 \%\)\(10 \%\)\(6 \%\)
Raj advises Andy to add potassium to the soil around her apple trees. Andy believes that adding potassium will not affect the distribution of grades for the quality of the apples. To test her belief Andy adds potassium to the soil around her apple trees. The following year she counts the number of apples in each grade. The number of apples in each grade is shown in the table below.
Grade\(A\)\(B\)\(C\)\(D\)\(E\)
Frequency971136213
Test Andy's belief using a \(5 \%\) level of significance. Show your working clearly, stating your hypotheses, expected frequencies and degrees of freedom. 2 continued
Edexcel FS1 AS 2019 June Q2
7 marks Standard +0.3
  1. A spinner used for a game is designed to give scores with the following probabilities
Score12346
Probability\(\frac { 3 } { 10 }\)\(\frac { 1 } { 10 }\)\(\frac { 1 } { 10 }\)\(\frac { 2 } { 5 }\)\(\frac { 1 } { 10 }\)
The spinner is spun 80 times and the results are as follows
Score12346
Frequency15412418
Test, at the \(10 \%\) level of significance, whether or not the spinner is giving scores as it is designed to do. Show your working and state your hypotheses clearly.
OCR Further Statistics 2018 September Q5
8 marks Standard +0.3
5 Hal designs a 4-edged spinner with edges labelled 1, 2, 3 and 4. He intends that the probability that the spinner will land on any edge should be proportional to the number on that edge. He spins the spinner 20 times and on each spin he records the number of the edge on which it lands. The results are shown in the table.
Edge number1234
Frequency3746
Test at the \(10 \%\) significance level whether the results are consistent with the intended probabilities.
OCR Further Statistics 2018 December Q7
12 marks Standard +0.8
7 Sasha tends to forget his passwords. He investigates whether the number of attempts he needs to log on to a system with a password can be modelled by a geometric distribution. On 60 occasions he records the number of attempts he needs to log on, and the results are shown in the table.
Number of attempts1234 or more
Frequency2019133
  1. Test at the \(1 \%\) significance level whether the results are consistent with the distribution Geo(0.4).
    [0pt]
  2. Suggest which two probabilities should be changed, and in what way, to produce an improved model. (Numerical values are not required.) You should give a reason for your suggestion. [3]