Chi-squared goodness of fit: Binomial

A question is this type if and only if it tests whether observed frequency data fits a binomial distribution, possibly with parameter estimated from data.

41 questions · Standard +0.4

5.06b Fit prescribed distribution: chi-squared test
Sort by: Default | Easiest first | Hardest first
WJEC Further Unit 2 2022 June Q5
11 marks Standard +0.3
5. John has a game that involves throwing a set of three identical, cubical dice with faces numbered 1 to 6 . He wishes to investigate whether these dice are fair in terms of the number of sixes obtained when they are thrown. John throws the set of three dice 1100 times and records the number of sixes obtained for each throw. The results are shown in the table below.
Number of sixes0123
Frequency6253848110
Using these results, conduct a goodness of fit test and draw an appropriate conclusion.
Edexcel FS1 AS 2021 June Q1
10 marks Standard +0.3
  1. Flobee sells tomato seeds in packets, each containing 40 seeds. Flobee advertises that only 4\% of its tomato seeds do not germinate.
Amodita is investigating the germination of Flobee's tomato seeds. She plants 125 packets of Flobee's tomato seeds and records the number of seeds that do not germinate in each packet.
Number of seeds that do not germinate0123456 or more
Frequency153538221050
Amodita wants to test whether the binomial distribution \(\mathrm { B } ( 40,0.04 )\) is a suitable model for these data. The table below shows the expected frequencies, to 2 decimal places, using this model.
Number of seeds that do not germinate012345 or more
Expected Frequency24.4240.70\(r\)17.456.73\(s\)
  1. Calculate the value of \(r\) and the value of \(s\)
  2. Stating your hypotheses clearly, carry out the test at the \(5 \%\) level of significance. You should state the number of degrees of freedom, critical value and conclusion clearly. Amodita believes that Flobee should use a more realistic value for the percentage of their tomato seeds that do not germinate.
    She decides to test the data using a new model \(\mathrm { B } ( 40 , p )\)
  3. Showing your working, suggest a more realistic value for \(p\)
Edexcel FS1 AS 2022 June Q3
9 marks Standard +0.8
  1. In a game, a coin is spun 5 times and the number of heads obtained is recorded. Tao suggests playing the game 20 times and carrying out a chi-squared test to investigate whether the coin might be biased.
    1. Explain why playing the game only 20 times may cause problems when carrying out the test.
    Chris decides to play the game 500 times. The results are as follows
    Number of heads012345
    Observed frequency2279318114651
    Chris decides to test whether or not the data can be modelled by a binomial distribution, with the probability of a head on each spin being 0.6 She calculates the expected frequencies, to 2 decimal places, as follows
    Number of heads012345
    Expected frequency5.1238.40115.20172.80129.6038.88
  2. State the number of degrees of freedom in Chris' test, giving a reason for your answer.
  3. Carry out the test at the \(5 \%\) level of significance. You should state your hypotheses, test statistic, critical value and conclusion clearly.
  4. Showing your working, find an alternative model which would better fit Chris' data.
Edexcel FS1 AS 2024 June Q4
15 marks Standard +0.3
  1. Robin shoots 8 arrows at a target each day for 100 days.
The number of times he hits the target each day is summarised in the table below.
Number of hits012345678
Frequency1103034174202
Misha believes that these data can be modelled by a binomial distribution.
  1. State, in context, two assumptions that are implied by the use of this model.
  2. Find an estimate for the proportion of arrows Robin shoots that hit the target. Misha calculates expected frequencies, to 2 decimal places, as follows.
    Number of hits012345678
    Expected frequency2.8112.67\(r\)28.0519.73\(s\)2.500.400.03
  3. Find the value of \(r\) and the value of \(s\) Misha correctly used a suitable test to assess her belief.
    1. Explain why she used a test with 3 degrees of freedom.
    2. Complete the test using a \(5 \%\) level of significance. You should clearly state your hypotheses, test statistic, critical value and conclusion.
Edexcel FS1 2019 June Q4
19 marks Standard +0.3
  1. Liam and Simone are studying the distribution of oak trees in some woodland. They divided the woodland into 80 equal squares and recorded the number of oak trees in each square. The results are summarised in Table 1 below.
\begin{table}[h]
Number of oak trees in a square01234567 or more
Frequency142123131170
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Liam believes that the oak trees were deliberately planted, with 6 oak trees per square and that a constant proportion \(p\) of the oak trees survived.
  1. Suggest the model Liam should use to describe the number of oak trees per square. Liam decides to test whether or not his model is suitable and calculates the expected frequencies given in Table 2. \begin{table}[h]
    Number of oak trees in a square0 or 123456
    Expected frequency5.5314.8924.2622.2410.872.21
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Showing your working clearly, complete the test using a \(5 \%\) level of significance. You should state your critical value and conclusion clearly. Simone believes that a Poisson distribution could be used to model the number of oak trees per square. She calculates the expected frequencies given in Table 3. \begin{table}[h]
    Number of oak trees in a square0 or 123456 or more
    Expected frequency12.6916.07\(s\)14.58\(t\)9.37
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  3. Find the value of \(s\) and the value of \(t\), giving your answers to 2 decimal places.
  4. Write down hypotheses to test the suitability of Simone's model. The test statistic for this test is 8.749
  5. Complete the test. Use a \(5 \%\) level of significance and state your critical value and conclusion clearly.
  6. Using the results of these tests, explain whether the origin of this woodland is likely to be cultivated or wild.
Edexcel FS1 2020 June Q5
13 marks Standard +0.3
  1. A factory produces pins.
An engineer selects 40 independent random samples of 6 pins produced at the factory and records the number of defective pins in each sample.
Number of defective pins0123456
Observed frequency191172010
  1. Show that the proportion of defective pins in the 40 samples is 0.15 The engineer suggests that the number of defective pins in a sample of 6 can be modelled using a binomial distribution. Using the information from the sample above, a test is to be carried out at the \(10 \%\) significance level, to see whether the data are consistent with the engineer's suggested model. The value of the test statistic for this test is 2.689
  2. Justifying the degrees of freedom used, carry out the test, at the \(10 \%\) significance level, to see whether the data are consistent with the engineer's suggested model. State your hypotheses clearly. The engineer later discovers that the previously recorded information was incorrect. The data should have been as follows.
    Number of defective pins0123456
    Observed frequency191163100
  3. Describe the effect this would have on the value of the test statistic that should be used for the hypothesis test.
    Give reasons for your answer.
Edexcel FS1 2022 June Q1
9 marks Standard +0.3
  1. A researcher is investigating the number of female cubs present in litters of size 4 He believes that the number of female cubs in a litter can be modelled by \(\mathrm { B } ( 4,0.5 )\) He randomly selects 100 litters each of size 4 and records the number of female cubs. The results are recorded in the table below.
Number of female cubs01234
Observed number of litters103333159
He calculated the expected frequencies as follows
Number of female cubs01234
Expected number of litters6.25\(r\)\(s\)\(r\)6.25
  1. Find the value of \(r\) and the value of \(s\)
  2. Carry out a suitable test, at the \(5 \%\) level of significance, to determine whether or not the number of female cubs in a litter can be modelled by \(\mathrm { B } ( 4,0.5 )\) You should clearly state your hypotheses and the critical value used.
Edexcel FS1 2023 June Q3
15 marks Standard +0.8
  1. In a class experiment, each day for 170 days, a child is chosen at random and spins a large cardboard coin 5 times and the number of heads is recorded.
    The results are summarised in the following table.
Number of heads012345
Frequency31045623812
Marcus believes that a \(\mathrm { B } ( 5,0.5 )\) distribution can be used to model these data and he calculates expected frequencies, to 2 decimal places, as follows
Number of heads012345
Expected frequency\(r\)26.56\(s\)\(s\)26.56\(r\)
  1. Find the value of \(r\) and the value of \(s\)
  2. Carry out a suitable test, at the \(5 \%\) level of significance, to determine whether or not the \(\mathrm { B } ( 5,0.5 )\) distribution is a good model for these data.
    You should state clearly your hypotheses, the test statistic and the critical value used. Nima believes that a better model for these data would be \(\mathrm { B } ( 5 , p )\)
  3. Find a suitable estimate for \(p\) To test her model, Nima uses this value of \(p\), to calculate expected frequencies as follows
    Number of heads012345
    Expected frequency2.0714.6541.4458.6341.4711.74
    The test statistic for Nima's test is 1.62 (to 3 significant figures)
  4. State,
    1. giving your reasons, the degrees of freedom
    2. the critical value
      that Nima should use for a test at the 5\% significance level.
  5. With reference to Marcus' and Nima's test results, comment on
    1. the probability of the coin landing on heads,
    2. the independence of the spins of the coin. Give reasons for your answers.
Edexcel FS1 Specimen Q3
14 marks Standard +0.8
  1. Bags of \(\pounds 1\) coins are paid into a bank. Each bag contains 20 coins.
The bank manager believes that \(5 \%\) of the \(\pounds 1\) coins paid into the bank are fakes. He decides to use the distribution \(X \sim B ( 20,0.05 )\) to model the random variable \(X\), the number of fake \(\pounds 1\) coins in each bag. The bank manager checks a random sample of 150 bags of \(\pounds 1\) coins and records the number of fake coins found in each bag. His results are summarised in Table 1. He then calculates some of the expected frequencies, correct to 1 decimal place. \begin{table}[h]
Number of fake coins in each bag01234 or more
Observed frequency436226136
Expected frequency53.856.68.9
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Carry out a hypothesis test, at the \(5 \%\) significance level, to see if the data supports the bank manager's statistical model. State your hypotheses clearly. The assistant manager thinks that a binomial distribution is a good model but suggests that the proportion of fake coins is higher than \(5 \%\). She calculates the actual proportion of fake coins in the sample and uses this value to carry out a new hypothesis test on the data. Her expected frequencies are shown in Table 2. \begin{table}[h]
    Number of fake coins in each bag01234 or more
    Observed frequency436226136
    Expected frequency44.555.733.212.54.1
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Explain why there are 2 degrees of freedom in this case.
  3. Given that she obtains a \(\chi ^ { 2 }\) test statistic of 2.67 , test the assistant manager's hypothesis that the binomial distribution is a good model for the number of fake coins in each bag. Use a \(5 \%\) level of significance and state your hypotheses clearly.
CAIE FP2 2009 November Q9
10 marks Standard +0.3
It has been found that 60\% of the computer chips produced in a factory are faulty. As part of quality control, 100 samples of 4 chips are selected at random, and each chip is tested. The number of faulty chips in each sample is recorded, with the results given in the following table.
Number of faulty chips01234
Number of samples212274910
The expected values for a binomial distribution with parameters \(n = 4\) and \(p = 0.6\) are given in the following table.
Number of faulty chips01234
Expected value2.5615.3634.5634.5612.96
Show how the expected value 34.56 corresponding to 2 faulty chips is obtained. [2] Carry out a goodness of fit test at the 5\% significance level, and state what can be deduced from the outcome of the test. [8]
Edexcel S3 Q6
12 marks Standard +0.3
Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters \(n = 8\) and \(p = 0.5\) is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.
Number of femalesObserved number of littersExpected number of litters
010.78
196.25
22721.88
346\(R\)
449\(S\)
535\(T\)
62621.88
756.25
820.78
  1. Find the values of \(R\), \(S\) and \(T\). [4]
  2. Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a 5\% level of significance. [7]
An alternative test might have involved estimating \(p\) rather than assuming \(p = 0.5\).
  1. Explain how this would have affected the test. [1]
Edexcel S3 2002 June Q6
12 marks Standard +0.3
Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters \(n = 8\) and \(p = 0.5\) is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.
Number of femalesObserved number of littersExpected number of litters
010.78
196.25
22721.88
346\(R\)
449\(S\)
535\(T\)
62621.88
756.25
820.78
  1. Find the values of \(R\), \(S\) and \(T\). [4]
  2. Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a 5\% level of significance. [7]
An alternative test might have involved estimating \(p\) rather than assuming \(p = 0.5\).
  1. Explain how this would have affected the test. [1]
Edexcel S3 2006 June Q8
13 marks Standard +0.3
Five coins were tossed 100 times and the number of heads recorded. The results are shown in the table below.
Number of heads012345
Frequency6182934103
  1. Suggest a suitable distribution to model the number of heads when five unbiased coins are tossed. [2]
  2. Test, at the 10\% level of significance, whether or not the five coins are unbiased. State your hypotheses clearly. [11]
Edexcel S3 2016 June Q6
Standard +0.3
An airport manager carries out a survey of families and their luggage. Each family is allowed to check in a maximum of 4 suitcases. She observes 50 families at the check-in desk and counts the total number of suitcases each family checks in. The data are summarised in the table below.
Number of suitcases01234
Frequency6251261
The manager claims that the data can be modelled by a binomial distribution with \(p = 0.3\)
  1. Test the manager's claim at the 5\% level of significance. State your hypotheses clearly. Show your working clearly and give your expected frequencies to 2 decimal places. (8) The manager also carries out a survey of the time taken by passengers to check in. She records the number of passengers that check in during each of 100 five-minute intervals. The manager makes a new claim that these data can be modelled by a Poisson distribution. She calculates the expected frequencies given in the table below.
    Number of passengers012345 or more
    Observed frequency540311860
    Expected frequency16.5329.75\(r\)\(s\)7.233.64
  2. Find the value of \(r\) and the value of \(s\) giving your answers to 2 decimal places. (3)
  3. Stating your hypotheses clearly, use a 1\% level of significance to test the manager's new claim. (6)
WJEC Further Unit 2 2018 June Q5
12 marks Standard +0.3
A life insurance saleswoman investigates the number of policies she sells per day. The results for a random sample of 50 days are shown in the table below.
Number of policies sold0123456
Number of days229121591
She sees the same fixed number of clients each day. She would like to know whether the binomial distribution with parameters 6 and 0·6 is a suitable model for the number of policies she sells per day.
  1. State suitable hypotheses for a goodness of fit test. [1]
  2. Here is part of the table for a \(\chi^2\) goodness of fit test on the data.
    Number of policies sold0123456
    Observed229121591
    Expected0·2051·8436·912\(d\)\(e\)9·3312·333
    1. Calculate the values of \(d\) and \(e\).
    2. Carry out the test using a 10% level of significance and draw a conclusion in context. [10]
  3. What do the parameters 6 and 0·6 mean in this context? [1]
OCR Further Statistics 2021 June Q4
10 marks Standard +0.3
A biased spinner has five sides, numbered 1 to 5. Elmer spins the spinner repeatedly and counts the number of spins, \(X\), up to and including the first time that the number 2 appears. He carries out this experiment 100 times and records the frequency \(f\) with which each value of \(X\) is obtained. His results are shown in Table 1, together with the values of \(xf\).
\(x\)123456\(\geq 7\)Total
Frequency \(f\)2015913101023100
\(xf\)203027525060161400
Table 1
  1. State an appropriate distribution with which to model \(X\), determining the value(s) of any parameter(s). [3]
Elmer carries out a goodness-of-fit test, at the 5\% level, for the distribution in part (a). Table 2 gives some of his calculations, in which numbers that are not exact have been rounded to 3 decimal places.
\(x\)123456\(\geq 7\)
Observed frequency \(O\)2015913101023
Expected frequency \(E\)2518.7514.06310.5477.9105.93317.798
\((O-E)^2/E\)10.751.8230.5710.5522.7891.520
Table 2
  1. Show how the expected frequency corresponding to \(x \geq 7\) was obtained. [2]
  2. Carry out the test. [5]