Chi-squared goodness of fit: Binomial

A question is this type if and only if it tests whether observed frequency data fits a binomial distribution, possibly with parameter estimated from data.

44 questions · Standard +0.4

Sort by: Default | Easiest first | Hardest first
Edexcel S3 2014 June Q5
13 marks Standard +0.3
5. A research station is doing some work on the germination of a new variety of genetically modified wheat. They planted 120 rows containing 7 seeds in each row.
The number of seeds germinating in each row was recorded. The results are as follows
Number of seeds germinating in each row01234567
Observed number of rows2611192532169
  1. Write down two reasons why a binomial distribution may be a suitable model.
  2. Show that the probability of a randomly selected seed from this sample germinating is 0.6 The research station used a binomial distribution with probability 0.6 of a seed germinating. The expected frequencies were calculated to 2 decimal places. The results are as follows
    Number of seeds germinating in each row01234567
    Expected number of rows0.202.06\(s\)23.22\(t\)31.3515.683.36
  3. Find the value of \(s\) and the value of \(t\).
  4. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not the data can be modelled by a binomial distribution.
Edexcel S3 2016 June Q6
17 marks Standard +0.3
6. An airport manager carries out a survey of families and their luggage. Each family is allowed to check in a maximum of 4 suitcases. She observes 50 families at the check-in desk and counts the total number of suitcases each family checks in. The data are summarised in the table below.
Number of suitcases01234
Frequency6251261
The manager claims that the data can be modelled by a binomial distribution with \(p = 0.3\)
  1. Test the manager's claim at the \(5 \%\) level of significance. State your hypotheses clearly.
    Show your working clearly and give your expected frequencies to 2 decimal places.
    (8) The manager also carries out a survey of the time taken by passengers to check in. She records the number of passengers that check in during each of 100 five-minute intervals. The manager makes a new claim that these data can be modelled by a Poisson distribution. She calculates the expected frequencies given in the table below.
    Number of passengers012345 or more
    Observed frequency540311860
    Expected frequency16.5329.75\(r\)\(s\)7.233.64
  2. Find the value of \(r\) and the value of \(s\) giving your answers to 2 decimal places.
  3. Stating your hypotheses clearly, use a \(1 \%\) level of significance to test the manager's new claim.
Edexcel S3 2018 June Q6
18 marks Standard +0.3
  1. David carries out an experiment with 4 identical dice, each with faces numbered 1 to 6 . He rolls the 4 dice and counts the number of dice showing an even number on the uppermost face. He repeats this 150 times. The results are summarised in the table below.
No. of dice showing an even number01234
Frequency1245363918
David defines the random variable \(C\) as the number of dice showing an even number on the uppermost face when the four dice are thrown. David claims that \(C \sim \mathrm {~B} ( 4,0.5 )\)
  1. Stating your hypotheses clearly and using a \(1 \%\) level of significance, test David's claim. Show your working clearly. John claims that \(C \sim \mathrm {~B} ( 4 , p )\)
  2. Calculate an estimate of the value of \(p\) from the summary of the results of David's experiment. Show your working clearly. John decides to test his claim. He calculates expected frequencies using the results of David's experiment and obtains the following table.
    No. of dice showing an even number01234
    Expected frequency8.6536.00\(d\)39.00\(e\)
  3. Calculate, to 2 decimal places, the value of \(d\) and the value of \(e\)
  4. State suitable hypotheses to test John’s claim. John obtained a test statistic of 16.9 and carries out a test at the \(1 \%\) level of significance.
  5. State what conclusion John should make about his claim.
    END
Edexcel S3 Q7
16 marks Standard +0.3
7. A student collects data on whether competitors in local tennis tournaments are right, or left-handed. The table below shows the number of left-handed players who reached the last 16 for fifty tournaments.
No. of Left-handed Players01234\(\geq 5\)
No. of Tournaments412181150
The student believes that a binomial distribution with \(n = 16\) and \(p = 0.1\) could be a suitable model for these data.
  1. Stating your hypotheses clearly test the student's model at the \(5 \%\) level of significance.
    (13 marks)
    To improve the model the student decides to estimate \(p\) using the data in the table. Using this value of \(p\) to calculate expected frequencies the student had 5 classes after combining and calculated that \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 2.127\)
  2. Test at the \(5 \%\) level of significance whether or not the binomial distribution is a suitable model for the number of left-handed players who reach the last 16 in local tennis tournaments. \section*{END}
Edexcel S3 Q4
12 marks Standard +0.3
4. A paranormal investigator invites couples who believe they have a telepathic connection to participate in a trial. With each couple one person looks at a card with one of five shapes on it and the other person says which of the shapes they think it is. This is repeated six times and the number of correct answers recorded. The results from 120 couples are given below.
Number Correct0123456
Number of Couples2656288200
The investigator wishes to see if this data fits a binomial distribution with parameters \(n = 6\) and \(p = \frac { 1 } { 5 }\) and calculates to 2 decimal places the expected frequencies given below.
Number Correct0123456
Expected Frequency9.831.840.180.01
  1. Find the other expected frequencies.
  2. Stating your hypotheses clearly, test at the \(5 \%\) level of significance whether or not the distribution is an appropriate model.
  3. Comment on your findings.
OCR MEI Further Statistics A AS 2019 June Q5
13 marks Standard +0.3
5 A researcher is investigating births of females and males in a particular species of animal which very often produces litters of 7 offspring.
The table shows some data about the number of females per litter in 200 litters of 7 offspring. The researcher thinks that a binomial distribution \(\mathrm { B } ( 7 , p )\) may be an appropriate model for these data. (c) Complete the test at the \(5 \%\) significance level. Fig. 5 shows the probability distribution \(\mathrm { B } ( 7,0.35 )\) together with the relative frequencies of the observed data (the numbers of litters each divided by 200). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-5_659_1285_342_319} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure} (d) Comment on the result of the test completed in part (c) by considering Fig. 5.
OCR MEI Further Statistics Major 2020 November Q9
16 marks Standard +0.3
9 A supermarket sells trays of peaches. Each tray contains 10 peaches. Often some of the peaches in a tray are rotten. The numbers of rotten peaches in a random sample of 150 trays are shown in Table 9.1. \begin{table}[h]
Number of rotten peaches0123456\(\geqslant 7\)
Frequency393933198840
\captionsetup{labelformat=empty} \caption{Table 9.1}
\end{table} A manager at the supermarket thinks that the number of rotten peaches in a tray may be modelled by a binomial distribution.
  1. Use these data to estimate the value of the parameter \(p\) for the binomial model \(\mathrm { B } ( 10 , p )\). The manager decides to carry out a goodness of fit test to investigate further. The screenshot in Fig. 9.2 shows part of a spreadsheet to assess the goodness of fit of the distribution \(\mathrm { B } ( 10 , p )\), using the value of \(p\) estimated from the data. \begin{table}[h]
    -ABCDE
    1Number of rotten peachesObserved frequencyBinomial probabilityExpected frequencyChi-squared contribution
    2039
    31391.4229
    42330.294144.11672.8012
    53190.162924.43831.2102
    6\(\geqslant 4\)200.076911.53116.2199
    7
    \captionsetup{labelformat=empty} \caption{Fig. 9.2}
    \end{table}
  2. Calculate the missing values in each of the following cells.
    • C2
    • D2
    • E2
    • Explain why the numbers for 4, 5, 6 and at least 7 rotten peaches have been combined into the single category of at least 4 rotten peaches, as shown in the spreadsheet.
    • Carry out the test at the \(1 \%\) significance level.
    • Using the values of the contributions, comment on the results of the test.
WJEC Further Unit 2 2018 June Q5
11 marks Standard +0.3
5. A life insurance saleswoman investigates the number of policies she sells per day. The results for a random sample of 50 days are shown in the table below.
Number of
policies sold
0123456
Number of days229121591
She sees the same fixed number of clients each day. She would like to know whether the binomial distribution with parameters 6 and 0.6 is a suitable model for the number of policies she sells per day.
  1. State suitable hypotheses for a goodness of fit test.
  2. Here is part of the table for a \(\chi ^ { 2 }\) goodness of fit test on the data.
    Number of policies sold0123456
    Observed229121591
    Expected0.2051.8436.912\(d\)\(e\)9.3312.333
    1. Calculate the values of \(d\) and \(e\).
    2. Carry out the test using a 10\% level of significance and draw a conclusion in context.
  3. What do the parameters 6 and 0.6 mean in this context?
WJEC Further Unit 2 2022 June Q5
11 marks Standard +0.3
5. John has a game that involves throwing a set of three identical, cubical dice with faces numbered 1 to 6 . He wishes to investigate whether these dice are fair in terms of the number of sixes obtained when they are thrown. John throws the set of three dice 1100 times and records the number of sixes obtained for each throw. The results are shown in the table below.
Number of sixes0123
Frequency6253848110
Using these results, conduct a goodness of fit test and draw an appropriate conclusion.
WJEC Further Unit 2 2023 June Q6
20 marks Standard +0.3
6. A company has 20 boats to hire out. Payment is always taken in advance and all 20 boats are hired out each day. A manager at the company notices that \(10 \%\) of groups do not turn up to take the boats, despite having already paid to hire them. The manager wishes to investigate whether the numbers of boats that do not get taken each day can be modelled by the binomial distribution \(B ( 20,0 \cdot 1 )\). The numbers of boats that were not taken for 110 randomly selected days are given below.
Number of boats not taken01234
5 or
more
Frequency1035292583
  1. State suitable hypotheses to carry out a goodness of fit test.
  2. Here is part of the table for a \(\chi ^ { 2 }\) goodness of fit test on the data.
    Number of boats not taken012345 or more
    Observed1035292583
    Expected\(f\)29.72\(g\)20.919.88\(4 \cdot 75\)
    1. Calculate the values of \(f\) and \(g\).
    2. By completing the test, give the conclusion the manager should reach. The cost of hiring a boat is \(\pounds 15\). Since demand is high and the proportion of groups that do not turn up is also relatively high, the manager decides to take payment for 22 boats each day. She would give \(\pounds 20\) (a full refund and some compensation) to any group that has paid and turned up, but cannot take a boat out due to the overselling. Assume that the proportion of groups not turning up stays the same.
    1. Suggest a binomial model that the manager could use for the number of groups arriving expecting to hire a boat.
    2. Hence calculate the expected daily net income for the company following the manager's decision.
  3. Is the manager justified in her decision? Give a reason for your answer.
Edexcel FS1 AS 2021 June Q1
10 marks Standard +0.3
  1. Flobee sells tomato seeds in packets, each containing 40 seeds. Flobee advertises that only 4\% of its tomato seeds do not germinate.
Amodita is investigating the germination of Flobee's tomato seeds. She plants 125 packets of Flobee's tomato seeds and records the number of seeds that do not germinate in each packet.
Number of seeds that do not germinate0123456 or more
Frequency153538221050
Amodita wants to test whether the binomial distribution \(\mathrm { B } ( 40,0.04 )\) is a suitable model for these data. The table below shows the expected frequencies, to 2 decimal places, using this model.
Number of seeds that do not germinate012345 or more
Expected Frequency24.4240.70\(r\)17.456.73\(s\)
  1. Calculate the value of \(r\) and the value of \(s\)
  2. Stating your hypotheses clearly, carry out the test at the \(5 \%\) level of significance. You should state the number of degrees of freedom, critical value and conclusion clearly. Amodita believes that Flobee should use a more realistic value for the percentage of their tomato seeds that do not germinate.
    She decides to test the data using a new model \(\mathrm { B } ( 40 , p )\)
  3. Showing your working, suggest a more realistic value for \(p\)
Edexcel FS1 AS 2022 June Q3
9 marks Standard +0.8
  1. In a game, a coin is spun 5 times and the number of heads obtained is recorded. Tao suggests playing the game 20 times and carrying out a chi-squared test to investigate whether the coin might be biased.
    1. Explain why playing the game only 20 times may cause problems when carrying out the test.
    Chris decides to play the game 500 times. The results are as follows
    Number of heads012345
    Observed frequency2279318114651
    Chris decides to test whether or not the data can be modelled by a binomial distribution, with the probability of a head on each spin being 0.6 She calculates the expected frequencies, to 2 decimal places, as follows
    Number of heads012345
    Expected frequency5.1238.40115.20172.80129.6038.88
  2. State the number of degrees of freedom in Chris' test, giving a reason for your answer.
  3. Carry out the test at the \(5 \%\) level of significance. You should state your hypotheses, test statistic, critical value and conclusion clearly.
  4. Showing your working, find an alternative model which would better fit Chris’ data.
Edexcel FS1 AS 2024 June Q4
15 marks Standard +0.3
  1. Robin shoots 8 arrows at a target each day for 100 days.
The number of times he hits the target each day is summarised in the table below.
Number of hits012345678
Frequency1103034174202
Misha believes that these data can be modelled by a binomial distribution.
  1. State, in context, two assumptions that are implied by the use of this model.
  2. Find an estimate for the proportion of arrows Robin shoots that hit the target. Misha calculates expected frequencies, to 2 decimal places, as follows.
    Number of hits012345678
    Expected frequency2.8112.67\(r\)28.0519.73\(s\)2.500.400.03
  3. Find the value of \(r\) and the value of \(s\) Misha correctly used a suitable test to assess her belief.
    1. Explain why she used a test with 3 degrees of freedom.
    2. Complete the test using a \(5 \%\) level of significance. You should clearly state your hypotheses, test statistic, critical value and conclusion.
Edexcel FS1 2019 June Q4
19 marks Standard +0.3
  1. Liam and Simone are studying the distribution of oak trees in some woodland. They divided the woodland into 80 equal squares and recorded the number of oak trees in each square. The results are summarised in Table 1 below.
\begin{table}[h]
Number of oak trees in a square01234567 or more
Frequency142123131170
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Liam believes that the oak trees were deliberately planted, with 6 oak trees per square and that a constant proportion \(p\) of the oak trees survived.
  1. Suggest the model Liam should use to describe the number of oak trees per square. Liam decides to test whether or not his model is suitable and calculates the expected frequencies given in Table 2. \begin{table}[h]
    Number of oak trees in a square0 or 123456
    Expected frequency5.5314.8924.2622.2410.872.21
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Showing your working clearly, complete the test using a \(5 \%\) level of significance. You should state your critical value and conclusion clearly. Simone believes that a Poisson distribution could be used to model the number of oak trees per square. She calculates the expected frequencies given in Table 3. \begin{table}[h]
    Number of oak trees in a square0 or 123456 or more
    Expected frequency12.6916.07\(s\)14.58\(t\)9.37
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  3. Find the value of \(s\) and the value of \(t\), giving your answers to 2 decimal places.
  4. Write down hypotheses to test the suitability of Simone's model. The test statistic for this test is 8.749
  5. Complete the test. Use a \(5 \%\) level of significance and state your critical value and conclusion clearly.
  6. Using the results of these tests, explain whether the origin of this woodland is likely to be cultivated or wild.
Edexcel FS1 2020 June Q5
13 marks Standard +0.3
  1. A factory produces pins.
An engineer selects 40 independent random samples of 6 pins produced at the factory and records the number of defective pins in each sample.
Number of defective pins0123456
Observed frequency191172010
  1. Show that the proportion of defective pins in the 40 samples is 0.15 The engineer suggests that the number of defective pins in a sample of 6 can be modelled using a binomial distribution. Using the information from the sample above, a test is to be carried out at the \(10 \%\) significance level, to see whether the data are consistent with the engineer's suggested model. The value of the test statistic for this test is 2.689
  2. Justifying the degrees of freedom used, carry out the test, at the \(10 \%\) significance level, to see whether the data are consistent with the engineer's suggested model. State your hypotheses clearly. The engineer later discovers that the previously recorded information was incorrect. The data should have been as follows.
    Number of defective pins0123456
    Observed frequency191163100
  3. Describe the effect this would have on the value of the test statistic that should be used for the hypothesis test.
    Give reasons for your answer.
Edexcel FS1 2022 June Q1
9 marks Standard +0.3
  1. A researcher is investigating the number of female cubs present in litters of size 4 He believes that the number of female cubs in a litter can be modelled by \(\mathrm { B } ( 4,0.5 )\) He randomly selects 100 litters each of size 4 and records the number of female cubs. The results are recorded in the table below.
Number of female cubs01234
Observed number of litters103333159
He calculated the expected frequencies as follows
Number of female cubs01234
Expected number of litters6.25\(r\)\(s\)\(r\)6.25
  1. Find the value of \(r\) and the value of \(s\)
  2. Carry out a suitable test, at the \(5 \%\) level of significance, to determine whether or not the number of female cubs in a litter can be modelled by \(\mathrm { B } ( 4,0.5 )\) You should clearly state your hypotheses and the critical value used.
Edexcel FS1 2023 June Q3
15 marks Standard +0.8
  1. In a class experiment, each day for 170 days, a child is chosen at random and spins a large cardboard coin 5 times and the number of heads is recorded.
    The results are summarised in the following table.
Number of heads012345
Frequency31045623812
Marcus believes that a \(\mathrm { B } ( 5,0.5 )\) distribution can be used to model these data and he calculates expected frequencies, to 2 decimal places, as follows
Number of heads012345
Expected frequency\(r\)26.56\(s\)\(s\)26.56\(r\)
  1. Find the value of \(r\) and the value of \(s\)
  2. Carry out a suitable test, at the \(5 \%\) level of significance, to determine whether or not the \(\mathrm { B } ( 5,0.5 )\) distribution is a good model for these data.
    You should state clearly your hypotheses, the test statistic and the critical value used. Nima believes that a better model for these data would be \(\mathrm { B } ( 5 , p )\)
  3. Find a suitable estimate for \(p\) To test her model, Nima uses this value of \(p\), to calculate expected frequencies as follows
    Number of heads012345
    Expected frequency2.0714.6541.4458.6341.4711.74
    The test statistic for Nima’s test is 1.62 (to 3 significant figures)
  4. State,
    1. giving your reasons, the degrees of freedom
    2. the critical value
      that Nima should use for a test at the 5\% significance level.
  5. With reference to Marcus' and Nima's test results, comment on
    1. the probability of the coin landing on heads,
    2. the independence of the spins of the coin. Give reasons for your answers.
Edexcel FS1 Specimen Q3
14 marks Standard +0.8
  1. Bags of \(\pounds 1\) coins are paid into a bank. Each bag contains 20 coins.
The bank manager believes that \(5 \%\) of the \(\pounds 1\) coins paid into the bank are fakes. He decides to use the distribution \(X \sim B ( 20,0.05 )\) to model the random variable \(X\), the number of fake \(\pounds 1\) coins in each bag. The bank manager checks a random sample of 150 bags of \(\pounds 1\) coins and records the number of fake coins found in each bag. His results are summarised in Table 1. He then calculates some of the expected frequencies, correct to 1 decimal place. \begin{table}[h]
Number of fake coins in each bag01234 or more
Observed frequency436226136
Expected frequency53.856.68.9
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Carry out a hypothesis test, at the \(5 \%\) significance level, to see if the data supports the bank manager's statistical model. State your hypotheses clearly. The assistant manager thinks that a binomial distribution is a good model but suggests that the proportion of fake coins is higher than \(5 \%\). She calculates the actual proportion of fake coins in the sample and uses this value to carry out a new hypothesis test on the data. Her expected frequencies are shown in Table 2. \begin{table}[h]
    Number of fake coins in each bag01234 or more
    Observed frequency436226136
    Expected frequency44.555.733.212.54.1
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Explain why there are 2 degrees of freedom in this case.
  3. Given that she obtains a \(\chi ^ { 2 }\) test statistic of 2.67 , test the assistant manager's hypothesis that the binomial distribution is a good model for the number of fake coins in each bag. Use a \(5 \%\) level of significance and state your hypotheses clearly.
Edexcel S3 Q6
13 marks Standard +0.3
6. Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters \(n = 8\) and \(p = 0.5\) is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.
Number of femalesObserved number of littersExpected number of litters
010.78
196.25
22721.88
346\(R\)
449\(S\)
535\(T\)
62621.88
756.25
820.78
  1. Find the values of \(R , S\) and \(T\).
  2. Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a \(5 \%\) level of significance. An alternative test might have involved estimating \(p\) rather than assuming \(p = 0.5\).
  3. Explain how this would have affected the test.