Chi-squared goodness of fit: Binomial

A question is this type if and only if it tests whether observed frequency data fits a binomial distribution, possibly with parameter estimated from data.

41 questions · Standard +0.4

5.06b Fit prescribed distribution: chi-squared test
Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2020 November Q3
7 marks Standard +0.3
3 Apples are sold in bags of 5. Based on her previous experience, Freya claims that the probability of any apple weighing more than 100 grams is 0.35 , independently of other apples in the bag. The apples in a random sample of 150 bags are checked and the number, \(x\), in each bag weighing more than 100 grams is recorded. The results are shown in the following table.
\(x\)012345
Frequency12394637124
Carry out a goodness of fit test at the \(5 \%\) significance level and hence comment on Freya's claim.
CAIE Further Paper 4 2021 November Q3
8 marks Standard +0.3
3 A supermarket sells pears in packs of 8 . Some of the pears in a pack may not be ripe, and the supermarket manager claims that the number of unripe pears in a pack can be modelled by the distribution \(\mathrm { B } ( 8,0.15 )\). A random sample of 150 packs was selected and the number of unripe pears in each pack was recorded. The following table shows the observed frequencies together with some of the expected frequencies using the manager's binomial distribution.
Number of unripe pears per pack012345\(\geqslant 6\)
Observed frequency35484315630
Expected frequency40.874\(p\)35.64112.5792.7750.392\(q\)
  1. Find the values of \(p\) and \(q\).
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to test whether the manager's claim is justified.
CAIE Further Paper 4 2022 November Q2
8 marks Standard +0.8
2 An organisation runs courses to train students to become engineers. These students are taught in groups of 8 . The director of the organisation claims that on average \(60 \%\) of the students in a group achieve a pass. A random sample of 150 groups of 8 students is chosen. The following table shows the observed frequencies together with some of the expected frequencies using the appropriate binomial distribution.
Number of passes per group012345678
Observed frequency00824453626101
Expected frequency\(p\)1.1806.19318.57934.836\(q\)\(r\)13.4372.519
  1. Find the values of \(p , q\) and \(r\) giving your answers correct to 3 decimal places.
  2. Carry out a goodness of fit test, at the \(10 \%\) significance level, to test whether there is evidence to reject the director's claim.
CAIE Further Paper 4 2024 November Q3
10 marks Standard +0.3
3 Rosie sows 5 seeds in each of 150 plant pots. The number of seeds that germinate is recorded for each pot. The results are summarised in the following table.
Number of seeds that germinate012345
Number of pots12404335164
Rosie suggests that the number of seeds that germinate follows the binomial distribution \(\mathrm { B } ( 5 , p )\).
  1. Use Rosie's results to show that \(p = 0.42\).
  2. Carry out a goodness of fit test, at the \(10 \%\) significance level, to test whether the distribution \(\mathrm { B } ( 5,0.42 )\) is a good fit for the data. \includegraphics[max width=\textwidth, alt={}, center]{b9cbf607-4f40-41bb-8374-6b2c39f945ac-06_2720_38_109_2010} \includegraphics[max width=\textwidth, alt={}, center]{b9cbf607-4f40-41bb-8374-6b2c39f945ac-07_2726_35_97_20}
OCR S3 2015 June Q6
13 marks Standard +0.3
6 In each of 38 randomly selected weeks of the English Premier Football League there were 10 matches. Table 1 summarises the number of home wins in 10 matches, \(X\), and the corresponding number of weeks. \begin{table}[h]
Number of home wins012345678910
Number of weeks01288971200
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} A researcher investigates whether \(X\) can be modelled by the distribution \(\mathrm { B } ( 10 , p )\). He calculates the expected frequencies using a value of \(p\) obtained from the sample mean.
  1. Show that \(p = 0.45\). Table 2 shows the observed and expected number of weeks. \begin{table}[h]
    Number of home wins012345678910Totals
    Observed number of weeks0128897120038
    Expected number of weeks0.0960.7882.8996.3269.0588.8936.0642.8350.8700.1580.01338
    \captionsetup{labelformat=empty} \caption{Table 2
  2. Show how the value of 2.835 for 7 home wins is obtained.}
\end{table} The researcher carries out a test, at the \(5 \%\) significance level, of whether the distribution \(\mathrm { B } ( 10 , p )\) fits the data.
  • Explain why it is necessary to combine classes.
  • Carry out the test.
  • OCR S3 2009 January Q8
    14 marks Standard +0.3
    8 A soft drinks factory produces lemonade which is sold in packs of 6 bottles. As part of the factory's quality control, random samples of 75 packs are examined at regular intervals. The number of underfilled bottles in a pack of 6 bottles is denoted by the random variable \(X\). The results of one quality control check are shown in the following table.
    Number of underfilled bottles0123
    Number of packs442083
    A researcher assumes that \(X \sim \mathrm {~B} ( 3 , p )\).
    1. By finding the sample mean, show that an estimate of \(p\) is 0.2 .
    2. Show that, at the \(5 \%\) significance level, there is evidence that this binomial distribution does not fit the data.
    3. Another researcher suggests that the goodness of fit test should be for \(\mathrm { B } ( 6 , p )\). She finds that the corresponding value of \(\chi ^ { 2 }\) is 2.74 , correct to 3 significant figures. Given that the number of degrees of freedom is the same as in part (ii), state the conclusion of the test at the same significance level.
    OCR MEI S3 2010 January Q1
    17 marks Standard +0.3
    1 Coastal wildlife wardens are monitoring populations of herring gulls. Herring gulls usually lay 3 eggs per nest and the wardens wish to model the number of eggs per nest that hatch. They assume that the situation can be modelled by the binomial distribution \(\mathrm { B } ( 3 , p )\) where \(p\) is the probability that an egg hatches. A random sample of 80 nests each containing 3 eggs has been observed with the following results.
    Number of eggs hatched0123
    Number of nests7232921
    1. Initially it is assumed that the value of \(p\) is \(\frac { 1 } { 2 }\). Test at the \(5 \%\) level of significance whether it is reasonable to suppose that the model applies with \(p = \frac { 1 } { 2 }\).
    2. The model is refined by estimating \(p\) from the data. Find the mean of the observed data and hence an estimate of \(p\).
    3. Using the estimated value of \(p\), the value of the test statistic \(X ^ { 2 }\) turns out to be 2.3857 . Is it reasonable to suppose, at the \(5 \%\) level of significance, that this refined model applies?
    4. Discuss the reasons for the different outcomes of the tests in parts (i) and (iii).
    CAIE FP2 2016 June Q9
    10 marks Standard +0.3
    9 Applicants for a national teacher training course are required to pass a mathematics test. Each year, the applicants are tested in groups of 6 and the number of successful applicants in each group is recorded. The overall proportion of successful applicants has remained constant over the years and is equal to \(60 \%\) of the applicants. The results from 150 randomly chosen groups are shown in the following table.
    Number of successful applicants0123456
    Number of groups13255138302
    Test, at the \(5 \%\) significance level, the goodness of fit of the distribution \(\mathbf { B } ( 6,0.6 )\) for the number of successful applicants in a group.
    CAIE FP2 2018 June Q11 OR
    Standard +0.8
    A scientist carries out an experiment to investigate the quantity \(X\), which takes the values \(0,1,2,3,4\), 5 or 6 . He believes that the values taken by \(X\) follow a binomial distribution. He conducts 250 trials. His results are summarised in the following table.
    \(x\)0123456
    Observed frequency228372531730
    1. Show that unbiased estimates of the mean and variance for these results are 1.876 and 1.266 respectively, correct to 3 decimal places. By evaluating the mean and variance of the distribution B(6, 0.313), explain why \(X\) could have this distribution.
      The expected frequencies corresponding to the distribution \(\mathrm { B } ( 6,0.313 )\) are shown in the following table.
      \(x\)0123456
      Observed frequency228372531730
      Expected frequency26.371.981.849.717.03.10.2
    2. Show how the expected frequency for \(x = 4\) is calculated.
    3. Test at the \(5 \%\) significance level whether the scientist's belief is correct.
      If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
    CAIE FP2 2012 November Q8
    9 marks Challenging +1.2
    8 Drinking glasses are sold in packs of 4. The manufacturer conducts a survey to assess the quality of the glasses. The results from a sample of 50 randomly chosen packs are summarised in the following table.
    Number of perfect glasses01234
    Number of packs13101719
    Fit a binomial distribution to the data and carry out a goodness of fit test at the \(10 \%\) significance level.
    CAIE FP2 2013 November Q8
    10 marks Standard +0.8
    8 A factory produces china mugs. Random samples of size 6 are selected at regular intervals, and the mugs are inspected for defects. During one week, 100 samples are selected and the numbers of defective mugs found are summarised in the following table.
    Number of defective mugs0123456
    Number of samples1143358210
    Fit a binomial distribution to the data and carry out a goodness of fit test at the 5\% significance level.
    Edexcel S3 2023 January Q4
    14 marks Standard +0.3
    4 A research student is investigating the number of children who are girls in families with 4 children. The table below shows her results for 200 such families.
    Number of girls01234
    Frequency1568693810
    The research student suggests that a binomial distribution with \(p = \frac { 1 } { 2 }\) could be a suitable model for the number of children who are girls in a family of 4 children.
    1. Using her results and a \(5 \%\) significance level, test the research student's claim. You should state your hypotheses, expected frequencies, test statistic and the critical value used. The research student decides to refine the model and retains the idea of using a binomial distribution but does not specify the probability that the child is a girl.
    2. Use the data in the table to show that the probability that a child is a girl is 0.45 The research student uses the probability from part (b) to calculate a new set of expected frequencies, none of which are less than 5
      The statistic \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) is evaluated and found to be 2.47
    3. Test, at the \(5 \%\) significance level, whether using a binomial distribution is suitable to model the number of children who are girls in a family of 4 children. You should state your hypotheses and the critical value used.
    Edexcel S3 2014 June Q6
    14 marks Standard +0.3
    6. Eight tasks were given to each of 125 randomly selected job applicants. The number of tasks failed by each applicant is recorded. The results are as follows
    Number of tasks failed by an applicant0123456 or more
    Frequency22145421230
    1. Show that the probability of a randomly selected task, from this sample, being failed is 0.3 An employer believes that a binomial distribution might provide a good model for the number of tasks, out of 8, that an applicant fails. He uses a binomial distribution, with the estimated probability 0.3 of a task being failed. The calculated expected frequencies are as follows
      Number of tasks failed by an applicant0123456 or more
      Expected frequency7.2124.7137.06\(r\)17.025.83\(s\)
    2. Find the value of \(r\) and the value of \(s\) giving your answers to 2 decimal places.
    3. Test, at the \(5 \%\) level of significance, whether or not a binomial distribution is a suitable model for these data. State your hypotheses and show your working clearly. The employer believes that all applicants have the same probability of failing each task.
    4. Use your result from part(c) to comment on this belief.
    Edexcel S3 2018 June Q2
    12 marks Standard +0.3
    1. A random sample of 75 packets of seeds is selected from a production line. Each packet contains 12 seeds. The seeds are planted and the number of seeds that germinate from each packet is recorded. The results are as follows.
    Number of seeds that
    germinate from each packet
    6 or
    fewer
    789101112
    Number of packets0351828174
    1. Show that the probability of a randomly selected seed from this sample germinating is 0.82 A gardener suggests that a binomial distribution can be used to model the number of seeds that germinate from a packet of 12 seeds. She uses a binomial distribution with the estimated probability 0.82 of a seed germinating. Some of the calculated expected frequencies are shown in the table below.
      Number of seeds that
      germinate from each packet
      6 or
      fewer
      789101112
      Expected frequency\(s\)2.807.97\(r\)22.0418.266.93
    2. Calculate the value of \(r\) and the value of \(s\), giving your answers correct to 2 decimal places.
    3. Test, at the \(10 \%\) level of significance, whether or not these data suggest that the binomial distribution is a suitable model for the number of seeds that germinate from a packet of 12 seeds. State your hypotheses clearly and show your working.
    Edexcel S3 2021 June Q5
    16 marks Standard +0.3
    1. A researcher is looking into the effectiveness of a new medicine for the relief of symptoms. He collects random samples of 8 people who are taking the medicine from each of 50 different medical practices. The number of people who say that the medicine is a success, in each sample, is recorded. The results are summarised in the table below.
    Number of successes012345678
    Number of practices46312107422
    The researcher decides to model this data using a binomial distribution.
    1. State two necessary assumptions that the researcher made in order to use this model.
    2. Show that the mean number of successes per sample is 3.54 He decides to use this mean to calculate expected frequencies. The results are shown in the table below.
      Number of successes012345678
      Expected frequency0.472.968.2313.07\(f\)8.233.270.74\(g\)
    3. Calculate the value of \(f\) and the value of \(g\). Give your answers to 2 decimal places.
    4. Stating your hypotheses clearly, test at the \(10 \%\) level of significance, whether or not the binomial distribution is a suitable model for the number of successes in samples of 8 people.
    Edexcel S3 2004 June Q6
    15 marks Standard +0.3
    6 Three six-sided dice, which were assumed to be fair, were rolled 250 times. On each occasion the number \(X\) of sixes was recorded. The results were as follows.
    Number of sixes0123
    Frequency125109133
    1. Write down a suitable model for \(X\).
    2. Test, at the \(1 \%\) level of significance, the suitability of your model for these data.
    3. Explain how the test would have been modified if it had not been assumed that the dice were fair.
    Edexcel S3 2007 June Q4
    13 marks Standard +0.3
    4. A quality control manager regularly samples 20 items from a production line and records the number of defective items \(x\). The results of 100 such samples are given in table 1 below. \begin{table}[h]
    \(x\)01234567 or more
    Frequency173119149730
    \captionsetup{labelformat=empty} \caption{Table 1}
    \end{table}
    1. Estimate the proportion of defective items from the production line. The manager claimed that the number of defective items in a sample of 20 can be modelled by a binomial distribution. He used the answer in part (a) to calculate the expected frequencies given in Table 2. \begin{table}[h]
      \(x\)01234567 or more
      Expected
      frequency
      12.227.0\(r\)19.0\(s\)3.20.90.2
      \captionsetup{labelformat=empty} \caption{Table 2}
      \end{table}
    2. Find the value of \(r\) and the value of \(s\) giving your answers to 1 decimal place.
    3. Stating your hypotheses clearly, use a \(5 \%\) level of significance to test the manager's claim.
    4. Explain what the analysis in part (c) tells the manager about the occurrence of defective items from this production line.
    Edexcel S3 2008 June Q6
    13 marks Standard +0.3
    1. Ten cuttings were taken from each of 100 randomly selected garden plants. The numbers of cuttings that did not grow were recorded.
    The results are as follows
    No. of cuttings
    which did
    not grow
    012345678,9 or 10
    Frequency11213020123210
    1. Show that the probability of a randomly selected cutting, from this sample, not growing is 0.223 A gardener believes that a binomial distribution might provide a good model for the number of cuttings, out of 10 , that do not grow. He uses a binomial distribution, with the probability 0.2 of a cutting not growing. The calculated expected frequencies are as follows
      No. of cuttings which did
      not grow
      012345 or more
      Expected frequency\(r\)26.84\(s\)20.138.81\(t\)
    2. Find the values of \(r , s\) and \(t\).
    3. State clearly the hypotheses required to test whether or not this binomial distribution is a suitable model for these data. The test statistic for the test is 4.17 and the number of degrees of freedom used is 4 .
    4. Explain fully why there are 4 degrees of freedom.
    5. Stating clearly the critical value used, carry out the test using a \(5 \%\) level of significance.
    Edexcel S3 2012 June Q6
    14 marks Standard +0.3
    6. A total of 100 random samples of 6 items are selected from a production line in a factory and the number of defective items in each sample is recorded. The results are summarised in the table below.
    Number of
    defective
    items
    0123456
    Number of
    samples
    616202317108
    1. Show that the mean number of defective items per sample is 2.91 A factory manager suggests that the data can be modelled by a binomial distribution with \(n = 6\). He uses the mean from the sample above and calculates expected frequencies as shown in the table below.
      Number of
      defective
      items
      0123456
      Expected
      frequency
      1.8710.5424.82\(a\)22.018.29\(b\)
    2. Calculate the value of \(a\) and the value of \(b\) giving your answers to 2 decimal places.
    3. Test, at the \(5 \%\) level, whether or not the binomial distribution is a suitable model for the number of defective items in samples of 6 items. State your hypotheses clearly.
    Edexcel S3 2014 June Q6
    17 marks Standard +0.3
    6. Bags of \(\pounds 1\) coins are paid into a bank. Each bag contains 20 coins. The bank manager believes that \(5 \%\) of the \(\pounds 1\) coins paid into the bank are fakes. He decides to use the distribution \(X \sim \mathrm {~B} ( 20,0.05 )\) to model the random variable \(X\), the number of fake \(\pounds 1\) coins in each bag.
    1. State the assumptions necessary for the binomial distribution to be an appropriate model in this case. The bank manager checks a random sample of 150 bags of \(\pounds 1\) coins and records the number of fake coins found in each bag. His results are summarised in Table 1. \begin{table}[h]
      Number of fake coins in each bag01234 or more
      Observed frequency436226136
      Expected frequency53.856.6\(r\)8.9\(s\)
      \captionsetup{labelformat=empty} \caption{Table 1}
      \end{table}
    2. Calculate the values of \(r\) and \(s\), giving your answers to 1 decimal place.
    3. Carry out a hypothesis test, at the \(5 \%\) significance level, to see if the data supports the bank manager's statistical model. State your hypotheses clearly. Question 6 parts (d) and (e) are continued on page 24 The assistant manager thinks that a binomial distribution is a good model but suggests that the proportion of fake coins is higher than \(5 \%\). She calculates the actual proportion of fake coins in the sample and uses this value to carry out a new hypothesis test on the data. Her expected frequencies are shown in Table 2. \begin{table}[h]
      Number of fake coins in each bag01234 or more
      Observed frequency436226136
      Expected frequency44.555.733.212.54.1
      \captionsetup{labelformat=empty} \caption{Table 2}
      \end{table}
    4. Explain why there are 2 degrees of freedom in this case.
    5. Given that she obtains a \(\chi ^ { 2 }\) test statistic of 2.67 , test the assistant manager's hypothesis that the binomial distribution is a good model for the number of fake coins in each bag. Use a \(5 \%\) level of significance and state your hypotheses clearly.
    Edexcel S3 2014 June Q5
    13 marks Standard +0.3
    5. A research station is doing some work on the germination of a new variety of genetically modified wheat. They planted 120 rows containing 7 seeds in each row.
    The number of seeds germinating in each row was recorded. The results are as follows
    Number of seeds germinating in each row01234567
    Observed number of rows2611192532169
    1. Write down two reasons why a binomial distribution may be a suitable model.
    2. Show that the probability of a randomly selected seed from this sample germinating is 0.6 The research station used a binomial distribution with probability 0.6 of a seed germinating. The expected frequencies were calculated to 2 decimal places. The results are as follows
      Number of seeds germinating in each row01234567
      Expected number of rows0.202.06\(s\)23.22\(t\)31.3515.683.36
    3. Find the value of \(s\) and the value of \(t\).
    4. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not the data can be modelled by a binomial distribution.
    Edexcel S3 2018 June Q6
    18 marks Standard +0.3
    1. David carries out an experiment with 4 identical dice, each with faces numbered 1 to 6 . He rolls the 4 dice and counts the number of dice showing an even number on the uppermost face. He repeats this 150 times. The results are summarised in the table below.
    No. of dice showing an even number01234
    Frequency1245363918
    David defines the random variable \(C\) as the number of dice showing an even number on the uppermost face when the four dice are thrown. David claims that \(C \sim \mathrm {~B} ( 4,0.5 )\)
    1. Stating your hypotheses clearly and using a \(1 \%\) level of significance, test David's claim. Show your working clearly. John claims that \(C \sim \mathrm {~B} ( 4 , p )\)
    2. Calculate an estimate of the value of \(p\) from the summary of the results of David's experiment. Show your working clearly. John decides to test his claim. He calculates expected frequencies using the results of David's experiment and obtains the following table.
      No. of dice showing an even number01234
      Expected frequency8.6536.00\(d\)39.00\(e\)
    3. Calculate, to 2 decimal places, the value of \(d\) and the value of \(e\)
    4. State suitable hypotheses to test John's claim. John obtained a test statistic of 16.9 and carries out a test at the \(1 \%\) level of significance.
    5. State what conclusion John should make about his claim.
      END
    Edexcel S3 Q7
    16 marks Standard +0.3
    7. A student collects data on whether competitors in local tennis tournaments are right, or left-handed. The table below shows the number of left-handed players who reached the last 16 for fifty tournaments.
    No. of Left-handed Players01234\(\geq 5\)
    No. of Tournaments412181150
    The student believes that a binomial distribution with \(n = 16\) and \(p = 0.1\) could be a suitable model for these data.
    1. Stating your hypotheses clearly test the student's model at the \(5 \%\) level of significance.
      (13 marks)
      To improve the model the student decides to estimate \(p\) using the data in the table. Using this value of \(p\) to calculate expected frequencies the student had 5 classes after combining and calculated that \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 2.127\)
    2. Test at the \(5 \%\) level of significance whether or not the binomial distribution is a suitable model for the number of left-handed players who reach the last 16 in local tennis tournaments. \section*{END}
    Edexcel S3 Q4
    12 marks Standard +0.3
    4. A paranormal investigator invites couples who believe they have a telepathic connection to participate in a trial. With each couple one person looks at a card with one of five shapes on it and the other person says which of the shapes they think it is. This is repeated six times and the number of correct answers recorded. The results from 120 couples are given below.
    Number Correct0123456
    Number of Couples2656288200
    The investigator wishes to see if this data fits a binomial distribution with parameters \(n = 6\) and \(p = \frac { 1 } { 5 }\) and calculates to 2 decimal places the expected frequencies given below.
    Number Correct0123456
    Expected Frequency9.831.840.180.01
    1. Find the other expected frequencies.
    2. Stating your hypotheses clearly, test at the \(5 \%\) level of significance whether or not the distribution is an appropriate model.
    3. Comment on your findings.
    OCR MEI Further Statistics A AS 2019 June Q5
    13 marks Standard +0.3
    5 A researcher is investigating births of females and males in a particular species of animal which very often produces litters of 7 offspring.
    The table shows some data about the number of females per litter in 200 litters of 7 offspring. The researcher thinks that a binomial distribution \(\mathrm { B } ( 7 , p )\) may be an appropriate model for these data. (c) Complete the test at the \(5 \%\) significance level. Fig. 5 shows the probability distribution \(\mathrm { B } ( 7,0.35 )\) together with the relative frequencies of the observed data (the numbers of litters each divided by 200). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-5_659_1285_342_319} \captionsetup{labelformat=empty} \caption{Fig. 5}
    \end{figure} (d) Comment on the result of the test completed in part (c) by considering Fig. 5.