5.06b Fit prescribed distribution: chi-squared test

136 questions

Sort by: Default | Easiest first | Hardest first
Edexcel FS1 AS 2023 June Q4
12 marks Standard +0.3
  1. Table 1 below shows the number of car breakdowns in the Snoreap district in each of 60 months.
\begin{table}[h]
Number of car
breakdowns
012345
Frequency1211191431
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Anja believes that the number of car breakdowns per month in Snoreap can be modelled by a Poisson distribution. Table 2 below shows the results of some of her calculations. \begin{table}[h]
Number of car breakdowns01234\(\geqslant 5\)
Observed frequency (O)1211191431
Expected frequency ( \(\mathbf { E } _ { \mathbf { i } }\) )9.929.644.34
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. State suitable hypotheses for a test to investigate Anja's belief.
  2. Explain why Anja has changed the label of the final column to \(\geqslant 5\)
  3. Showing your working clearly, complete Table 2
  4. Find the value of \(\frac { \left( O _ { i } - E _ { i } \right) ^ { 2 } } { E _ { i } }\) when the number of car breakdowns is
    1. 1
    2. 3
  5. Explain why Anja used 3 degrees of freedom for her test. The test statistic for Anja's test is 6.54 to 2 decimal places.
  6. Stating the critical value and using a \(5 \%\) level of significance, complete Anja's test.
Edexcel FS1 AS 2024 June Q4
15 marks Standard +0.3
  1. Robin shoots 8 arrows at a target each day for 100 days.
The number of times he hits the target each day is summarised in the table below.
Number of hits012345678
Frequency1103034174202
Misha believes that these data can be modelled by a binomial distribution.
  1. State, in context, two assumptions that are implied by the use of this model.
  2. Find an estimate for the proportion of arrows Robin shoots that hit the target. Misha calculates expected frequencies, to 2 decimal places, as follows.
    Number of hits012345678
    Expected frequency2.8112.67\(r\)28.0519.73\(s\)2.500.400.03
  3. Find the value of \(r\) and the value of \(s\) Misha correctly used a suitable test to assess her belief.
    1. Explain why she used a test with 3 degrees of freedom.
    2. Complete the test using a \(5 \%\) level of significance. You should clearly state your hypotheses, test statistic, critical value and conclusion.
Edexcel FS1 AS Specimen Q4
11 marks Standard +0.3
  1. The discrete random variable \(X\) follows a Poisson distribution with mean 1.4
    1. Write down the value of
      1. \(\mathrm { P } ( \mathrm { X } = 1 )\)
      2. \(\mathrm { P } ( \mathrm { X } \leqslant 4 )\)
    The manager of a bank recorded the number of mortgages approved each week over a 40 week period.
    Number of mortgages approved0123456
    Frequency101674201
  2. Show that the mean number of mortgages approved over the 40 week period is 1.4 The bank manager believes that the Poisson distribution may be a good model for the number of mortgages approved each week. She uses a Poisson distribution with a mean of 1.4 to calculate expected frequencies as follows.
    Number of mortgages approved012345 or more
    Expected frequency9.86r9.674.511.58s
  3. Find the value of r and the value of s giving your answers to 2 decimal places. The bank manager will test, at the \(5 \%\) level of significance, whether or not the data can be modelled by a Poisson distribution.
  4. Calculate the test statistic and state the conclusion for this test. State clearly the degrees of freedom and the hypotheses used in the test. \section*{Q uestion 4 continued} \section*{Q uestion 4 continued}
Edexcel FS1 2019 June Q4
19 marks Standard +0.3
  1. Liam and Simone are studying the distribution of oak trees in some woodland. They divided the woodland into 80 equal squares and recorded the number of oak trees in each square. The results are summarised in Table 1 below.
\begin{table}[h]
Number of oak trees in a square01234567 or more
Frequency142123131170
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Liam believes that the oak trees were deliberately planted, with 6 oak trees per square and that a constant proportion \(p\) of the oak trees survived.
  1. Suggest the model Liam should use to describe the number of oak trees per square. Liam decides to test whether or not his model is suitable and calculates the expected frequencies given in Table 2. \begin{table}[h]
    Number of oak trees in a square0 or 123456
    Expected frequency5.5314.8924.2622.2410.872.21
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Showing your working clearly, complete the test using a \(5 \%\) level of significance. You should state your critical value and conclusion clearly. Simone believes that a Poisson distribution could be used to model the number of oak trees per square. She calculates the expected frequencies given in Table 3. \begin{table}[h]
    Number of oak trees in a square0 or 123456 or more
    Expected frequency12.6916.07\(s\)14.58\(t\)9.37
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  3. Find the value of \(s\) and the value of \(t\), giving your answers to 2 decimal places.
  4. Write down hypotheses to test the suitability of Simone's model. The test statistic for this test is 8.749
  5. Complete the test. Use a \(5 \%\) level of significance and state your critical value and conclusion clearly.
  6. Using the results of these tests, explain whether the origin of this woodland is likely to be cultivated or wild.
Edexcel FS1 2020 June Q5
13 marks Standard +0.3
  1. A factory produces pins.
An engineer selects 40 independent random samples of 6 pins produced at the factory and records the number of defective pins in each sample.
Number of defective pins0123456
Observed frequency191172010
  1. Show that the proportion of defective pins in the 40 samples is 0.15 The engineer suggests that the number of defective pins in a sample of 6 can be modelled using a binomial distribution. Using the information from the sample above, a test is to be carried out at the \(10 \%\) significance level, to see whether the data are consistent with the engineer's suggested model. The value of the test statistic for this test is 2.689
  2. Justifying the degrees of freedom used, carry out the test, at the \(10 \%\) significance level, to see whether the data are consistent with the engineer's suggested model. State your hypotheses clearly. The engineer later discovers that the previously recorded information was incorrect. The data should have been as follows.
    Number of defective pins0123456
    Observed frequency191163100
  3. Describe the effect this would have on the value of the test statistic that should be used for the hypothesis test.
    Give reasons for your answer.
Edexcel FS1 2021 June Q1
7 marks Moderate -0.3
  1. Kelly throws a tetrahedral die \(n\) times and records the number on which it lands for each throw.
She calculates the expected frequency for each number to be 43 if the die was unbiased.
The table below shows three of the frequencies Kelly records but the fourth one is missing.
Number1234
Frequency473436\(x\)
  1. Show that \(x = 55\) Kelly wishes to test, at the \(5 \%\) level of significance, whether or not there is evidence that the tetrahedral die is unbiased.
  2. Explain why there are 3 degrees of freedom for this test.
  3. Stating your hypotheses clearly and the critical value used, carry out the test.
Edexcel FS1 2022 June Q1
9 marks Standard +0.3
  1. A researcher is investigating the number of female cubs present in litters of size 4 He believes that the number of female cubs in a litter can be modelled by \(\mathrm { B } ( 4,0.5 )\) He randomly selects 100 litters each of size 4 and records the number of female cubs. The results are recorded in the table below.
Number of female cubs01234
Observed number of litters103333159
He calculated the expected frequencies as follows
Number of female cubs01234
Expected number of litters6.25\(r\)\(s\)\(r\)6.25
  1. Find the value of \(r\) and the value of \(s\)
  2. Carry out a suitable test, at the \(5 \%\) level of significance, to determine whether or not the number of female cubs in a litter can be modelled by \(\mathrm { B } ( 4,0.5 )\) You should clearly state your hypotheses and the critical value used.
Edexcel FS1 2023 June Q3
15 marks Standard +0.8
  1. In a class experiment, each day for 170 days, a child is chosen at random and spins a large cardboard coin 5 times and the number of heads is recorded.
    The results are summarised in the following table.
Number of heads012345
Frequency31045623812
Marcus believes that a \(\mathrm { B } ( 5,0.5 )\) distribution can be used to model these data and he calculates expected frequencies, to 2 decimal places, as follows
Number of heads012345
Expected frequency\(r\)26.56\(s\)\(s\)26.56\(r\)
  1. Find the value of \(r\) and the value of \(s\)
  2. Carry out a suitable test, at the \(5 \%\) level of significance, to determine whether or not the \(\mathrm { B } ( 5,0.5 )\) distribution is a good model for these data.
    You should state clearly your hypotheses, the test statistic and the critical value used. Nima believes that a better model for these data would be \(\mathrm { B } ( 5 , p )\)
  3. Find a suitable estimate for \(p\) To test her model, Nima uses this value of \(p\), to calculate expected frequencies as follows
    Number of heads012345
    Expected frequency2.0714.6541.4458.6341.4711.74
    The test statistic for Nima's test is 1.62 (to 3 significant figures)
  4. State,
    1. giving your reasons, the degrees of freedom
    2. the critical value
      that Nima should use for a test at the 5\% significance level.
  5. With reference to Marcus' and Nima's test results, comment on
    1. the probability of the coin landing on heads,
    2. the independence of the spins of the coin. Give reasons for your answers.
Edexcel FS1 Specimen Q3
14 marks Standard +0.8
  1. Bags of \(\pounds 1\) coins are paid into a bank. Each bag contains 20 coins.
The bank manager believes that \(5 \%\) of the \(\pounds 1\) coins paid into the bank are fakes. He decides to use the distribution \(X \sim B ( 20,0.05 )\) to model the random variable \(X\), the number of fake \(\pounds 1\) coins in each bag. The bank manager checks a random sample of 150 bags of \(\pounds 1\) coins and records the number of fake coins found in each bag. His results are summarised in Table 1. He then calculates some of the expected frequencies, correct to 1 decimal place. \begin{table}[h]
Number of fake coins in each bag01234 or more
Observed frequency436226136
Expected frequency53.856.68.9
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Carry out a hypothesis test, at the \(5 \%\) significance level, to see if the data supports the bank manager's statistical model. State your hypotheses clearly. The assistant manager thinks that a binomial distribution is a good model but suggests that the proportion of fake coins is higher than \(5 \%\). She calculates the actual proportion of fake coins in the sample and uses this value to carry out a new hypothesis test on the data. Her expected frequencies are shown in Table 2. \begin{table}[h]
    Number of fake coins in each bag01234 or more
    Observed frequency436226136
    Expected frequency44.555.733.212.54.1
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Explain why there are 2 degrees of freedom in this case.
  3. Given that she obtains a \(\chi ^ { 2 }\) test statistic of 2.67 , test the assistant manager's hypothesis that the binomial distribution is a good model for the number of fake coins in each bag. Use a \(5 \%\) level of significance and state your hypotheses clearly.
OCR Further Statistics 2018 September Q5
8 marks Standard +0.3
5 Hal designs a 4-edged spinner with edges labelled 1, 2, 3 and 4. He intends that the probability that the spinner will land on any edge should be proportional to the number on that edge. He spins the spinner 20 times and on each spin he records the number of the edge on which it lands. The results are shown in the table.
Edge number1234
Frequency3746
Test at the \(10 \%\) significance level whether the results are consistent with the intended probabilities.
CAIE FP2 2017 June Q10
12 marks Standard +0.3
Roberto owns a small hotel and offers accommodation to guests. Over a period of \(100\) nights, the numbers of rooms, \(x\), that are occupied each night at Roberto's hotel and the corresponding frequencies are shown in the following table.
Number of rooms occupied \((x)\)0123456\(\geqslant 7\)
Number of nights491826201670
  1. Show that the mean number of rooms that are occupied each night is \(3.25\). [1]
The following table shows most of the corresponding expected frequencies, correct to \(2\) decimal places, using a Poisson distribution with mean \(3.25\).
Number of rooms occupied \((x)\)0123456\(\geqslant 7\)
Observed frequency491826201670
Expected frequency3.8812.6020.4822.1818.0211.72
  1. Show how the expected value of \(22.18\), for \(x = 3\), is obtained and find the expected values for \(x = 6\) and for \(x \geqslant 7\). [4]
  2. Use a goodness-of-fit test at the \(5\%\) significance level to determine whether the Poisson distribution is a suitable model for the number of rooms occupied each night at Roberto's hotel. [7]
CAIE FP2 2009 November Q9
10 marks Standard +0.3
It has been found that 60\% of the computer chips produced in a factory are faulty. As part of quality control, 100 samples of 4 chips are selected at random, and each chip is tested. The number of faulty chips in each sample is recorded, with the results given in the following table.
Number of faulty chips01234
Number of samples212274910
The expected values for a binomial distribution with parameters \(n = 4\) and \(p = 0.6\) are given in the following table.
Number of faulty chips01234
Expected value2.5615.3634.5634.5612.96
Show how the expected value 34.56 corresponding to 2 faulty chips is obtained. [2] Carry out a goodness of fit test at the 5\% significance level, and state what can be deduced from the outcome of the test. [8]
CAIE FP2 2014 November Q8
9 marks Challenging +1.2
The numbers of a particular type of laptop computer sold by a store on each of 100 consecutive Saturdays are summarised in the following table.
Number sold01234567\(\geq 8\)
Number of Saturdays7203916142110
Fit a Poisson distribution to the data and carry out a goodness of fit test at the 2.5% significance level. [9]
CAIE FP2 2015 November Q8
10 marks Standard +0.8
The number of goals scored by a certain football team was recorded for each of 100 matches, and the results are summarised in the following table.
Number of goals0123456 or more
Frequency121631251330
Fit a Poisson distribution to the data, and test its goodness of fit at the 5% significance level. [10]
CAIE Further Paper 4 2021 June Q5
10 marks Standard +0.3
Chai packs china mugs into cardboard boxes. Chai's manager suspects that breakages occur at random times and that the number of breakages may follow a Poisson distribution. He takes a small sample of observations and finds that the number of breakages in a one-hour period has a mean of 2.4 and a standard deviation of 1.5.
  1. Explain how this information tends to support the manager's suspicion. [2]
The manager now takes a larger sample and claims that the numbers of breakages in a one-hour period follow a Poisson distribution. The numbers of breakages in a random sample of 180 one-hour periods are summarised in the following table.
Number of breakages01234567 or more
Frequency213346312316100
The mean number of breakages calculated from this sample is 2.5.
  1. Use the data from this larger sample to carry out a goodness of fit test, at the 10% significance level, to test the claim. [8]
Edexcel S3 2015 June Q3
11 marks Standard +0.3
The number of accidents on a particular stretch of motorway was recorded each day for 200 consecutive days. The results are summarised in the following table.
Number of accidents012345
Frequency4757463596
  1. Show that the mean number of accidents per day for these data is 1.6 [1]
A motorway supervisor believes that the number of accidents per day on this stretch of motorway can be modelled by a Poisson distribution. She uses the mean found in part (a) to calculate the expected frequencies for this model. Her results are given in the following table.
Number of accidents012345 or more
Frequency40.3864.61\(r\)27.5711.03\(s\)
  1. Find the value of \(r\) and the value of \(s\), giving your answers to 2 decimal places. [3]
  2. Stating your hypotheses clearly, use a 10\% level of significance to test the motorway supervisor's belief. Show your working clearly. [7]
Edexcel S3 Q6
12 marks Standard +0.3
Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters \(n = 8\) and \(p = 0.5\) is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.
Number of femalesObserved number of littersExpected number of litters
010.78
196.25
22721.88
346\(R\)
449\(S\)
535\(T\)
62621.88
756.25
820.78
  1. Find the values of \(R\), \(S\) and \(T\). [4]
  2. Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a 5\% level of significance. [7]
An alternative test might have involved estimating \(p\) rather than assuming \(p = 0.5\).
  1. Explain how this would have affected the test. [1]
Edexcel S3 2002 June Q6
12 marks Standard +0.3
Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters \(n = 8\) and \(p = 0.5\) is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.
Number of femalesObserved number of littersExpected number of litters
010.78
196.25
22721.88
346\(R\)
449\(S\)
535\(T\)
62621.88
756.25
820.78
  1. Find the values of \(R\), \(S\) and \(T\). [4]
  2. Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a 5\% level of significance. [7]
An alternative test might have involved estimating \(p\) rather than assuming \(p = 0.5\).
  1. Explain how this would have affected the test. [1]
Edexcel S3 2006 June Q8
13 marks Standard +0.3
Five coins were tossed 100 times and the number of heads recorded. The results are shown in the table below.
Number of heads012345
Frequency6182934103
  1. Suggest a suitable distribution to model the number of heads when five unbiased coins are tossed. [2]
  2. Test, at the 10\% level of significance, whether or not the five coins are unbiased. State your hypotheses clearly. [11]
Edexcel S3 2011 June Q5
13 marks Standard +0.3
The number of hurricanes per year in a particular region was recorded over 80 years. The results are summarised in Table 1 below.
No of hurricanes, \(h\)01234567
Frequency0251720121212
Table 1
  1. Write down two assumptions that will support modelling the number of hurricanes per year by a Poisson distribution. [2]
  2. Show that the mean number of hurricanes per year from Table 1 is 4.4875 [2]
  3. Use the answer in part (b) to calculate the expected frequencies \(r\) and \(s\) given in Table 2 below to 2 decimal places. [3]
\(h\)01234567 or more
Expected frequency0.904.04\(r\)13.55\(s\)13.6510.2113.39
Table 2
  1. Test, at the 5\% level of significance, whether or not the data can be modelled by a Poisson distribution. State your hypotheses clearly. [6]
Edexcel S3 2016 June Q6
Standard +0.3
An airport manager carries out a survey of families and their luggage. Each family is allowed to check in a maximum of 4 suitcases. She observes 50 families at the check-in desk and counts the total number of suitcases each family checks in. The data are summarised in the table below.
Number of suitcases01234
Frequency6251261
The manager claims that the data can be modelled by a binomial distribution with \(p = 0.3\)
  1. Test the manager's claim at the 5\% level of significance. State your hypotheses clearly. Show your working clearly and give your expected frequencies to 2 decimal places. (8) The manager also carries out a survey of the time taken by passengers to check in. She records the number of passengers that check in during each of 100 five-minute intervals. The manager makes a new claim that these data can be modelled by a Poisson distribution. She calculates the expected frequencies given in the table below.
    Number of passengers012345 or more
    Observed frequency540311860
    Expected frequency16.5329.75\(r\)\(s\)7.233.64
  2. Find the value of \(r\) and the value of \(s\) giving your answers to 2 decimal places. (3)
  3. Stating your hypotheses clearly, use a 1\% level of significance to test the manager's new claim. (6)
Edexcel S3 Q4
13 marks Standard +0.8
Breakdowns on a certain stretch of motorway were recorded each day for 80 consecutive days. The results are summarised in the table below.
Number of breakdowns012\(>2\)
Frequency3832100
It is suggested that the number of breakdowns per day can be modelled by a Poisson distribution. Using a 5% level of significance, test whether or not the Poisson distribution is a suitable model for these data. State your hypotheses clearly. [13]
OCR MEI S2 2007 January Q3
18 marks Standard +0.3
An electrical retailer gives customers extended guarantees on washing machines. Under this guarantee all repairs in the first 3 years are free. The retailer records the numbers of free repairs made to 80 machines.
Number of repairs0123\(>3\)
Frequency5320610
  1. Show that the sample mean is 0.4375. [1]
  2. The sample standard deviation \(s\) is 0.6907. Explain why this supports a suggestion that a Poisson distribution may be a suitable model for the distribution of the number of free repairs required by a randomly chosen washing machine. [2]
The random variable \(X\) denotes the number of free repairs required by a randomly chosen washing machine. For the remainder of this question you should assume that \(X\) may be modelled by a Poisson distribution with mean 0.4375.
  1. Find P\((X = 1)\). Comment on your answer in relation to the data in the table. [4]
  2. The manager decides to monitor 8 washing machines sold on one day. Find the probability that there are at least 12 free repairs in total on these 8 machines. You may assume that the 8 machines form an independent random sample. [3]
  3. A launderette with 8 washing machines has needed 12 free repairs. Why does your answer to part (iv) suggest that the Poisson model with mean 0.4375 is unlikely to be a suitable model for free repairs on the machines in the launderette? Give a reason why the model may not be appropriate for the launderette. [3]
The retailer also sells tumble driers with the same guarantee. The number of free repairs on a tumble drier in three years can be modelled by a Poisson distribution with mean 0.15. A customer buys a tumble drier and a washing machine.
  1. Assuming that free repairs are required independently, find the probability that
    1. the two appliances need a total of 3 free repairs between them,
    2. each appliance needs exactly one free repair.
    [5]
OCR MEI S2 2007 January Q4
18 marks Standard +0.3
Two educational researchers are investigating the relationship between personal ambitions and home location of students. The researchers classify students into those whose main personal ambition is good academic results and those who have some other ambition. A random sample of 480 students is selected.
  1. One researcher summarises the data as follows.
    \multirow{2}{*}{Observed}Home location
    \cline{2-3}CityNon-city
    \multirow{2}{*}{Ambition}Good results102147
    \cline{2-3}Other75156
    Carry out a test at the 5\% significance level to examine whether there is any association between home location and ambition. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic. [9]
  2. The other researcher summarises the same data in a different way as follows.
    \multirow{2}{*}{Observed}Home location
    \cline{2-4}CityTownCountry
    \multirow{2}{*}{Ambition}Good results1028364
    \cline{2-4}Other756492
    1. Calculate the expected frequencies for both 'Country' cells. [2]
    2. The test statistic for these data is 10.94. Carry out a test at the 5\% level based on this table, using the same hypotheses as in part (i). [3]
    3. The table below gives the contribution of each cell to the test statistic. Discuss briefly how personal ambitions are related to home location. [2]
      \multirow{2}{*}{
      Contribution to the
      test statistic
      }
      Home location
      \cline{2-4}CityTownCountry
      \multirow{2}{*}{Ambition}Good results1.1290.5963.540
      \cline{2-4}Other1.2170.6433.816
  3. Comment briefly on whether the analysis in part (ii) means that the conclusion in part (i) is invalid. [2]
OCR S3 2012 January Q5
10 marks Standard +0.3
A statistician suggested that the weekly sales \(X\) thousand litres at a petrol station could be modelled by the following probability density function. $$\text{f}(x) = \begin{cases} \frac{1}{40}(2x + 3) & 0 \leqslant x < 5, \\ 0 & \text{otherwise.} \end{cases}$$
  1. Show that, using this model, P\((a < X < a + 1) = \frac{a + 2}{20}\) for \(0 \leqslant a < 4\). [3]
Sales in 100 randomly chosen weeks gave the following grouped frequency table.
\(x\)\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)
Frequency1612183024
  1. Carry out a goodness of fit test at the \(10\%\) significance level of whether f\((x)\) fits the data. [7]