5.06b Fit prescribed distribution: chi-squared test

136 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI S3 2009 June Q4
18 marks Standard +0.3
4 A random variable \(X\) has probability density function \(\mathrm { f } ( x ) = \frac { 2 x } { \lambda ^ { 2 } }\) for \(0 < x < \lambda\), where \(\lambda\) is a positive constant.
  1. Show that, for any value of \(\lambda , \mathrm { f } ( x )\) is a valid probability density function.
  2. Find \(\mu\), the mean value of \(X\), in terms of \(\lambda\) and show that \(\mathrm { P } ( X < \mu )\) does not depend on \(\lambda\).
  3. Given that \(\mathrm { E } \left( X ^ { 2 } \right) = \frac { \lambda ^ { 2 } } { 2 }\), find \(\sigma ^ { 2 }\), the variance of \(X\), in terms of \(\lambda\). The random variable \(X\) is used to model the depth of the space left by the filling machine at the top of a jar of jam. The model gives the following probabilities for \(X\) (whatever the value of \(\lambda\) ).
    \(0 < X \leqslant \mu - \sigma\)\(\mu - \sigma < X \leqslant \mu\)\(\mu < X \leqslant \mu + \sigma\)\(\mu + \sigma < X < \lambda\)
    0.185730.258710.369830.18573
    A sample of 50 random observations of \(X\), classified in the same way, is summarised by the following frequencies.
    4112015
  4. Carry out a suitable test at the \(5 \%\) level of significance to assess the goodness of fit of \(X\) to these data. Explain briefly how your conclusion may be affected by the choice of significance level.
OCR MEI S3 2012 June Q4
18 marks Standard +0.3
4 The numbers of call-outs per day received by a fire station for a random sample of 255 weekdays were recorded as follows.
Number of call-outs012345 or more
Frequency (days)1457922630
The mean number of call-outs per day for these data is 0.6 . A Poisson model, using this sample mean of 0.6 , is fitted to the data, and gives the following expected frequencies (correct to 3 decimal places).
Number of call-outs012345 or more
Expected frequency139.94783.96825.1905.0380.7560.101
  1. Using a \(5 \%\) significance level, carry out a test to examine the goodness of fit of the model to the data. The time \(T\), measured in days, that elapses between successive call-outs can be modelled using the exponential distribution for which \(\mathrm { f } ( t )\), the probability density function, is $$\mathrm { f } ( t ) = \begin{cases} 0 & t < 0 , \\ \lambda \mathrm { e } ^ { - \lambda t } & t \geqslant 0 , \end{cases}$$ where \(\lambda\) is a positive constant.
  2. For the distribution above, it can be shown that \(\mathrm { E } ( T ) = \frac { 1 } { \lambda }\). Given that the mean time between successive call-outs is \(\frac { 5 } { 3 }\) days, write down the value of \(\lambda\).
  3. Find \(\mathrm { F } ( t )\), the cumulative distribution function.
  4. Find the probability that the time between successive call-outs is more than 1 day.
  5. Find the median time that elapses between successive call-outs.
OCR MEI S3 2013 June Q3
19 marks Challenging +1.2
3 The random variable \(X\) has the following probability density function, \(\mathrm { f } ( x )\). $$f ( x ) = \begin{cases} k x ( x - 5 ) ^ { 2 } & 0 \leqslant x < 5 \\ 0 & \text { elsewhere } \end{cases}$$
  1. Sketch \(\mathrm { f } ( x )\).
  2. Find, in terms of \(k\), the cumulative distribution function, \(\mathrm { F } ( x )\).
  3. Hence show that \(k = \frac { 12 } { 625 }\). The random variable \(X\) is proposed as a model for the amount of time, in minutes, lost due to stoppages during a football match. The times lost in a random sample of 60 matches are summarised in the table. The table also shows some of the corresponding expected frequencies given by the model.
    Time (minutes)\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)
    Observed frequency51523116
    Expected frequency17.769.121.632
  4. Find the remaining expected frequencies.
  5. Carry out a goodness of fit test, using a significance level of \(2.5 \%\), to see if the model might be suitable in this context.
CAIE FP2 2010 June Q7
8 marks Challenging +1.2
7 Benford's Law states that, in many tables containing large numbers of numerical values, the probability distribution of the leading non-zero digit \(D\) is given by $$\mathrm { P } ( D = d ) = \log _ { 10 } \left( \frac { d + 1 } { d } \right) , \quad d = 1,2 , \ldots , 9 .$$ The following table shows a summary of a random sample of 100 non-zero leading digits taken from a table of cumulative probabilities for the Poisson distribution.
Leading digit12345\(\geqslant 6\)
Frequency222113111122
Carry out a suitable goodness of fit test at the 10\% significance level.
CAIE FP2 2011 June Q10 OR
Standard +0.3
A family was asked to record the number of letters delivered to their house on each of 200 randomly chosen weekdays. The results are summarised in the following table.
Number of letters012345\(\geqslant 6\)
Number of days57605325410
It is suggested that the number of letters delivered each weekday has a Poisson distribution. By finding the mean and variance for this sample, comment on the appropriateness of this suggestion. The following table includes some of the expected values, correct to 3 decimal places, using a Poisson distribution with mean equal to the sample mean for the above data.
Number of letters012345\(\geqslant 6\)
Expected number of days53.96470.693\(p\)\(q\)6.6221.7350.463
  1. Show that \(p = 46.304\), correct to 3 decimal places, and find \(q\).
  2. Carry out a goodness of fit test at the \(10 \%\) significance level.
CAIE FP2 2013 June Q7
9 marks Standard +0.8
7 A random sample of 80 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Observed frequency362996
It is required to test the goodness of fit of the distribution having probability density function f given by $$f ( x ) = \begin{cases} \frac { 3 } { x ^ { 2 } } & 2 \leqslant x < 6 \\ 0 & \text { otherwise. } \end{cases}$$ Show that the expected frequency for the interval \(2 \leqslant x < 3\) is 40 and calculate the remaining expected frequencies. Carry out a goodness of fit test, at the \(10 \%\) significance level.
CAIE FP2 2014 June Q9
10 marks Standard +0.8
9 A random sample of 200 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)\(6 \leqslant x < 7\)\(7 \leqslant x < 8\)
Observed frequency634532252276
It is required to test the goodness of fit of the distribution with probability density function \(f\) given by $$f ( x ) = \begin{cases} \frac { 1 } { x \ln 8 } & 1 \leqslant x < 8 \\ 0 & \text { otherwise } \end{cases}$$ The relevant expected frequencies, correct to 2 decimal places, are given in the following table.
Interval\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)\(6 \leqslant x < 7\)\(7 \leqslant x < 8\)
Expected frequency66.67\(p\)27.67\(q\)17.5414.8312.84
Show that \(p = 39.00\), correct to 2 decimal places, and find the value of \(q\). Carry out a goodness of fit test at the 5\% significance level.
CAIE FP2 2015 June Q11 OR
Challenging +1.2
Each of 200 identically biased dice is thrown repeatedly until an even number is obtained. The number of throws, \(x\), needed is recorded and the results are summarised in the following table.
\(x\)123456\(\geqslant 7\)
Frequency12643223510
State a type of distribution that could be used to fit the data given in the table above. Fit a distribution of this type in which the probability of throwing an even number for each die is 0.6 and carry out a goodness of fit test at the 5\% significance level. For each of these dice, it is known that the probability of obtaining a 6 when it is thrown is 0.25 . Ten of these dice are each thrown 5 times. Find the probability that at least one 6 is obtained on exactly 4 of the 10 dice.
CAIE FP2 2016 June Q9
10 marks Standard +0.3
9 Applicants for a national teacher training course are required to pass a mathematics test. Each year, the applicants are tested in groups of 6 and the number of successful applicants in each group is recorded. The overall proportion of successful applicants has remained constant over the years and is equal to \(60 \%\) of the applicants. The results from 150 randomly chosen groups are shown in the following table.
Number of successful applicants0123456
Number of groups13255138302
Test, at the \(5 \%\) significance level, the goodness of fit of the distribution \(\mathbf { B } ( 6,0.6 )\) for the number of successful applicants in a group.
CAIE FP2 2018 June Q11 OR
Standard +0.8
A scientist carries out an experiment to investigate the quantity \(X\), which takes the values \(0,1,2,3,4\), 5 or 6 . He believes that the values taken by \(X\) follow a binomial distribution. He conducts 250 trials. His results are summarised in the following table.
\(x\)0123456
Observed frequency228372531730
  1. Show that unbiased estimates of the mean and variance for these results are 1.876 and 1.266 respectively, correct to 3 decimal places. By evaluating the mean and variance of the distribution B(6, 0.313), explain why \(X\) could have this distribution.
    The expected frequencies corresponding to the distribution \(\mathrm { B } ( 6,0.313 )\) are shown in the following table.
    \(x\)0123456
    Observed frequency228372531730
    Expected frequency26.371.981.849.717.03.10.2
  2. Show how the expected frequency for \(x = 4\) is calculated.
  3. Test at the \(5 \%\) significance level whether the scientist's belief is correct.
    If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
CAIE FP2 2019 June Q9
10 marks Standard +0.8
9 A random sample of 50 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(0 \leqslant x < 0.8\)\(0.8 \leqslant x < 1.6\)\(1.6 \leqslant x < 2.4\)\(2.4 \leqslant x < 3.2\)\(3.2 \leqslant x < 4\)
Observed frequency1816862
It is required to test the goodness of fit of the distribution with probability density function \(f\) given by $$f ( x ) = \begin{cases} \frac { 3 } { 16 } ( 4 - x ) ^ { \frac { 1 } { 2 } } & 0 \leqslant x < 4 \\ 0 & \text { otherwise. } \end{cases}$$ The relevant expected frequencies, correct to 2 decimal places, are given in the following table.
Interval\(0 \leqslant x < 0.8\)\(0.8 \leqslant x < 1.6\)\(1.6 \leqslant x < 2.4\)\(2.4 \leqslant x < 3.2\)\(3.2 \leqslant x < 4\)
Expected frequency14.2212.5410.598.184.47
  1. Show how the expected frequency for \(1.6 \leqslant x < 2.4\) is obtained.
  2. Carry out a goodness of fit test at the \(5 \%\) significance level.
CAIE FP2 2008 November Q9
10 marks Standard +0.3
9 A sample of 100 observations of the continuous random variable \(T\) was obtained and the values are summarised in the following table.
Interval\(1 \leqslant t < 1.5\)\(1.5 \leqslant t < 2\)\(2 \leqslant t < 2.5\)\(2.5 \leqslant t < 3\)
Frequency6417163
It is required to test the goodness of fit of the distribution with probability density function given by $$f ( t ) = \begin{cases} \frac { 9 } { 4 t ^ { 3 } } & 1 \leqslant t < 3 \\ 0 & \text { otherwise } \end{cases}$$ The relevant expected values are as follows.
Interval\(1 \leqslant t < 1.5\)\(1.5 \leqslant t < 2\)\(2 \leqslant t < 2.5\)\(2.5 \leqslant t < 3\)
Expected frequency62.521.87510.1255.5
Show how the expected value 10.125 is obtained. Carry out the test, at the \(10 \%\) significance level.
CAIE FP2 2011 November Q8
11 marks Standard +0.8
8 A sample of 216 observations of the continuous random variable \(X\) was obtained and the results are summarised in the following table.
Interval\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Observed frequency13153159107
It is suggested that these results are consistent with a distribution having probability density function f given by $$f ( x ) = \begin{cases} k x ^ { 2 } & 0 \leqslant x < 6 \\ 0 & \text { otherwise } \end{cases}$$ where \(k\) is a positive constant. The relevant expected frequencies are given in the following table.
Interval\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Expected frequency17\(a\)\(b\)\(c\)91
  1. Show that \(a = 19\) and find the values of \(b\) and \(c\).
  2. Carry out a goodness of fit test at the \(10 \%\) significance level.
CAIE FP2 2012 November Q10 OR
Standard +0.8
A continuous random variable \(X\) is believed to have the probability density function f given by $$f ( x ) = \begin{cases} \frac { 3 } { 10 } \left( 5 x - x ^ { 2 } - 4 \right) & 2 \leqslant x < 4 \\ 0 & \text { otherwise } \end{cases}$$ A random sample of 60 observations was taken and these values are summarised in the following grouped frequency table.
Interval\(2 \leqslant x < 2.4\)\(2.4 \leqslant x < 2.8\)\(2.8 \leqslant x < 3.2\)\(3.2 \leqslant x < 3.6\)\(3.6 \leqslant x < 4\)
Observed frequency19171680
The estimated mean, based on the grouped data in the table above, is 2.69 , correct to 2 decimal places. It is decided that a goodness of fit test will only be conducted if the mean predicted from the probability density function is within \(10 \%\) of the estimated mean. Show that this condition is satisfied. The relevant expected frequencies are as follows.
Interval\(2 \leqslant x < 2.4\)\(2.4 \leqslant x < 2.8\)\(2.8 \leqslant x < 3.2\)\(3.2 \leqslant x < 3.6\)\(3.6 \leqslant x < 4\)
Expected frequency15.45616.03214.30410.2723.936
Show how the expected frequency for the interval \(3.2 \leqslant x < 3.6\) is obtained. Carry out the goodness of fit test at the 10\% significance level.
CAIE FP2 2012 November Q8
9 marks Challenging +1.2
8 Drinking glasses are sold in packs of 4. The manufacturer conducts a survey to assess the quality of the glasses. The results from a sample of 50 randomly chosen packs are summarised in the following table.
Number of perfect glasses01234
Number of packs13101719
Fit a binomial distribution to the data and carry out a goodness of fit test at the \(10 \%\) significance level.
CAIE FP2 2013 November Q8
10 marks Standard +0.8
8 A factory produces china mugs. Random samples of size 6 are selected at regular intervals, and the mugs are inspected for defects. During one week, 100 samples are selected and the numbers of defective mugs found are summarised in the following table.
Number of defective mugs0123456
Number of samples1143358210
Fit a binomial distribution to the data and carry out a goodness of fit test at the 5\% significance level.
CAIE FP2 2014 November Q8
9 marks Standard +0.8
8 The numbers of a particular type of laptop computer sold by a store on each of 100 consecutive Saturdays are summarised in the following table.
Number sold01234567\(\geqslant 8\)
Number of Saturdays7203916142110
Fit a Poisson distribution to the data and carry out a goodness of fit test at the \(2.5 \%\) significance level.
CAIE FP2 2015 November Q8
10 marks Standard +0.8
8 The number of goals scored by a certain football team was recorded for each of 100 matches, and the results are summarised in the following table.
Number of goals0123456 or more
Frequency121631251330
Fit a Poisson distribution to the data, and test its goodness of fit at the 5\% significance level.
CAIE FP2 2016 November Q9
13 marks Standard +0.3
9 The number of visitors arriving at an art exhibition is recorded for each 10 -minute period of time during the ten hours that it is open on a particular day. The results are as follows.
Number of visitors in a 10 -minute period012345678\(\geqslant 9\)
Number of 10 -minute periods2212811134710
  1. Calculate the mean and variance for this sample and explain whether your answers support a suggestion that a Poisson distribution might be a suitable model for the number of visitors in a 10-minute period.
  2. Use an appropriate Poisson distribution to find the two expected frequencies missing from the following table.
    Number of visitors in
    a 10-minute period
    012345678\(\geqslant 9\)
    Expected number of
    10 -minute periods
    1.108.7911.729.386.253.571.791.28
  3. Test, at the \(10 \%\) significance level, the goodness of fit of this Poisson distribution to the data.
CAIE FP2 2017 Specimen Q8
10 marks Challenging +1.2
8 The number of goals scored by a certain football team was recorded for each of 100 matches, and the results are summarised in the following table.
Number of goals0123456 or more
Frequency121631251330
Fit a Poisson distribution to the data, and test its goodness of fit at the 5\% significance level.
OCR MEI S3 2008 January Q4
18 marks Standard +0.3
4
  1. In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution \(\mathrm { B } ( 5 , p )\) should model these data.
    Number of girlsNumber of families
    032
    1110
    2154
    3125
    463
    516
    1. Use this information to calculate an estimate for the mean number of girls per family of 5 children. Hence show that 0.45 can be taken as an estimate of \(p\).
    2. Investigate at a \(5 \%\) significance level whether the binomial model with \(p\) estimated as 0.45 fits the data. Comment on your findings and also on the extent to which the conditions for a binomial model are likely to be met.
  2. A researcher wishes to select 50 families from the 500 in part (a) for further study. Suggest what sort of sample she might choose and describe how she should go about choosing it.
OCR MEI Paper 2 2021 November Q13
7 marks Moderate -0.3
13 At a certain factory Christmas tree decorations are packed in boxes of 10 . The quality control manager collects a random sample of 100 boxes of decorations and records the number of decorations in each box which are damaged. His results are displayed in Fig. 13.1. \begin{table}[h]
Number of damaged decorations012345 or more
Number of boxes1935281350
\captionsetup{labelformat=empty} \caption{Fig. 13.1}
\end{table}
  1. Calculate
    It is believed that the number of damaged decorations in a box of 10, \(X\), may be modelled by a binomial distribution such that \(\mathrm { X } \sim \mathrm { B } ( \mathrm { n } , \mathrm { p } )\).
  2. State suitable values for \(n\) and \(p\).
  3. Use the binomial model to complete the copy of Fig. 13.2 in the Printed Answer Booklet, giving your answers correct to \(\mathbf { 1 }\) decimal place. \begin{table}[h]
    Number of damaged decorations012345 or more
    Observed number of boxes1935281350
    Expected number of boxes
    \captionsetup{labelformat=empty} \caption{Fig. 13.2}
    \end{table}
  4. Explain whether the model is a good fit for these data.
OCR Further Statistics AS 2018 June Q8
9 marks Challenging +1.2
8 The table shows the results of a random sample drawn from a population which is thought to have the distribution \(\mathrm { U } ( 20 )\).
Range\(1 \leqslant x \leqslant 8\)\(9 \leqslant x \leqslant 12\)\(13 \leqslant x \leqslant 20\)
Observed frequency12\(y\)\(28 - y\)
Find the range of values of \(y\) for which the data are not consistent with the distribution at the \(5 \%\) significance level. \section*{END OF QUESTION PAPER}
OCR Further Statistics AS 2021 November Q6
9 marks Moderate -0.3
6 A student believes that if you ask people to choose an integer between 1 and 10, not all integers are equally likely to be chosen. The student asks a random sample of 100 people to choose an integer between 1 and 10 inclusive. The observed frequencies \(O\), together with the values of \(\frac { ( O - E ) ^ { 2 } } { E }\) where \(E\) is the corresponding expected frequency, are shown in the table.
Integer12345678910
O7820876197810
\(\frac { ( \mathrm { O } - \mathrm { E } ) ^ { 2 } } { \mathrm { E } }\)0.90.410.00.40.91.68.10.90.40
  1. Show how the value of 8.1 for integer 7 is obtained.
  2. Show that there is evidence at the \(1 \%\) significance level that the student's belief is correct. The student wishes to suggest an alternative model for the probabilities associated with each integer. In this model, two of the integers have the same probability \(p _ { 1 }\) of being chosen and the other eight integers each have probability \(p _ { 2 }\) of being chosen.
  3. Suggest which two integers should have probability \(p _ { 1 }\) and suggest a possible value of \(p _ { 1 }\).
OCR Further Statistics 2022 June Q9
10 marks Challenging +1.2
9 The head teacher of a school believes that, on average, pupil absences on the days Monday, Tuesday, Wednesday, Thursday and Friday are in the ratio \(3 : 2 : 2 : 2 : 3\). The head teacher takes a random sample of 120 pupil absences. The results are as follows.
Day of weekMondayTuesdayWednesdayThursdayFriday
Number of absences2816241636
  1. Test at the \(5 \%\) significance level whether these results are consistent with the head teacher's belief. A significance test at the \(5 \%\) level is also carried out on a second, independent, random sample of \(n\) pupil absences. All the numbers of absences are integers. The ratio of the numbers of absences for each day in this sample is identical to the ratio of the numbers of absences for each day in the original sample of size 120.
  2. Determine the smallest value of \(n\) for which the conclusion of this significance test is that the data are not consistent with the head teacher's belief.