Chi-squared goodness of fit: Other continuous

A question is this type if and only if it tests whether data fits a specified continuous probability density function other than normal or uniform.

19 questions · Standard +0.6

Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2023 June Q3
9 marks Standard +0.8
3 A random sample of 50 values of the continuous random variable \(X\) was taken. These values are summarised in the following table.
Interval\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(2 \leqslant x < 2.5\)\(2.5 \leqslant x < 3\)\(3 \leqslant x < 3.5\)\(3.5 \leqslant x \leqslant 4\)
Observed frequency338111312
It is required to test the goodness of fit of the distribution with probability density function \(f\) given by $$f ( x ) = \begin{cases} \frac { 1 } { 24 } \left( \frac { 4 } { x ^ { 2 } } + x ^ { 2 } \right) & 1 \leqslant x \leqslant 4 \\ 0 & \text { otherwise } \end{cases}$$ The expected frequencies, correct to 4 decimal places, are given in the following table.
Interval\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(2 \leqslant x < 2.5\)\(2.5 \leqslant x < 3\)\(3 \leqslant x < 3.5\)\(3.5 \leqslant x \leqslant 4\)
Expected frequency4.4271\(a\)6.12858.4549\(b\)14.9678
  1. Show that \(a = 4.6007\) and find the value of \(b\).
  2. Carry out a goodness of fit test, at the \(10 \%\) significance level, to test whether f is a satisfactory model for the data.
CAIE Further Paper 4 2020 November Q3
7 marks Standard +0.8
3 A random sample of 200 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(2 \leqslant x < 2.5\)\(2.5 \leqslant x < 3\)
Observed frequency52340414645
It is required to test the goodness of fit of the distribution with probability density function f given by $$f ( x ) = \begin{cases} \frac { 1 } { 9 } x ( 4 - x ) & 0 \leqslant x \leqslant 3 \\ 0 & \text { otherwise } \end{cases}$$ Most of the relevant expected frequencies, correct to 2 decimal places, are given in the following table.
Interval\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(2 \leqslant x < 2.5\)\(2.5 \leqslant x < 3\)
Expected frequency\(p\)\(q\)37.9643.5243.5237.96
  1. Show that \(p = 10.19\) and find the value of \(q\).
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to test whether f is a satisfactory model for the data.
OCR S3 2012 January Q5
10 marks Standard +0.3
5 A statistician suggested that the weekly sales \(X\) thousand litres at a petrol station could be modelled by the following probability density function. $$f ( x ) = \begin{cases} \frac { 1 } { 40 } ( 2 x + 3 ) & 0 \leqslant x < 5 \\ 0 & \text { otherwise } \end{cases}$$
  1. Show that, using this model, \(\mathrm { P } ( a \leqslant X < a + 1 ) = \frac { a + 2 } { 20 }\) for \(0 \leqslant a \leqslant 4\). Sales in 100 randomly chosen weeks gave the following grouped frequency table.
    \(x\)\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)
    Frequency1612183024
  2. Carry out a goodness of fit test at the \(10 \%\) significance level of whether \(\mathrm { f } ( x )\) fits the data.
OCR S3 Specimen Q4
10 marks Standard +0.3
4 The lengths of time, in seconds, between vehicles passing a fixed observation point on a road were recorded at a time when traffic was flowing freely. The frequency distribution in Table 1 is a summary of the data from 100 observations. \begin{table}[h]
Time interval \(( x\) seconds \()\)\(0 < x \leqslant 5\)\(5 < x \leqslant 10\)\(10 < x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x\)
Observed frequency49222072
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} It is thought that the distribution of times might be modelled by the continuous random variable \(X\) with probability density function given by $$f ( x ) = \begin{cases} 0.1 e ^ { - 0.1 x } & x > 0 \\ 0 & \text { otherwise } \end{cases}$$ Using this model, the expected frequencies (correct to 2 decimal places) for the given time intervals are shown in Table 2. \begin{table}[h]
Time interval \(( x\) seconds \()\)\(0 < x \leqslant 5\)\(5 < x \leqslant 10\)\(10 < x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x\)
Expected frequency39.3523.8723.2511.701.83
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Show how the expected frequency of 23.87, corresponding to the interval \(5 < x \leqslant 10\), is obtained.
  2. Test, at the 10\% significance level, the goodness of fit of the model to the data.
OCR MEI S3 2006 June Q1
18 marks Standard +0.3
1 Design engineers are simulating the load on a particular part of a complex structure. They intend that the simulated load, measured in a convenient unit, should be given by the random variable \(X\) having probability density function $$f ( x ) = 12 x ^ { 3 } - 24 x ^ { 2 } + 12 x , \quad 0 \leqslant x \leqslant 1 .$$
  1. Find the mean and the mode of \(X\).
  2. Find the cumulative distribution function \(\mathrm { F } ( x )\) of \(X\). $$\text { Verify that } \mathrm { F } \left( \frac { 1 } { 4 } \right) = \frac { 67 } { 256 } , \mathrm {~F} \left( \frac { 1 } { 2 } \right) = \frac { 11 } { 16 } \text { and } \mathrm { F } \left( \frac { 3 } { 4 } \right) = \frac { 243 } { 256 } .$$ The engineers suspect that the process for generating simulated loads might not be working as intended. To investigate this, they generate a random sample of 512 loads. These are recorded in a frequency distribution as follows.
    Load \(x\)\(0 \leqslant x \leqslant \frac { 1 } { 4 }\)\(\frac { 1 } { 4 } < x \leqslant \frac { 1 } { 2 }\)\(\frac { 1 } { 2 } < x \leqslant \frac { 3 } { 4 }\)\(\frac { 3 } { 4 } < x \leqslant 1\)
    Frequency12620913146
  3. Use a suitable statistical procedure to assess the goodness of fit of \(X\) to these data. Discuss your conclusions briefly.
OCR MEI S3 2007 June Q4
18 marks Standard +0.3
4 A machine produces plastic strip in a continuous process. Occasionally there is a flaw at some point along the strip. The length of strip (in hundreds of metres) between successive flaws is modelled by a continuous random variable \(X\) with probability density function \(\mathrm { f } ( x ) = \frac { 18 } { ( 3 + x ) ^ { 3 } }\) for \(x > 0\). The table below gives the frequencies for 100 randomly chosen observations of \(X\). It also gives the probabilities for the class intervals using the model.
Length \(x\) (hundreds of metres)Observed frequencyProbability
\(0 < x \leqslant 0.5\)210.2653
\(0.5 < x \leqslant 1\)240.1722
\(1 < x \leqslant 2\)120.2025
\(2 < x \leqslant 3\)150.1100
\(3 < x \leqslant 5\)130.1094
\(5 < x \leqslant 10\)90.0874
\(x > 10\)60.0532
  1. Examine the fit of this model to the data at the \(5 \%\) level of significance. You are given that the median length between successive flaws is 124 metres. At a later date the following random sample of ten lengths (in metres) between flaws is obtained. $$\begin{array} { l l l l l l l l l l } 239 & 77 & 179 & 221 & 100 & 312 & 52 & 129 & 236 & 42 \end{array}$$
  2. Test at the \(10 \%\) level of significance whether the median length may still be assumed to be 124 metres.
OCR S3 2010 June Q5
10 marks Standard +0.3
5 A random variable \(X\) is believed to have (cumulative) distribution function given by $$\mathrm { F } ( x ) = \begin{cases} 0 & x < 0 , \\ 1 - \mathrm { e } ^ { - x ^ { 2 } } & x \geqslant 0 . \end{cases}$$ In order to test this, a random sample of 150 observations of \(X\) were taken, and their values are summarised in the following grouped frequency table.
Values\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(x \geqslant 2\)
Frequency415032234
The expected frequencies, correct to 1 decimal place, corresponding to the above distribution, are 33.2, 61.6 and 39.4 respectively for the first 3 cells.
  1. Find the expected frequencies for the last 2 cells.
  2. Carry out a goodness of fit test at the \(2 \frac { 1 } { 2 } \%\) significance level.
OCR S3 2016 June Q7
12 marks Standard +0.8
7 A continuous random variable \(X\) has probability density function $$f ( x ) = \begin{cases} a x ^ { 3 } & 0 \leqslant x \leqslant 1 \\ a x ^ { 2 } & 1 < x \leqslant 2 \\ 0 & \text { otherwise } \end{cases}$$ where \(a\) is a constant.
  1. Show that \(a = \frac { 12 } { 31 }\).
  2. Find \(\mathrm { E } ( X )\). It is thought that the time taken by a student to complete a task can be well modelled by \(X\). The times taken by 992 randomly chosen students are summarised in the table, together with some of the expected frequencies.
    Time\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x \leqslant 2\)
    Observed frequency892279613
    Expected frequency690
  3. Find the other expected frequencies and test, at the \(5 \%\) level of significance, whether the data can be well modelled by \(X\).
OCR MEI S3 2009 June Q4
18 marks Standard +0.3
4 A random variable \(X\) has probability density function \(\mathrm { f } ( x ) = \frac { 2 x } { \lambda ^ { 2 } }\) for \(0 < x < \lambda\), where \(\lambda\) is a positive constant.
  1. Show that, for any value of \(\lambda , \mathrm { f } ( x )\) is a valid probability density function.
  2. Find \(\mu\), the mean value of \(X\), in terms of \(\lambda\) and show that \(\mathrm { P } ( X < \mu )\) does not depend on \(\lambda\).
  3. Given that \(\mathrm { E } \left( X ^ { 2 } \right) = \frac { \lambda ^ { 2 } } { 2 }\), find \(\sigma ^ { 2 }\), the variance of \(X\), in terms of \(\lambda\). The random variable \(X\) is used to model the depth of the space left by the filling machine at the top of a jar of jam. The model gives the following probabilities for \(X\) (whatever the value of \(\lambda\) ).
    \(0 < X \leqslant \mu - \sigma\)\(\mu - \sigma < X \leqslant \mu\)\(\mu < X \leqslant \mu + \sigma\)\(\mu + \sigma < X < \lambda\)
    0.185730.258710.369830.18573
    A sample of 50 random observations of \(X\), classified in the same way, is summarised by the following frequencies.
    4112015
  4. Carry out a suitable test at the \(5 \%\) level of significance to assess the goodness of fit of \(X\) to these data. Explain briefly how your conclusion may be affected by the choice of significance level.
OCR MEI S3 2013 June Q3
19 marks Challenging +1.2
3 The random variable \(X\) has the following probability density function, \(\mathrm { f } ( x )\). $$f ( x ) = \begin{cases} k x ( x - 5 ) ^ { 2 } & 0 \leqslant x < 5 \\ 0 & \text { elsewhere } \end{cases}$$
  1. Sketch \(\mathrm { f } ( x )\).
  2. Find, in terms of \(k\), the cumulative distribution function, \(\mathrm { F } ( x )\).
  3. Hence show that \(k = \frac { 12 } { 625 }\). The random variable \(X\) is proposed as a model for the amount of time, in minutes, lost due to stoppages during a football match. The times lost in a random sample of 60 matches are summarised in the table. The table also shows some of the corresponding expected frequencies given by the model.
    Time (minutes)\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)
    Observed frequency51523116
    Expected frequency17.769.121.632
  4. Find the remaining expected frequencies.
  5. Carry out a goodness of fit test, using a significance level of \(2.5 \%\), to see if the model might be suitable in this context.
CAIE FP2 2013 June Q7
9 marks Standard +0.8
7 A random sample of 80 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Observed frequency362996
It is required to test the goodness of fit of the distribution having probability density function f given by $$f ( x ) = \begin{cases} \frac { 3 } { x ^ { 2 } } & 2 \leqslant x < 6 \\ 0 & \text { otherwise. } \end{cases}$$ Show that the expected frequency for the interval \(2 \leqslant x < 3\) is 40 and calculate the remaining expected frequencies. Carry out a goodness of fit test, at the \(10 \%\) significance level.
CAIE FP2 2014 June Q9
10 marks Standard +0.8
9 A random sample of 200 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)\(6 \leqslant x < 7\)\(7 \leqslant x < 8\)
Observed frequency634532252276
It is required to test the goodness of fit of the distribution with probability density function \(f\) given by $$f ( x ) = \begin{cases} \frac { 1 } { x \ln 8 } & 1 \leqslant x < 8 \\ 0 & \text { otherwise } \end{cases}$$ The relevant expected frequencies, correct to 2 decimal places, are given in the following table.
Interval\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)\(6 \leqslant x < 7\)\(7 \leqslant x < 8\)
Expected frequency66.67\(p\)27.67\(q\)17.5414.8312.84
Show that \(p = 39.00\), correct to 2 decimal places, and find the value of \(q\). Carry out a goodness of fit test at the 5\% significance level.
CAIE FP2 2019 June Q9
10 marks Standard +0.8
9 A random sample of 50 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(0 \leqslant x < 0.8\)\(0.8 \leqslant x < 1.6\)\(1.6 \leqslant x < 2.4\)\(2.4 \leqslant x < 3.2\)\(3.2 \leqslant x < 4\)
Observed frequency1816862
It is required to test the goodness of fit of the distribution with probability density function \(f\) given by $$f ( x ) = \begin{cases} \frac { 3 } { 16 } ( 4 - x ) ^ { \frac { 1 } { 2 } } & 0 \leqslant x < 4 \\ 0 & \text { otherwise. } \end{cases}$$ The relevant expected frequencies, correct to 2 decimal places, are given in the following table.
Interval\(0 \leqslant x < 0.8\)\(0.8 \leqslant x < 1.6\)\(1.6 \leqslant x < 2.4\)\(2.4 \leqslant x < 3.2\)\(3.2 \leqslant x < 4\)
Expected frequency14.2212.5410.598.184.47
  1. Show how the expected frequency for \(1.6 \leqslant x < 2.4\) is obtained.
  2. Carry out a goodness of fit test at the \(5 \%\) significance level.
CAIE FP2 2008 November Q9
10 marks Standard +0.3
9 A sample of 100 observations of the continuous random variable \(T\) was obtained and the values are summarised in the following table.
Interval\(1 \leqslant t < 1.5\)\(1.5 \leqslant t < 2\)\(2 \leqslant t < 2.5\)\(2.5 \leqslant t < 3\)
Frequency6417163
It is required to test the goodness of fit of the distribution with probability density function given by $$f ( t ) = \begin{cases} \frac { 9 } { 4 t ^ { 3 } } & 1 \leqslant t < 3 \\ 0 & \text { otherwise } \end{cases}$$ The relevant expected values are as follows.
Interval\(1 \leqslant t < 1.5\)\(1.5 \leqslant t < 2\)\(2 \leqslant t < 2.5\)\(2.5 \leqslant t < 3\)
Expected frequency62.521.87510.1255.5
Show how the expected value 10.125 is obtained. Carry out the test, at the \(10 \%\) significance level.
CAIE FP2 2010 November Q11 OR
Standard +0.3
The continuous random variable \(T\) has a negative exponential distribution with probability density function given by $$\mathrm { f } ( t ) = \begin{cases} \lambda \mathrm { e } ^ { - \lambda t } & t \geqslant 0 \\ 0 & \text { otherwise } \end{cases}$$ Show that for \(t \geqslant 0\) the distribution function is given by \(\mathrm { F } ( t ) = 1 - \mathrm { e } ^ { - \lambda t }\). The table below shows some values of \(\mathrm { F } ( t )\) for the case when the mean is 20 . Find the missing value.
\(t\)0510152025303540
\(\mathrm {~F} ( t )\)00.22120.39350.63210.71350.77690.82620.8647
It is thought that the lifetime of a species of insect under laboratory conditions has a negative exponential distribution with mean 20 hours. When observation starts there are 100 insects, which have been randomly selected. The lifetimes of the insects, in hours, are summarised in the table below.
Lifetime (hours)\(0 - 5\)\(5 - 10\)\(10 - 15\)\(15 - 20\)\(20 - 25\)\(25 - 30\)\(30 - 35\)\(35 - 40\)\(\geqslant 40\)
Frequency2020119985117
Calculate the expected values for each interval, assuming a negative exponential model with a mean of 20 hours, giving your values correct to 2 decimal places. Perform a \(\chi ^ { 2 }\)-test of goodness of fit, at the \(5 \%\) level of significance, in order to test whether a negative exponential distribution, with a mean of 20 hours, is a suitable model for the lifetime of this species of insect under laboratory conditions.
CAIE FP2 2011 November Q8
11 marks Standard +0.8
8 A sample of 216 observations of the continuous random variable \(X\) was obtained and the results are summarised in the following table.
Interval\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Observed frequency13153159107
It is suggested that these results are consistent with a distribution having probability density function f given by $$f ( x ) = \begin{cases} k x ^ { 2 } & 0 \leqslant x < 6 \\ 0 & \text { otherwise } \end{cases}$$ where \(k\) is a positive constant. The relevant expected frequencies are given in the following table.
Interval\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Expected frequency17\(a\)\(b\)\(c\)91
  1. Show that \(a = 19\) and find the values of \(b\) and \(c\).
  2. Carry out a goodness of fit test at the \(10 \%\) significance level.
CAIE FP2 2012 November Q10 OR
Standard +0.8
A continuous random variable \(X\) is believed to have the probability density function f given by $$f ( x ) = \begin{cases} \frac { 3 } { 10 } \left( 5 x - x ^ { 2 } - 4 \right) & 2 \leqslant x < 4 \\ 0 & \text { otherwise } \end{cases}$$ A random sample of 60 observations was taken and these values are summarised in the following grouped frequency table.
Interval\(2 \leqslant x < 2.4\)\(2.4 \leqslant x < 2.8\)\(2.8 \leqslant x < 3.2\)\(3.2 \leqslant x < 3.6\)\(3.6 \leqslant x < 4\)
Observed frequency19171680
The estimated mean, based on the grouped data in the table above, is 2.69 , correct to 2 decimal places. It is decided that a goodness of fit test will only be conducted if the mean predicted from the probability density function is within \(10 \%\) of the estimated mean. Show that this condition is satisfied. The relevant expected frequencies are as follows.
Interval\(2 \leqslant x < 2.4\)\(2.4 \leqslant x < 2.8\)\(2.8 \leqslant x < 3.2\)\(3.2 \leqslant x < 3.6\)\(3.6 \leqslant x < 4\)
Expected frequency15.45616.03214.30410.2723.936
Show how the expected frequency for the interval \(3.2 \leqslant x < 3.6\) is obtained. Carry out the goodness of fit test at the 10\% significance level.
OCR Further Statistics Specimen Q8
15 marks Standard +0.3
8 A continuous random variable \(X\) has probability density function given by $$\mathrm { f } ( x ) = \left\{ \begin{array} { c c } 0.8 \mathrm { e } ^ { - 0.8 x } & x \geq 0 \\ 0 & x < 0 \end{array} \right.$$
  1. Find the mean and variance of \(X\). The lifetime of a certain organism is thought to have the same distribution as \(X\). The lifetimes in days of a random sample of 60 specimens of the organism were found. The observed frequencies, together with the expected frequencies correct to 3 decimal places, are given in the table.
    Range\(0 \leq x < 1\)\(1 \leq x < 2\)\(2 \leq x < 3\)\(3 \leq x < 4\)\(x \geq 4\)
    Observed24221031
    Expected33.04014.8466.6712.9972.446
  2. Show how the expected frequency for \(1 \leq x < 2\) is obtained.
  3. Carry out a goodness of fit test at the \(5 \%\) significance level.
Edexcel S3 2023 June Q4
11 marks Standard +0.3
  1. It is suggested that the delay, in hours, of certain flights from a particular country may be modelled by the continuous random variable, \(T\), with probability density function
$$f ( t ) = \left\{ \begin{array} { c l } \frac { 2 } { 25 } t & 0 \leqslant t < 5 \\ 0 & \text { otherwise } \end{array} \right.$$
  1. Show that for \(0 \leqslant a \leqslant 4\) $$P ( a \leqslant T < a + 1 ) = \frac { 1 } { 25 } ( 2 a + 1 )$$ A random sample of 150 of these flights is taken. The delays are summarised in the table below.
    Delay ( \(\boldsymbol { t }\) hours)Frequency
    \(0 \leqslant t < 1\)10
    \(1 \leqslant t < 2\)13
    \(2 \leqslant t < 3\)24
    \(3 \leqslant t < 4\)35
    \(4 \leqslant t < 5\)68
  2. Test, at the \(5 \%\) significance level, whether the given probability density function is a suitable model for these delays.
    You should state your hypotheses, expected frequencies, test statistic and the critical value used.