Chi-squared goodness of fit

157 questions · 17 question types identified

Sort by: Question count | Difficulty
Chi-squared goodness of fit: Binomial

A question is this type if and only if it tests whether observed frequency data fits a binomial distribution, possibly with parameter estimated from data.

41 Standard +0.4
26.1% of questions
Show example »
8 Drinking glasses are sold in packs of 4. The manufacturer conducts a survey to assess the quality of the glasses. The results from a sample of 50 randomly chosen packs are summarised in the following table.
Number of perfect glasses01234
Number of packs13101719
Fit a binomial distribution to the data and carry out a goodness of fit test at the \(10 \%\) significance level.
View full question →
Easiest question Standard +0.3 »
3 Apples are sold in bags of 5. Based on her previous experience, Freya claims that the probability of any apple weighing more than 100 grams is 0.35 , independently of other apples in the bag. The apples in a random sample of 150 bags are checked and the number, \(x\), in each bag weighing more than 100 grams is recorded. The results are shown in the following table.
\(x\)012345
Frequency12394637124
Carry out a goodness of fit test at the \(5 \%\) significance level and hence comment on Freya's claim.
View full question →
Hardest question Challenging +1.2 »
8 Drinking glasses are sold in packs of 4. The manufacturer conducts a survey to assess the quality of the glasses. The results from a sample of 50 randomly chosen packs are summarised in the following table.
Number of perfect glasses01234
Number of packs13101719
Fit a binomial distribution to the data and carry out a goodness of fit test at the \(10 \%\) significance level.
View full question →
Chi-squared goodness of fit: Poisson

A question is this type if and only if it tests whether observed frequency data fits a Poisson distribution, possibly with parameter estimated from data.

21 Standard +0.4
13.4% of questions
Show example »
8 The number of goals scored by a certain football team was recorded for each of 100 matches, and the results are summarised in the following table.
Number of goals0123456 or more
Frequency121631251330
Fit a Poisson distribution to the data, and test its goodness of fit at the 5\% significance level.
View full question →
Easiest question Moderate -0.8 »
16
256 2 The random variable \(T\) has an exponential distribution with mean 2 Find \(\mathrm { P } ( T \leq 1.4 )\) Circle your answer. \(\mathrm { e } ^ { - 2.8 }\) \(\mathrm { e } ^ { - 0.7 }\) \(1 - e ^ { - 0.7 }\) \(1 - \mathrm { e } ^ { - 2.8 }\) The continuous random variable \(Y\) has cumulative distribution function $$\mathrm { F } ( y ) = \left\{ \begin{array} { l r } 0 & y < 2 \\ - \frac { 1 } { 9 } y ^ { 2 } + \frac { 10 } { 9 } y - \frac { 16 } { 9 } & 2 \leq y < 5 \\ 1 & y \geq 5 \end{array} \right.$$ Find the median of \(Y\) Circle your answer. 2 \(\frac { 10 - 3 \sqrt { 2 } } { 2 }\) \(\frac { 7 } { 2 }\) \(\frac { 10 + 3 \sqrt { 2 } } { 2 }\) Turn over for the next question 4 Research has shown that the mean number of volcanic eruptions on Earth each day is 20 Sandra records 162 volcanic eruptions during a period of one week. Sandra claims that there has been an increase in the mean number of volcanic eruptions per week. Test Sandra's claim at the \(5 \%\) level of significance.
5 The continuous random variable \(X\) has probability density function $$f ( x ) = \begin{cases} \frac { 1 } { 6 } e ^ { \frac { x } { 3 } } & 0 \leq x \leq \ln 27 \\ 0 & \text { otherwise } \end{cases}$$ Show that the mean of \(X\) is \(\frac { 3 } { 2 } ( \ln 27 - 2 )\) 6 Over time it has been accepted that the mean retirement age for professional baseball players is 29.5 years old. Imran claims that the mean retirement age is no longer 29.5 years old.
He takes a random sample of 5 recently retired professional baseball players and records their retirement ages, \(x\). The results are $$\sum x = 152.1 \quad \text { and } \quad \sum ( x - \bar { x } ) ^ { 2 } = 7.81$$ 6
  1. State an assumption that you should make about the distribution of the retirement ages to investigate Imran's claim. 6
  2. Investigate Imran's claim, using the 10\% level of significance.
View full question →
Hardest question Challenging +1.2 »
8 The number of goals scored by a certain football team was recorded for each of 100 matches, and the results are summarised in the following table.
Number of goals0123456 or more
Frequency121631251330
Fit a Poisson distribution to the data, and test its goodness of fit at the 5\% significance level.
View full question →
Chi-squared goodness of fit: Other continuous

A question is this type if and only if it tests whether data fits a specified continuous probability density function other than normal or uniform.

21 Standard +0.6
13.4% of questions
Show example »
7 A random sample of 80 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)
Observed frequency362996
It is required to test the goodness of fit of the distribution having probability density function f given by $$f ( x ) = \begin{cases} \frac { 3 } { x ^ { 2 } } & 2 \leqslant x < 6 \\ 0 & \text { otherwise. } \end{cases}$$ Show that the expected frequency for the interval \(2 \leqslant x < 3\) is 40 and calculate the remaining expected frequencies. Carry out a goodness of fit test, at the \(10 \%\) significance level.
View full question →
Easiest question Standard +0.3 »
4 The lengths of time, in seconds, between vehicles passing a fixed observation point on a road were recorded at a time when traffic was flowing freely. The frequency distribution in Table 1 is a summary of the data from 100 observations. \begin{table}[h]
Time interval \(( x\) seconds \()\)\(0 < x \leqslant 5\)\(5 < x \leqslant 10\)\(10 < x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x\)
Observed frequency49222072
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} It is thought that the distribution of times might be modelled by the continuous random variable \(X\) with probability density function given by $$f ( x ) = \begin{cases} 0.1 e ^ { - 0.1 x } & x > 0 \\ 0 & \text { otherwise } \end{cases}$$ Using this model, the expected frequencies (correct to 2 decimal places) for the given time intervals are shown in Table 2. \begin{table}[h]
Time interval \(( x\) seconds \()\)\(0 < x \leqslant 5\)\(5 < x \leqslant 10\)\(10 < x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x\)
Expected frequency39.3523.8723.2511.701.83
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Show how the expected frequency of 23.87, corresponding to the interval \(5 < x \leqslant 10\), is obtained.
  2. Test, at the 10\% significance level, the goodness of fit of the model to the data.
View full question →
Hardest question Challenging +1.2 »
3 The random variable \(X\) has the following probability density function, \(\mathrm { f } ( x )\). $$f ( x ) = \begin{cases} k x ( x - 5 ) ^ { 2 } & 0 \leqslant x < 5 \\ 0 & \text { elsewhere } \end{cases}$$
  1. Sketch \(\mathrm { f } ( x )\).
  2. Find, in terms of \(k\), the cumulative distribution function, \(\mathrm { F } ( x )\).
  3. Hence show that \(k = \frac { 12 } { 625 }\). The random variable \(X\) is proposed as a model for the amount of time, in minutes, lost due to stoppages during a football match. The times lost in a random sample of 60 matches are summarised in the table. The table also shows some of the corresponding expected frequencies given by the model.
    Time (minutes)\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)
    Observed frequency51523116
    Expected frequency17.769.121.632
  4. Find the remaining expected frequencies.
  5. Carry out a goodness of fit test, using a significance level of \(2.5 \%\), to see if the model might be suitable in this context.
View full question →
Chi-squared goodness of fit: Uniform

A question is this type if and only if it tests whether data fits a uniform (discrete or continuous) distribution, including equal proportions.

14 Standard +0.2
8.9% of questions
Show example »
2 The discrete random variable \(Y\) is uniformly distributed over the values \(\{ 12,13 , \ldots , 20 \}\).
  1. Write down \(\mathrm { P } ( Y < 15 )\).
  2. Two independent observations of \(Y\) are taken. Find the probability that one of these values is less than 15 and the other is greater than 15 .
  3. Find \(\mathrm { P } ( Y > \mathrm { E } ( Y ) )\).
View full question →
Easiest question Moderate -0.8 »
2 The discrete random variable \(Y\) is uniformly distributed over the values \(\{ 12,13 , \ldots , 20 \}\).
  1. Write down \(\mathrm { P } ( Y < 15 )\).
  2. Two independent observations of \(Y\) are taken. Find the probability that one of these values is less than 15 and the other is greater than 15 .
  3. Find \(\mathrm { P } ( Y > \mathrm { E } ( Y ) )\).
View full question →
Hardest question Standard +0.8 »
6. A total of 228 items are collected from an archaeological site. The distance from the centre of the site is recorded for each item. The results are summarised in the table below.
Distance from the
centre of the site (m)
\(0 - 1\)\(1 - 2\)\(2 - 4\)\(4 - 6\)\(6 - 9\)\(9 - 12\)
Number of items221544375258
Test, at the \(5 \%\) level of significance, whether or not the data can be modelled by a continuous uniform distribution. State your hypotheses clearly.
View full question →
Chi-squared goodness of fit: Given ratios

A question is this type if and only if it tests whether observed frequencies match specified theoretical ratios or proportions.

14 Standard +0.5
8.9% of questions
Show example »
2 In a study of the inheritance of skin colouration in corn snakes, a researcher found 865 snakes with black and orange bodies, 320 snakes with black bodies, 335 snakes with orange bodies and 112 snakes with bodies of other colours. Theory predicts that snakes of these colours should occur in the ratios \(9 : 3 : 3 : 1\). Test, at the \(5 \%\) significance level, whether these experimental results are compatible with theory.
View full question →
Easiest question Standard +0.3 »
2 In a study of the inheritance of skin colouration in corn snakes, a researcher found 865 snakes with black and orange bodies, 320 snakes with black bodies, 335 snakes with orange bodies and 112 snakes with bodies of other colours. Theory predicts that snakes of these colours should occur in the ratios \(9 : 3 : 3 : 1\). Test, at the \(5 \%\) significance level, whether these experimental results are compatible with theory.
View full question →
Hardest question Challenging +1.2 »
7 Benford's Law states that, in many tables containing large numbers of numerical values, the probability distribution of the leading non-zero digit \(D\) is given by $$\mathrm { P } ( D = d ) = \log _ { 10 } \left( \frac { d + 1 } { d } \right) , \quad d = 1,2 , \ldots , 9 .$$ The following table shows a summary of a random sample of 100 non-zero leading digits taken from a table of cumulative probabilities for the Poisson distribution.
Leading digit12345\(\geqslant 6\)
Frequency222113111122
Carry out a suitable goodness of fit test at the 10\% significance level.
View full question →
Chi-squared goodness of fit: Normal

A question is this type if and only if it tests whether continuous data fits a normal distribution, using grouped frequency data and possibly estimated parameters.

10 Standard +0.3
6.4% of questions
Show example »
8 The continuous random variable \(Y\) has a distribution with mean \(\mu\) and variance 20. A random sample of 50 observations of \(Y\) is selected and these observations are summarised in the following grouped frequency table.
Values\(y < 20\)\(20 \leqslant y < 25\)\(25 \leqslant y < 30\)\(y \geqslant 30\)
Frequency327128
  1. Assuming that \(Y \sim \mathrm {~N} ( 25,20 )\), show that the expected frequency for the interval \(20 \leqslant y < 25\) is 18.41, correct to 2 decimal places, and obtain the remaining expected frequencies.
  2. Test, at the \(5 \%\) significance level, whether the distribution \(\mathrm { N } ( 25,20 )\) fits the data.
  3. Given that the sample mean is 24.91 , find a \(98 \%\) confidence interval for \(\mu\).
  4. Does the outcome of the test in part (ii) affect the validity of the confidence interval found in part (iii)? Justify your answer.
View full question →
Easiest question Standard +0.3 »
2 It is claimed that the heights of a particular age group of boys follow a normal distribution with mean 125 cm and standard deviation 12 cm . Observations for a randomly chosen group of 60 boys in this age group are summarised in the following table. The table also gives the expected frequencies, correct to 2 decimal places, based on the normal distribution with mean 125 cm and standard deviation 12 cm .
Height, \(x \mathrm {~cm}\)\(x < 100\)\(100 \leqslant x < 110\)\(110 \leqslant x < 120\)\(120 \leqslant x < 130\)\(130 \leqslant x < 140\)\(x \geqslant 140\)
Observed frequency031523118
Expected frequency1.125.2213.9719.3813.976.34
  1. Show how the expected frequency for \(130 \leqslant x < 140\) is obtained.
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to determine whether the claim is supported by the data.
View full question →
Hardest question Standard +0.3 »
2 It is claimed that the heights of a particular age group of boys follow a normal distribution with mean 125 cm and standard deviation 12 cm . Observations for a randomly chosen group of 60 boys in this age group are summarised in the following table. The table also gives the expected frequencies, correct to 2 decimal places, based on the normal distribution with mean 125 cm and standard deviation 12 cm .
Height, \(x \mathrm {~cm}\)\(x < 100\)\(100 \leqslant x < 110\)\(110 \leqslant x < 120\)\(120 \leqslant x < 130\)\(130 \leqslant x < 140\)\(x \geqslant 140\)
Observed frequency031523118
Expected frequency1.125.2213.9719.3813.976.34
  1. Show how the expected frequency for \(130 \leqslant x < 140\) is obtained.
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to determine whether the claim is supported by the data.
View full question →
Spreadsheet-based chi-squared test

A question is this type if and only if it presents chi-squared test data in a spreadsheet format with some values deliberately omitted to be calculated.

9 Standard +0.3
5.7% of questions
Show example »
At a bird feeding station, birds are captured and ringed. If a bird is recaptured, the ring enables it to be identified. The table below shows the number of recaptures, \(x\), during a period of a month, for each bird of a particular species in a random sample of \(40\) birds.
Number of recaptures, \(x\)012345678910
Frequency255910431010
  1. The sample mean of \(x\) is \(3.4\). Calculate the sample variance of \(x\). [2]
  2. Briefly comment on whether the results of part (i) support a suggestion that a Poisson model might be a good fit to the data. [1]
The screenshot below shows part of a spreadsheet for a \(\chi^2\) test to assess the goodness of fit of a Poisson model. The sample mean of \(3.4\) has been used as an estimate of the Poisson parameter. Some values in the spreadsheet have been deliberately omitted. \includegraphics{figure_2}
  1. State the null and alternative hypotheses for the test. [1]
  2. Calculate the missing values in cells
  3. Complete the test at the \(10\%\) significance level. [5]
  4. The screenshot below shows part of a spreadsheet for a \(\chi^2\) test for a different species of bird. Find the value of the Poisson parameter used. \includegraphics{figure_3} [3]
View full question →
Easiest question Standard +0.3 »
5 A random sample of workers for a large company were asked whether they are smokers, ex-smokers or have never smoked. The responses were classified by the type of worker: Managerial, Production line or Administrative. Fig. 5 is a screenshot showing part of the spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1Observed frequencies
2SmokerEx-smokerNever smokedTotals
3Managerial210517
4Production line18152154
5Administrative1361433
6Totals333140104
7
8Expected frequencies
95.39425.06736.5385
1017.134620.7692
1110.47129.836512.6923
12
13Contributions to the test statistic
142.13584.80170.3620
150.04370.0026
161.49640.1347
17Test statistic9.66
18
\captionsetup{labelformat=empty} \caption{Fig. 5}
\end{table}
  1. (A) State the sample size.
    (B) State the null and alternative hypotheses for a test to investigate whether there is any association between type of worker and smoking status.
  2. Showing your calculations, find the missing values in each of the following cells.
View full question →
Hardest question Standard +0.3 »
5 A random sample of workers for a large company were asked whether they are smokers, ex-smokers or have never smoked. The responses were classified by the type of worker: Managerial, Production line or Administrative. Fig. 5 is a screenshot showing part of the spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1Observed frequencies
2SmokerEx-smokerNever smokedTotals
3Managerial210517
4Production line18152154
5Administrative1361433
6Totals333140104
7
8Expected frequencies
95.39425.06736.5385
1017.134620.7692
1110.47129.836512.6923
12
13Contributions to the test statistic
142.13584.80170.3620
150.04370.0026
161.49640.1347
17Test statistic9.66
18
\captionsetup{labelformat=empty} \caption{Fig. 5}
\end{table}
  1. (A) State the sample size.
    (B) State the null and alternative hypotheses for a test to investigate whether there is any association between type of worker and smoking status.
  2. Showing your calculations, find the missing values in each of the following cells.
View full question →
Chi-squared test of independence

A question is this type if and only if it involves testing whether two categorical variables are independent using a contingency table and chi-squared test.

8 Standard +0.1
5.1% of questions
Show example »
The manager of a leisure centre collected data on the usage of the facilities in the centre by its members. A random sample from her records is summarised below.
FacilityMaleFemale
Pool4068
Jacuzzi2633
Gym5231
Making your method clear, test whether or not there is any evidence of an association between gender and use of the club facilities. State your hypotheses clearly and use a 5\% level of significance. [11]
View full question →
Chi-squared with algebraic frequencies

A question is this type if and only if observed or expected frequencies are given algebraically (in terms of variables) and require manipulation or finding constraints.

5 Standard +0.8
3.2% of questions
Show example »
The table shows the results of a random sample drawn from a population which is thought to have the distribution \(U(20)\).
Range\(1 \leq x \leq 8\)\(9 \leq x \leq 12\)\(13 \leq x \leq 20\)
Observed frequency12\(y\)\(28 - y\)
Find the range of values of \(y\) for which the data are not consistent with the distribution at the \(5\%\) significance level. [9]
View full question →
Assess model suitability before testing

A question is this type if and only if it asks to comment on whether a distribution is suitable by comparing sample mean and variance or other preliminary checks before formal testing.

5 Standard +0.3
3.2% of questions
Show example »
Chai packs china mugs into cardboard boxes. Chai's manager suspects that breakages occur at random times and that the number of breakages may follow a Poisson distribution. He takes a small sample of observations and finds that the number of breakages in a one-hour period has a mean of 2.4 and a standard deviation of 1.5.
  1. Explain how this information tends to support the manager's suspicion. [2]
The manager now takes a larger sample and claims that the numbers of breakages in a one-hour period follow a Poisson distribution. The numbers of breakages in a random sample of 180 one-hour periods are summarised in the following table.
Number of breakages01234567 or more
Frequency213346312316100
The mean number of breakages calculated from this sample is 2.5.
  1. Use the data from this larger sample to carry out a goodness of fit test, at the 10% significance level, to test the claim. [8]
View full question →
F-test for equality of variances

A question is this type if and only if it involves testing whether two population variances are equal using the F-distribution, typically comparing two independent samples from normal populations.

4 Standard +0.5
2.5% of questions
Show example »
  1. The random variable \(X\) has an \(F\)-distribution with 8 and 12 degrees of freedom.
Find \(\mathrm { P } \left( \frac { 1 } { 5.67 } < X < 2.85 \right)\).
(4)
View full question →
Chi-squared distribution theory and properties

A question is this type if and only if it involves theoretical properties of the chi-squared distribution such as moment generating functions, deriving expected values and variances, or verifying probability density functions.

3 Standard +0.8
1.9% of questions
Show example »
4 The random variable \(X\) has a \(\chi ^ { 2 }\) distribution with \(v\) degrees of freedom. The moment generating function of \(X\) is $$\mathrm { M } _ { X } ( t ) = ( 1 - 2 t ) ^ { - \frac { 1 } { 2 } v }$$
  1. Show that \(\mathrm { E } ( X ) = v\).
  2. Find \(\operatorname { Var } ( X )\).
  3. Obtain the moment generating function of the sum \(Y\) of two independent \(\chi ^ { 2 }\) random variables, one with 6 degrees of freedom and the other with 8 degrees of freedom.
  4. Identify the distribution of \(Y\).
View full question →
Interpret chi-squared test results

A question is this type if and only if it asks for interpretation or comment on chi-squared test results in context, including which cells contribute most.

1 Standard +0.3
0.6% of questions
Show example »
  1. State the null hypothesis that Emily used.
  2. Find the value of the test statistic, \(X ^ { 2 }\), giving your answer to one decimal place.
  3. State, in context, the conclusion that Emily should reach based on the results of her \(\chi ^ { 2 }\) test.
  4. Make one comment on the GCSE performances of 16-year-old students attending Bailey Language School.
  5. Emily's friend, Joanna, used the same data to correctly conduct a \(\chi ^ { 2 }\) test using the \(10 \%\) level of significance. State, with justification, the conclusion that Joanna should reach.
View full question →
Degrees of freedom determination

A question is this type if and only if it specifically asks to explain or justify the number of degrees of freedom in a chi-squared test.

1 Moderate -0.3
0.6% of questions
Show example »
  1. Kelly throws a tetrahedral die \(n\) times and records the number on which it lands for each throw.
She calculates the expected frequency for each number to be 43 if the die was unbiased.
The table below shows three of the frequencies Kelly records but the fourth one is missing.
Number1234
Frequency473436\(x\)
  1. Show that \(x = 55\) Kelly wishes to test, at the \(5 \%\) level of significance, whether or not there is evidence that the tetrahedral die is unbiased.
  2. Explain why there are 3 degrees of freedom for this test.
  3. Stating your hypotheses clearly and the critical value used, carry out the test.
View full question →
Calculate expected frequencies

A question is this type if and only if it requires calculating expected frequencies for a chi-squared test from probabilities or from marginal totals in a contingency table.

0
0.0% of questions
Show example »
3 A mobile phone company offers an insurance policy to its customers when they purchase a mobile phone. The company conducted a survey on the age of the customers and whether or not claims were made. A random sample of 1200 customers from this company was investigated for 2020 and the results are shown in the table below.
Claim made in 2020No claim made in 2020Total
\multirow{3}{*}{Age}17-20 years24176200
21-50 years48652700
51 years and over14286300
Total8611141200
The data are to be used to determine whether or not making a claim is independent of age.
  1. Calculate the expected frequencies for the age group 51 years and over that
    1. made a claim in 2020
    2. did not make a claim in 2020 The 4 classes of customers aged between 17 and 50 give a value of \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 7.123\) correct to 3 decimal places.
  2. Test, at the \(1 \%\) level of significance, whether or not making a claim is independent of age. Show your working clearly, stating your hypotheses, the degrees of freedom, the test statistic and the critical value used.
View full question →
Determine minimum sample size

A question is this type if and only if it asks for the minimum or maximum sample size needed to reach a particular conclusion in a chi-squared test.

0
0.0% of questions
Confidence intervals for variance using chi-squared

A question is this type if and only if it involves constructing confidence intervals for a single population variance or standard deviation using the chi-squared distribution.

0
0.0% of questions