Chi-squared goodness of fit: Normal

A question is this type if and only if it tests whether continuous data fits a normal distribution, using grouped frequency data and possibly estimated parameters.

10 questions · Standard +0.3

Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2021 November Q2
8 marks Standard +0.3
2 It is claimed that the heights of a particular age group of boys follow a normal distribution with mean 125 cm and standard deviation 12 cm . Observations for a randomly chosen group of 60 boys in this age group are summarised in the following table. The table also gives the expected frequencies, correct to 2 decimal places, based on the normal distribution with mean 125 cm and standard deviation 12 cm .
Height, \(x \mathrm {~cm}\)\(x < 100\)\(100 \leqslant x < 110\)\(110 \leqslant x < 120\)\(120 \leqslant x < 130\)\(130 \leqslant x < 140\)\(x \geqslant 140\)
Observed frequency031523118
Expected frequency1.125.2213.9719.3813.976.34
  1. Show how the expected frequency for \(130 \leqslant x < 140\) is obtained.
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to determine whether the claim is supported by the data.
OCR S3 2007 June Q8
14 marks Standard +0.3
8 The continuous random variable \(Y\) has a distribution with mean \(\mu\) and variance 20. A random sample of 50 observations of \(Y\) is selected and these observations are summarised in the following grouped frequency table.
Values\(y < 20\)\(20 \leqslant y < 25\)\(25 \leqslant y < 30\)\(y \geqslant 30\)
Frequency327128
  1. Assuming that \(Y \sim \mathrm {~N} ( 25,20 )\), show that the expected frequency for the interval \(20 \leqslant y < 25\) is 18.41, correct to 2 decimal places, and obtain the remaining expected frequencies.
  2. Test, at the \(5 \%\) significance level, whether the distribution \(\mathrm { N } ( 25,20 )\) fits the data.
  3. Given that the sample mean is 24.91 , find a \(98 \%\) confidence interval for \(\mu\).
  4. Does the outcome of the test in part (ii) affect the validity of the confidence interval found in part (iii)? Justify your answer.
OCR MEI S3 2007 January Q4
18 marks Standard +0.3
4
  1. An amateur weather forecaster has been keeping records of air pressure, measured in atmospheres. She takes the measurement at the same time every day using a barometer situated in her garden. A random sample of 100 of her observations is summarised in the table below. The corresponding expected frequencies for a Normal distribution, with its two parameters estimated by sample statistics, are also shown in the table.
    Pressure ( \(a\) atmospheres)Observed frequencyFrequency as given by Normal model
    \(a \leqslant 0.98\)41.45
    \(0.98 < a \leqslant 0.99\)65.23
    \(0.99 < a \leqslant 1.00\)913.98
    \(1.00 < a \leqslant 1.01\)1523.91
    \(1.01 < a \leqslant 1.02\)3726.15
    \(1.02 < a \leqslant 1.03\)2118.29
    \(1.03 < a\)810.99
    Carry out a test at the \(5 \%\) level of significance of the goodness of fit of the Normal model. State your conclusion carefully and comment on your findings.
  2. The forecaster buys a new digital barometer that can be linked to her computer for easier recording of observations. She decides that she wishes to compare the readings of the new barometer with those of the old one. For a random sample of 10 days, the readings (in atmospheres) of the two barometers are shown below.
    DayABCDEFGHIJ
    Old0.9921.0051.0011.0111.0260.9801.0201.0251.0421.009
    New0.9851.0031.0021.0141.0220.9881.0301.0161.0471.025
    Use an appropriate Wilcoxon test to examine at the \(10 \%\) level of significance whether there is any reason to suppose that, on the whole, readings on the old and new barometers do not agree.
OCR S3 2009 June Q7
14 marks Standard +0.3
7 In 1761, James Short took measurements of the parallax of the sun based on the transit of Venus. The mean and standard deviation of a random sample of 50 of these measurements are 8.592 and 0.7534 respectively, in suitable units.
  1. Show that if \(X \sim \mathrm {~N} \left( 8.592,0.7534 ^ { 2 } \right)\), then $$\mathrm { P } ( X \leqslant 8.084 ) = \mathrm { P } ( 8.084 < X \leqslant 8.592 ) = \mathrm { P } ( 8.592 < X \leqslant 9.100 ) = \mathrm { P } ( X > 9.100 ) = 0.25 \text {. }$$ The following table summarises the 50 measurements using these intervals.
    Measurement \(( x )\)\(x \leqslant 8.084\)\(8.084 < x \leqslant 8.592\)\(8.592 < x \leqslant 9.100\)\(x > 9.100\)
    Frequency822119
  2. Carry out a test, at the \(\frac { 1 } { 2 } \%\) significance level, of whether a normal distribution fits the data.
  3. Obtain a 99\% confidence interval for the mean of all similar parallax measurements.
OCR MEI S3 2011 January Q3
18 marks Standard +0.3
3 The masses, in kilograms, of a random sample of 100 chickens on sale in a large supermarket were recorded as follows.
Mass \(( m \mathrm {~kg} )\)\(m < 1.6\)\(1.6 \leqslant m < 1.8\)\(1.8 \leqslant m < 2.0\)\(2.0 \leqslant m < 2.2\)\(2.2 \leqslant m < 2.4\)\(2.4 \leqslant m < 2.6\)\(2.6 \leqslant m\)
Frequency2830421152
  1. Assuming that the first and last classes are the same width as the other classes, calculate an estimate of the sample mean and show that the corresponding estimate of the sample standard deviation is 0.2227 kg . A Normal distribution using the mean and standard deviation found in part (i) is to be fitted to these data. The expected frequencies for the classes are as follows.
    Mass \(( m \mathrm {~kg} )\)\(m < 1.6\)\(1.6 \leqslant m < 1.8\)\(1.8 \leqslant m < 2.0\)\(2.0 \leqslant m < 2.2\)\(2.2 \leqslant m < 2.4\)\(2.4 \leqslant m < 2.6\)\(2.6 \leqslant m\)
    Expected
    frequency
    2.1710.92\(f\)33.8519.225.130.68
  2. Use the Normal distribution to find \(f\).
  3. Carry out a goodness of fit test of this Normal model using a significance level of 5\%.
  4. Discuss the outcome of the test with reference to the contributions to the test statistic and to the possibility of other significance levels.
OCR Further Statistics 2021 November Q6
11 marks Standard +0.3
6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, \(X\) minutes, taken by candidates to complete the paper. The organiser starts by carrying out a goodness-of-fit test for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) at the \(5 \%\) significance level. The grouped data and the results of some of the calculations are shown in the following table.
Time\(0 \leqslant X < 80\)\(80 \leqslant X < 90\)\(90 \leqslant X < 100\)\(100 \leqslant X < 110\)\(X \geqslant 110\)
Observed frequency \(O\)3695137129103
Expected frequency \(E\)45.60680.641123.754123.754126.246
\(\frac { ( O - E ) ^ { 2 } } { E }\)2.0232.5571.4180.2224.280
  1. State suitable hypotheses for the test.
  2. Show how the figures 123.754 and 0.222 in the column for \(100 \leqslant X < 110\) were obtained. [3]
  3. Carry out the test. The organiser now wants to suggest an improved model for the data.
    1. Suggest an aspect of the data that the organiser should take into account in considering an improved model.
    2. The graph of the probability density function for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) is shown in the diagram in the Printed Answer Booklet. On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
Edexcel S3 2021 January Q5
18 marks Standard +0.3
5. Chrystal is studying the lengths of pine cones that have fallen from a tree. She believes that the length, \(X \mathrm {~cm}\), of the pine cones can be modelled by a normal distribution with mean 6 cm and standard deviation 0.75 cm . She collects a random sample of 80 pine cones and their lengths are recorded in the table below.
Length, \(x\) cm\(x < 5\)\(5 \leqslant x < 5.5\)\(5.5 \leqslant x < 6\)\(6 \leqslant x < 6.5\)\(x \geqslant 6.5\)
Frequency614242610
  1. Stating your hypotheses clearly and using a \(10 \%\) level of significance, test Chrystal's belief. Show your working clearly and state the expected frequencies, the test statistic and the critical value used.
    (10) Chrystal's friend David asked for more information about the lengths of the 80 pine cones. Chrystal told him that $$\sum x = 464 \quad \text { and } \quad \sum x ^ { 2 } = 2722.59$$
  2. Calculate unbiased estimates of the mean and variance of the lengths of the pine cones. David used the calculations from part (b) to test whether or not the lengths of the pine cones are normally distributed using Chrystal's sample. His test statistic was 3.50 (to 3 significant figures) and he did not pool any classes.
  3. Using a \(10 \%\) level of significance, complete David's test stating the critical value and the degrees of freedom used.
  4. Estimate, to 2 significant figures, the proportion of pine cones from the tree that are longer than 7 cm . \includegraphics[max width=\textwidth, alt={}, center]{ba3f3f9c-53d2-4e95-b2f3-3f617f1821ed-15_2255_50_314_34}
Edexcel S3 2022 January Q6
14 marks Standard +0.3
  1. A farmer sells strawberries in baskets. The contents of each of 100 randomly selected baskets were weighed and the results, given to the nearest gram, are shown below.
Weight of strawberries (grams)Number of baskets
302-3035
304-30513
306-30710
308-30918
310-31125
312-31320
314-3155
316-3174
The farmer proposes that the weight of strawberries per basket, in grams, should be modelled by a normal distribution with a mean of 310 g and standard deviation 4 g . Using his model, the farmer obtains the following expected frequencies.
Weight of strawberries (s, grams)Expected frequency
\(s \leqslant 303.5\)\(a\)
\(303.5 < s \leqslant 305.5\)7.8
\(305.5 < s \leqslant 307.5\)13.6
\(307.5 < s \leqslant 309.5\)18.4
\(309.5 < s \leqslant 311.5\)19.6
\(311.5 < s \leqslant 313.5\)16.3
\(313.5 < s \leqslant 315.5\)10.6
\(s > 315.5\)\(b\)
  1. Find the value of \(a\) and the value of \(b\). Give your answers correct to one decimal place. Before \(s \leqslant 303.5\) and \(s > 315.5\) are included, for the remaining cells, $$\sum \frac { ( O - E ) ^ { 2 } } { E } = 9.71$$
  2. Using a 5\% significance level, test whether the data are consistent with the model. You should state your hypotheses, the test statistic and the critical value used. An alternative model uses estimates for the population mean and standard deviation from the data given. Using these estimated values no expected frequency is below 5
    Another test is to be carried out, using a \(5 \%\) significance level, to assess whether the data are consistent with this alternative model.
  3. State the effect, if any, on the critical value for this test. Give a reason for your answer.
Edexcel S3 2013 June Q4
14 marks Standard +0.3
4. Customers at a post office are timed to see how long they wait until being served at the counter. A random sample of 50 customers is chosen and their waiting times, \(x\) minutes, are summarised in Table 1. \begin{table}[h]
Waiting time in minutes \(( x )\)Frequency
\(0 - 3\)8
\(3 - 5\)12
\(5 - 6\)13
\(6 - 8\)9
\(8 - 12\)8
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Show that an estimate of \(\bar { x } = 5.49\) and an estimate of \(s _ { x } ^ { 2 } = 6.88\) The post office manager believes that the customers' waiting times can be modelled by a normal distribution.
    Assuming the data is normally distributed, she calculates the expected frequencies for these data and some of these frequencies are shown in Table 2. \begin{table}[h]
    Waiting Time\(x < 3\)\(3 - 5\)\(5 - 6\)\(6 - 8\)\(x > 8\)
    Expected Frequency8.5612.737.56\(a\)\(b\)
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Find the value of \(a\) and the value of \(b\).
  3. Test, at the \(5 \%\) level of significance, the manager's belief. State your hypotheses clearly.
Edexcel S3 Q7
17 marks Standard +0.3
A shoe manufacturer sees a report from another country stating that the length of adult male feet is normally distributed with a mean of 22.4 cm and a standard deviation of 2.8 cm. The manufacturer wishes to see if this model is appropriate for his customers and collects data on the length, correct to the nearest cm, of the right foot of a random sample of 200 males giving the following results:
Length (cm)\(\leq 18\)\(19 - 21\)\(22 - 24\)\(25 - 27\)\(\geq 28\)
No. of Men2448694118
The expected frequencies for the \(\leq 18\) and \(19 - 21\) groups are calculated as 16.46 and 58.44 respectively, correct to 2 decimal places.
  1. Calculate expected frequencies for the other three classes. [7]
  2. Stating your hypotheses clearly, test at the 10\% level of significance whether or not this data can be modelled by the distribution N(22.4, 2.8²). [7]
The manufacturer wishes to refine the model by not assuming a mean and standard deviation.
  1. Explain briefly how the manufacturer should proceed. [3]