Chi-squared goodness of fit: Normal

A question is this type if and only if it tests whether continuous data fits a normal distribution, using grouped frequency data and possibly estimated parameters.

13 questions · Standard +0.3

Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2021 November Q2
8 marks Standard +0.3
2 It is claimed that the heights of a particular age group of boys follow a normal distribution with mean 125 cm and standard deviation 12 cm . Observations for a randomly chosen group of 60 boys in this age group are summarised in the following table. The table also gives the expected frequencies, correct to 2 decimal places, based on the normal distribution with mean 125 cm and standard deviation 12 cm .
Height, \(x \mathrm {~cm}\)\(x < 100\)\(100 \leqslant x < 110\)\(110 \leqslant x < 120\)\(120 \leqslant x < 130\)\(130 \leqslant x < 140\)\(x \geqslant 140\)
Observed frequency031523118
Expected frequency1.125.2213.9719.3813.976.34
  1. Show how the expected frequency for \(130 \leqslant x < 140\) is obtained.
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to determine whether the claim is supported by the data.
OCR S3 2007 June Q8
14 marks Standard +0.3
8 The continuous random variable \(Y\) has a distribution with mean \(\mu\) and variance 20. A random sample of 50 observations of \(Y\) is selected and these observations are summarised in the following grouped frequency table.
Values\(y < 20\)\(20 \leqslant y < 25\)\(25 \leqslant y < 30\)\(y \geqslant 30\)
Frequency327128
  1. Assuming that \(Y \sim \mathrm {~N} ( 25,20 )\), show that the expected frequency for the interval \(20 \leqslant y < 25\) is 18.41, correct to 2 decimal places, and obtain the remaining expected frequencies.
  2. Test, at the \(5 \%\) significance level, whether the distribution \(\mathrm { N } ( 25,20 )\) fits the data.
  3. Given that the sample mean is 24.91 , find a \(98 \%\) confidence interval for \(\mu\).
  4. Does the outcome of the test in part (ii) affect the validity of the confidence interval found in part (iii)? Justify your answer.
OCR MEI S3 2006 January Q4
18 marks Standard +0.3
4 Quality control inspectors in a factory are investigating the lengths of glass tubes that will be used to make laboratory equipment.
  1. Data on the observed lengths of a random sample of 200 glass tubes from one batch are available in the form of a frequency distribution as follows.
    Length
    \(x ( \mathrm {~mm} )\)
    Observed
    frequency
    \(x \leqslant 298\)1
    \(298 < x \leqslant 300\)30
    \(300 < x \leqslant 301\)62
    \(301 < x \leqslant 302\)70
    \(302 < x \leqslant 304\)34
    \(x > 304\)3
    The sample mean and standard deviation are 301.08 and 1.2655 respectively.
    The corresponding expected frequencies for the Normal distribution with parameters estimated by the sample statistics are
    Length
    \(x ( \mathrm {~mm} )\)
    Expected
    frequency
    \(x \leqslant 298\)1.49
    \(298 < x \leqslant 300\)37.85
    \(300 < x \leqslant 301\)55.62
    \(301 < x \leqslant 302\)58.32
    \(302 < x \leqslant 304\)44.62
    \(x > 304\)2.10
    Examine the goodness of fit of a Normal distribution, using a 5\% significance level.
  2. It is thought that the lengths of tubes in another batch have an underlying distribution similar to that for the batch in part (i) but possibly with different location and dispersion parameters. A random sample of 10 tubes from this batch gives the following lengths (in mm ). $$\begin{array} { l l l l l l l l l l } 301.3 & 301.4 & 299.6 & 302.2 & 300.3 & 303.2 & 302.6 & 301.8 & 300.9 & 300.8 \end{array}$$ (A) Discuss briefly whether it would be appropriate to use a \(t\) test to examine a hypothesis about the population mean length for this batch.
    (B) Use a Wilcoxon test to examine at the \(10 \%\) significance level whether the population median length for this batch is 301 mm .
OCR MEI S3 2007 January Q4
18 marks Standard +0.3
4
  1. An amateur weather forecaster has been keeping records of air pressure, measured in atmospheres. She takes the measurement at the same time every day using a barometer situated in her garden. A random sample of 100 of her observations is summarised in the table below. The corresponding expected frequencies for a Normal distribution, with its two parameters estimated by sample statistics, are also shown in the table.
    Pressure ( \(a\) atmospheres)Observed frequencyFrequency as given by Normal model
    \(a \leqslant 0.98\)41.45
    \(0.98 < a \leqslant 0.99\)65.23
    \(0.99 < a \leqslant 1.00\)913.98
    \(1.00 < a \leqslant 1.01\)1523.91
    \(1.01 < a \leqslant 1.02\)3726.15
    \(1.02 < a \leqslant 1.03\)2118.29
    \(1.03 < a\)810.99
    Carry out a test at the \(5 \%\) level of significance of the goodness of fit of the Normal model. State your conclusion carefully and comment on your findings.
  2. The forecaster buys a new digital barometer that can be linked to her computer for easier recording of observations. She decides that she wishes to compare the readings of the new barometer with those of the old one. For a random sample of 10 days, the readings (in atmospheres) of the two barometers are shown below.
    DayABCDEFGHIJ
    Old0.9921.0051.0011.0111.0260.9801.0201.0251.0421.009
    New0.9851.0031.0021.0141.0220.9881.0301.0161.0471.025
    Use an appropriate Wilcoxon test to examine at the \(10 \%\) level of significance whether there is any reason to suppose that, on the whole, readings on the old and new barometers do not agree.
OCR S3 2009 June Q7
14 marks Standard +0.3
7 In 1761, James Short took measurements of the parallax of the sun based on the transit of Venus. The mean and standard deviation of a random sample of 50 of these measurements are 8.592 and 0.7534 respectively, in suitable units.
  1. Show that if \(X \sim \mathrm {~N} \left( 8.592,0.7534 ^ { 2 } \right)\), then $$\mathrm { P } ( X \leqslant 8.084 ) = \mathrm { P } ( 8.084 < X \leqslant 8.592 ) = \mathrm { P } ( 8.592 < X \leqslant 9.100 ) = \mathrm { P } ( X > 9.100 ) = 0.25 \text {. }$$ The following table summarises the 50 measurements using these intervals.
    Measurement \(( x )\)\(x \leqslant 8.084\)\(8.084 < x \leqslant 8.592\)\(8.592 < x \leqslant 9.100\)\(x > 9.100\)
    Frequency822119
  2. Carry out a test, at the \(\frac { 1 } { 2 } \%\) significance level, of whether a normal distribution fits the data.
  3. Obtain a 99\% confidence interval for the mean of all similar parallax measurements.
OCR MEI S3 2011 January Q3
18 marks Standard +0.3
3 The masses, in kilograms, of a random sample of 100 chickens on sale in a large supermarket were recorded as follows.
Mass \(( m \mathrm {~kg} )\)\(m < 1.6\)\(1.6 \leqslant m < 1.8\)\(1.8 \leqslant m < 2.0\)\(2.0 \leqslant m < 2.2\)\(2.2 \leqslant m < 2.4\)\(2.4 \leqslant m < 2.6\)\(2.6 \leqslant m\)
Frequency2830421152
  1. Assuming that the first and last classes are the same width as the other classes, calculate an estimate of the sample mean and show that the corresponding estimate of the sample standard deviation is 0.2227 kg . A Normal distribution using the mean and standard deviation found in part (i) is to be fitted to these data. The expected frequencies for the classes are as follows.
    Mass \(( m \mathrm {~kg} )\)\(m < 1.6\)\(1.6 \leqslant m < 1.8\)\(1.8 \leqslant m < 2.0\)\(2.0 \leqslant m < 2.2\)\(2.2 \leqslant m < 2.4\)\(2.4 \leqslant m < 2.6\)\(2.6 \leqslant m\)
    Expected
    frequency
    2.1710.92\(f\)33.8519.225.130.68
  2. Use the Normal distribution to find \(f\).
  3. Carry out a goodness of fit test of this Normal model using a significance level of 5\%.
  4. Discuss the outcome of the test with reference to the contributions to the test statistic and to the possibility of other significance levels.
CAIE FP2 2018 November Q11 OR
Standard +0.3
A machine is used to produce metal rods. When the machine is working efficiently, the lengths, \(x \mathrm {~cm}\), of the rods have a normal distribution with mean 150 cm and standard deviation 1.2 cm . The machine is checked regularly by taking random samples of 200 rods. The latest results are shown in the following table.
Interval\(146 \leqslant x < 147\)\(147 \leqslant x < 148\)\(148 \leqslant x < 149\)\(149 \leqslant x < 150\)
Observed frequency122352
\(150 \leqslant x < 151\)\(151 \leqslant x < 152\)\(152 \leqslant x < 153\)\(153 \leqslant x < 154\)
6936152
As a first check, the sample is used to calculate an estimate for the mean.
  1. Show that an estimate for the mean from this sample is close to 150 cm .
    As a second check, the results are tested for goodness of fit of the normal distribution with mean 150 cm and standard deviation 1.2 cm . The relevant expected frequencies, found using the normal distribution function given in the List of Formulae (MF10), are shown in the following table.
    Interval\(x < 147\)\(147 \leqslant x < 148\)\(148 \leqslant x < 149\)\(149 \leqslant x < 150\)
    Observed frequency122352
    Expected frequency1.248.3230.9459.50
    \(150 \leqslant x < 151\)\(151 \leqslant x < 152\)\(152 \leqslant x < 153\)\(153 \leqslant x\)
    6936152
    59.5030.948.321.24
  2. Show how the expected frequency for \(151 \leqslant x < 152\) is obtained.
  3. Test, at the \(5 \%\) significance level, the goodness of fit of the normal distribution to the results.
    If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
OCR Further Statistics 2021 November Q6
11 marks Standard +0.3
6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, \(X\) minutes, taken by candidates to complete the paper. The organiser starts by carrying out a goodness-of-fit test for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) at the \(5 \%\) significance level. The grouped data and the results of some of the calculations are shown in the following table.
Time\(0 \leqslant X < 80\)\(80 \leqslant X < 90\)\(90 \leqslant X < 100\)\(100 \leqslant X < 110\)\(X \geqslant 110\)
Observed frequency \(O\)3695137129103
Expected frequency \(E\)45.60680.641123.754123.754126.246
\(\frac { ( O - E ) ^ { 2 } } { E }\)2.0232.5571.4180.2224.280
  1. State suitable hypotheses for the test.
  2. Show how the figures 123.754 and 0.222 in the column for \(100 \leqslant X < 110\) were obtained. [3]
  3. Carry out the test. The organiser now wants to suggest an improved model for the data.
    1. Suggest an aspect of the data that the organiser should take into account in considering an improved model.
    2. The graph of the probability density function for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) is shown in the diagram in the Printed Answer Booklet. On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
Edexcel S3 2021 January Q5
18 marks Standard +0.3
5. Chrystal is studying the lengths of pine cones that have fallen from a tree. She believes that the length, \(X \mathrm {~cm}\), of the pine cones can be modelled by a normal distribution with mean 6 cm and standard deviation 0.75 cm . She collects a random sample of 80 pine cones and their lengths are recorded in the table below.
Length, \(x\) cm\(x < 5\)\(5 \leqslant x < 5.5\)\(5.5 \leqslant x < 6\)\(6 \leqslant x < 6.5\)\(x \geqslant 6.5\)
Frequency614242610
  1. Stating your hypotheses clearly and using a \(10 \%\) level of significance, test Chrystal's belief. Show your working clearly and state the expected frequencies, the test statistic and the critical value used.
    (10) Chrystal's friend David asked for more information about the lengths of the 80 pine cones. Chrystal told him that $$\sum x = 464 \quad \text { and } \quad \sum x ^ { 2 } = 2722.59$$
  2. Calculate unbiased estimates of the mean and variance of the lengths of the pine cones. David used the calculations from part (b) to test whether or not the lengths of the pine cones are normally distributed using Chrystal's sample. His test statistic was 3.50 (to 3 significant figures) and he did not pool any classes.
  3. Using a \(10 \%\) level of significance, complete David's test stating the critical value and the degrees of freedom used.
  4. Estimate, to 2 significant figures, the proportion of pine cones from the tree that are longer than 7 cm . \includegraphics[max width=\textwidth, alt={}, center]{ba3f3f9c-53d2-4e95-b2f3-3f617f1821ed-15_2255_50_314_34}
Edexcel S3 2022 January Q6
14 marks Standard +0.3
  1. A farmer sells strawberries in baskets. The contents of each of 100 randomly selected baskets were weighed and the results, given to the nearest gram, are shown below.
Weight of strawberries (grams)Number of baskets
302-3035
304-30513
306-30710
308-30918
310-31125
312-31320
314-3155
316-3174
The farmer proposes that the weight of strawberries per basket, in grams, should be modelled by a normal distribution with a mean of 310 g and standard deviation 4 g . Using his model, the farmer obtains the following expected frequencies.
Weight of strawberries (s, grams)Expected frequency
\(s \leqslant 303.5\)\(a\)
\(303.5 < s \leqslant 305.5\)7.8
\(305.5 < s \leqslant 307.5\)13.6
\(307.5 < s \leqslant 309.5\)18.4
\(309.5 < s \leqslant 311.5\)19.6
\(311.5 < s \leqslant 313.5\)16.3
\(313.5 < s \leqslant 315.5\)10.6
\(s > 315.5\)\(b\)
  1. Find the value of \(a\) and the value of \(b\). Give your answers correct to one decimal place. Before \(s \leqslant 303.5\) and \(s > 315.5\) are included, for the remaining cells, $$\sum \frac { ( O - E ) ^ { 2 } } { E } = 9.71$$
  2. Using a 5\% significance level, test whether the data are consistent with the model. You should state your hypotheses, the test statistic and the critical value used. An alternative model uses estimates for the population mean and standard deviation from the data given. Using these estimated values no expected frequency is below 5
    Another test is to be carried out, using a \(5 \%\) significance level, to assess whether the data are consistent with this alternative model.
  3. State the effect, if any, on the critical value for this test. Give a reason for your answer.
Edexcel S3 2013 June Q4
14 marks Standard +0.3
4. Customers at a post office are timed to see how long they wait until being served at the counter. A random sample of 50 customers is chosen and their waiting times, \(x\) minutes, are summarised in Table 1. \begin{table}[h]
Waiting time in minutes \(( x )\)Frequency
\(0 - 3\)8
\(3 - 5\)12
\(5 - 6\)13
\(6 - 8\)9
\(8 - 12\)8
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Show that an estimate of \(\bar { x } = 5.49\) and an estimate of \(s _ { x } ^ { 2 } = 6.88\) The post office manager believes that the customers' waiting times can be modelled by a normal distribution.
    Assuming the data is normally distributed, she calculates the expected frequencies for these data and some of these frequencies are shown in Table 2. \begin{table}[h]
    Waiting Time\(x < 3\)\(3 - 5\)\(5 - 6\)\(6 - 8\)\(x > 8\)
    Expected Frequency8.5612.737.56\(a\)\(b\)
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Find the value of \(a\) and the value of \(b\).
  3. Test, at the \(5 \%\) level of significance, the manager's belief. State your hypotheses clearly.
Edexcel S3 Q7
17 marks Standard +0.3
7. A shoe manufacturer sees a report from another country stating that the length of adult male feet is normally distributed with a mean of 22.4 cm and a standard deviation of 2.8 cm . The manufacturer wishes to see if this model is appropriate for his customers and collects data on the length, correct to the nearest cm, of the right foot of a random sample of 200 males giving the following results:
Length (cm)\(\leq 18\)\(19 - 21\)\(22 - 24\)\(25 - 27\)\(\geq 28\)
No. of Men2448694118
The expected frequencies for the \(\leq 18\) and \(19 - 21\) groups are calculated as 16.46 and 58.44 respectively, correct to 2 decimal places.
  1. Calculate expected frequencies for the other three classes.
  2. Stating your hypotheses clearly, test at the \(10 \%\) level of significance whether or not this data can be modelled by the distribution \(\mathrm { N } \left( 22.4,2.8 ^ { 2 } \right)\).
    (7 marks)
    The manufacturer wishes to refine the model by not assuming a mean and standard deviation.
  3. Explain briefly how the manufacturer should proceed. \section*{END}
OCR MEI S3 Q4
20 marks Standard +0.3
4 Quality control inspectors in a factory are investigating the lengths of glass tubes that will be used to make laboratory equipment.
  1. Data on the observed lengths of a random sample of 200 glass tubes from one batch are available in the form of a frequency distribution as follows.
  2. Use a suitable statistical procedure to assess the goodness of fit of \(X\) to these data. Discuss your conclusions briefly. 2 A bus route runs from the centre of town A through the town's urban area to a point B on its boundary and then through the country to a small town C . Because of traffic congestion and general road conditions, delays occur on both the urban and the country sections. All delays may be considered independent. The scheduled time for the journey from A to B is 24 minutes. In fact, journey times over this section are given by the Normally distributed random variable \(X\) with mean 26 minutes and standard deviation 3 minutes. The scheduled time for the journey from B to C is 18 minutes. In fact, journey times over this section are given by the Normally distributed random variable \(Y\) with mean 15 minutes and standard deviation 2 minutes. Journey times on the two sections of route may be considered independent. The timetable published to the public does not show details of times at intermediate points; thus, if a bus is running early, it merely continues on its journey and is not required to wait.
  3. Find the probability that a journey from A to B is completed in less than the scheduled time of 24 minutes.
  4. Find the probability that a journey from A to C is completed in less than the scheduled time of 42 minutes.
  5. It is proposed to introduce a system of bus lanes in the urban area. It is believed that this would mean that the journey time from A to B would be given by the random variable \(0.85 X\). Assuming this to be the case, find the probability that a journey from A to B would be completed in less than the currently scheduled time of 24 minutes.
  6. An alternative proposal is to introduce an express service. This would leave out some bus stops on both sections of the route and its overall journey time from A to C would be given by the random variable \(0.9 X + 0.8 Y\). The scheduled time from A to C is to be given as a whole number of minutes. Find the least possible scheduled time such that, with probability 0.75 , buses would complete the journey on time or early.
  7. A programme of minor road improvements is undertaken on the country section. After their completion, it is thought that the random variable giving the journey time from B to C is still Normally distributed with standard deviation 2 minutes. A random sample of 15 journeys is found to have a sample mean journey time from B to C of 13.4 minutes. Provide a two-sided \(95 \%\) confidence interval for the population mean journey time from B to C . 3 An employer has commissioned an opinion polling organisation to undertake a survey of the attitudes of staff to proposed changes in the pension scheme. The staff are categorised as management, professional and administrative, and it is thought that there might be considerable differences of opinion between the categories. There are 60,140 and 300 staff respectively in the categories. The budget for the survey allows for a sample of 40 members of staff to be selected for in-depth interviews.
  8. Explain why it would be unwise to select a simple random sample from all the staff.
  9. Discuss whether it would be sensible to consider systematic sampling.
  10. What are the advantages of stratified sampling in this situation?
  11. State the sample sizes in each category if stratified sampling with as nearly as possible proportional allocation is used. The opinion polling organisation needs to estimate the average wealth of staff in the categories, in terms of property, savings, investments and so on. In a random sample of 11 professional staff, the sample mean is \(\pounds 345818\) and the sample standard deviation is \(\pounds 69241\).
  12. Assuming the underlying population is Normally distributed, test at the \(5 \%\) level of significance the null hypothesis that the population mean is \(\pounds 300000\) against the alternative hypothesis that it is greater than \(\pounds 300000\). Provide also a two-sided \(95 \%\) confidence interval for the population mean.
    [0pt] [10] 4 A company has many factories. It is concerned about incidents of trespassing and, in the hope of reducing if not eliminating these, has embarked on a programme of installing new fencing.
  13. Records for a random sample of 9 factories of the numbers of trespass incidents in typical weeks before and after installation of the new fencing are as follows.
  14. Find the probability that, on a randomly chosen visit, it takes less than 50 minutes to mow the lawns.
  15. Find the probability that, on a randomly chosen visit, the total time for hoeing and pruning is less than 50 minutes.
  16. If Bill mows the lawns while Ben does the hoeing and pruning, find the probability that, on a randomly chosen visit, Ben finishes first. Bill and Ben do my gardening twice a month and send me an invoice at the end of the month.
  17. Write down the mean and variance of the total time (in minutes) they spend on mowing, hoeing and pruning per month.
  18. The company charges for the total time spent at 15 pence per minute. There is also a fixed charge of \(\pounds 10\) per month. Find the probability that the total charge for a month does not exceed \(\pounds 40\). 4 (a) An amateur weather forecaster has been keeping records of air pressure, measured in atmospheres. She takes the measurement at the same time every day using a barometer situated in her garden. A random sample of 100 of her observations is summarised in the table below. The corresponding expected frequencies for a Normal distribution, with its two parameters estimated by sample statistics, are also shown in the table.
  19. Find the probability that the weekly takings for coaches are less than \(\pounds 40000\).
  20. Find the probability that the weekly takings for lorries exceed the weekly takings for cars.
  21. Find the probability that over a 4 -week period the total takings for cars exceed \(\pounds 225000\). What assumption must be made about the four weeks?
  22. Each week the operator allocates part of the takings for repairs. This is determined for each type of vehicle according to estimates of the long-term damage caused. It is calculated as follows: \(5 \%\) of takings for cars, \(10 \%\) for coaches and \(20 \%\) for lorries. Find the probability that in any given week the total amount allocated for repairs will exceed \(\pounds 20000\). 3 The management of a large chain of shops aims to reduce the level of absenteeism among its workforce by means of an incentive bonus scheme. In order to evaluate the effectiveness of the scheme, the management measures the percentage of working days lost before and after its introduction for each of a random sample of 11 shops. The results are shown below.
  23. Give three reasons why a \(t\) test would be appropriate.
  24. Carry out the test using a \(5 \%\) significance level. State your hypotheses and conclusion carefully.
  25. Find a 95\% confidence interval for the true mean temperature in the reaction chamber.
  26. Describe briefly one advantage and one disadvantage of having a 99\% confidence interval instead of a 95\% confidence interval. 4 (a) In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution \(\mathrm { B } ( 5 , p )\) should model these data.
  27. The grower intends to perform a \(t\) test to examine whether there is any difference in the mean yield of the two types of plant. State the hypotheses he should use and also any necessary assumption.
  28. Carry out the test using a \(5 \%\) significance level.
    (b) The tea grower deals with many types of tea and employs tasters to rate them. The tasters do this by giving each tea a score out of 100. The tea grower wishes to compare the scores given by two of the tasters. Their scores for a random selection of 10 teas are as follows. A Wilcoxon signed rank test is to be used to decide whether there is any evidence of a preference for one of the uniforms.
  29. Explain why this test is appropriate in these circumstances and state the hypotheses that should be used.
  30. Carry out the test at the \(5 \%\) significance level. 4 A random variable \(X\) has probability density function \(\mathrm { f } ( x ) = \frac { 2 x } { \lambda ^ { 2 } }\) for \(0 < x < \lambda\), where \(\lambda\) is a positive constant.
  31. Show that, for any value of \(\lambda , \mathrm { f } ( x )\) is a valid probability density function.
  32. Find \(\mu\), the mean value of \(X\), in terms of \(\lambda\) and show that \(\mathrm { P } ( X < \mu )\) does not depend on \(\lambda\).
  33. Given that \(\mathrm { E } \left( X ^ { 2 } \right) = \frac { \lambda ^ { 2 } } { 2 }\), find \(\sigma ^ { 2 }\), the variance of \(X\), in terms of \(\lambda\). The random variable \(X\) is used to model the depth of the space left by the filling machine at the top of a jar of jam. The model gives the following probabilities for \(X\) (whatever the value of \(\lambda\) ).
  34. Initially it is assumed that the value of \(p\) is \(\frac { 1 } { 2 }\). Test at the \(5 \%\) level of significance whether it is reasonable to suppose that the model applies with \(p = \frac { 1 } { 2 }\).
  35. The model is refined by estimating \(p\) from the data. Find the mean of the observed data and hence an estimate of \(p\).
  36. Using the estimated value of \(p\), the value of the test statistic \(X ^ { 2 }\) turns out to be 2.3857 . Is it reasonable to suppose, at the \(5 \%\) level of significance, that this refined model applies?
  37. Discuss the reasons for the different outcomes of the tests in parts (i) and (iii). 2 (a) A continuous random variable, \(X\), has probability density function $$f ( x ) = \begin{cases} \frac { 1 } { 72 } \left( 8 x - x ^ { 2 } \right) & 2 \leqslant x \leqslant 8 \\ 0 & \text { otherwise } \end{cases}$$
  38. Find \(\mathrm { F } ( x )\), the cumulative distribution function of \(X\).
  39. Sketch \(\mathrm { F } ( x )\).
  40. The median of \(X\) is \(m\). Show that \(m\) satisfies the equation \(m ^ { 3 } - 12 m ^ { 2 } + 148 = 0\). Verify that \(m \approx 4.42\).
    (b) The random variable in part (a) is thought to model the weights, in kilograms, of lambs at birth. The birth weights, in kilograms, of a random sample of 12 lambs, given in ascending order, are as follows. $$\begin{array} { l l l l l l l l l l l l } 3.16 & 3.62 & 3.80 & 3.90 & 4.02 & 4.72 & 5.14 & 6.36 & 6.50 & 6.58 & 6.68 & 6.78 \end{array}$$ Test at the 5\% level of significance whether a median of 4.42 is consistent with these data. 3 Cholesterol is a lipid (fat) which is manufactured by the liver from the fatty foods that we eat. It plays a vital part in allowing the body to function normally. However, when high levels of cholesterol are present in the blood there is a risk of arterial disease. Among the factors believed to assist with achieving and maintaining low cholesterol levels are weight loss and exercise. A doctor wishes to test the effectiveness of exercise in lowering cholesterol levels. For a random sample of 12 of her patients, she measures their cholesterol levels before and after they have followed a programme of exercise. The measurements obtained are as follows. This sample is to be tested to see whether the campaign appears to have been successful in raising the percentage receiving the booster.
  41. Explain why the use of paired data is appropriate in this context.
  42. Carry out an appropriate Wilcoxon signed rank test using these data, at the \(5 \%\) significance level.
    (b) Benford's Law predicts the following probability distribution for the first significant digit in some large data sets.
    Digit123456789
    Probability0.3010.1760.1250.0970.0790.0670.0580.0510.046
    On one particular day, the first significant digits of the stock market prices of the shares of a random sample of 200 companies gave the following results.
    Digit123456789
    Frequency55342716151712159
    Test at the \(10 \%\) level of significance whether Benford's Law provides a reasonable model in the context of share prices. 4 A random variable \(X\) has an exponential distribution with probability density function \(\mathrm { f } ( x ) = \lambda \mathrm { e } ^ { - \lambda x }\) for \(x \geqslant 0\), where \(\lambda\) is a positive constant.
  43. Verify that \(\int _ { 0 } ^ { \infty } \mathrm { f } ( x ) \mathrm { d } x = 1\) and sketch \(\mathrm { f } ( x )\).
  44. In this part of the question you may use the following result. $$\int _ { 0 } ^ { \infty } x ^ { r } \mathrm { e } ^ { - \lambda x } \mathrm {~d} x = \frac { r ! } { \lambda ^ { r + 1 } } \quad \text { for } r = 0,1,2 , \ldots$$ Derive the mean and variance of \(X\) in terms of \(\lambda\). The random variable \(X\) is used to model the lifetime, in years, of a particular type of domestic appliance. The manufacturer of the appliance states that, based on past experience, the mean lifetime is 6 years.
  45. Let \(\bar { X }\) denote the mean lifetime, in years, of a random sample of 50 appliances. Write down an approximate distribution for \(\bar { X }\).
  46. A random sample of 50 appliances is found to have a mean lifetime of 7.8 years. Does this cast any doubt on the model?