OCR MEI S3 — Question 4 10 marks

Exam BoardOCR MEI
ModuleS3 (Statistics 3)
Marks10
TopicChi-squared distribution

4 Quality control inspectors in a factory are investigating the lengths of glass tubes that will be used to make laboratory equipment.
  1. Data on the observed lengths of a random sample of 200 glass tubes from one batch are available in the form of a frequency distribution as follows.
  2. Use a suitable statistical procedure to assess the goodness of fit of \(X\) to these data. Discuss your conclusions briefly. 2 A bus route runs from the centre of town A through the town's urban area to a point B on its boundary and then through the country to a small town C . Because of traffic congestion and general road conditions, delays occur on both the urban and the country sections. All delays may be considered independent. The scheduled time for the journey from A to B is 24 minutes. In fact, journey times over this section are given by the Normally distributed random variable \(X\) with mean 26 minutes and standard deviation 3 minutes. The scheduled time for the journey from B to C is 18 minutes. In fact, journey times over this section are given by the Normally distributed random variable \(Y\) with mean 15 minutes and standard deviation 2 minutes. Journey times on the two sections of route may be considered independent. The timetable published to the public does not show details of times at intermediate points; thus, if a bus is running early, it merely continues on its journey and is not required to wait.
  3. Find the probability that a journey from A to B is completed in less than the scheduled time of 24 minutes.
  4. Find the probability that a journey from A to C is completed in less than the scheduled time of 42 minutes.
  5. It is proposed to introduce a system of bus lanes in the urban area. It is believed that this would mean that the journey time from A to B would be given by the random variable \(0.85 X\). Assuming this to be the case, find the probability that a journey from A to B would be completed in less than the currently scheduled time of 24 minutes.
  6. An alternative proposal is to introduce an express service. This would leave out some bus stops on both sections of the route and its overall journey time from A to C would be given by the random variable \(0.9 X + 0.8 Y\). The scheduled time from A to C is to be given as a whole number of minutes. Find the least possible scheduled time such that, with probability 0.75 , buses would complete the journey on time or early.
  7. A programme of minor road improvements is undertaken on the country section. After their completion, it is thought that the random variable giving the journey time from B to C is still Normally distributed with standard deviation 2 minutes. A random sample of 15 journeys is found to have a sample mean journey time from B to C of 13.4 minutes. Provide a two-sided \(95 \%\) confidence interval for the population mean journey time from B to C . 3 An employer has commissioned an opinion polling organisation to undertake a survey of the attitudes of staff to proposed changes in the pension scheme. The staff are categorised as management, professional and administrative, and it is thought that there might be considerable differences of opinion between the categories. There are 60,140 and 300 staff respectively in the categories. The budget for the survey allows for a sample of 40 members of staff to be selected for in-depth interviews.
  8. Explain why it would be unwise to select a simple random sample from all the staff.
  9. Discuss whether it would be sensible to consider systematic sampling.
  10. What are the advantages of stratified sampling in this situation?
  11. State the sample sizes in each category if stratified sampling with as nearly as possible proportional allocation is used. The opinion polling organisation needs to estimate the average wealth of staff in the categories, in terms of property, savings, investments and so on. In a random sample of 11 professional staff, the sample mean is \(\pounds 345818\) and the sample standard deviation is \(\pounds 69241\).
  12. Assuming the underlying population is Normally distributed, test at the \(5 \%\) level of significance the null hypothesis that the population mean is \(\pounds 300000\) against the alternative hypothesis that it is greater than \(\pounds 300000\). Provide also a two-sided \(95 \%\) confidence interval for the population mean.
    [0pt] [10] 4 A company has many factories. It is concerned about incidents of trespassing and, in the hope of reducing if not eliminating these, has embarked on a programme of installing new fencing.
  13. Records for a random sample of 9 factories of the numbers of trespass incidents in typical weeks before and after installation of the new fencing are as follows.
  14. Find the probability that, on a randomly chosen visit, it takes less than 50 minutes to mow the lawns.
  15. Find the probability that, on a randomly chosen visit, the total time for hoeing and pruning is less than 50 minutes.
  16. If Bill mows the lawns while Ben does the hoeing and pruning, find the probability that, on a randomly chosen visit, Ben finishes first. Bill and Ben do my gardening twice a month and send me an invoice at the end of the month.
  17. Write down the mean and variance of the total time (in minutes) they spend on mowing, hoeing and pruning per month.
  18. The company charges for the total time spent at 15 pence per minute. There is also a fixed charge of \(\pounds 10\) per month. Find the probability that the total charge for a month does not exceed \(\pounds 40\). 4 (a) An amateur weather forecaster has been keeping records of air pressure, measured in atmospheres. She takes the measurement at the same time every day using a barometer situated in her garden. A random sample of 100 of her observations is summarised in the table below. The corresponding expected frequencies for a Normal distribution, with its two parameters estimated by sample statistics, are also shown in the table.
  19. Find the probability that the weekly takings for coaches are less than \(\pounds 40000\).
  20. Find the probability that the weekly takings for lorries exceed the weekly takings for cars.
  21. Find the probability that over a 4 -week period the total takings for cars exceed \(\pounds 225000\). What assumption must be made about the four weeks?
  22. Each week the operator allocates part of the takings for repairs. This is determined for each type of vehicle according to estimates of the long-term damage caused. It is calculated as follows: \(5 \%\) of takings for cars, \(10 \%\) for coaches and \(20 \%\) for lorries. Find the probability that in any given week the total amount allocated for repairs will exceed \(\pounds 20000\). 3 The management of a large chain of shops aims to reduce the level of absenteeism among its workforce by means of an incentive bonus scheme. In order to evaluate the effectiveness of the scheme, the management measures the percentage of working days lost before and after its introduction for each of a random sample of 11 shops. The results are shown below.
  23. Give three reasons why a \(t\) test would be appropriate.
  24. Carry out the test using a \(5 \%\) significance level. State your hypotheses and conclusion carefully.
  25. Find a 95\% confidence interval for the true mean temperature in the reaction chamber.
  26. Describe briefly one advantage and one disadvantage of having a 99\% confidence interval instead of a 95\% confidence interval. 4 (a) In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution \(\mathrm { B } ( 5 , p )\) should model these data.
  27. The grower intends to perform a \(t\) test to examine whether there is any difference in the mean yield of the two types of plant. State the hypotheses he should use and also any necessary assumption.
  28. Carry out the test using a \(5 \%\) significance level.
    (b) The tea grower deals with many types of tea and employs tasters to rate them. The tasters do this by giving each tea a score out of 100. The tea grower wishes to compare the scores given by two of the tasters. Their scores for a random selection of 10 teas are as follows. A Wilcoxon signed rank test is to be used to decide whether there is any evidence of a preference for one of the uniforms.
  29. Explain why this test is appropriate in these circumstances and state the hypotheses that should be used.
  30. Carry out the test at the \(5 \%\) significance level. 4 A random variable \(X\) has probability density function \(\mathrm { f } ( x ) = \frac { 2 x } { \lambda ^ { 2 } }\) for \(0 < x < \lambda\), where \(\lambda\) is a positive constant.
  31. Show that, for any value of \(\lambda , \mathrm { f } ( x )\) is a valid probability density function.
  32. Find \(\mu\), the mean value of \(X\), in terms of \(\lambda\) and show that \(\mathrm { P } ( X < \mu )\) does not depend on \(\lambda\).
  33. Given that \(\mathrm { E } \left( X ^ { 2 } \right) = \frac { \lambda ^ { 2 } } { 2 }\), find \(\sigma ^ { 2 }\), the variance of \(X\), in terms of \(\lambda\). The random variable \(X\) is used to model the depth of the space left by the filling machine at the top of a jar of jam. The model gives the following probabilities for \(X\) (whatever the value of \(\lambda\) ).
  34. Initially it is assumed that the value of \(p\) is \(\frac { 1 } { 2 }\). Test at the \(5 \%\) level of significance whether it is reasonable to suppose that the model applies with \(p = \frac { 1 } { 2 }\).
  35. The model is refined by estimating \(p\) from the data. Find the mean of the observed data and hence an estimate of \(p\).
  36. Using the estimated value of \(p\), the value of the test statistic \(X ^ { 2 }\) turns out to be 2.3857 . Is it reasonable to suppose, at the \(5 \%\) level of significance, that this refined model applies?
  37. Discuss the reasons for the different outcomes of the tests in parts (i) and (iii). 2 (a) A continuous random variable, \(X\), has probability density function $$f ( x ) = \begin{cases} \frac { 1 } { 72 } \left( 8 x - x ^ { 2 } \right) & 2 \leqslant x \leqslant 8
    0 & \text { otherwise } \end{cases}$$
  38. Find \(\mathrm { F } ( x )\), the cumulative distribution function of \(X\).
  39. Sketch \(\mathrm { F } ( x )\).
  40. The median of \(X\) is \(m\). Show that \(m\) satisfies the equation \(m ^ { 3 } - 12 m ^ { 2 } + 148 = 0\). Verify that \(m \approx 4.42\).
    (b) The random variable in part (a) is thought to model the weights, in kilograms, of lambs at birth. The birth weights, in kilograms, of a random sample of 12 lambs, given in ascending order, are as follows. $$\begin{array} { l l l l l l l l l l l l } 3.16 & 3.62 & 3.80 & 3.90 & 4.02 & 4.72 & 5.14 & 6.36 & 6.50 & 6.58 & 6.68 & 6.78 \end{array}$$ Test at the 5\% level of significance whether a median of 4.42 is consistent with these data. 3 Cholesterol is a lipid (fat) which is manufactured by the liver from the fatty foods that we eat. It plays a vital part in allowing the body to function normally. However, when high levels of cholesterol are present in the blood there is a risk of arterial disease. Among the factors believed to assist with achieving and maintaining low cholesterol levels are weight loss and exercise. A doctor wishes to test the effectiveness of exercise in lowering cholesterol levels. For a random sample of 12 of her patients, she measures their cholesterol levels before and after they have followed a programme of exercise. The measurements obtained are as follows. This sample is to be tested to see whether the campaign appears to have been successful in raising the percentage receiving the booster.
  41. Explain why the use of paired data is appropriate in this context.
  42. Carry out an appropriate Wilcoxon signed rank test using these data, at the \(5 \%\) significance level.
    (b) Benford's Law predicts the following probability distribution for the first significant digit in some large data sets.
    Digit123456789
    Probability0.3010.1760.1250.0970.0790.0670.0580.0510.046
    On one particular day, the first significant digits of the stock market prices of the shares of a random sample of 200 companies gave the following results.
    Digit123456789
    Frequency55342716151712159
    Test at the \(10 \%\) level of significance whether Benford's Law provides a reasonable model in the context of share prices. 4 A random variable \(X\) has an exponential distribution with probability density function \(\mathrm { f } ( x ) = \lambda \mathrm { e } ^ { - \lambda x }\) for \(x \geqslant 0\), where \(\lambda\) is a positive constant.
  43. Verify that \(\int _ { 0 } ^ { \infty } \mathrm { f } ( x ) \mathrm { d } x = 1\) and sketch \(\mathrm { f } ( x )\).
  44. In this part of the question you may use the following result. $$\int _ { 0 } ^ { \infty } x ^ { r } \mathrm { e } ^ { - \lambda x } \mathrm {~d} x = \frac { r ! } { \lambda ^ { r + 1 } } \quad \text { for } r = 0,1,2 , \ldots$$ Derive the mean and variance of \(X\) in terms of \(\lambda\). The random variable \(X\) is used to model the lifetime, in years, of a particular type of domestic appliance. The manufacturer of the appliance states that, based on past experience, the mean lifetime is 6 years.
  45. Let \(\bar { X }\) denote the mean lifetime, in years, of a random sample of 50 appliances. Write down an approximate distribution for \(\bar { X }\).
  46. A random sample of 50 appliances is found to have a mean lifetime of 7.8 years. Does this cast any doubt on the model?
This paper (2 questions)
View full paper