5.05d - OCR Spec

OCR MEI S3 2007 January Q2

18 marks Standard +0.3

2 The manager of a large country estate is preparing to plant an area of woodland. He orders a large number of saplings (young trees) from a nursery. He selects a random sample of 12 of the saplings and measures their heights, which are as follows (in metres). $$\begin{array} { l l l l l l l l l l l l } 0.63 & 0.62 & 0.58 & 0.56 & 0.59 & 0.62 & 0.64 & 0.58 & 0.55 & 0.61 & 0.56 & 0.52 \end{array}$$

The manager requires that the mean height of saplings at planting is at least 0.6 metres. Carry out the usual $t$ test to examine this, using a $5 \%$ significance level. State your hypotheses and conclusion carefully. What assumption is needed for the test to be valid?
Find a $95 \%$ confidence interval for the true mean height of saplings. Explain carefully what is meant by a $95 \%$ confidence interval.
Suppose the assumption needed in part (i) cannot be justified. Identify an alternative test that the manager could carry out in order to check that the saplings meet his requirements, and state the null hypothesis for this test.

OCR MEI S3 2006 June Q3

18 marks Moderate -0.3

3 An employer has commissioned an opinion polling organisation to undertake a survey of the attitudes of staff to proposed changes in the pension scheme. The staff are categorised as management, professional and administrative, and it is thought that there might be considerable differences of opinion between the categories. There are 60,140 and 300 staff respectively in the categories. The budget for the survey allows for a sample of 40 members of staff to be selected for in-depth interviews.

Explain why it would be unwise to select a simple random sample from all the staff.
Discuss whether it would be sensible to consider systematic sampling.
What are the advantages of stratified sampling in this situation?
State the sample sizes in each category if stratified sampling with as nearly as possible proportional allocation is used. The opinion polling organisation needs to estimate the average wealth of staff in the categories, in terms of property, savings, investments and so on. In a random sample of 11 professional staff, the sample mean is $\pounds 345818$ and the sample standard deviation is $\pounds 69241$.
Assuming the underlying population is Normally distributed, test at the $5 \%$ level of significance the null hypothesis that the population mean is $\pounds 300000$ against the alternative hypothesis that it is greater than $\pounds 300000$. Provide also a two-sided $95 \%$ confidence interval for the population mean.
[0pt] [10]

OCR MEI S3 2006 June Q4

18 marks Moderate -0.3

4 A company has many factories. It is concerned about incidents of trespassing and, in the hope of reducing if not eliminating these, has embarked on a programme of installing new fencing.

Records for a random sample of 9 factories of the numbers of trespass incidents in typical weeks before and after installation of the new fencing are as follows.
Factory A B C D E F G H I
Number before installation 8 12 6 4 14 22 4 13 14
Number after installation 6 11 0 1 18 10 11 5 4
Use a Wilcoxon test to examine at the $5 \%$ level of significance whether it appears that, on the whole, the number of trespass incidents per week is lower after the installation of the new fencing than before.
Records are also available of the costs of damage from typical trespass incidents before and after the introduction of the new fencing for a random sample of 7 factories, as follows (in £).
Factory T U V W X Y Z
Cost before installation 1215 95 546 467 2356 236 550
Cost after installation 1268 110 578 480 2417 318 620
Stating carefully the required distributional assumption, provide a two-sided $99 \%$ confidence interval based on a $t$ distribution for the population mean difference between costs of damage before and after installation of the new fencing. Explain why this confidence interval should not be based on the Normal distribution.

OCR MEI S3 2007 June Q1

18 marks Standard +0.3

1 A manufacturer of fireworks is investigating the lengths of time for which the fireworks burn. For a particular type of firework this length of time, in minutes, is modelled by the random variable $T$ with probability density function $$\mathrm { f } ( t ) = k t ^ { 3 } ( 2 - t ) \quad \text { for } 0 < t \leqslant 2$$ where $k$ is a constant.

Show that $k = \frac { 5 } { 8 }$.
Find the modal time.
Find $\mathrm { E } ( T )$ and show that $\operatorname { Var } ( T ) = \frac { 8 } { 63 }$.
A large random sample of $n$ fireworks of this type is tested. Write down in terms of $n$ the approximate distribution of $\bar { T }$, the sample mean time.
For a random sample of 100 such fireworks the times are summarised as follows. $$\Sigma t = 145.2 \quad \Sigma t ^ { 2 } = 223.41$$ Find a 95\% confidence interval for the mean time for this type of firework and hence comment on the appropriateness of the model.

OCR MEI S3 2007 June Q3

18 marks Standard +0.3

3 The management of a large chain of shops aims to reduce the level of absenteeism among its workforce by means of an incentive bonus scheme. In order to evaluate the effectiveness of the scheme, the management measures the percentage of working days lost before and after its introduction for each of a random sample of 11 shops. The results are shown below.

Shop	A	B	C	D	E	F	G	H	I	J	K
\% days lost before	3.5	5.0	3.5	3.2	4.5	4.9	4.1	6.0	6.8	8.1	6.0
\% days lost after	1.8	4.3	2.9	4.5	4.4	5.8	3.5	6.7	6.4	5.4	5.1

The management decides to carry out a $t$ test to investigate whether there has been a reduction in absenteeism.
1. State clearly the hypotheses that should be used together with any necessary assumptions.
2. Carry out the test using a $5 \%$ significance level.
Find a 95\% confidence interval for the true mean percentage of days lost after the introduction of the incentive scheme and state any assumption needed. The management has set a target that the mean percentage should be 3.5. Do you think this has been achieved? Explain your answer.

OCR MEI S4 2007 June Q3

24 marks Challenging +1.2

3 An engineering company buys a certain type of component from two suppliers, A and B. It is important that, on the whole, the strengths of these components are the same from both suppliers. The company can measure the strengths in its laboratory. Random samples of seven components from supplier A and five from supplier B give the following strengths, in a convenient unit.

Supplier A	25.8	27.4	26.2	23.5	28.3	26.4	27.2
Supplier B	25.6	24.9	23.7	25.8	26.9

The underlying distributions of strengths are assumed to be Normal for both suppliers, with variances 2.45 for supplier A and 1.40 for supplier B.

Test at the $5 \%$ level of significance whether it is reasonable to assume that the mean strengths from the two suppliers are equal.
Provide a two-sided 90\% confidence interval for the true mean difference.
Show that the test procedure used in part (i), with samples of sizes 7 and 5 and a $5 \%$ significance level, leads to acceptance of the null hypothesis of equal means if $- 1.556 < \bar { x } - \bar { y } < 1.556$, where $\bar { x }$ and $\bar { y }$ are the observed sample means from suppliers A and B . Hence find the probability of a Type II error for this test procedure if in fact the true mean strength from supplier A is 2.0 units more than that from supplier B.
A manager suggests that the Wilcoxon rank sum test should be used instead, comparing the median strengths for the samples of sizes 7 and 5 . Give one reason why this suggestion might be sensible and two why it might not.

OCR MEI S4 2008 June Q3

24 marks Standard +0.3

3

Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic. A machine fills salt containers that will be sold in shops. The containers are supposed to contain 750 g of salt. The machine operates in such a way that the amount of salt delivered to each container is a Normally distributed random variable with standard deviation 20 g . The machine should be calibrated in such a way that the mean amount delivered, $\mu$, is 750 g . Each hour, a random sample of 9 containers is taken from the previous hour's output and the sample mean amount of salt is determined. If this is between 735 g and 765 g , the previous hour's output is accepted. If not, the previous hour's output is rejected and the machine is recalibrated.
Find the probability of rejecting the previous hour's output if the machine is properly calibrated. Comment on your result.
Find the probability of accepting the previous hour's output if $\mu = 725 \mathrm {~g}$. Comment on your result.
Obtain an expression for the operating characteristic of this testing procedure in terms of the cumulative distribution function $\Phi ( z )$ of the standard Normal distribution. Evaluate the operating characteristic for the following values (in g) of $\mu$ : 720, 730, 740, 750, 760, 770, 780.

OCR MEI S4 2010 June Q3

24 marks Standard +0.3

3 At a factory, two production lines are in use for making steel rods. A critical dimension is the diameter of a rod. For the first production line, it is assumed from experience that the diameters are Normally distributed with standard deviation 1.2 mm . For the second production line, it is assumed from experience that the diameters are Normally distributed with standard deviation 1.4 mm . It is desired to test whether the mean diameters for the two production lines, $\mu _ { 1 }$ and $\mu _ { 2 }$, are equal. A random sample of 8 rods is taken from the first production line and, independently, a random sample of 10 rods is taken from the second production line.

Find the acceptance region for the customary test based on the Normal distribution for the null hypothesis $\mu _ { 1 } = \mu _ { 2 }$, against the alternative hypothesis $\mu _ { 1 } \neq \mu _ { 2 }$, at the $5 \%$ level of significance.
The sample means are found to be 25.8 mm and 24.4 mm respectively. What is the result of the test? Provide a two-sided $99 \%$ confidence interval for $\mu _ { 1 } - \mu _ { 2 }$. The production lines are modified so that the diameters may be assumed to be of equal (but unknown) variance. However, they may no longer be Normally distributed. A two-sided test of the equality of the population medians is required, at the $5 \%$ significance level.
The diameters in independent random samples of sizes 6 and 8 are as follows, in mm .
First production line 25.9 25.8 25.3 24.7 24.4 25.4
Second production line 23.8 25.6 24.0 23.5 24.1 24.5 24.3 25.1
Use an appropriate procedure to carry out the test.

OCR MEI S4 2014 June Q2

24 marks Challenging +1.2

2

The probability density function of the random variable $X$ is $$\mathrm { f } ( x ) = \frac { x ^ { k - 1 } \mathrm { e } ^ { - x / \phi } } { \phi ^ { k } ( k - 1 ) ! } , x > 0$$ where $k$ is a known positive integer and $\phi$ is an unknown parameter ( $\phi > 0$ ). Show that the moment generating function (mgf) of $X$ is $$\mathrm { M } _ { X } ( \theta ) = ( 1 - \phi \theta ) ^ { - k }$$ for $\theta < \frac { 1 } { \phi }$.
Write down the mgf of the random variable $W = \sum _ { i = 1 } ^ { n } X _ { i }$ where $X _ { 1 } , X _ { 2 } , \ldots , X _ { n }$ are independent random variables each with the same distribution as $X$.
Write down the mgf of the random variable $Y = \frac { 2 W } { \phi }$. Given that the mgf of the random variable $V$ having the $\chi _ { m } ^ { 2 }$ distribution is $\mathrm { M } _ { V } ( \theta ) = ( 1 - 2 \theta ) ^ { - m / 2 }$ (for $\theta < \frac { 1 } { 2 }$ ), deduce the distribution of $Y$.
Deduce that $\mathrm { P } \left( l < \frac { 2 W } { \phi } < u \right) = 0.95$ where $l$ and $u$ are the lower and upper $2 \frac { 1 } { 2 } \%$ points of the $\chi _ { 2 n k } ^ { 2 }$ distribution. Hence deduce that a $95 \%$ confidence interval for $\phi$ is given by $\left( \frac { 2 w } { u } , \frac { 2 w } { l } \right)$ where $w$ is an observation on the random variable $W$.
For the case $k = 2$ and $n = 10$, use percentage points of the $\chi ^ { 2 }$ distribution to write down, in terms of $w$, an expression for a $95 \%$ confidence interval for $\phi$. By considering the $\operatorname { mgf }$ of $W$, find in terms of $\phi$ the expected length of this interval.

OCR MEI S4 2015 June Q1

24 marks Standard +0.3

1 The random variable $X$ has the following probability density function, in which $a$ is a (positive) parameter. $$\mathrm { f } ( x ) = \frac { 2 } { a } x \mathrm { e } ^ { - x ^ { 2 } / a } , \quad x \geqslant 0 .$$

Verify that $\int _ { 0 } ^ { \infty } \mathrm { f } ( x ) \mathrm { d } x = 1$.
Show that $\mathrm { E } \left( X ^ { 2 } \right) = a$ and $\mathrm { E } \left( X ^ { 4 } \right) = 2 a ^ { 2 }$. The parameter $a$ is to be estimated by maximum likelihood based on an independent random sample from the distribution, $X _ { 1 } , X _ { 2 } , \ldots , X _ { n }$.
Show that the logarithm of the likelihood function is $$n \ln 2 - n \ln a + \sum _ { i = 1 } ^ { n } \ln X _ { i } - \frac { 1 } { a } \sum _ { i = 1 } ^ { n } X _ { i } ^ { 2 }$$ Hence obtain the maximum likelihood estimator, $\hat { a }$, for $a$.
[0pt] [You are not required to verify that any turning point you find is a maximum.]
Using the results from part (ii), show that $\hat { a }$ is unbiased for $a$ and find the variance of $\hat { a }$.
In a particular random sample from this distribution, $n = 100$ and $\sum x _ { i } ^ { 2 } = 147.1$. Obtain an approximate 95\% confidence interval for $a$. (You may assume that the Central Limit Theorem holds in this case.) Option 2: Generating Functions

OCR S3 2014 June Q5

9 marks Standard +0.3

5 The day before the 1992 General Election, an opinion poll showed that $37.6 \%$ of a random sample of 1731 voters intended to vote for the Conservative party.

Calculate an approximate $99.9 \%$ confidence interval for the proportion of voters intending to vote Conservative. The actual proportion voting Conservative was above the upper limit of the confidence interval.
Give two possible reasons for this occurrence.
What sample size would be required to produce a $99.9 \%$ confidence interval of width 0.05 ?

OCR S3 2015 June Q4

9 marks Standard +0.3

4 A set of bathroom scales is known to operate with an error which is normally distributed. One morning a man weighs himself 4 times. The 4 values for his mass, in kg , which can be considered to be a random sample are as follows. $$\begin{array} { l l l l } 62.6 & 62.8 & 62.1 & 62.5 \end{array}$$

Find a $95 \%$ confidence interval for his mass. Give the end-points of the interval correct to 3 decimal places.
Based on these results, a $y \%$ confidence interval has width 0.482 . Find $y$.

OCR S2 2013 June Q6

11 marks Standard +0.3

6 The random variable $X$ denotes the yield, in kilograms per acre, of a certain crop. Under the standard treatment it is known that $\mathrm { E } ( X ) = 38.4$. Under a new treatment, the yields of 50 randomly chosen regions can be summarised as $$n = 50 , \quad \sum x = 1834.0 , \quad \sum x ^ { 2 } = 70027.37 .$$ Test at the $1 \%$ level whether there has been a change in the mean crop yield.

OCR S3 2009 January Q3

8 marks Moderate -0.3

3 In a random sample of credit card holders, it was found that $28 \%$ of them used their card for internet purchases.

Given that the sample size is 1200 , find a $98 \%$ confidence interval for the percentage of all credit card holders who use their card for internet purchases.
Estimate the smallest sample size for which a $98 \%$ confidence interval would have a width of at most $5 \%$, and state why the value found is only an estimate.

OCR S3 2010 January Q5

11 marks Standard +0.3

5 Each of a random sample of 200 steel bars taken from a production line was examined and 27 were found to be faulty.

Find an approximate $90 \%$ confidence interval for the proportion of faulty bars produced. A change in the production method was introduced which, it was claimed, would reduce the proportion of faulty bars. After the change, each of a further random sample of 100 bars was examined and 8 were found to be faulty.
Test the claim, at the $10 \%$ significance level.

OCR S3 2010 January Q6

12 marks Standard +0.3

6 The deterioration of a certain drug over time was investigated as follows. The drug strength was measured in each of a random sample of 8 bottles containing the drug. These were stored for two years and the strengths were then re-measured. The original and final strengths, in suitable units, are shown in the following table.

Bottle	1	2	3	4	5	6	7	8
Original strength	8.7	9.4	9.2	8.9	9.6	8.2	9.9	8.8
Final strength	8.1	9.0	9.0	8.8	9.3	8.0	9.5	8.5

Stating any required assumption, test at the $5 \%$ significance level whether the mean strength has decreased by more than 0.2 over the two years.
Calculate a 95\% confidence interval for the mean reduction in strength over the two years.

OCR S3 2013 January Q3

7 marks Standard +0.3

3 Two reading schemes, $A$ and $B$, are compared by using them with a random sample of 9 five-year-old children. The children are divided into two groups, 5 allotted to scheme $A$ and 4 to scheme $B$, and the schemes are taught under similar conditions.
After one year the children are given the same test and their scores, $x _ { A }$ and $x _ { B }$, are summarised below. With the usual notation, $$\begin{aligned} & n _ { A } = 5 , \bar { x } _ { A } = 52.0 , \sum \left( x _ { A } - \bar { x } _ { A } \right) ^ { 2 } = 248 , \\ & n _ { B } = 4 , \bar { x } _ { B } = 56.5 , \sum \left( x _ { B } - \bar { x } _ { B } \right) ^ { 2 } = 381 . \end{aligned}$$ It may be assumed that scores have normal distributions.

Calculate an $80 \%$ confidence interval for the difference in population mean scores for the two methods.
State a further assumption required for the validity of the interval.

OCR S3 2013 January Q5

9 marks Standard +0.3

5 A constitutional change was proposed for a Golf Club with a large membership. This was to be voted on at the Annual General Meeting. A month before this meeting the secretary asked a random sample of 50 members for their opinions. Out of the 50 members $70 \%$ said they approved.

Calculate an approximate $90 \%$ confidence interval for the proportion $p$ of all members who would approve the proposal.
Explain what is meant by a $90 \%$ confidence interval in this context.
Nearer the date of the meeting the secretary asked a random sample of $n$ members, and, as before, $70 \%$ said they approved. This time the secretary calculated an approximate $99 \%$ confidence interval for $p$. It is given that the confidence interval does not include 0.85 . Find the smallest possible value of $n$.

OCR S3 2013 January Q7

11 marks Challenging +1.2

7 The random variable $X$ has distribution $\mathrm { N } ( \mu , 1 )$. A random sample of 4 observations of $X$ is taken. The sample mean is denoted by $\bar { X }$.

Find the value of the constant $a$ for which ( $\bar { X } - a , \bar { X } + a$ ) is a $98 \%$ confidence interval for $\mu$. The independent random variable $Y$ has distribution $\mathrm { N } ( \mu , 9 )$. A random sample of 16 observations of $Y$ is taken. The sample mean is denoted by $\bar { Y }$.
Write down the distribution of $\bar { X } - \bar { Y }$.
A $90 \%$ confidence interval for $\mu$ based on $\bar { Y }$ is given by ( $\bar { Y } - 1.234 , \bar { Y } + 1.234$ ). Find the probability that this interval does not overlap with the interval in part (i).

OCR S3 2009 June Q3

7 marks Standard +0.3

3 A machine produces circular metal discs whose radii have a normal distribution with mean $\mu \mathrm { cm }$. A random sample of five discs is selected and their radii, in cm, are as follows. $$\begin{array} { l l l l l } 6.47 & 6.52 & 6.46 & 6.47 & 6.51 \end{array}$$

Calculate a $95 \%$ confidence interval for $\mu$.
Hence state a 95\% confidence interval for the mean circumference of a disc.

OCR S3 2009 June Q7

14 marks Standard +0.3

7 In 1761, James Short took measurements of the parallax of the sun based on the transit of Venus. The mean and standard deviation of a random sample of 50 of these measurements are 8.592 and 0.7534 respectively, in suitable units.

Show that if $X \sim \mathrm {~N} \left( 8.592,0.7534 ^ { 2 } \right)$, then $$\mathrm { P } ( X \leqslant 8.084 ) = \mathrm { P } ( 8.084 < X \leqslant 8.592 ) = \mathrm { P } ( 8.592 < X \leqslant 9.100 ) = \mathrm { P } ( X > 9.100 ) = 0.25 \text {. }$$ The following table summarises the 50 measurements using these intervals.
Measurement $( x )$ $x \leqslant 8.084$ $8.084 < x \leqslant 8.592$ $8.592 < x \leqslant 9.100$ $x > 9.100$
Frequency 8 22 11 9
Carry out a test, at the $\frac { 1 } { 2 } \%$ significance level, of whether a normal distribution fits the data.
Obtain a 99\% confidence interval for the mean of all similar parallax measurements.

OCR S3 2010 June Q4

8 marks Standard +0.3

4 Part of an ecological study involved measuring the heights of trees in a young forest. In order to obtain an estimate of the mean height of all the trees in the forest, a random sample of 70 trees was selected and their heights measured. These heights, $x$ metres, are summarised by $\Sigma x = 246.6$ and $\Sigma x ^ { 2 } = 1183.65$. The mean height of all trees in the forest is denoted by $\mu$ metres.

Calculate a symmetric $90 \%$ confidence interval for $\mu$.
A student was asked to interpret the interval and said,
"If 100 independent $90 \%$ confidence intervals were calculated then 90 of them would contain $\mu$." Explain briefly what is wrong with this statement.
Four independent $90 \%$ confidence intervals for $\mu$ are obtained. Calculate the probability that at least three of the intervals contain $\mu$.

OCR S3 2012 June Q1

7 marks Moderate -0.3

1 A machine fills packets of flour whose nominal weights are 500 g . Each of a random sample of 100 packets was weighed and 14 packets weighed less than 500 g . The population proportion of packets that weigh less than 500 g is denoted by $p$.

Calculate an approximate $95 \%$ confidence interval for $p$.
The weights of the packets, in grams, are normally distributed with mean $\mu$ and variance 50 . Assuming that $p = 0.14$, calculate the value of $\mu$.

OCR S3 2012 June Q4

11 marks Standard +0.3

4 The time interval, $T$ minutes, between consecutive stoppages of a particular grinding machine is regularly measured. $T$ is normally distributed with mean $\mu$.
24 randomly chosen values of $T$ are summarised by $$\sum _ { i = 1 } ^ { 24 } t _ { i } = 348.0 \text { and } \sum _ { i = 1 } ^ { 24 } t _ { i } ^ { 2 } = 5195.5 .$$

Calculate a symmetric $95 \%$ confidence interval for $\mu$.
For the machine to be working acceptably, $\mu$ should be at least 15.0 . Using a test at the 10\% significance level, decide whether the machine is working acceptably.

OCR S3 2013 June Q2

9 marks Standard +0.3

2 In order to estimate the total number of rabbits in a certain region, a random sample of 500 rabbits is captured, marked and released. After two days a random sample of 250 rabbits is captured and 24 are found to be marked. It may be assumed that there is no change in the population during the two days.

Estimate the total number of rabbits in the region.
Calculate an approximate $95 \%$ confidence interval for the population proportion of marked rabbits.
Using your answer to part (ii), estimate a 95\% confidence interval for the total number of rabbits in the region.

Measurement \(( x )\)	\(x \leqslant 8.084\)	\(8.084 < x \leqslant 8.592\)	\(8.592 < x \leqslant 9.100\)	\(x > 9.100\)
Frequency	8	22	11	9

Factory	A	B	C	D	E	F	G	H	I
Number before installation	8	12	6	4	14	22	4	13	14
Number after installation	6	11	0	1	18	10	11	5	4

Factory	T	U	V	W	X	Y	Z
Cost before installation	1215	95	546	467	2356	236	550
Cost after installation	1268	110	578	480	2417	318	620

First production line	25.9	25.8	25.3	24.7	24.4	25.4
Second production line	23.8	25.6	24.0	23.5	24.1	24.5	24.3	25.1

5.05d Confidence intervals: using normal distribution

OCR MEI S3 2007 January Q2

OCR MEI S3 2006 June Q3

OCR MEI S3 2006 June Q4

OCR MEI S3 2007 June Q1

OCR MEI S3 2007 June Q3

OCR MEI S4 2007 June Q3

OCR MEI S4 2008 June Q3

OCR MEI S4 2010 June Q3

OCR MEI S4 2014 June Q2

OCR MEI S4 2015 June Q1

OCR S3 2014 June Q5

OCR S3 2015 June Q4

OCR S2 2013 June Q6

OCR S3 2009 January Q3

OCR S3 2010 January Q5

OCR S3 2010 January Q6

OCR S3 2013 January Q3

OCR S3 2013 January Q5

OCR S3 2013 January Q7

OCR S3 2009 June Q3

OCR S3 2009 June Q7

OCR S3 2010 June Q4

OCR S3 2012 June Q1

OCR S3 2012 June Q4

OCR S3 2013 June Q2