OCR S3 (Statistics 3) 2008 January

Question 1
View details
1 A blueberry farmer increased the amount of water sprayed over his berries to see what effect this had on their weight. The farmer weighed each of a random sample of 80 berries of the previous season's crop and each of a random sample of 100 berries of the new crop. The results are summarised in the following table, in which \(\bar { x }\) denotes the sample mean weight in grams, and \(s ^ { 2 }\) denotes an unbiased estimate of the relevant population variance.
Sample size\(\bar { x }\)\(s ^ { 2 }\)
Previous season's crop \(( P )\)801.240.00356
New crop \(( N )\)1001.360.00340
  1. Calculate an estimate of \(\operatorname { Var } \left( \bar { X } _ { N } - \bar { X } _ { P } \right)\).
  2. Calculate a \(95 \%\) confidence interval for the difference in population mean weights.
  3. Give a reason why it is unnecessary to use a \(t\)-distribution in calculating the confidence interval.
Question 2
View details
2 The times taken for customers' phone complaints to be handled were monitored regularly by a company. During a particular week a researcher checked a random sample of 20 complaints and the times, \(x\) minutes, taken to handle the complaints are summarised by \(\Sigma x = 337.5\). Handling times may be assumed to have a normal distribution with mean \(\mu\) minutes and standard deviation 3.8 minutes.
  1. Calculate a \(98 \%\) confidence interval for \(\mu\). During the same week two other researchers each calculated a \(98 \%\) confidence interval for \(\mu\) based on independent samples.
  2. Calculate the probability that at least one of the three intervals does not contain \(\mu\).
  3. State two ways in which the calculation in part (i) would differ if the standard deviation were unknown.
Question 3
View details
3 A transport authority wished to compare the performance of two rail companies, Western and Northern. They noted that the number of 'on-time' arrivals for a random sample of 80 Western trains over a particular route was 71 . The corresponding number for a random sample of 90 Northern trains over a similar route was 73 .
  1. Test, at the \(5 \%\) significance level, whether the population proportion of on-time Western trains exceeds the population proportion of on-time Northern trains.
  2. Ranjit wishes to test whether the population proportion of on-time Western trains exceeds the population proportion of on-time Northern trains by more than 0.01 . What variance estimate should she use?
Question 4
View details
4 Eezimix flour is sold in small bags of weight \(S\) grams, where \(S \sim \mathrm {~N} \left( 502.1,0.31 ^ { 2 } \right)\). It is also sold in large bags of weight \(L\) grams, where \(L \sim \mathrm {~N} \left( 1004.9,0.58 ^ { 2 } \right)\).
  1. Find the probability that a randomly chosen large bag weighs at least 1 gram more than two randomly chosen small bags.
  2. Find the probability that a randomly chosen large bag weighs less than twice the weight of a randomly chosen small bag.
Question 5
View details
5 Of two brands of lawnmower, \(A\) and \(B\), brand \(A\) was claimed to take less time, on average, than brand \(B\) to mow similar stretches of lawn. In order to test this claim, 9 randomly selected gardeners were each given the task of mowing two regions of lawn, one with each brand of mower. All the regions had the same size and shape and had grass of the same height. The times taken, in seconds, are given in the table.
Gardener123456789
Brand \(A\)412386389401396394397411391
Brand \(B\)422394385408394399397410397
  1. Test the claim using a paired-sample \(t\)-test at the \(5 \%\) significance level. State a distributional assumption required for the test to be valid.
  2. Give a reason why a paired-sample \(t\)-test should be used, rather than a 2 -sample \(t\)-test, in this case.
Question 6
View details
6 The Research and Development department of a paint manufacturer has produced paint of three different shades of grey, \(G _ { 1 } , G _ { 2 }\) and \(G _ { 3 }\). In order to find the reaction of the public to these shades, each of a random sample of 120 people was asked to state which shade they preferred. The results, classified by gender, are shown in Table 1. \begin{table}[h]
Shade
\cline { 2 - 5 }\(G _ { 1 }\)\(G _ { 2 }\)\(G _ { 3 }\)
\cline { 2 - 5 } GenderMale112423
Female181331
\cline { 2 - 5 }
\cline { 2 - 5 }
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Table 2 shows the corresponding expected values, correct to 2 decimal places, for a test of independence. \begin{table}[h]
Shade
\cline { 2 - 5 }\(G _ { 1 }\)\(G _ { 2 }\)\(G _ { 3 }\)
\cline { 2 - 5 } GenderMale14.0217.8826.10
Female14.9819.1227.90
\cline { 2 - 5 }
\cline { 2 - 5 }
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Show how the value 17.88 for Male, \(G _ { 2 }\) was obtained.
  2. Test, at the \(5 \%\) significance level, whether gender and preferred shade are independent.
  3. Determine the smallest significance level obtained from tables or calculator for which there is evidence that not all shades are equally preferred by people in general, irrespective of gender.
Question 7
View details
7 The continuous random variable \(T\) has probability density function given by $$f ( t ) = \begin{cases} 4 t ^ { 3 } & 0 < t \leqslant 1
0 & \text { otherwise } \end{cases}$$
  1. Obtain the cumulative distribution function of \(T\).
  2. Find the cumulative distribution function of \(H\), where \(H = \frac { 1 } { T ^ { 4 } }\), and hence show that the probability density function of \(H\) is given by \(\mathrm { g } ( h ) = \frac { 1 } { h ^ { 2 } }\) over an interval to be stated.
  3. Find \(\mathrm { E } \left( 1 + 2 H ^ { - 1 } \right)\).