Edexcel S4 (Statistics 4)

Mark scheme PDF ↗

Question 1 13 marks
View details
A random sample \(X_1, X_2, ..., X_{10}\) is taken from a population with mean \(\mu\) and variance \(\sigma^2\).
  1. Determine the bias, if any, of each of the following estimators of \(\mu\). $$\theta_1 = \frac{X_1 + X_4 + X_5}{3}$$ $$\theta_2 = \frac{X_{10} - X_1}{3}$$ $$\theta_3 = \frac{3X_1 + 2X_5 + X_{10}}{6}$$ [4]
  2. Find the variance of each of these estimators. [5]
  3. State, giving reasons, which of these three estimators for \(\mu\) is
    1. the best estimator,
    2. the worst estimator.
    [4]
Question 1 8 marks
View details
A company manufactures bolts with a mean diameter of 5 mm. The company wishes to check that the diameter of the bolts has not decreased. A random sample of 10 bolts is taken and the diameters, \(x\) mm, of the bolts are measured. The results are summarised below. $$\sum x = 49.1 \quad \sum x^2 = 241.2$$ Using a 1\% level of significance, test whether or not the mean diameter of the bolts is less than 5 mm. (You may assume that the diameter of the bolts follows a normal distribution.) [8]
Question 1 13 marks
View details
A teacher wishes to test whether playing background music enables students to complete a task more quickly. The same task was completed by 15 students, divided at random into two groups. The first group had background music playing during the task and the second group had no background music playing. The times taken, in minutes, to complete the task are summarised below.
Sample size \(n\)Standard deviation \(s\)Mean \(\bar{x}\)
With background music84.115.9
Without background music75.217.9
You may assume that the times taken to complete the task by the students are two independent random samples from normal distributions.
  1. Stating your hypotheses clearly, test, at the 10\% level of significance, whether or not the variances of the times taken to complete the task with and without background music are equal. [5]
  2. Find a 99\% confidence interval for the difference in the mean times taken to complete the task with and without background music. [7]
Experiments like this are often performed using the same people in each group.
  1. Explain why this would not be appropriate in this case. [1]
Question 1 2 marks
View details
Find the value of the constant \(a\) such that $$\text{P}(a < F_{8,10} < 3.07) = 0.94$$ [2]
Question 2 17 marks
View details
A large number of students are split into two groups \(A\) and \(B\). The students sit the same test but under different conditions. Group A has music playing in the room during the test, and group B has no music playing during the test. Small samples are then taken from each group and their marks recorded. The marks are normally distributed. The marks are as follows: Sample from Group \(A\): 42, 40, 35, 37, 34, 43, 42, 44, 49 Sample from Group \(B\): 40, 44, 38, 47, 38, 37, 33
  1. Stating your hypotheses clearly, and using a 10\% level of significance, test whether or not there is evidence of a difference between the variances of the marks of the two groups. [8]
  2. State clearly an assumption you have made to enable you to carry out the test in part (a). [1]
  3. Use a two tailed test, with a 5\% level of significance, to determine if the playing of music during the test has made any difference in the mean marks of the two groups. State your hypotheses clearly. [7]
  4. Write down what you can conclude about the effect of music on a student's performance during the test. [1]
Question 2 12 marks
View details
An emission-control device is tested to see if it reduces CO\(_2\) emissions from cars. The emissions from 6 randomly selected cars are measured with and without the device. The results are as follows.
Car\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)
Emissions without device151.4164.3168.5148.2139.4151.2
Emissions with device148.9162.7166.9150.1140.0146.7
  1. State an assumption that needs to be made in order to carry out a \(t\)-test in this case. [1]
  2. State why a paired \(t\)-test is suitable for use with these data. [1]
  3. Using a 5\% level of significance, test whether or not there is evidence that the device reduces CO\(_2\) emissions from cars. [8]
  4. Explain, in context, what a type II error would be in this case. [2]
Question 2 9 marks
View details
As part of an investigation, a random sample of 10 people had their heart rate, in beats per minute, measured whilst standing up and whilst lying down. The results are summarised below.
Person12345678910
Heart rate lying down66705965726662695668
Heart rate standing up75766367807565746375
  1. State one assumption that needs to be made in order to carry out a paired \(t\)-test. [1]
  2. Test, at the 5\% level of significance, whether or not there is any evidence that standing up increases people's mean heart rate by more than 5 beats per minute. State your hypotheses clearly. [8]
Question 2 5 marks
View details
Two independent random samples \(X_1, X_2, ..., X_n\) and \(Y_1, Y_2, Y_3, Y_4\) were taken from different normal populations with a common standard deviation \(\sigma\). The following sample statistics were calculated. $$s_x = 14.67 \quad s_y = 12.07$$ Find the 99\% confidence interval for \(\sigma^2\) based on these two samples. [5]
Question 3 8 marks
View details
The weights, in grams, of mice are normally distributed. A biologist takes a random sample of 10 mice. She weighs each mouse and records its weight. The ten mice are then fed on a special diet. They are weighed again after two weeks. Their weights in grams are as follows:
Mouse\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)
Weight before diet50.048.347.554.038.942.750.146.840.341.2
Weight after diet52.147.650.152.342.244.351.848.041.943.6
Stating your hypotheses clearly, and using a 1\% level of significance, test whether or not the diet causes an increase in the mean weight of the mice. [8]
Question 3 12 marks
View details
Define, in terms of H\(_0\) and/or H\(_1\),
  1. the size of a hypothesis test, [1]
  2. the power of a hypothesis test. [1]
The probability of getting a head when a coin is tossed is denoted by \(p\). This coin is tossed 12 times in order to test the hypotheses H\(_0\): \(p = 0.5\) against H\(_1\): \(p \neq 0.5\), using a 5\% level of significance.
  1. Find the largest critical region for this test, such that the probability in each tail is less than 2.5\%. [4]
  2. Given that \(p = 0.4\)
    1. find the probability of a type II error when using this test,
    2. find the power of this test.
    [4]
  3. Suggest two ways in which the power of the test can be increased. [2]
Question 3 12 marks
View details
A manager in a sweet factory believes that the machines are working incorrectly and the proportion \(p\) of underweight bags of sweets is more than 5\%. He decides to test this by randomly selecting a sample of 5 bags and recording the number \(X\) that are underweight. The manager sets up the hypotheses H\(_0\): \(p = 0.05\) and H\(_1\): \(p > 0.05\) and rejects the null hypothesis if \(x > 1\).
  1. Find the size of the test. [2]
  2. Show that the power function of the test is $$1 - (1-p)^4(1+4p)$$ [3]
The manager goes on holiday and his deputy checks the production by randomly selecting a sample of 10 bags of sweets. He rejects the hypothesis that \(p = 0.05\) if more than 2 underweight bags are found in the sample.
  1. Find the probability of a Type I error using the deputy's test. [2]
Question 3 continues on page 12 The table below gives some values, to 2 decimal places, of the power function for the deputy's test.
\(p\)0.100.150.200.25
Power0.07\(s\)0.320.47
  1. Find the value of \(s\). [1]
The graph of the power function for the manager's test is shown in Figure 1. \includegraphics{figure_1}
  1. On the same axes, draw the graph of the power function for the deputy's test. [1]
  2. (i) State the value of \(p\) where these graphs intersect. (ii) Compare the effectiveness of the two tests if \(p\) is greater than this value. [2]
The deputy suggests that they should use his sampling method rather than the manager's.
  1. Give a reason why the manager might not agree to this change. [1]
Question 3 8 marks
View details
Manuel is planning to buy a new machine to squeeze oranges in his cafe and he has two models, at the same price, on trial. The manufacturers of machine B claim that their machine produces more juice from an orange than machine A. To test this claim Manuel takes a random sample of 8 oranges, cuts them in half and puts one half in machine A and the other half in machine B. The amount of juice, in ml, produced by each machine is given in the table below.
Orange12345678
Machine A6058555352515456
Machine B6160585255505258
Stating your hypotheses clearly, test, at the 10\% level of significance, whether or not the mean amount of juice produced by machine B is more than the mean amount produced by machine A. [8]
Question 4 9 marks
View details
A town council is concerned that the mean price of renting two bedroom flats in the town has exceeded £650 per month. A random sample of eight two bedroom flats gave the following results, £\(x\), per month. 705, 640, 560, 680, 800, 620, 580, 760 [You may assume \(\sum x = 5345\) and \(\sum x^2 = 3621025\)]
  1. Find a 90\% confidence interval for the mean price of renting a two bedroom flat. [6]
  2. State an assumption that is required for the validity of your interval in part (a). [1]
  3. Comment on whether or not the town council is justified in being concerned. Give a reason for your answer. [2]
Question 4 14 marks
View details
A farmer set up a trial to assess whether adding water to dry feed increases the milk yield of his cows. He randomly selected 22 cows. Thirteen of the cows were given dry feed and the other 9 cows were given the feed with water added. The milk yields, in litres per day, were recorded with the following results.
Sample sizeMean\(s^2\)
Dry feed1325.542.45
Feed with water added927.941.02
You may assume that the milk yield from cows given the dry feed and the milk yield from cows given the feed with water added are from independent normal distributions.
  1. Test, at the 10\% level of significance, whether or not the variances of the populations from which the samples are drawn are the same. State your hypotheses clearly. [5]
  2. Calculate a 95\% confidence interval for the difference between the two mean milk yields. [7]
  3. Explain the importance of the test in part (a) to the calculation in part (b). [2]
Question 4 16 marks
View details
A random sample of 15 strawberries is taken from a large field and the weight \(x\) grams of each strawberry is recorded. The results are summarised below. $$\sum x = 291 \quad \sum x^2 = 5968$$ Assume that the weights of strawberries are normally distributed. Calculate a 95\% confidence interval for
  1. (i) the mean of the weights of the strawberries in the field, (ii) the variance of the weights of the strawberries in the field. [12]
Strawberries weighing more than 23g are considered to be less tasty.
  1. Use appropriate confidence limits from part (a) to find the highest estimate of the proportion of strawberries that are considered to be less tasty. [4]
Question 4 12 marks
View details
A proportion \(p\) of letters sent by a company are incorrectly addressed and if \(p\) is thought to be greater than 0.05 then action is taken. Using H\(_0\): \(p = 0.05\) and H\(_1\): \(p > 0.05\), a manager from the company takes a random sample of 40 letters and rejects H\(_0\) if the number of incorrectly addressed letters is more than 3.
  1. Find the size of this test. [2]
  2. Find the probability of a Type II error in the case where \(p\) is in fact 0.10 [2]
Table 1 below gives some values, to 2 decimal places, of the power function of this test.
\(p\)0.0750.1000.1250.1500.1750.2000.225
Power0.35\(s\)0.750.870.940.970.99
Table 1
  1. Write down the value of \(s\). [1]
A visiting consultant uses an alternative system to test the same hypotheses. A sample of 15 letters is taken. If these are all correctly addressed then H\(_0\) is accepted. If 2 or more are found to have been incorrectly addressed then H\(_0\) is rejected. If only one is found to be incorrectly addressed then a further random sample of 15 is taken and H\(_0\) is rejected if 2 or more are found to have been incorrectly addressed in this second sample, otherwise H\(_0\) is accepted.
  1. Find the size of the test used by the consultant. [3]
Question 4 continues on page 8 \includegraphics{figure_1}
  1. On Figure 1 draw the graph of the power function of the manager's test. [2]
  2. State, giving your reasons, which test you would recommend. [2]
Question 5 8 marks
View details
A machine is filling bottles of milk. A random sample of 16 bottles was taken and the volume of milk in each bottle was measured and recorded. The volume of milk in a bottle is normally distributed and the unbiased estimate of the variance, \(s^2\), of the volume of milk in a bottle is 0.003
  1. Find a 95\% confidence interval for the variance of the population of volumes of milk from which the sample was taken. [5] The machine should fill bottles so that the standard deviation of the volumes is equal to 0.07
  2. Comment on this with reference to your 95\% confidence interval. [3]
Question 5 14 marks
View details
A machine fills jars with jam. The weight of jam in each jar is normally distributed. To check the machine is working properly the contents of a random sample of 15 jars are weighed in grams. Unbiased estimates of the mean and variance are obtained as $$\mu = 560 \quad s^2 = 25.2$$ Calculate a 95\% confidence interval for,
  1. the mean weight of jam, [4]
  2. the variance of the weight of jam. [5]
A weight of more than 565g is regarded as too high and suggests the machine is not working properly.
  1. Use appropriate confidence limits from parts (a) and (b) to find the highest estimate of the proportion of jars that weigh too much. [5]
Question 5 11 marks
View details
A car manufacturer claims that, on a motorway, the mean number of miles per gallon for the Panther car is more than 70. To test this claim a car magazine measures the number of miles per gallon, \(x\), of each of a random sample of 20 Panther cars and obtained the following statistics. $$\bar{x} = 71.2 \quad s = 3.4$$ The number of miles per gallon may be assumed to be normally distributed.
  1. Stating your hypotheses clearly and using a 5\% level of significance, test the manufacturer's claim. [5]
The standard deviation of the number of miles per gallon for the Tiger car is 4.
  1. Stating your hypotheses clearly, test, at the 5\% level of significance, whether or not there is evidence that the variance of the number of miles per gallon for the Panther car is different from that of the Tiger car. [6]
Question 5 14 marks
View details
The weights of the contents of breakfast cereal boxes are normally distributed. A manufacturer changes the style of the boxes but claims that the weight of the contents remains the same. A random sample of 6 old style boxes had contents with the following weights (in grams). 512, 503, 514, 506, 509, 515 The weights, \(y\) grams, of the contents of an independent random sample of 5 new style boxes gave $$\bar{y} = 504.8 \text{ and } s_y = 3.420$$
  1. Use a two-tail test to show, at the 10\% level of significance, that the variances of the weights of the contents of the old and new style boxes can be assumed to be equal. State your hypotheses clearly. [5]
  2. Showing your working clearly, find a 90\% confidence interval for \(\mu_x - \mu_y\) where \(\mu_x\) and \(\mu_y\) are the mean weights of the contents of old and new style boxes respectively. [7]
  3. With reference to your confidence interval comment on the manufacturer's claim. [2]
Question 6 12 marks
View details
A drug is claimed to produce a cure to a certain disease in 35\% of people who have the disease. To test this claim a sample of 20 people having this disease is chosen at random and given the drug. If the number of people cured is between 4 and 10 inclusive the claim will be accepted. Otherwise the claim will not be accepted.
  1. Write down suitable hypotheses to carry out this test. [2]
  2. Find the probability of making a Type I error. [3] The table below gives the value of the probability of the Type II error, to 4 decimal places, for different values of \(p\) where \(p\) is the probability of the drug curing a person with the disease.
    P(cure)0.20.30.40.5
    P(Type II error)0.5880\(r\)0.8565\(s\)
  3. Calculate the value of \(r\) and the value of \(s\). [3]
  4. Calculate the power of the test for \(p = 0.2\) and \(p = 0.4\) [2]
  5. Comment, giving your reasons, on the suitability of this test procedure. [2]
Question 6 15 marks
View details
A continuous uniform distribution on the interval \([0, k]\) has mean \(\frac{k}{2}\) and variance \(\frac{k^2}{12}\). A random sample of three independent variables \(X_1\), \(X_2\) and \(X_3\) is taken from this distribution.
  1. Show that \(\frac{2}{3}X_1 + \frac{1}{2}X_2 + \frac{5}{6}X_3\) is an unbiased estimator for \(k\). [3]
An unbiased estimator for \(k\) is given by \(\hat{k} = aX_1 + bX_2\) where \(a\) and \(b\) are constants.
  1. Show that Var(\(\hat{k}\)) = \((a^2 - 2a + 2) \frac{k^2}{6}\) [6]
  2. Hence determine the value of \(a\) and the value of \(b\) for which \(\hat{k}\) has minimum variance, and calculate this minimum variance. [6]
Question 6 14 marks
View details
Faults occur in a roll of material at a rate of \(\lambda\) per m\(^2\). To estimate \(\lambda\), three pieces of material of sizes 3 m\(^2\), 7 m\(^2\) and 10 m\(^2\) are selected and the number of faults \(X_1\), \(X_2\) and \(X_3\) respectively are recorded. The estimator \(\hat{\lambda}\), where $$\hat{\lambda} = k(X_1 + X_2 + X_3)$$ is an unbiased estimator of \(\lambda\).
  1. Write down the distributions of \(X_1\), \(X_2\) and \(X_3\) and find the value of \(k\). [4]
  2. Find Var(\(\hat{\lambda}\)). [3]
A random sample of \(n\) pieces of this material, each of size 4 m\(^2\), was taken. The number of faults on each piece, \(Y\), was recorded.
  1. Show that \(\frac{1}{4}\bar{Y}\) is an unbiased estimator of \(\lambda\). [2]
  2. Find Var(\(\frac{1}{4}\bar{Y}\)). [3]
  3. Find the minimum value of \(n\) for which \(\frac{1}{4}\bar{Y}\) becomes a better estimator of \(\lambda\) than \(\hat{\lambda}\). [2]
Question 7 8 marks
View details
An engineering firm buys steel rods. The steel rods from its present supplier are known to have a mean tensile strength of 230 N/mm\(^2\). A new supplier of steel rods offers to supply rods at a cheaper price than the present supplier. A random sample of ten rods from this new supplier gave tensile strengths, \(x\) N/mm\(^2\), which are summarised below.
Sample size\(\Sigma x\)\(\Sigma x^2\)
102283524079
  1. Stating your hypotheses clearly, and using a 5\% level of significance, test whether or not the rods from the new supplier have a tensile strength lower than the present supplier. (You may assume that the tensile strength is normally distributed). [7]
  2. In the light of your conclusion to part (a) write down what you would recommend the engineering firm to do. [1]