Edexcel S4 (Statistics 4)

Question 1 6 marks
View details
A beach is divided into two areas \(A\) and \(B\). A random sample of pebbles is taken from each of the two areas and the length of each pebble is measured. A sample of size 26 is taken from area \(A\) and the unbiased estimate for the population variance is \(s_A^2 = 0.495 \text{ mm}^2\). A sample of size 25 is taken from area \(B\) and the unbiased estimate for the population variance is \(s_B^2 = 1.04 \text{ mm}^2\).
  1. Stating your hypotheses clearly test, at the 10\% significance level, whether or not there is a difference in variability of pebble length between area \(A\) and area \(B\). [5]
  2. State the assumption you have made about the populations of pebble lengths in order to carry out this test. [1]
Question 1 4 marks
View details
The random variable \(X\) has an \(F\)-distribution with 8 and 12 degrees of freedom. Find P\(\left(\frac{1}{5.67} < X < 2.85\right)\). [4]
Question 1 6 marks
View details
The random variable \(X\) has a \(\chi^2\)-distribution with 9 degrees of freedom.
  1. Find P(2.088 < \(X\) < 19.023). [3]
The random variable \(Y\) follows an \(F\)-distribution with 12 and 5 degrees of freedom.
  1. [(b)] Find the upper and lower 5\% critical values for \(Y\). [3]
(Total 6 marks)
Question 1 7 marks
View details
Historical records from a large colony of squirrels show that the weight of squirrels is normally distributed with a mean of 101.2 g. Following a change in the diet of squirrels, a biologist is interested in whether or not the mean weight has changed. A random sample of 14 squirrels is weighed and their weights \(x\), in grams, recorded. The results are summarised as follows: \(\sum x = 1370\), \(\sum x^2 = 134487.50\). Stating your hypotheses clearly test, at the 5\% level of significance, whether or not there has been a change in the mean weight of the squirrels. [7]
Question 1 9 marks
View details
A medical student is investigating two methods of taking a person's blood pressure. He takes a random sample of 10 people and measures their blood pressure using an arm cuff and a finger monitor. The table below shows the blood pressure for each person, measured by each method. \includegraphics{figure_1}
  1. Use a paired \(t\)-test to determine, at the 10\% level of significance, whether or not there is a difference in the mean blood pressure measured using the two methods. State your hypotheses clearly. [8]
  2. State an assumption about the underlying distribution of measured blood pressure required for this test. [1]
Question 2 9 marks
View details
A random sample of 10 mustard plants had the following heights, in mm, after 4 days growth. 5.0, 4.5, 4.8, 5.2, 4.3, 5.1, 5.2, 4.9, 5.1, 5.0 Those grown previously had a mean height of 5.1 mm after 4 days. Using a 2.5\% significance level, test whether or not the mean height of these plants is less than that of those grown previously. (You may assume that the height of mustard plants after 4 days follows a normal distribution.) [9]
Question 2 6 marks
View details
A mechanic is required to change car tyres. An inspector timed a random sample of 20 tyre changes and calculated the unbiased estimate of the population variance to be 6.25 minutes². Test, at the 5\% significance level, whether or not the standard deviation of the population of times taken by the mechanic is greater than 2 minutes. State your hypotheses clearly. [6]
Question 2 6 marks
View details
The standard deviation of the length of a random sample of 8 fence posts produced by a timber yard was 8 mm. A second timber yard produced a random sample of 13 fence posts with a standard deviation of 14 mm.
  1. Test, at the 10\% significance level, whether or not there is evidence that the lengths of fence posts produced by these timber yards differ in variability. State your hypotheses clearly. [5]
  2. State an assumption you have made in order to carry out the test in part (a). [1]
(Total 6 marks)
Question 2 12 marks
View details
The weights, in grams, of apples are assumed to follow a normal distribution. The weights of apples sold by a supermarket have variance \(\sigma_1^2\). A random sample of 4 apples from the supermarket had weights 114, 100, 119, 123.
  1. Find a 95\% confidence interval for \(\sigma_1^2\). [7]
The weights of apples sold on a market stall have variance \(\sigma_M^2\). A second random sample of 7 apples was taken from the market stall. The sample variance \(s_M^2\) of the apples was 318.8.
  1. [(b)] Stating your hypotheses clearly test, at the 1\% level of significance, whether or not there is evidence that \(\sigma_M^2 > \sigma_1^2\). [5]
Question 2 11 marks
View details
The value of orders, in £, made to a firm over the internet has distribution N(\(\mu, \sigma^2\)). A random sample of \(n\) orders is taken and \(\bar{X}\) denotes the sample mean.
  1. Write down the mean and variance of \(\bar{X}\) in terms of \(\mu\) and \(\sigma^2\). [2]
A second sample of \(m\) orders is taken and \(\bar{Y}\) denotes the mean of this sample. An estimator of the population mean is given by $$U = \frac{n\bar{X} + m\bar{Y}}{n + m}$$
  1. [(b)] Show that \(U\) is an unbiased estimator for \(\mu\). [3]
  2. Show that the variance of \(U\) is \(\frac{\sigma^2}{n + m}\). [4]
  3. State which of \(\bar{X}\) or \(U\) is a better estimator for \(\mu\). Give a reason for your answer. [2]
Question 3 9 marks
View details
A train company claims that the probability \(p\) of one of its trains arriving late is 10\%. A regular traveller sets up the hypothesis \(H_0: p = 0.1\) and decides that the probability is greater than 10\% and decides to test this by randomly selecting 12 trains and recording the number \(X\) of trains that were late. The traveller sets up the hypotheses \(H_0: p = 0.1\) and \(H_1: p > 0.1\) and decides to reject \(H_0\) if \(x \ge 2\).
  1. Find the size of the test. [1]
  2. Show that the power function of the test is $$1 - (1 - p)^{10}(1 + 10p + 55p^2).$$ [4]
  3. Calculate the power of the test when
    1. \(p = 0.2\),
    2. \(p = 0.6\). [3]
  4. Comment on your results from part (c). [1]
Question 3 9 marks
View details
It is suggested that a Poisson distribution with parameter \(\lambda\) can model the number of currants in a currant bun. A random bun is selected in order to test the hypotheses H₀: \(\lambda = 8\) against H₁: \(\lambda \neq 8\), using a 10\% level of significance.
  1. Find the critical region for this test, such that the probability in each tail is as close as possible to 5\%. [5]
  2. Given that \(\lambda = 10\), find
    1. the probability of a type II error,
    2. the power of the test. [4]
Question 3
View details
A machine is set to fill bags with flour such that the mean weight is 1010 grams. To check that the machine is working properly, a random sample of 8 bags is selected. The weight of flour, in grams, in each bag is as follows. 1010 1015 1005 1000 998 1008 1012 1007 Carry out a suitable test, at the 5\% significance level, to test whether or not the mean weight of flour in the bags is less than 1010 grams. (You may assume that the weight of flour delivered by the machine is normally distributed.) (Total 8 marks)
Question 3 9 marks
View details
As part of an investigation into the effectiveness of solar heating, a pair of houses was identified where the mean weekly fuel consumption was the same. One of the houses was then fitted with solar heating and the other was not. Following the fitting of the solar heating, a random sample of 9 weeks was taken and the table below shows the weekly fuel consumption for each house. \includegraphics{figure_3}
  1. Stating your hypotheses clearly, test, at the 5\% level of significance, whether or not there is evidence that the solar heating reduces the mean weekly fuel consumption. [8]
  2. State an assumption about weekly fuel consumption that is required to carry out this test. [1]
Question 3 13 marks
View details
The lengths, \(x\) mm, of the forewings of a random sample of male and female adult butterflies are measured. The following statistics are obtained from the data. \includegraphics{figure_3}
  1. Assuming the lengths of the forewings are normally distributed test, at the 10\% level of significance, whether or not the variances of the two distributions are the same. State your hypotheses clearly. [7]
  2. Stating your hypotheses clearly test, at the 5\% level of significance, whether the mean length of the forewings of the female butterflies is less than the mean length of the forewings of the male butterflies. [6]
Question 4 9 marks
View details
A random sample of 15 tomatoes is taken and the weight \(x\) grams of each tomato is found. The results are summarised by \(\sum x = 208\) and \(\sum x^2 = 2962\).
  1. Assuming that the weights of the tomatoes are normally distributed, calculate the 90\% confidence interval for the variance \(\sigma^2\) of the weights of the tomatoes. [7]
  2. State with a reason whether or not the confidence interval supports the assertion \(\sigma^2 = 3\). [2]
Question 4 9 marks
View details
A doctor believes that the span of a person's dominant hand is greater than that of the weaker hand. To test his theory, the doctor measures the spans of the dominant and weaker hands of a random sample of 8 people. He subtracts the span of the weaker hand from that of the dominant hand. The spans, in cm, are summarised in the table below. \includegraphics{figure_4} Test, at the 5\% significance level, the doctor's belief. [9]
Question 4 13 marks
View details
A farmer set up a trial to assess the effect of two different diets on the increase in the weight of his lambs. He randomly selected 20 lambs. Ten of the lambs were given diet \(A\) and the other 10 lambs were given diet \(B\). The gain in weight, in kg, of each lamb over the period of the trial was recorded.
  1. State why a paired \(t\)-test is not suitable for use with these data. [1]
  2. Suggest an alternative method for selecting the sample which would make the use of a paired \(t\)-test suitable. [1]
  3. Suggest two other factors that the farmer might consider when selecting the sample. [2]
The following paired data were collected. \includegraphics{figure_4}
  1. [(d)] Using a paired \(t\)-test, at the 5\% significance level, test whether or not there is evidence of a difference in the weight gained by the lambs using diet \(A\) compared with those using diet \(B\). [8]
  2. State, giving a reason, which diet you would recommend the farmer to use for his lambs. [1]
(Total 13 marks)
Question 4 13 marks
View details
Two machines \(A\) and \(B\) produce the same type of component in a factory. The factory manager wishes to know whether the lengths, \(x\) cm, of the components produced by the two machines have the same mean. The manager took a random sample of components from each machine and the results are summarised in the table below. \includegraphics{figure_4} The lengths of components produced by the machines can be assumed to follow normal distributions.
  1. Use a two tail test to show, at the 10\% significance level, that the variances of the lengths of components produced by each machine can be assumed to be equal. [4]
  2. Showing your working clearly, find a 95\% confidence interval for \(\mu_A - \mu_B\), where \(\mu_A\) and \(\mu_B\) are the mean lengths of the populations of components produced by machine \(A\) and machine \(B\) respectively. [7]
There are serious consequences for the production at the factory if the difference in mean lengths of the components produced by the two machines is more than 0.7 cm.
  1. [(c)] State, giving your reason, whether or not the factory manager should be concerned. [2]
Question 4 12 marks
View details
The length \(X\) mm of a spring made by a machine is normally distributed N(\(\mu, \sigma^2\)). A random sample of 20 springs is selected and their lengths measured in mm. Using this sample the unbiased estimates of \(\mu\) and \(\sigma^2\) are \(\bar{x} = 100.6\), \(s^2 = 1.5\). Stating your hypotheses clearly test, at the 10\% level of significance,
  1. whether or not the variance of the lengths of springs is different from 0.9, [6]
  2. whether or not the mean length of the springs is greater than 100 mm. [6]
Question 5 11 marks
View details
  1. Define
    1. a Type I error,
    2. a Type II error. [2]
A small aviary, that leaves the eggs with the parent birds, rears chicks at an average rate of 5 per year. In order to increase the number of chicks reared per year it is decided to remove the eggs from the aviary as soon as they are laid and put them in an incubator. At the end of the first year of using an incubator 7 chicks had been successfully reared.
  1. [(b)] Assuming that the number of chicks reared per year follows a Poisson distribution test, at the 5\% significance level, whether or not there is evidence of an increase in the number of chicks reared per year. State your hypotheses clearly. [4]
  2. Calculate the probability of the Type I error for this test. [3]
  3. Given that the true average number of chicks reared per year when the eggs are hatched in an incubator is 8, calculate the probability of a Type II error. [2]
Question 5 15 marks
View details
  1. Explain briefly what you understand by
    1. an unbiased estimator,
    2. a consistent estimator.
of an unknown population parameter \(\theta\) [3] From a binomial population, in which the proportion of successes is \(p\), 3 samples of size \(n\) are taken. The number of successes \(X_1, X_2\), and \(X_3\) are recorded and used to estimate \(p\).
  1. [(b)] Determine the bias, if any, of each of the following estimators of \(p\). \(\hat{p}_1 = \frac{X_1 + X_2 + X_3}{3n}\), \(\hat{p}_2 = \frac{X_1 + 3X_2 + X_3}{6n}\), \(\hat{p}_3 = \frac{2X_1 + 3X_2 + X_3}{6n}\) [4]
  2. Find the variance of each of these estimators. [4]
  3. State, giving a reason, which of the three estimators for \(p\) is
    1. the best estimator,
    2. the worst estimator. [4]
Question 5 13 marks
View details
Define
  1. a Type I error, [1]
  2. the size of a test. [1]
Jane claims that she can read Alan's mind. To test this claim Alan randomly chooses a card with one of 4 symbols on it. He then concentrates on the symbol. Jane then attempts to read Alan's mind by stating what symbol she thinks is on the card. The experiment is carried out 8 times and the number of times \(X\) that Jane is recorded. The probability of Jane stating the correct symbol is denoted by \(p\). To test the hypothesis H₀: \(p = 0.25\) against H₁: \(p > 0.25\), a critical region of \(X > 6\) is used.
  1. [(c)] Find the size of this test. [3]
  2. Show that the power function of this test is \(8p^7 - 7p^8\). [3]
Given that \(p = 0.3\), calculate
  1. [(e)] the power of this test, [1]
  2. the probability of a Type II error. [2]
  3. Suggest two ways in which you might reduce the probability of a Type II error. [2]
(Total 12 marks)
Question 5 17 marks
View details
Rolls of cloth delivered to a factory contain defects at an average rate of 2 per metre. A quality assurance manager selects a random sample of 15 metres of cloth from each delivery to test whether or not there is evidence that \(\lambda > 0.3\). The criterion that the manager uses for rejecting the hypothesis that \(\lambda = 0.3\) is that there are 9 or more defects in the sample.
  1. Find the size of the test. [2]
Table 1 gives some values, to 2 decimal places, of the power function of this test. \includegraphics{figure_5}
  1. [(b)] Find the value of \(r\). [2]
The manager would like to design a test, of whether or not \(\lambda > 0.3\), that uses a smaller length of cloth. He chooses a length of 10 m and requires the probability of a type I error to be less than 10\%.
  1. [(c)] Find the criterion to reject the hypothesis that \(\lambda = 0.3\) which makes the test as powerful as possible. [2]
  2. Hence state the size of this second test. [1]
Table 2 gives some values, to 2 decimal places, of the power function for the test in part (c). \includegraphics{figure_5_table2}
  1. [(e)] Find the value of \(s\). [2]
  2. Using the same axes, on graph paper draw the graphs of the power functions of these two tests. [4]
  3. [(g)] State the value of \(\lambda\) where the graphs cross.
    1. Explain the significance of \(\lambda\) where the graphs cross. [2]
There are serious consequences for the production at the factory if the difference in mean lengths of the components produced by the two machines is more than 0.7 cm. Deliveries of cloth with \(\lambda = 0.3\) are unusable.
  1. [(h)] Suggest, giving your reasons, which test manager should adopt. [2]
Question 6 14 marks
View details
A random sample of three independent variables \(X_1, X_2\) and \(X_3\) is taken from a distribution with mean \(\mu\) and variance \(\sigma^2\).
  1. Show that \(\frac{1}{3}X_1 + \frac{1}{3}X_2 + \frac{1}{3}X_3\) is an unbiased estimator for \(\mu\). [3]
An unbiased estimator for \(\mu\) is given by \(\hat{\mu} = aX_1 + bX_2\) where \(a\) and \(b\) are constants.
  1. [(b)] Show that Var(\(\hat{\mu}\)) = \((2a^2 - 2a + 1)\sigma^2\). [6]
  2. Hence determine the value of \(a\) and the value of \(b\) for which \(\hat{\mu}\) has minimum variance. [5]
Question 6 16 marks
View details
A supervisor wishes to check the typing speed of a new typist. On 10 randomly selected occasions, the supervisor records the time taken for the new typist to type 100 words. The results, in seconds, are given below. 110, 125, 130, 126, 128, 127, 118, 120, 122, 125 The supervisor assumes that the time taken to type 100 words is normally distributed.
  1. Calculate a 95\% confidence interval for
    1. the mean,
    2. the variance
    of the population of times taken by this typist to type 100 words. [13]
The supervisor requires the average time needed to type 100 words to be no more than 130 seconds and the standard deviation to be no more than 4 seconds.
  1. [(b)] Comment on whether or not the supervisor should be concerned about the speed of the new typist. [3]
Question 6 12 marks
View details
Brickland and Goodbrick are two manufacturers of bricks. The lengths of the bricks produced by each manufacturer can be assumed to be normally distributed. A random sample of 20 bricks is taken from Brickland and the length, \(x\) mm, of each brick is recorded. The mean of this sample is 207.1 mm and the variance is 3.2 mm².
  1. Calculate the 98\% confidence interval for the mean length of brick from Brickland. [4]
A random sample of 10 bricks is selected from those manufactured by Goodbrick. The length of each brick, \(y\) mm, is recorded. The results are summarised as follows. \(\sum y = 2046.2\) \(\sum y^2 = 418785.4\) The variances of the length of brick for each manufacturer are assumed to be the same.
  1. [(b)] Find a 90\% confidence interval for the value by which the mean length of brick made by Brickland exceeds the mean length of brick made by Goodbrick. [8]
(Total 12 marks)
Question 6 17 marks
View details
\includegraphics{figure_6} Figure 1 shows a square of side 1 and area \(l^2\) which lies in the first quadrant with one vertex at the origin. A point \(P\) with coordinates \((X, Y)\) is selected at random inside the square and the coordinates are used to estimate \(l^2\). It is assumed that \(X\) and \(Y\) are independent random variables each having a continuous uniform distribution over the interval \([0, l]\). [You may assume that E\((X^n Y^m) = \) E\((X^n)\)E\((Y^m)\), where \(n\) is a positive integer.]
  1. Use integration to show that E\((X^n) = \frac{l^{n+1}}{n+1}\). [3]
The random variable \(S = kXY\), where \(k\) is a constant, is an unbiased estimator for \(l^2\).
  1. [(b)] Find the value of \(k\). [3]
  2. Show that Var \(S = \frac{7l^4}{9}\). [3]
The random variable \(U = q(X^2 + Y^2)\), where \(q\) is a constant, is also an unbiased estimator for \(l^2\).
  1. [(d)] Show that the value of \(q = \frac{3}{2}\). [3]
  2. Find Var \(U\). [3]
  3. State, giving a reason, which of \(S\) and \(U\) is the better estimator of \(l^2\). [1]
The point (2, 3) is selected from inside the square.
  1. [(g)] Use the estimator chosen in part (f) to find an estimate for the area of the square. [1]
TOTAL FOR PAPER: 75 MARKS
Question 7 17 marks
View details
Two methods of extracting juice from an orange are to be compared. Eight oranges are halved. One half of each orange is chosen at random and allocated to Method \(A\) and the other half is allocated to Method \(B\). The amounts of juice extracted, in ml, are given in the table. \includegraphics{figure_7} One statistician suggests performing a two-sample \(t\)-test to investigate whether or not there is a difference between the mean amounts of juice extracted by the two methods.
  1. Stating your hypotheses clearly and using a 5\% significance level, carry out this test. (You may assume \(\bar{x}_A = 26.125\), \(s_A^2 = 7.84\), \(\bar{x}_B = 25\), \(s_B^2 = 4\) and \(\sigma_A^2 = \sigma_B^2\).) [7]
Another statistician suggests analysing these data using a paired \(t\)-test.
  1. [(b)] Using a 5\% significance level, carry out this test. [9]
  2. State which of these two tests you consider to be more appropriate. Give a reason for your choice. [1]
Question 7 16 marks
View details
A grocer receives deliveries of cauliflowers from two different growers, \(A\) and \(B\). The grocer takes random samples of cauliflowers from those supplied by each grower. He measures the weight \(x\), in grams, of each cauliflower. The results are summarised in the table below. \includegraphics{figure_7}
  1. Show, at the 10\% significance level, that the variances of the populations from which the samples are drawn can be assumed to be equal by testing the hypothesis H₀: \(\sigma_A^2 = \sigma_B^2\) against hypothesis H₁: \(\sigma_A^2 \neq \sigma_B^2\). (You may assume that the two samples come from normal populations.) [6]
The grocer believes that the mean weight of cauliflowers provided by \(B\) is at least 150 g more than the mean weight of cauliflowers provided by \(A\).
  1. [(b)] Use a 5\% significance level to test the grocer's belief. [8]
  2. Justify your choice of test. [2]
Question 7 17 marks
View details
A bag contains marbles of which an unknown proportion \(p\) is red. A random sample of \(n\) marbles is drawn, with replacement, from the bag. The number \(X\) of red marbles drawn is noted. A second random sample of \(m\) marbles is drawn, with replacement. The number \(Y\) of red marbles drawn is noted. Given that \(p_1 = \frac{aX}{n} + \frac{bY}{m}\) is an unbiased estimator of \(p_1\),
  1. show that \(a + b = 1\). [4]
Given that \(p_2 = \frac{(X + Y)}{n + m}\)
  1. [(b)] show that \(p_2\) is an unbiased estimator for \(p\). [3]
  2. Show that the variance of \(p_1\) is p(1 - \(p\))\(\left(\frac{a^2}{n} + \frac{b^2}{m}\right)\). [3]
  3. Find the variance of \(p_2\). [3]
  4. Given that \(a = 0.4\), \(m = 10\) and \(n = 20\), explain which estimator \(p_1\) or \(p_2\) you should use. [4]
(Total 17 marks)