Questions S4 (270 questions)

Browse by board
AQA AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further AS Paper 1 Further AS Paper 2 Discrete Further AS Paper 2 Mechanics Further AS Paper 2 Statistics Further Paper 1 Further Paper 2 Further Paper 3 Discrete Further Paper 3 Mechanics Further Paper 3 Statistics M1 M2 M3 Paper 1 Paper 2 Paper 3 S1 S2 S3 CAIE FP1 FP2 Further Paper 1 Further Paper 2 Further Paper 3 Further Paper 4 M1 M2 P1 P2 P3 S1 S2 Edexcel AEA AS Paper 1 AS Paper 2 C1 C12 C2 C3 C34 C4 CP AS CP1 CP2 D1 D2 F1 F2 F3 FD1 FD1 AS FD2 FD2 AS FM1 FM1 AS FM2 FM2 AS FP1 FP1 AS FP2 FP2 AS FP3 FS1 FS1 AS FS2 FS2 AS M1 M2 M3 M4 M5 P1 P2 P3 P4 PMT Mocks Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 OCR AS Pure C1 C2 C3 C4 D1 D2 FD1 AS FM1 AS FP1 FP1 AS FP2 FP3 FS1 AS Further Additional Pure Further Additional Pure AS Further Discrete Further Discrete AS Further Mechanics Further Mechanics AS Further Pure Core 1 Further Pure Core 2 Further Pure Core AS Further Statistics Further Statistics AS H240/01 H240/02 H240/03 M1 M2 M3 M4 Mechanics 1 PURE Pure 1 S1 S2 S3 S4 Stats 1 OCR MEI AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further Extra Pure Further Mechanics A AS Further Mechanics B AS Further Mechanics Major Further Mechanics Minor Further Numerical Methods Further Pure Core Further Pure Core AS Further Pure with Technology Further Statistics A AS Further Statistics B AS Further Statistics Major Further Statistics Minor M1 M2 M3 M4 Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 SPS SPS ASFM SPS ASFM Mechanics SPS ASFM Pure SPS ASFM Statistics SPS FM SPS FM Mechanics SPS FM Pure SPS FM Statistics SPS SM SPS SM Mechanics SPS SM Pure SPS SM Statistics WJEC Further Unit 1 Further Unit 2 Further Unit 3 Further Unit 4 Further Unit 5 Further Unit 6 Unit 1 Unit 2 Unit 3 Unit 4
Edexcel S4 Q3
3. A certain vaccine is known to be only \(70 \%\) effective against a particular virus; thus \(30 \%\) of those vaccinated will actually catch the virus. In order to test whether or not a new and more expensive vaccine provides better protection against the same virus, a random sample of 30 people were chosen and given the new vaccine. If fewer than 6 people contracted the virus the new vaccine would be considered more effective than the current one.
  1. Write down suitable hypotheses for this test.
  2. Find the probability of making a Type I error.
  3. Find the power of this test if the new vaccine is
    1. \(80 \%\) effective,
    2. \(90 \%\) effective. An independent research organisation decided to test the new vaccine on a random sample of 50 people to see if it could be considered more than \(70 \%\) effective. They required the probability of a Type I error to be as close as possible to 0.05 .
  4. Find the critical region for this test.
  5. State the size of this critical region.
  6. Find the power of this test if the new vaccine is
    1. \(80 \%\) effective,
    2. \(90 \%\) effective.
  7. Give one advantage and one disadvantage of the second test.
Edexcel S4 Q4
4. Gill, a member of the accounts department in a large company, is studying the expenses claims of company employees. She assumes that the claims, in \(\pounds\), follow a normal distribution with mean \(\mu\) and variance \(\sigma ^ { 2 }\). As a first stage in her investigation she took the following random sample of 10 claims. $$30.85,99.75,142.73,223.16,75.43,28.57,53.90,81.43,68.62,43.45 .$$
  1. Find a 95\% confidence interval for \(\mu\). The chief accountant would like a \(95 \%\) confidence interval where the difference between the upper confidence limit and the lower confidence limit is less than 20 .
  2. Show that \(\frac { \sigma ^ { 2 } } { n } < 26.03\) (to 2 decimal places), where \(n\) is the size of the sample required to achieve this. Gill decides to use her original sample of 10 to obtain a value for \(\sigma ^ { 2 }\) so that the chance of her value being an underestimate is 0.01 .
  3. Find such a value for \(\sigma ^ { 2 }\).
  4. Use this value for \(\sigma ^ { 2 }\) to estimate the size of sample the chief accountant requires.
Edexcel S4 Q5
5. An educational researcher is testing the effectiveness of a new method of teaching a topic in mathematics. A random sample of 10 children were taught by the new method and a second random sample of 9 children, of similar age and ability, were taught by the conventional method. At the end of the teaching, the same test was given to both groups of children. The marks obtained by the two groups are summarised in the table below.
New methodConventional method
Mean \(( \bar { x } )\)82.378.2
Standard deviation \(( s )\)3.55.7
Number of students \(( n )\)109
  1. Stating your hypotheses clearly and using a \(5 \%\) level of significance, investigate whether or not
    1. the variance of the marks of children taught by the conventional method is greater than that of children taught by the new method,
    2. the mean score of children taught by the conventional method is lower than the mean score of those taught by the new method.
      [0pt] [In each case you should give full details of the calculation of the test statistics.]
  2. State any assumptions you made in order to carry out these tests.
  3. Find a 95\% confidence interval for the common variance of the marks of the two groups.
Edexcel S4 Q6
6. A statistics student is trying to estimate the probability, \(p\), of rolling a 6 with a particular die. The die is rolled 10 times and the random variable \(X _ { 1 }\) represents the number of sixes obtained. The random variable \(R _ { 1 } = \frac { X _ { 1 } } { 10 }\) is proposed as an estimator of \(p\).
  1. Show that \(R _ { 1 }\) is an unbiased estimator of \(p\). The student decided to roll the die again \(n\) times ( \(n > 10\) ) and the random variable \(X _ { 2 }\) represents the number of sixes in these \(n\) rolls. The random variable \(R _ { 2 } = \frac { X _ { 2 } } { n }\) and the random variable \(Y = \frac { 1 } { 2 } \left( R _ { 1 } + R _ { 2 } \right)\).
  2. Show that both \(R _ { 2 }\) and \(Y\) are unbiased estimators of \(p\).
  3. Find \(\operatorname { Var } \left( R _ { 2 } \right)\) and \(\operatorname { Var } ( Y )\).
  4. State giving a reason which of the 3 estimators \(R _ { 1 } , R _ { 2 }\) and \(Y\) are consistent estimators of \(p\).
  5. For the case \(n = 20\) state, giving a reason, which of the 3 estimators \(R _ { 1 } , R _ { 2 }\) and \(Y\) you would recommend. The student's teacher pointed out that a better estimator could be found based on the random variable \(X _ { 1 } + X _ { 2 }\).
  6. Find a suitable estimator and explain why it is better than \(R _ { 1 } , R _ { 2 }\) and \(Y\). END
OCR S4 2010 June Q3
  1. Assuming that all rankings are equally likely, show that \(\mathrm { P } ( R \leqslant 17 ) = \frac { 2 } { 231 }\). The marks of 5 randomly chosen students from School \(A\) and 6 randomly chosen students from School \(B\), who took the same examination, achieving different marks, were ranked. The rankings are shown in the table.
    Rank1234567891011
    School\(A\)\(A\)\(A\)\(B\)\(A\)\(A\)\(B\)\(B\)\(B\)\(B\)\(B\)
  2. For a Wilcoxon rank-sum test, obtain the exact smallest significance level for which there is evidence of a difference in performance at the two schools.
Edexcel S4 Q1
  1. A beach is divided into two areas \(A\) and \(B\). A random sample of pebbles is taken from each of the two areas and the length of each pebble is measured. A sample of size 26 is taken from area \(A\) and the unbiased estimate for the population variance is \(s _ { A } ^ { 2 } = 0.495 \mathrm {~mm} ^ { 2 }\). A sample of size 25 is taken from area \(B\) and the unbiased estimate for the population variance is \(s _ { B } ^ { 2 } = 1.04 \mathrm {~mm} ^ { 2 }\).
    1. Stating your hypotheses clearly test, at the \(10 \%\) significance level, whether or not there is a difference in variability of pebble length between area \(A\) and area \(B\).
    2. State the assumption you have made about the populations of pebble lengths in order to carry out the test.
      (1)
    3. A random sample of 10 mustard plants had the following heights, in mm , after 4 days growth.
    $$5.0,4.5,4.8,5.2,4.3,5.1,5.2,4.9,5.1,5.0$$ Those grown previously had a mean height of 5.1 mm after 4 days. Using a \(2.5 \%\) significance level, test whether or not the mean height of these plants is less than that of those grown previously.
    (You may assume that the height of mustard plants after 4 days follows a normal distribution.)
    (9)
    3. A train company claims that the probability \(p\) of one of its trains arriving late is \(10 \%\). A regular traveller on the company's trains believes that the probability is greater than \(10 \%\) and decides to test this by randomly selecting 12 trains and recording the number \(X\) of trains that were late. The traveller sets up the hypotheses \(\mathrm { H } _ { 0 } : p = 0.1\) and \(\mathrm { H } _ { 1 } : p > 0.1\) and accepts the null hypothesis if \(x \leq 2\).
  2. Find the size of the test.
  3. Show that the power function of the test is $$1 - ( 1 - p ) ^ { 10 } \left( 1 + 10 p + 55 p ^ { 2 } \right)$$
  4. Calculate the power of the test when
    1. \(p = 0.2\),
    2. \(p = 0.6\).
  5. Comment on your results from part (c).
    4. A random sample of 15 tomatoes is taken and the weight \(x\) grams of each tomato is found. The results are summarised by \(\sum x = 208\) and \(\sum x ^ { 2 } = 2962\).
  6. Assuming that the weights of the tomatoes are normally distributed, calculate the \(90 \%\) confidence interval for the variance \(\sigma ^ { 2 }\) of the weights of the tomatoes.
  7. State with a reason whether or not the confidence interval supports the assertion \(\sigma ^ { 2 } = 3\).
    5. (a) Define
    1. a Type I error,
    2. a Type II error. A small aviary, that leaves the eggs with the parent birds, rears chicks at an average rate of 5 per year. In order to increase the number of chicks reared per year it is decided to remove the eggs from the aviary as soon as they are laid and put them in an incubator. At the end of the first year of using an incubator 7 chicks had been successfully reared.
  8. Assuming that the number of chicks reared per year follows a Poisson distribution test, at the \(5 \%\) significance level, whether or not there is evidence of an increase in the number of chicks reared per year. State your hypotheses clearly.
  9. Calculate the probability of the Type I error for this test.
  10. Given that the true average number of chicks reared per year when the eggs are hatched in an incubator is 8, calculate the probability of a Type II error.
    6. A random sample of three independent variables \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\) is taken from a distribution with mean \(\mu\) and variance \(\sigma ^ { 2 }\).
  11. Show that \(\frac { 2 } { 3 } X _ { 1 } - \frac { 1 } { 2 } X _ { 2 } + \frac { 5 } { 6 } X _ { 3 }\) is an unbiased estimator for \(\mu\). An unbiased estimator for \(\mu\) is given by \(\hat { \mu } = a X _ { 1 } + b X _ { 2 }\) where \(a\) and \(b\) are constants.
  12. Show that \(\operatorname { Var } ( \hat { \mu } ) = \left( 2 a ^ { 2 } - 2 a + 1 \right) \sigma ^ { 2 }\).
  13. Hence determine the value of \(a\) and the value of \(b\) for which \(\hat { \mu }\) has minimum variance.
    7. Two methods of extracting juice from an orange are to be compared. Eight oranges are halved. One half of each orange is chosen at random and allocated to Method \(A\) and the other half is allocated to Method \(B\). The amounts of juice extracted, in ml , are given in the table.
    \cline { 2 - 9 } \multicolumn{1}{c|}{}Orange
    \cline { 2 - 9 } \multicolumn{1}{c|}{}12345678
    Method \(A\)2930262526222328
    Method \(B\)2725282423262225
    One statistician suggests performing a two-sample \(t\)-test to investigate whether or not there is a difference between the mean amounts of juice extracted by the two methods.
  14. Stating your hypotheses clearly and using a \(5 \%\) significance level, carry out this test.
    (You may assume \(\bar { x } _ { A } = 26.125 , s _ { A } ^ { 2 } = 7.84 , \bar { x } _ { B } = 25 , s _ { B } ^ { 2 } = 4\) and \(\sigma _ { A } ^ { 2 } = \sigma _ { B } ^ { 2 }\) ) Another statistician suggests analysing these data using a paired \(t\)-test.
  15. Using a \(5 \%\) significance level, carry out this test.
  16. State which of these two tests you consider to be more appropriate. Give a reason for your choice.
    (1) \section*{END} \section*{Advanced/Advanced Subsidiary} Wednesday 16 June 2004 - Afternoon Time: \(\mathbf { 1 }\) hour \(\mathbf { 3 0 }\) minutes Answer Book (AB16)
    Nil
    Graph Paper (ASG2)
    Mathematical Formulae (Lilac) Candidates may use any calculator EXCEPT those with the facility for symbolic algebra, differentiation and/or integration. Thus candidates may NOT use calculators such as the Texas Instruments TI 89, TI 92, Casio CFX 9970G, Hewlett Packard HP 48G. In the boxes on the answer book, write the name of the examining body (Edexcel), your centre number, candidate number, the unit title (Statistics S4), the paper reference (6686), your surname, other name and signature.
    Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
    Full marks may be obtained for answers to ALL questions.
    This paper has seven questions. You must ensure that your answers to parts of questions are clearly labelled.
    You must show sufficient working to make your methods clear to the Examiner. Answers without working may gain no credit.
    1. The random variable \(X\) has an \(F\)-distribution with 8 and 12 degrees of freedom.
    Find \(\mathrm { P } \left( \frac { 1 } { 5.67 } < X < 2.85 \right)\).
Edexcel S4 Q3
3. A train company claims that the probability \(p\) of one of its trains arriving late is \(10 \%\). A regular traveller on the company's trains believes that the probability is greater than \(10 \%\) and decides to test this by randomly selecting 12 trains and recording the number \(X\) of trains that were late. The traveller sets up the hypotheses \(\mathrm { H } _ { 0 } : p = 0.1\) and \(\mathrm { H } _ { 1 } : p > 0.1\) and accepts the null hypothesis if \(x \leq 2\).
  1. Find the size of the test.
  2. Show that the power function of the test is $$1 - ( 1 - p ) ^ { 10 } \left( 1 + 10 p + 55 p ^ { 2 } \right)$$
  3. Calculate the power of the test when
    1. \(p = 0.2\),
    2. \(p = 0.6\).
  4. Comment on your results from part (c).
Edexcel S4 Q5
5. (a) Define
  1. a Type I error,
  2. a Type II error. A small aviary, that leaves the eggs with the parent birds, rears chicks at an average rate of 5 per year. In order to increase the number of chicks reared per year it is decided to remove the eggs from the aviary as soon as they are laid and put them in an incubator. At the end of the first year of using an incubator 7 chicks had been successfully reared.
    (b) Assuming that the number of chicks reared per year follows a Poisson distribution test, at the \(5 \%\) significance level, whether or not there is evidence of an increase in the number of chicks reared per year. State your hypotheses clearly.
    (c) Calculate the probability of the Type I error for this test.
    (d) Given that the true average number of chicks reared per year when the eggs are hatched in an incubator is 8, calculate the probability of a Type II error.
Edexcel S4 Q7
7. Two methods of extracting juice from an orange are to be compared. Eight oranges are halved. One half of each orange is chosen at random and allocated to Method \(A\) and the other half is allocated to Method \(B\). The amounts of juice extracted, in ml , are given in the table. The lengths of components produced by the machines can be assumed to follow normal distributions.
  1. Use a two tail test to show, at the \(10 \%\) significance level, that the variances of the lengths of components produced by each machine can be assumed to be equal.
  2. Showing your working clearly, find a \(95 \%\) confidence interval for \(\mu _ { B } - \mu _ { A }\), where \(\mu _ { A }\) and \(\mu _ { B }\) are the mean lengths of the populations of components produced by machine \(A\) and machine \(B\) respectively. There are serious consequences for the production at the factory if the difference in mean lengths of the components produced by the two machines is more than 0.7 cm .
  3. State, giving your reason, whether or not the factory manager should be concerned.
    5. Rolls of cloth delivered to a factory contain defects at an average rate of \(\lambda\) per metre. A quality assurance manager selects a random sample of 15 metres of cloth from each delivery to test whether or not there is evidence that \(\lambda > 0.3\). The criterion that the manager uses for rejecting the hypothesis that \(\lambda = 0.3\) is that there are 9 or more defects in the sample.
  4. Find the size of the test. Table 1 gives some values, to 2 decimal places, of the power function of this test. \begin{table}[h]
  5. Use a paired \(t\)-test to determine, at the \(10 \%\) level of significance, whether or not there is a difference in the mean blood pressure measured using the two methods. State your hypotheses clearly.
  6. State an assumption about the underlying distribution of measured blood pressure required for this test.
    2. The value of orders, in \(\pounds\), made to a firm over the internet has distribution \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\). A random sample of \(n\) orders is taken and \(\bar { X }\) denotes the sample mean.
  7. Write down the mean and variance of \(\bar { X }\) in terms of \(\mu\) and \(\sigma ^ { 2 }\). A second sample of \(m\) orders is taken and \(\bar { Y }\) denotes the mean of this sample.
    An estimator of the population mean is given by $$U = \frac { n \bar { X } + m \bar { Y } } { n + m }$$
  8. Show that \(U\) is an unbiased estimator for \(\mu\).
  9. Show that the variance of \(U\) is \(\frac { \sigma ^ { 2 } } { n + m }\).
  10. State which of \(\bar { X }\) or \(U\) is a better estimator for \(\mu\). Give a reason for your answer.
    3. The lengths, \(x \mathrm {~mm}\), of the forewings of a random sample of male and female adult butterflies are measured. The following statistics are obtained from the data.
  11. Stating your hypotheses clearly, and using a \(10 \%\) level of significance, test whether or not there is evidence of a difference between the variances of the marks of the two groups.
  12. State clearly an assumption you have made to enable you to carry out the test in part (a).
  13. Use a two tailed test, with a \(5 \%\) level of significance, to determine if the playing of music during the test has made any difference in the mean marks of the two groups. State your hypotheses clearly.
  14. Write down what you can conclude about the effect of music on a student's performance during the test.
    3. The weights, in grams, of mice are normally distributed. A biologist takes a random sample of 10 mice. She weighs each mouse and records its weight. The ten mice are then fed on a special diet. They are weighed again after two weeks.
    Their weights in grams are as follows:
  15. State an assumption that needs to be made in order to carry out a \(t\)-test in this case.
  16. State why a paired \(t\)-test is suitable for use with these data.
  17. Using a \(5 \%\) level of significance, test whether or not there is evidence that the device reduces \(\mathrm { CO } _ { 2 }\) emissions from cars.
  18. Explain, in context, what a type II error would be in this case.
    3. Define, in terms of \(\mathrm { H } _ { 0 }\) and/or \(\mathrm { H } _ { 1 }\),
  19. the size of a hypothesis test,
  20. the power of a hypothesis test. The probability of getting a head when a coin is tossed is denoted by \(p\). This coin is tossed 12 times in order to test the hypotheses \(\mathrm { H } _ { 0 } : p = 0.5\) against \(\mathrm { H } _ { 1 } : p \neq 0.5\), using a \(5 \%\) level of significance.
  21. Find the largest critical region for this test, such that the probability in each tail is less than \(2.5 \%\).
  22. Given that \(p = 0.4\)
    1. find the probability of a type II error when using this test,
    2. find the power of this test.
  23. Suggest two ways in which the power of the test can be increased.
    4. A farmer set up a trial to assess whether adding water to dry feed increases the milk yield of his cows. He randomly selected 22 cows. Thirteen of the cows were given dry feed and the other 9 cows were given the feed with water added. The milk yields, in litres per day, were recorded with the following results. You may assume that the times taken to complete the task by the students are two independent random samples from normal distributions.
  24. Stating your hypotheses clearly, test, at the \(10 \%\) level of significance, whether or not the variances of the times taken to complete the task with and without background music are equal.
  25. Find a \(99 \%\) confidence interval for the difference in the mean times taken to complete the task with and without background music. Experiments like this are often performed using the same people in each group.
  26. Explain why this would not be appropriate in this case.
    2. As part of an investigation, a random sample of 10 people had their heart rate, in beats per minute, measured whilst standing up and whilst lying down. The results are summarized below. Stating your hypotheses clearly, test, at the \(10 \%\) level of significance, whether or not the mean amount of juice produced by machine \(B\) is more than the mean amount produced by machine \(A\).
    4. A proportion \(p\) of letters sent by a company are incorrectly addressed and if \(p\) is thought to be greater than 0.05 then action is taken. Using \(\mathrm { H } _ { 0 } : p = 0.05\) and \(\mathrm { H } _ { 1 } : p > 0.05\), a manager from the company takes a random sample of 40 letters and rejects \(\mathrm { H } _ { 0 }\) if the number of incorrectly addressed letters is more than 3 .
  27. Find the size of this test.
  28. Find the probability of a Type II error in the case where \(p\) is in fact 0.10 . Table 1 below gives some values, to 2 decimal places, of the power function of this test. The student decides to carry out a paired \(t\)-test to investigate whether, on average, the blood pressure of a person when sitting down is more than their blood pressure after standing up.
  29. State clearly the hypotheses that should be used and any necessary assumption that needs to be made.
  30. Carry out the test at the \(1 \%\) level of significance.
    2. A biologist investigating the shell size of turtles takes random samples of adult female and adult make turtles and records the length, \(x \mathrm {~cm}\), of the shell. The results are summarised below. Assuming that the scores are normally distributed and stating your hypotheses clearly, test at the \(5 \%\) level of significance whether or not there is evidence to support the teacher's belief.
    (8)
    6. A machine fills bottles with water. The amount of water in each bottle is normally distributed. To check the machine is working properly, a random sample of 12 bottles is selected and the amount of water, in ml , in each bottle is recorded. Unbiased estimates for the mean and variance are $$\mu = 502 \quad s ^ { 2 } = 5.6$$ Stating your hypotheses clearly, test at the \(1 \%\) level of significance
  31. whether or not the mean amount of water in a bottle is more than 500 ml ,
  32. whether or not the standard deviation of the amount of water in a bottle is less than 3 ml .
    7. A machine produces bricks. The lengths, \(x \mathrm {~mm}\), of the bricks are distributed \(\mathrm { N } \left( \mu , 2 ^ { 2 } \right)\). At the start of each week a random sample of \(n\) bricks is taken to check the machine is working correctly.
    A test is then carried out at the \(1 \%\) level of significance with $$\mathrm { H } _ { 0 } : \mu = 202 \quad \text { and } \quad \mathrm { H } _ { 1 } : \mu < 202$$
  33. Find, in terms of \(n\), the critical region of the test. The probability of a type II error, when \(\mu = 200\), is less than 0.05 .
  34. Find the minimum value of \(n\).
Edexcel S4 Q8
8. A random sample \(W _ { 1 } , W _ { 2 } \ldots , W _ { n }\) is taken from a distribution with mean \(\mu\) and variance \(\sigma ^ { 2 }\).
  1. Write down \(\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } \right)\) and show that \(\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } \right) = n \left( \sigma ^ { 2 } + \mu ^ { 2 } \right)\). An estimator for \(\mu\) is $$\bar { X } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i }$$
  2. Show that \(\bar { X }\) is a consistent estimator for \(\mu\). An estimator of \(\sigma ^ { 2 }\) is $$U = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } - \left( \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } \right) ^ { 2 }$$
  3. Find the bias of \(U\).
  4. Write down an unbiased estimator of \(\sigma ^ { 2 }\) in the form \(k U\), where \(k\) is in terms of \(n\). \section*{Advanced/Advanced Subsidiary} \section*{Friday 21 June 2013 - Morning} Mathematical Formulae (Pink) Nil Candidates may use any calculator allowed by the regulations of the Joint Council for Qualifications. Calculators must not have the facility for symbolic algebra manipulation or symbolic differentiation/integration, or have retrievable mathematical formulae stored in them. In the boxes above, write your centre number, candidate number, your surname, initials and signature. Check that you have the correct question paper.
    Answer ALL the questions.
    You must write your answer for each question in the space following the question.
    Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
    Full marks may be obtained for answers to ALL questions.
    The marks for the parts of questions are shown in round brackets, e.g. (2).
    There are 6 questions in this question paper. The total mark for this paper is 75.
    There are 20 pages in this question paper. Any blank pages are indicated. You must ensure that your answers to parts of questions are clearly labelled.
    You must show sufficient working to make your methods clear to the Examiner.
    Answers without working may not gain full credit.
    1. George owns a garage and he records the mileage of cars, \(x\) thousands of miles, between services. The results from a random sample of 10 cars are summarised below.
    $$\sum x = 113.4 \quad \sum x ^ { 2 } = 1414.08$$ The mileage of cars between services is normally distributed and George believes that the standard deviation is 2.4 thousand miles. Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not these data support George's belief.
    2. Every 6 months some engineers are tested to see if their times, in minutes, to assemble a particular component have changed. The times taken to assemble the component are normally distributed. A random sample of 8 engineers was chosen and their times to assemble the component were recorded in January and in July. The data are given in the table below.
  5. Using a suitable test, at the \(5 \%\) level of significance, state whether or not, on the basis of this trial, you would recommend using the new medicine. State your hypotheses clearly.
  6. State an assumption needed to carry out this test.
    2. The cloth produced by a certain manufacturer has defects that occur randomly at a constant rate of \(\lambda\) per square metre. If \(\lambda\) is thought to be greater than 1.5 then action has to be taken. Using \(\mathrm { H } _ { 0 } : \lambda = 1.5\) and \(\mathrm { H } _ { 1 } : \lambda > 1.5\) a quality control officer takes a \(4 \mathrm {~m} ^ { 2 }\) sample of cloth and rejects \(\mathrm { H } _ { 0 }\) if there are 11 or more defects. If there are 8 or fewer defects she accepts \(\mathrm { H } _ { 0 }\). If there are 9 or 10 defects a second sample of \(4 \mathrm {~m} ^ { 2 }\) is taken and H 0 is rejected if there are 11 or more defects in this second sample, otherwise it is accepted.
  7. Find the size of this test.
  8. Find the power of this test when \(\lambda = 2\).
    3. A farmer is investigating the milk yields of two breeds of cow. He takes a random sample of 9 cows of breed \(A\) and an independent random sample of 12 cows of breed \(B\). For a 5 day period he measures the amount of milk, \(x\) gallons, produced by each cow. The results are summarised in the table below.
  9. State one assumption that needs to be made in order to carry out a paired \(t\)-test.
  10. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not the drug increases the mean number of hours of sleep per night by more than 10 minutes. State the critical value for this test.
    5. A statistician believes a coin is biased and the probability, \(p\), of getting a head when the coin is tossed is less than 0.5 . The statistician decides to test this by tossing the coin 10 times and recording the number, \(X\), of heads. He sets up the hypotheses \(\mathrm { H } _ { 0 } : p = 0.5\) and \(\mathrm { H } _ { 1 } : p < 0.5\) and rejects the null hypothesis if \(x < 3\).
  11. Find the size of the test.
  12. Show that the power function of this test is $$( 1 - p ) ^ { 8 } \left( 36 p ^ { 2 } + 8 p + 1 \right)$$ Table 1 gives values, to 2 decimal places, of the power function for the statistician's test. \begin{table}[h] \section*{Table 1}
  13. On the axes below draw the graph of the power function for the statistician's test.
  14. Find the range of values of \(p\) for which the probability of accepting the coin as unbiased, when in fact it is biased, is less than or equal to 0.4 .
    (3)
    \includegraphics[max width=\textwidth, alt={}, center]{47023328-16c0-452b-be48-046187e4193e-38_747_792_731_351}
    6. (a) Explain what is meant by the sampling distribution of an estimator \(T\) of the population parameter \(\theta\).
  15. Explain what you understand by the statement that \(T\) is a biased estimator of \(\theta\). A population has mean \(\mu\) and variance \(\sigma ^ { 2 }\).
    A random sample \(X _ { 1 } , X _ { 2 } , \ldots , X _ { 10 }\) is taken from this population.
  16. Calculate the bias of each of the following estimators of \(\mu\). $$\begin{aligned} & \hat { \mu } _ { 1 } = \frac { X _ { 3 } + X _ { 5 } + X _ { 7 } } { 3 }
    & \hat { \mu } _ { 2 } = \frac { 5 X _ { 1 } + 2 X _ { 2 } + X _ { 9 } } { 6 }
    & \hat { \mu } _ { 3 } = \frac { 3 X _ { 10 } - X _ { 1 } } { 3 } \end{aligned}$$
  17. Find the variance of each of these three estimators.
  18. State, giving a reason, which of these three estimators for \(\mu\) is
    1. the best estimator,
    2. the worst estimator.
      7. Two groups of students take the same examination. A random sample of students is taken from each of the groups.
      The marks of the 9 students from Group 1 are as follows $$\begin{array} { l l l l l l l l l } 30 & 29 & 35 & 27 & 23 & 33 & 33 & 35 & 28 \end{array}$$ The marks, \(x\), of the 7 students from Group 2 gave the following statistics $$\bar { x } = 31.29 \quad s ^ { 2 } = 12.9$$ A test is to be carried out to see whether or not there is a difference between the mean marks of the two groups of students. You may assume that the samples are taken from normally distributed populations and that they are independent.
  19. State one other assumption that must be made in order to apply this test and show that this assumption is reasonable by testing it at a \(10 \%\) level of significance. State your hypotheses clearly.
  20. Stating your hypotheses clearly, test, using a significance level of \(5 \%\), whether or not there is a difference between the mean marks of the two groups of students. \section*{TOTAL FOR PAPER: 75 MARKS} \section*{END} Materials required for examination
    Answer Book (AB16)
    Graph Paper (ASG2)
    Mathematical Formulae (Lilac) Items included with question papers Nil 6686 Candidates may use any calculator EXCEPT those with the facility for symbolic algebra, differentiation and/or integration. Thus candidates may NOT use calculators such as the Texas Instruments TI 89, TI 92, Casio CFX 9970G, Hewlett Packard HP 48G. In the boxes on the answer book, write the name of the examining body (Edexcel), your centre number, candidate number, the unit title (Statistics S4), the paper reference (6686), your surname, other name and signature.
    Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
    Full marks may be obtained for answers to ALL questions.
    This paper has seven questions. Pages 6, 7 and 8 are blank. You must ensure that your answers to parts of questions are clearly labelled.
    You must show sufficient working to make your methods clear to the Examiner. Answers without working may gain no credit.
    1. The random variable \(X\) has an \(F\) distribution with 10 and 12 degrees of freedom. Find \(a\) and \(b\) such that \(\mathrm { P } ( a < X < b ) = 0.90\).
    2. A chemist has developed a fuel additive and claims that it reduces the fuel consumption of cars. To test this claim, 8 randomly selected cars were each filled with 20 litres of fuel and driven around a race circuit. Each car was tested twice, once with the additive and once without. The distances, in miles, that each car travelled before running out of fuel are given in the table below.
    Car12345678
    Distance without additive163172195170183185161176
    Distance with additive168185187172180189172175
    Assuming that the distances travelled follow a normal distribution and stating your hypotheses clearly test, at the \(10 \%\) level of significance, whether or not there is evidence to support the chemist's claim.
    3. A technician is trying to estimate the area \(\mu ^ { 2 }\) of a metal square. The independent random variables \(X _ { 1 }\) and \(X _ { 2 }\) are each distributed \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\) and represent two measurements of the sides of the square. Two estimators of the area, \(A _ { 1 }\) and \(A _ { 2 }\), are proposed where $$A _ { 1 } = X _ { 1 } X _ { 2 } \quad \text { and } \quad A _ { 2 } = \left( \frac { X _ { 1 } + X _ { 2 } } { 2 } \right) ^ { 2 } .$$ [You may assume that if \(X _ { 1 }\) and \(X _ { 2 }\) are independent random variables then $$\left. \mathrm { E } \left( X _ { 1 } X _ { 2 } \right) = \mathrm { E } \left( X _ { 1 } \right) \mathrm { E } \left( X _ { 2 } \right) \right]$$
  21. Find \(\mathrm { E } \left( A _ { 1 } \right)\) and show that \(\mathrm { E } \left( A _ { 2 } \right) = \mu ^ { 2 } + \frac { \sigma ^ { 2 } } { 2 }\).
  22. Find the bias of each of these estimators. The technician is told that \(\operatorname { Var } \left( A _ { 1 } \right) = \sigma ^ { 4 } + 2 \mu ^ { 2 } \sigma ^ { 2 }\) and \(\operatorname { Var } \left( A _ { 2 } \right) = \frac { 1 } { 2 } \sigma ^ { 4 } + 2 \mu ^ { 2 } \sigma ^ { 2 }\). The technician decided to use \(A _ { 1 }\) as the estimator for \(\mu ^ { 2 }\).
  23. Suggest a possible reason for this decision. A statistician suggests taking a random sample of \(n\) measurements of sides of the square and finding the mean \(\bar { X }\). He knows that \(\mathrm { E } \left( \bar { X } ^ { 2 } \right) = \mu ^ { 2 } + \frac { \sigma ^ { 2 } } { n }\) and \(\operatorname { Var } \left( \bar { X } ^ { 2 } \right) = \frac { 2 \sigma ^ { 4 } } { n ^ { 2 } } + \frac { 4 \sigma ^ { 2 } \mu ^ { 2 } } { n }\).
  24. Explain whether or not \(\bar { X } ^ { 2 }\) is a consistent estimator of \(\mu ^ { 2 }\).
    4. A recent census in the U.K. revealed that the heights of females in the U.K. have a mean of 160.9 cm . A doctor is studying the heights of female Indians in a remote region of South America. The doctor measured the height, \(x \mathrm {~cm}\), of each of a random sample of 30 female Indians and obtained the following statistics. $$\Sigma x = 4400.7 , \quad \Sigma \mathrm { x } ^ { 2 } = 646904.41 .$$ The heights of female Indians may be assumed to follow a normal distribution.
    The doctor presented the results of the study in a medical journal and wrote 'the female Indians in this region are more than 10 cm shorter than females in the U.K.'
  25. Stating your hypotheses clearly and using a \(5 \%\) level of significance, test the doctor's statement. The census also revealed that the standard deviation of the heights of U.K. females was 6.0 cm .
  26. Stating your hypotheses clearly test, at the \(5 \%\) level of significance, whether or not there is evidence that the variance of the heights of female Indians is different from that of females in the U.K.
    5. The times, \(x\) seconds, taken by the competitors in the 100 m freestyle events at a school swimming gala are recorded. The following statistics are obtained from the data.
    No. of competitorsSample Mean \(\bar { x }\)\(\sum x ^ { 2 }\)
    Girls883.1055746
    Boys788.9056130
    Following the gala a proud parent claims that girls are faster swimmers than boys. Assuming that the times taken by the competitors are two independent random samples from normal distributions,
  27. test, at the \(10 \%\) level of significance, whether or not the variances of the two distributions are the same. State your hypotheses clearly.
  28. Stating your hypotheses clearly, test the parent's claim. Use a \(5 \%\) level of significance.
    6. A nutritionist studied the levels of cholesterol, \(X \mathrm { mg } / \mathrm { cm } ^ { 3 }\), of male students at a large college. She assumed that \(X\) was distributed \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\) and examined a random sample of 25 male students. Using this sample she obtained unbiased estimates of \(\mu\) and \(\sigma ^ { 2 }\) as $$\hat { \mu } = 1.68 , \quad \hat { \sigma } ^ { 2 } = 1.79 .$$
  29. Find a 95\% confidence interval for \(\mu\).
  30. Obtain a \(95 \%\) confidence interval for \(\sigma ^ { 2 }\). A cholesterol reading of more than \(2.5 \mathrm { mg } / \mathrm { cm } ^ { 3 }\) is regarded as high.
  31. Use appropriate confidence limits from parts (a) and (b) to find the lowest estimate of the proportion of male students in the college with high cholesterol.
Edexcel S4 Q3
  1. The weights, in grams, of mice are normally distributed. A biologist takes a random sample of 10 mice. She weighs each mouse and records its weight.
The ten mice are then fed on a special diet. They are weighed again after two weeks.
Their weights in grams are as follows:
MouseA\(B\)CD\(E\)\(F\)G\(H\)\(I\)\(J\)
Weight before diet50.048.347.554.038.942.750.146.840.341.2
Weight after diet52.147.650.152.342.244.351.848.041.943.6
Stating your hypotheses clearly, and using a \(1 \%\) level of significance, test whether or not the diet causes an increase in the mean weight of the mice.
Edexcel S4 Q5
5. A machine is filling bottles of milk. A random sample of 16 bottles was taken and the volume of milk in each bottle was measured and recorded. The volume of milk in a bottle is normally distributed and the unbiased estimate of the variance, \(s ^ { 2 }\), of the volume of milk in a bottle is 0.003
  1. Find a 95\% confidence interval for the variance of the population of volumes of milk from which the sample was taken. The machine should fill bottles so that the standard deviation of the volumes is equal to 0.07
  2. Comment on this with reference to your 95\% confidence interval.
Edexcel S4 Q7
  1. An engineering firm buys steel rods. The steel rods from its present supplier are known to have a mean tensile strength of \(230 \mathrm {~N} / \mathrm { mm } ^ { 2 }\).
A new supplier of steel rods offers to supply rods at a cheaper price than the present supplier. A random sample of ten rods from this new supplier gave tensile strengths, \(x \mathrm { N } / \mathrm { mm } ^ { 2 }\), which are summarised below. Turn over
  1. A company manufactures bolts with a mean diameter of 5 mm . The company wishes to check that the diameter of the bolts has not decreased. A random sample of 10 bolts is taken and the diameters, \(x \mathrm {~mm}\), of the bolts are measured. The results are summarised below.
$$\sum x = 49.1 \quad \sum x ^ { 2 } = 241.2$$ Using a \(1 \%\) level of significance, test whether or not the mean diameter of the bolts is less than 5 mm .
(You may assume that the diameter of the bolts follows a normal distribution.)
2. An emission-control device is tested to see if it reduces \(\mathrm { CO } _ { 2 }\) emissions from cars. The emissions from 6 randomly selected cars are measured with and without the device. The results are as follows. Turn over
advancing learning, changing lives
  1. A teacher wishes to test whether playing background music enables students to complete a task more quickly. The same task was completed by 15 students, divided at random into two groups. The first group had background music playing during the task and the second group had no background music playing.
    The times taken, in minutes, to complete the task are summarised below.
(d) Find the value of \(s\). The graph of the power function for the manager's test is shown in Figure 1. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{a1841cf5-93f3-4043-b6ed-651168b13b87-34_1157_1436_847_260} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure} (e) On the same axes, draw the graph of the power function for the deputy's test.
(f) (i) State the value of \(p\) where these graphs intersect.
(ii) Compare the effectiveness of the two tests if \(p\) is greater than this value. The deputy suggests that they should use his sampling method rather than the manager's.
(g) Give a reason why the manager might not agree to this change.
  1. A random sample of 15 strawberries is taken from a large field and the weight \(x\) grams of each strawberry is recorded. The results are summarised below.
$$\sum x = 291 \quad \sum x ^ { 2 } = 5968$$ Assume that the weights of strawberries are normally distributed. Calculate a 95\% confidence interval for
(a) (i) the mean of the weights of the strawberries in the field,
(ii) the variance of the weights of the strawberries in the field. Strawberries weighing more than 23 g are considered to be less tasty.
(b) Use appropriate confidence limits from part (a) to find the highest estimate of the proportion of strawberries that are considered to be less tasty.
  1. A car manufacturer claims that, on a motorway, the mean number of miles per gallon for the Panther car is more than 70 . To test this claim a car magazine measures the number of miles per gallon, \(x\), of each of a random sample of 20 Panther cars and obtained the following statistics.
$$\bar { x } = 71.2 \quad s = 3.4$$ The number of miles per gallon may be assumed to be normally distributed.
(a) Stating your hypotheses clearly and using a \(5 \%\) level of significance, test the manufacturer's claim. The standard deviation of the number of miles per gallon for the Tiger car is 4 .
(b) Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not there is evidence that the variance of the number of miles per gallon for the Panther car is different from that of the Tiger car.
  1. Faults occur in a roll of material at a rate of \(\lambda\) per \(\mathrm { m } ^ { 2 }\). To estimate \(\lambda\), three pieces of material of sizes \(3 \mathrm {~m} ^ { 2 } , 7 \mathrm {~m} ^ { 2 }\) and \(10 \mathrm {~m} ^ { 2 }\) are selected and the number of faults \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\) respectively are recorded.
The estimator \(\hat { \lambda }\), where $$\hat { \lambda } = k \left( X _ { 1 } + X _ { 2 } + X _ { 3 } \right)$$ is an unbiased estimator of \(\lambda\).
(a) Write down the distributions of \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\) and find the value of \(k\).
(b) Find \(\operatorname { Var } ( \hat { \lambda } )\). A random sample of \(n\) pieces of this material, each of size \(4 \mathrm {~m} ^ { 2 }\), was taken. The number of faults on each piece, \(Y\), was recorded.
(c) Show that \(\frac { 1 } { 4 } \bar { Y }\) is an unbiased estimator of \(\lambda\).
(d) Find \(\operatorname { Var } \left( \frac { 1 } { 4 } \bar { Y } \right)\).
(e) Find the minimum value of \(n\) for which \(\frac { 1 } { 4 } \bar { Y }\) becomes a better estimator of \(\lambda\) than \(\hat { \lambda }\).
Turn over
advancing learning, changing lives
  1. Find the value of the constant \(a\) such that
$$\mathrm { P } \left( a < F _ { 8,10 } < 3.07 \right) = 0.94$$ 2. Two independent random samples \(X _ { 1 } , X _ { 2 } , \ldots , X _ { 7 }\) and \(Y _ { 1 } , Y _ { 2 } , Y _ { 3 } , Y _ { 4 }\) were taken from different normal populations with a common standard deviation \(\sigma\). The following sample statistics were calculated. $$s _ { x } = 14.67 \quad s _ { y } = 12.07$$ Find the \(99 \%\) confidence interval for \(\sigma ^ { 2 }\) based on these two samples.
3. Manuel is planning to buy a new machine to squeeze oranges in his cafe and he has two models, at the same price, on trial. The manufacturers of machine \(B\) claim that their machine produces more juice from an orange than machine \(A\). To test this claim Manuel takes a random sample of 8 oranges, cuts them in half and puts one half in machine \(A\) and the other half in machine \(B\). The amount of juice, in ml , produced by each machine is given in the table below. \section*{Table 1} Figure 1 shows the graph of the power function of the test used by the consultant.
\includegraphics[max width=\textwidth, alt={}, center]{a1841cf5-93f3-4043-b6ed-651168b13b87-48_1722_1671_657_132} \section*{Figure 1} (e) On Figure 1 draw the graph of the power function of the manager's test.
(2)
(f) State, giving your reasons, which test you would recommend.
(2)
  1. The weights of the contents of breakfast cereal boxes are normally distributed.
A manufacturer changes the style of the boxes but claims that the weight of the contents remains the same.
A random sample of 6 old style boxes had contents with the following weights (in grams). $$\begin{array} { l l l l l l } 512 & 503 & 514 & 506 & 509 & 515 \end{array}$$ The weights, \(y\) grams, of the contents of an independent random sample of 5 new style boxes gave $$\bar { y } = 504.8 \text { and } s _ { y } = 3.420$$ (a) Use a two-tail test to show, at the \(10 \%\) level of significance, that the variances of the weights of the contents of the old and new style boxes can be assumed to be equal. State your hypotheses clearly.
(b) Showing your working clearly, find a \(90 \%\) confidence interval for \(\mu _ { x } - \mu _ { y }\), where \(\mu _ { x }\) and \(\mu _ { y }\) are the mean weights of the contents of old and new style boxes respectively.
(c) With reference to your confidence interval comment on the manufacturer's claim. 6. A random sample \(X _ { 1 } , X _ { 2 } , \ldots , X _ { n }\) is taken from a population where each of the \(X _ { i }\) have a continuous uniform distribution over the interval \([ 0 , \beta ]\).
The random variable \(Y = \max \left\{ X _ { 1 } , X _ { 2 } , \ldots , X _ { n } \right\}\).
The probability density function of \(Y\) is given by $$\mathrm { f } ( y ) = \left\{ \begin{array} { c c } \frac { n } { \beta ^ { n } } y ^ { n - 1 } & 0 \leqslant y \leqslant \beta
0 & \text { otherwise } \end{array} \right.$$ (a) Show that \(\mathrm { E } \left( Y ^ { m } \right) = \frac { n } { n + m } \beta ^ { m }\).
(b) Write down \(\mathrm { E } ( Y )\).
(c) Using your answers to parts (a) and (b), or otherwise, show that $$\operatorname { Var } ( Y ) = \frac { n } { ( n + 1 ) ^ { 2 } ( n + 2 ) } \beta ^ { 2 }$$ (d) State, giving your reasons, whether or not \(Y\) is a consistent estimator of \(\beta\). The random variables \(M = 2 \bar { X }\), where \(\bar { X } = \frac { 1 } { n } \left( X _ { 1 } + X _ { 2 } + \ldots + X _ { n } \right)\), and \(S = k Y\), where \(k\) is a constant, are both unbiased estimators of \(\beta\).
(e) Find the value of \(k\) in terms of \(n\).
(f) State, giving your reasons, which of \(M\) and \(S\) is the better estimator of \(\beta\) in this case. Five observations of \(X\) are: \(\quad \begin{array} { l l l l l } 8.5 & 6.3 & 5.4 & 9.1 & 7.6 \end{array}\)
(g) Calculate the better estimate of \(\beta\). 7. A machine produces components whose lengths are normally distributed with mean 102.3 mm and standard deviation 2.8 mm . After the machine had been serviced, a random sample of 20 components were tested to see if the mean and standard deviation had changed. The lengths, \(x \mathrm {~mm}\), of each of these 20 components are summarised as $$\sum x = 2072 \quad \sum x ^ { 2 } = 214856$$ (a) Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not there is evidence of a change in standard deviation.
(b) Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not the mean length of the components has changed from the original value of 102.3 mm using
(i) a normal distribution,
(ii) a \(t\) distribution.
(c) Comment on the mean length of components produced after the service in the light of the tests from part (a) and part (b). Give a reason for your answer. Turn over
  1. A medical student is investigating whether there is a difference in a person's blood pressure when sitting down and after standing up. She takes a random sample of 12 people and measures their blood pressure, in mmHg , when sitting down and after standing up.
The results are shown below.
Edexcel S4 Q8
  1. A random sample \(W _ { 1 } , W _ { 2 } \ldots , W _ { n }\) is taken from a distribution with mean \(\mu\) and variance \(\sigma ^ { 2 }\)
    1. Write down \(\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } \right)\) and show that \(\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } \right) = n \left( \sigma ^ { 2 } + \mu ^ { 2 } \right)\)
    An estimator for \(\mu\) is $$\bar { X } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i }$$
  2. Show that \(\bar { X }\) is a consistent estimator for \(\mu\). An estimator of \(\sigma ^ { 2 }\) is $$U = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } - \left( \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } \right) ^ { 2 }$$
  3. Find the bias of \(U\).
  4. Write down an unbiased estimator of \(\sigma ^ { 2 }\) in the form \(k U\), where \(k\) is in terms of \(n\). Turn over
    1. George owns a garage and he records the mileage of cars, \(x\) thousands of miles, between services. The results from a random sample of 10 cars are summarised below.
    $$\sum x = 113.4 \quad \sum x ^ { 2 } = 1414.08$$ The mileage of cars between services is normally distributed and George believes that the standard deviation is 2.4 thousand miles. Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not these data support George’s belief.
    2. Every 6 months some engineers are tested to see if their times, in minutes, to assemble a particular component have changed. The times taken to assemble the component are normally distributed. A random sample of 8 engineers was chosen and their times to assemble the component were recorded in January and in July. The data are given in the table below. \end{table} Table 1 Figure 1 shows a graph of the power function for the scientist's test.
  5. On the same axes draw the graph of the power function for the statistician's test. Given that it takes 20 minutes to collect and test a 20 ml sample and 15 minutes to collect and test a 10 ml sample
  6. show that the expected time of the statistician's test is slower than the scientist's test for \(\lambda \mathrm { e } ^ { - \lambda } > \frac { 1 } { 3 }\)
  7. By considering the times when \(\lambda = 1\) and \(\lambda = 2\) together with the power curves in part (e) suggest, giving a reason, which test you would use.
    (2) \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{a1841cf5-93f3-4043-b6ed-651168b13b87-93_1179_1152_1455_395} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure}
    1. The carbon content, measured in suitable units, of steel is normally distributed. Two independent random samples of steel were taken from a refining plant at different times and their carbon content recorded. The results are given below.
    Sample \(A : \quad 1.5 \quad 0.9 \quad 1.3 \quad 1.2\)
    \(\begin{array} { l l l l l l l } \text { Sample } B : & 0.4 & 0.6 & 0.8 & 0.3 & 0.5 & 0.4 \end{array}\)
  8. Stating your hypotheses clearly, carry out a suitable test, at the \(10 \%\) level of significance, to show that both samples can be assumed to have come from populations with a common variance \(\sigma ^ { 2 }\).
  9. Showing your working clearly, find the \(99 \%\) confidence interval for \(\sigma ^ { 2 }\) based on both samples.
Edexcel S4 2002 June Q1
  1. The random variable \(X\) has an \(F\) distribution with 10 and 12 degrees of freedom. Find \(a\) and \(b\) such that \(\mathrm { P } ( a < X < b ) = 0.90\).
    (3)
  2. A chemist has developed a fuel additive and claims that it reduces the fuel consumption of cars. To test this claim, 8 randomly selected cars were each filled with 20 litres of fuel and driven around a race circuit. Each car was tested twice, once with the additive and once without. The distances, in miles, that each car travelled before running out of fuel are given in the table below.
Car12345678
Distance without additive163172195170183185161176
Distance with additive168185187172180189172175
Assuming that the distances travelled follow a normal distribution and stating your hypotheses clearly test, at the \(10 \%\) level of significance, whether or not there is evidence to support the chemist's claim.
(8)
Edexcel S4 2002 June Q3
3. A technician is trying to estimate the area \(\mu ^ { 2 }\) of a metal square. The independent random variables \(X _ { 1 }\) and \(X _ { 2 }\) are each distributed \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\) and represent two measurements of the sides of the square. Two estimators of the area, \(A _ { 1 }\) and \(A _ { 2 }\), are proposed where $$A _ { 1 } = X _ { 1 } X _ { 2 } \quad \text { and } \quad A _ { 2 } = \left( \frac { X _ { 1 } + X _ { 2 } } { 2 } \right) ^ { 2 } .$$ [You may assume that if \(X _ { 1 }\) and \(X _ { 2 }\) are independent random variables then $$\left. \mathrm { E } \left( X _ { 1 } X _ { 2 } \right) = \mathrm { E } \left( X _ { 1 } \right) \mathrm { E } \left( X _ { 2 } \right) \right]$$
  1. Find \(\mathrm { E } \left( A _ { 1 } \right)\) and show that \(\mathrm { E } \left( A _ { 2 } \right) = \mu ^ { 2 } + \frac { \sigma ^ { 2 } } { 2 }\).
  2. Find the bias of each of these estimators. The technician is told that \(\operatorname { Var } \left( A _ { 1 } \right) = \sigma ^ { 4 } + 2 \mu ^ { 2 } \sigma ^ { 2 }\) and \(\operatorname { Var } \left( A _ { 2 } \right) = \frac { 1 } { 2 } \sigma ^ { 4 } + 2 \mu ^ { 2 } \sigma ^ { 2 }\). The technician decided to use \(A _ { 1 }\) as the estimator for \(\mu ^ { 2 }\).
  3. Suggest a possible reason for this decision. A statistician suggests taking a random sample of \(n\) measurements of sides of the square and finding the mean \(\bar { X }\). He knows that \(\mathrm { E } \left( \bar { X } ^ { 2 } \right) = \mu ^ { 2 } + \frac { \sigma ^ { 2 } } { n }\) and \(\operatorname { Var } \left( \bar { X } ^ { 2 } \right) = \frac { 2 \sigma ^ { 4 } } { n ^ { 2 } } + \frac { 4 \sigma ^ { 2 } \mu ^ { 2 } } { n }\).
  4. Explain whether or not \(\bar { X } ^ { 2 }\) is a consistent estimator of \(\mu ^ { 2 }\).
Edexcel S4 2002 June Q4
4. A recent census in the U.K. revealed that the heights of females in the U.K. have a mean of 160.9 cm . A doctor is studying the heights of female Indians in a remote region of South America. The doctor measured the height, \(x \mathrm {~cm}\), of each of a random sample of 30 female Indians and obtained the following statistics. $$\Sigma x = 4400.7 , \quad \Sigma \mathrm { x } ^ { 2 } = 646904.41 .$$ The heights of female Indians may be assumed to follow a normal distribution.
The doctor presented the results of the study in a medical journal and wrote 'the female Indians in this region are more than 10 cm shorter than females in the U.K.'
  1. Stating your hypotheses clearly and using a \(5 \%\) level of significance, test the doctor's statement.
    (6) The census also revealed that the standard deviation of the heights of U.K. females was 6.0 cm .
  2. Stating your hypotheses clearly test, at the \(5 \%\) level of significance, whether or not there is evidence that the variance of the heights of female Indians is different from that of females in the U.K.
    (6)
Edexcel S4 2002 June Q5
5. The times, \(x\) seconds, taken by the competitors in the 100 m freestyle events at a school swimming gala are recorded. The following statistics are obtained from the data.
No. of competitorsSample Mean \(\bar { x }\)\(\sum x ^ { 2 }\)
Girls883.1055746
Boys788.9056130
Following the gala a proud parent claims that girls are faster swimmers than boys. Assuming that the times taken by the competitors are two independent random samples from normal distributions,
  1. test, at the \(10 \%\) level of significance, whether or not the variances of the two distributions are the same. State your hypotheses clearly.
  2. Stating your hypotheses clearly, test the parent's claim. Use a \(5 \%\) level of significance.
Edexcel S4 2002 June Q6
6. A nutritionist studied the levels of cholesterol, \(X \mathrm { mg } / \mathrm { cm } ^ { 3 }\), of male students at a large college. She assumed that \(X\) was distributed \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\) and examined a random sample of 25 male students. Using this sample she obtained unbiased estimates of \(\mu\) and \(\sigma ^ { 2 }\) as $$\hat { \mu } = 1.68 , \quad \hat { \sigma } ^ { 2 } = 1.79 .$$
  1. Find a 95\% confidence interval for \(\mu\).
  2. Obtain a \(95 \%\) confidence interval for \(\sigma ^ { 2 }\). A cholesterol reading of more than \(2.5 \mathrm { mg } / \mathrm { cm } ^ { 3 }\) is regarded as high.
  3. Use appropriate confidence limits from parts (a) and (b) to find the lowest estimate of the proportion of male students in the college with high cholesterol.
Edexcel S4 2002 June Q7
  1. A proportion \(p\) of the items produced by a factory is defective. A quality assurance manager selects a random sample of 5 items from each batch produced to check whether or not there is evidence that \(p\) is greater than 0.10 . The criterion that the manager uses for rejecting the hypothesis that \(p\) is 0.10 is that there are more than 2 defective items in the sample.
    1. Find the size of the test.
      (2)
    Table 1 gives some values, to 2 decimal places, of the power function of this test. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 1}
    \(p\)0.150.200.250.300.350.40
    Power0.03\(r\)0.100.160.240.32
    \end{table}
  2. Find the value of \(r\). One day the manager is away and an assistant checks the production by random sample of 10 items from each batch produced. The hypothesis that \(p = 0.10\) is rejected if more than 4 defectives are found in the sample.
  3. Find P (Type I error) using the assistant's test. Table 2 gives some values, to 2 decimal places, of the power function for this test. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 2}
    \(p\)0.150.200.250.300.350.40
    Power0.010.030.080.150.25\(s\)
    \end{table}
  4. Find the value of \(s\).
  5. Using the same axes, draw the graphs of the power functions of these two tests.
    1. State the value of \(p\) where these graphs cross.
    2. Explain the significance if \(p\) is greater than this value. The manager studies the graphs in part ( \(e\) ) but decides to carry on using the test based on a sample of size 5 .
  6. Suggest 2 reasons why the manager might have made this decision.