Question 8 - A-Level Maths

Edexcel S4 — Question 8

Exam Board	Edexcel
Module	S4 (Statistics 4)
Topic	Hypothesis test of a normal distribution

8. A random sample $W _ { 1 } , W _ { 2 } \ldots , W _ { n }$ is taken from a distribution with mean $\mu$ and variance $\sigma ^ { 2 }$.

Write down $\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } \right)$ and show that $\mathrm { E } \left( \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } \right) = n \left( \sigma ^ { 2 } + \mu ^ { 2 } \right)$. An estimator for $\mu$ is $$\bar { X } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i }$$
Show that $\bar { X }$ is a consistent estimator for $\mu$. An estimator of $\sigma ^ { 2 }$ is $$U = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } ^ { 2 } - \left( \frac { 1 } { n } \sum _ { i = 1 } ^ { n } W _ { i } \right) ^ { 2 }$$
Find the bias of $U$.
Write down an unbiased estimator of $\sigma ^ { 2 }$ in the form $k U$, where $k$ is in terms of $n$. \section*{Advanced/Advanced Subsidiary} \section*{Friday 21 June 2013 - Morning} Mathematical Formulae (Pink) Nil Candidates may use any calculator allowed by the regulations of the Joint Council for Qualifications. Calculators must not have the facility for symbolic algebra manipulation or symbolic differentiation/integration, or have retrievable mathematical formulae stored in them. In the boxes above, write your centre number, candidate number, your surname, initials and signature. Check that you have the correct question paper.
Answer ALL the questions.
You must write your answer for each question in the space following the question.
Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
Full marks may be obtained for answers to ALL questions.
The marks for the parts of questions are shown in round brackets, e.g. (2).
There are 6 questions in this question paper. The total mark for this paper is 75.
There are 20 pages in this question paper. Any blank pages are indicated. You must ensure that your answers to parts of questions are clearly labelled.
You must show sufficient working to make your methods clear to the Examiner.
Answers without working may not gain full credit.
1. George owns a garage and he records the mileage of cars, $x$ thousands of miles, between services. The results from a random sample of 10 cars are summarised below.
$$\sum x = 113.4 \quad \sum x ^ { 2 } = 1414.08$$ The mileage of cars between services is normally distributed and George believes that the standard deviation is 2.4 thousand miles. Stating your hypotheses clearly, test, at the $5 \%$ level of significance, whether or not these data support George's belief.
2. Every 6 months some engineers are tested to see if their times, in minutes, to assemble a particular component have changed. The times taken to assemble the component are normally distributed. A random sample of 8 engineers was chosen and their times to assemble the component were recorded in January and in July. The data are given in the table below.
Using a suitable test, at the $5 \%$ level of significance, state whether or not, on the basis of this trial, you would recommend using the new medicine. State your hypotheses clearly.
State an assumption needed to carry out this test.
2. The cloth produced by a certain manufacturer has defects that occur randomly at a constant rate of $\lambda$ per square metre. If $\lambda$ is thought to be greater than 1.5 then action has to be taken. Using $\mathrm { H } _ { 0 } : \lambda = 1.5$ and $\mathrm { H } _ { 1 } : \lambda > 1.5$ a quality control officer takes a $4 \mathrm {~m} ^ { 2 }$ sample of cloth and rejects $\mathrm { H } _ { 0 }$ if there are 11 or more defects. If there are 8 or fewer defects she accepts $\mathrm { H } _ { 0 }$. If there are 9 or 10 defects a second sample of $4 \mathrm {~m} ^ { 2 }$ is taken and H 0 is rejected if there are 11 or more defects in this second sample, otherwise it is accepted.
Find the size of this test.
Find the power of this test when $\lambda = 2$.
3. A farmer is investigating the milk yields of two breeds of cow. He takes a random sample of 9 cows of breed $A$ and an independent random sample of 12 cows of breed $B$. For a 5 day period he measures the amount of milk, $x$ gallons, produced by each cow. The results are summarised in the table below.
State one assumption that needs to be made in order to carry out a paired $t$-test.
Stating your hypotheses clearly, test, at the $1 \%$ level of significance, whether or not the drug increases the mean number of hours of sleep per night by more than 10 minutes. State the critical value for this test.
5. A statistician believes a coin is biased and the probability, $p$, of getting a head when the coin is tossed is less than 0.5 . The statistician decides to test this by tossing the coin 10 times and recording the number, $X$, of heads. He sets up the hypotheses $\mathrm { H } _ { 0 } : p = 0.5$ and $\mathrm { H } _ { 1 } : p < 0.5$ and rejects the null hypothesis if $x < 3$.
Find the size of the test.
Show that the power function of this test is $$( 1 - p ) ^ { 8 } \left( 36 p ^ { 2 } + 8 p + 1 \right)$$ Table 1 gives values, to 2 decimal places, of the power function for the statistician's test. \begin{table}[h] \section*{Table 1}
On the axes below draw the graph of the power function for the statistician's test.
Find the range of values of $p$ for which the probability of accepting the coin as unbiased, when in fact it is biased, is less than or equal to 0.4 .
(3)
\includegraphics[max width=\textwidth, alt={}, center]{47023328-16c0-452b-be48-046187e4193e-38_747_792_731_351}
6. (a) Explain what is meant by the sampling distribution of an estimator $T$ of the population parameter $\theta$.
Explain what you understand by the statement that $T$ is a biased estimator of $\theta$. A population has mean $\mu$ and variance $\sigma ^ { 2 }$.
A random sample $X _ { 1 } , X _ { 2 } , \ldots , X _ { 10 }$ is taken from this population.
Calculate the bias of each of the following estimators of $\mu$. $$\begin{aligned} & \hat { \mu } _ { 1 } = \frac { X _ { 3 } + X _ { 5 } + X _ { 7 } } { 3 }
& \hat { \mu } _ { 2 } = \frac { 5 X _ { 1 } + 2 X _ { 2 } + X _ { 9 } } { 6 }
& \hat { \mu } _ { 3 } = \frac { 3 X _ { 10 } - X _ { 1 } } { 3 } \end{aligned}$$
Find the variance of each of these three estimators.
State, giving a reason, which of these three estimators for $\mu$ is
1. the best estimator,
2. the worst estimator.
  7. Two groups of students take the same examination. A random sample of students is taken from each of the groups.
  The marks of the 9 students from Group 1 are as follows $$\begin{array} { l l l l l l l l l } 30 & 29 & 35 & 27 & 23 & 33 & 33 & 35 & 28 \end{array}$$ The marks, $x$, of the 7 students from Group 2 gave the following statistics $$\bar { x } = 31.29 \quad s ^ { 2 } = 12.9$$ A test is to be carried out to see whether or not there is a difference between the mean marks of the two groups of students. You may assume that the samples are taken from normally distributed populations and that they are independent.
State one other assumption that must be made in order to apply this test and show that this assumption is reasonable by testing it at a $10 \%$ level of significance. State your hypotheses clearly.
Stating your hypotheses clearly, test, using a significance level of $5 \%$, whether or not there is a difference between the mean marks of the two groups of students. \section*{TOTAL FOR PAPER: 75 MARKS} \section*{END} Materials required for examination
Answer Book (AB16)
Graph Paper (ASG2)
Mathematical Formulae (Lilac) Items included with question papers Nil 6686 Candidates may use any calculator EXCEPT those with the facility for symbolic algebra, differentiation and/or integration. Thus candidates may NOT use calculators such as the Texas Instruments TI 89, TI 92, Casio CFX 9970G, Hewlett Packard HP 48G. In the boxes on the answer book, write the name of the examining body (Edexcel), your centre number, candidate number, the unit title (Statistics S4), the paper reference (6686), your surname, other name and signature.
Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
Full marks may be obtained for answers to ALL questions.
This paper has seven questions. Pages 6, 7 and 8 are blank. You must ensure that your answers to parts of questions are clearly labelled.
You must show sufficient working to make your methods clear to the Examiner. Answers without working may gain no credit.
1. The random variable $X$ has an $F$ distribution with 10 and 12 degrees of freedom. Find $a$ and $b$ such that $\mathrm { P } ( a < X < b ) = 0.90$.
2. A chemist has developed a fuel additive and claims that it reduces the fuel consumption of cars. To test this claim, 8 randomly selected cars were each filled with 20 litres of fuel and driven around a race circuit. Each car was tested twice, once with the additive and once without. The distances, in miles, that each car travelled before running out of fuel are given in the table below.
Car 1 2 3 4 5 6 7 8
Distance without additive 163 172 195 170 183 185 161 176
Distance with additive 168 185 187 172 180 189 172 175
Assuming that the distances travelled follow a normal distribution and stating your hypotheses clearly test, at the $10 \%$ level of significance, whether or not there is evidence to support the chemist's claim.
3. A technician is trying to estimate the area $\mu ^ { 2 }$ of a metal square. The independent random variables $X _ { 1 }$ and $X _ { 2 }$ are each distributed $\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)$ and represent two measurements of the sides of the square. Two estimators of the area, $A _ { 1 }$ and $A _ { 2 }$, are proposed where $$A _ { 1 } = X _ { 1 } X _ { 2 } \quad \text { and } \quad A _ { 2 } = \left( \frac { X _ { 1 } + X _ { 2 } } { 2 } \right) ^ { 2 } .$$ [You may assume that if $X _ { 1 }$ and $X _ { 2 }$ are independent random variables then $$\left. \mathrm { E } \left( X _ { 1 } X _ { 2 } \right) = \mathrm { E } \left( X _ { 1 } \right) \mathrm { E } \left( X _ { 2 } \right) \right]$$
Find $\mathrm { E } \left( A _ { 1 } \right)$ and show that $\mathrm { E } \left( A _ { 2 } \right) = \mu ^ { 2 } + \frac { \sigma ^ { 2 } } { 2 }$.
Find the bias of each of these estimators. The technician is told that $\operatorname { Var } \left( A _ { 1 } \right) = \sigma ^ { 4 } + 2 \mu ^ { 2 } \sigma ^ { 2 }$ and $\operatorname { Var } \left( A _ { 2 } \right) = \frac { 1 } { 2 } \sigma ^ { 4 } + 2 \mu ^ { 2 } \sigma ^ { 2 }$. The technician decided to use $A _ { 1 }$ as the estimator for $\mu ^ { 2 }$.
Suggest a possible reason for this decision. A statistician suggests taking a random sample of $n$ measurements of sides of the square and finding the mean $\bar { X }$. He knows that $\mathrm { E } \left( \bar { X } ^ { 2 } \right) = \mu ^ { 2 } + \frac { \sigma ^ { 2 } } { n }$ and $\operatorname { Var } \left( \bar { X } ^ { 2 } \right) = \frac { 2 \sigma ^ { 4 } } { n ^ { 2 } } + \frac { 4 \sigma ^ { 2 } \mu ^ { 2 } } { n }$.
Explain whether or not $\bar { X } ^ { 2 }$ is a consistent estimator of $\mu ^ { 2 }$.
4. A recent census in the U.K. revealed that the heights of females in the U.K. have a mean of 160.9 cm . A doctor is studying the heights of female Indians in a remote region of South America. The doctor measured the height, $x \mathrm {~cm}$, of each of a random sample of 30 female Indians and obtained the following statistics. $$\Sigma x = 4400.7 , \quad \Sigma \mathrm { x } ^ { 2 } = 646904.41 .$$ The heights of female Indians may be assumed to follow a normal distribution.
The doctor presented the results of the study in a medical journal and wrote 'the female Indians in this region are more than 10 cm shorter than females in the U.K.'
Stating your hypotheses clearly and using a $5 \%$ level of significance, test the doctor's statement. The census also revealed that the standard deviation of the heights of U.K. females was 6.0 cm .
Stating your hypotheses clearly test, at the $5 \%$ level of significance, whether or not there is evidence that the variance of the heights of female Indians is different from that of females in the U.K.
5. The times, $x$ seconds, taken by the competitors in the 100 m freestyle events at a school swimming gala are recorded. The following statistics are obtained from the data.
No. of competitors Sample Mean $\bar { x }$ $\sum x ^ { 2 }$
Girls 8 83.10 55746
Boys 7 88.90 56130
Following the gala a proud parent claims that girls are faster swimmers than boys. Assuming that the times taken by the competitors are two independent random samples from normal distributions,
test, at the $10 \%$ level of significance, whether or not the variances of the two distributions are the same. State your hypotheses clearly.
Stating your hypotheses clearly, test the parent's claim. Use a $5 \%$ level of significance.
6. A nutritionist studied the levels of cholesterol, $X \mathrm { mg } / \mathrm { cm } ^ { 3 }$, of male students at a large college. She assumed that $X$ was distributed $\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)$ and examined a random sample of 25 male students. Using this sample she obtained unbiased estimates of $\mu$ and $\sigma ^ { 2 }$ as $$\hat { \mu } = 1.68 , \quad \hat { \sigma } ^ { 2 } = 1.79 .$$
Find a 95\% confidence interval for $\mu$.
Obtain a $95 \%$ confidence interval for $\sigma ^ { 2 }$. A cholesterol reading of more than $2.5 \mathrm { mg } / \mathrm { cm } ^ { 3 }$ is regarded as high.
Use appropriate confidence limits from parts (a) and (b) to find the lowest estimate of the proportion of male students in the college with high cholesterol.

This paper (5 questions)

View full paper

Q1 Q3 Q5 Q7 Q8

Car	1	2	3	4	5	6	7	8
Distance without additive	163	172	195	170	183	185	161	176
Distance with additive	168	185	187	172	180	189	172	175

	No. of competitors	Sample Mean \(\bar { x }\)	\(\sum x ^ { 2 }\)
Girls	8	83.10	55746
Boys	7	88.90	56130