Pooled variance estimation - Linear combinations of normal random variables

CAIE FP2 2015 June Q6

6 The independent random variables $X$ and $Y$ have distributions with the same variance $\sigma ^ { 2 }$. Random samples of $N$ observations of $X$ and 10 observations of $Y$ are taken, and the results are summarised by $$\Sigma x = 5 , \quad \Sigma x ^ { 2 } = 11 , \quad \Sigma y = 10 , \quad \Sigma y ^ { 2 } = 160 .$$ These data give a pooled estimate of 12 for $\sigma ^ { 2 }$. Find $N$.

CAIE FP2 2008 November Q6

6 The independent random variables $X$ and $Y$ have normal distributions with the same variance $\sigma ^ { 2 }$. Samples of 5 observations of $X$ and 10 observations of $Y$ are made, and the results are summarised by $\Sigma x = 15 , \Sigma x ^ { 2 } = 128 , \Sigma y = 36$ and $\Sigma y ^ { 2 } = 980$. Find a pooled estimate of $\sigma ^ { 2 }$.

CAIE FP2 2013 November Q7

7 Two independent random variables $X$ and $Y$ have distributions with the same variance $\sigma ^ { 2 }$. Random samples of $n$ observations of $X$ and $2 n$ observations of $Y$ are taken and the results are summarised by $$\Sigma x = 10.0 , \quad \Sigma x ^ { 2 } = 25.0 , \quad \Sigma y = 15.0 , \quad \Sigma y ^ { 2 } = 43.5 .$$ Given that the pooled estimate of $\sigma ^ { 2 }$ is 2 , find the value of $n$.

CAIE FP2 2016 November Q8

8 The amounts spent on the weekly food shopping by families in the big city $P$ and the small town $Q$ are to be compared. The amounts spent, in dollars, in $P$ and $Q$ are denoted by $x$ and $y$ respectively. For a random sample of 60 families in $P$ and a random sample of 50 families in $Q$, the amounts are summarised as follows. $$\Sigma x = 9600 \quad \Sigma x ^ { 2 } = 1560000 \quad \Sigma y = 7200 \quad \Sigma y ^ { 2 } = 1052500$$ Assuming a common population variance, find

a pooled estimate for the population variance,
a $95 \%$ confidence interval for the difference in the population means in $P$ and $Q$.

Edexcel S3 2016 June Q4

4. A random sample of 60 children and a random sample of 50 adults were taken and each person was given the same task to complete. The table below summarises the times taken, $t$ seconds, to complete the task.

	Mean, $\overline { \boldsymbol { t } }$	Standard deviation, $\boldsymbol { s }$	$\boldsymbol { n }$
Children	61.2	5.9	60
Adults	59.1	5.2	50

Stating your hypotheses clearly, test, at the $5 \%$ level of significance, whether or not there is evidence that the mean time taken to complete the task by children is greater than the mean time taken by adults.
(6)
Explain the relevance of the Central Limit Theorem to your calculation in part (a).
State an assumption you have made to carry out the test in part (a).

Edexcel S3 2021 June Q4

A college runs academic and vocational courses. The college has 1680 academic students and 2520 vocational students.
1. Describe how a stratified sample of 70 students at the college could be taken.
All students at the college take a basic skills test. A random sample of 50 academic students has a mean score of 57 and a variance of 60. An independent random sample of 80 vocational students has a mean score of 62 with a variance of 70
Stating your hypotheses clearly, test at the $5 \%$ level of significance, whether or not the mean basic skills score for vocational students is greater than the mean basic skills score for academic students.
Explain the importance of the Central Limit Theorem to the test in part (b).
State an assumption that is required to carry out the test in part (b). All the academic students at the college take a basic skills course. Another random sample of 50 academic students and another independent random sample of 80 vocational students retake the basic skills test. The hypotheses used in part (b) are then tested again at the same level of significance. The value of the test statistic $z$ is now 1.54
Comment on the mean basic skills scores of academic and vocational students after taking this course.
Considering the outcomes of the tests in part (b) and part (e), comment on the effectiveness of the basic skills course.

AQA S3 2008 June Q3

3 Pitted black olives in brine are sold in jars labelled " 340 grams net weight". Two machines, A and B, independently fill these jars with olives before the brine is added. The weight, $X$ grams, of olives delivered by machine A may be modelled by a normal distribution with mean $\mu _ { X }$ and standard deviation 4.5. The weight, $Y$ grams, of olives delivered by machine B may be modelled by a normal distribution with mean $\mu _ { Y }$ and standard deviation 5.7. The mean weight of olives from a random sample of 10 jars filled by machine A is found to be 157 grams, whereas that from a random sample of 15 jars filled by machine $B$ is found to be 162 grams. Test, at the $1 \%$ level of significance, the hypothesis that $\mu _ { X } = \mu _ { Y }$.
(6 marks)

AQA S3 2013 June Q3

3 A builders' merchant's depot has two machines, X and Y , each of which can be used for filling bags with sand or gravel. The weight, in kilograms, delivered by machine X may be modelled by a normal distribution with mean $\mu _ { \mathrm { X } }$ and standard deviation 25 . The weight, in kilograms, delivered by machine Y may be modelled by a normal distribution with mean $\mu _ { \mathrm { Y } }$ and standard deviation 30 . Fred, the depot's yardman, records the weights, in kilograms, of a random sample of 10 bags of sand delivered by machine X as
$\begin{array} { l l l l l l l l l l } 1055 & 1045 & 1000 & 985 & 1040 & 1025 & 1005 & 1030 & 1015 & 1060 \end{array}$
He also records the weights, in kilograms, of a random sample of 8 bags of gravel delivered by machine Y as $$\begin{array} { l l l l l l l l } 1085 & 1055 & 1055 & 1000 & 1035 & 1050 & 1005 & 1075 \end{array}$$

Construct a $95 \%$ confidence interval for $\mu _ { \mathrm { Y } } - \mu _ { \mathrm { X } }$, giving the limits to the nearest 5 kg .
Dot, the depot's manager, commented that Fred's data collection may have been biased. Justify her comment and explain how the possible bias could have been eliminated.
(2 marks)

AQA S3 2014 June Q2

6 marks

2 Each household within a district council's area has two types of wheelie-bin: a black one for general refuse and a green one for garden refuse. Each type of bin is emptied by the council fortnightly. The weight, in kilograms, of refuse emptied from a black bin can be modelled by the random variable $B \sim \mathrm {~N} \left( \mu _ { B } , 0.5625 \right)$. The weight, in kilograms, of refuse emptied from a green bin can be modelled by the random variable $G \sim \mathrm {~N} \left( \mu _ { G } , 0.9025 \right)$. The mean weight of refuse emptied from a random sample of 20 black bins was 21.35 kg . The mean weight of refuse emptied from an independent random sample of 15 green bins was 21.90 kg . Test, at the $5 \%$ level of significance, the hypothesis that $\mu _ { B } = \mu _ { G }$.
[0pt] [6 marks]

Edexcel S4 2006 January Q3

3. A population has mean $\mu$ and variance $\sigma ^ { 2 }$. A random sample of size 3 is to be taken from this population and $\bar { X }$ denotes its sample mean. A second random sample of size 4 is to be taken from this population and $\bar { Y }$ denotes its sample mean.

Show that unbiased estimators for $\mu$ are given by
1. $\hat { \mu } _ { 1 } = \frac { 1 } { 3 } \bar { X } + \frac { 2 } { 3 } \bar { Y }$,
2. $\hat { \mu } _ { 2 } = \frac { 5 \bar { X } + 4 \bar { Y } } { 9 }$.
Calculate Var $\left( \hat { \mu } _ { 1 } \right)$
Given that $\operatorname { Var } \left( \hat { \mu } _ { 2 } \right) = \frac { 37 } { 243 } \sigma ^ { 2 }$, state, giving a reason, which of these two estimators should be
used. used.

Edexcel S4 2015 June Q6

6. A random sample $X _ { 1 } , X _ { 2 } , X _ { 3 } , \ldots , X _ { 2 n }$ is taken from a population with mean $\frac { \mu } { 3 }$ and variance $3 \sigma ^ { 2 }$. A second random sample $Y _ { 1 } , Y _ { 2 } , Y _ { 3 } , \ldots , Y _ { n }$ is taken from a population with mean $\frac { \mu } { 2 }$ and variance $\frac { \sigma ^ { 2 } } { 2 }$, where the $X$ and $Y$ variables are all independent.
$A$, $B$ and $C$ are possible estimators of $\mu$, where $$\begin{aligned} & A = \frac { X _ { 1 } + X _ { 2 } + X _ { 3 } + Y _ { 1 } + Y _ { 2 } } { 2 }
& B = \frac { 3 X _ { 1 } } { 2 } + \frac { 2 Y _ { 1 } } { 3 }
& C = \frac { 3 X _ { 1 } + 4 Y _ { 1 } } { 3 } \end{aligned}$$

Show that two of $A , B$ and $C$ are unbiased estimators of $\mu$ and find the bias of the third estimator of $\mu$.
Showing your working clearly, find which of $A$, $B$ and $C$ is the best estimator of $\mu$. The estimator $$D = \frac { 1 } { k } \left( \sum _ { i = 1 } ^ { 2 n } X _ { i } + \sum _ { i = 1 } ^ { n } Y _ { i } \right)$$ is an unbiased estimator of $\mu$.
Find $k$ in terms of $n$.
Show that $D$ is also a consistent estimator of $\mu$.
Find the least value of $n$ for which $D$ is a better estimator of $\mu$ than any of $A$, $B$ or $C$.

	Mean, \(\overline { \boldsymbol { t } }\)	Standard deviation, \(\boldsymbol { s }\)	\(\boldsymbol { n }\)
Children	61.2	5.9	60
Adults	59.1	5.2	50