Two-sample t-test (unknown variances)

Questions requiring a hypothesis test comparing two population means where population variances are unknown and must be estimated from sample data, typically using a two-sample t-test or pooled variance approach.

21 questions · Standard +0.4

5.05c Hypothesis test: normal distribution for population mean
Sort by: Default | Easiest first | Hardest first
CAIE S2 2021 November Q6
8 marks Standard +0.3
6 The random variable \(T\) denotes the time, in seconds, for 100 m races run by Tania. \(T\) is normally distributed with mean \(\mu\) and variance \(\sigma ^ { 2 }\). A random sample of 40 races run by Tania gave the following results. $$n = 40 \quad \Sigma t = 560 \quad \Sigma t ^ { 2 } = 7850$$
  1. Calculate unbiased estimates of \(\mu\) and \(\sigma ^ { 2 }\).
    The random variable \(S\) denotes the time, in seconds, for 100 m races run by Suki. \(S\) has the independent distribution \(\mathrm { N } ( 14.2,0.3 )\).
  2. Using your answers to part (a), find the probability that, in a randomly chosen 100 m race, Suki's time will be at least 0.1 s more than Tania's time.
OCR S3 2008 January Q1
6 marks Standard +0.3
1 A blueberry farmer increased the amount of water sprayed over his berries to see what effect this had on their weight. The farmer weighed each of a random sample of 80 berries of the previous season's crop and each of a random sample of 100 berries of the new crop. The results are summarised in the following table, in which \(\bar { x }\) denotes the sample mean weight in grams, and \(s ^ { 2 }\) denotes an unbiased estimate of the relevant population variance.
Sample size\(\bar { x }\)\(s ^ { 2 }\)
Previous season's crop \(( P )\)801.240.00356
New crop \(( N )\)1001.360.00340
  1. Calculate an estimate of \(\operatorname { Var } \left( \bar { X } _ { N } - \bar { X } _ { P } \right)\).
  2. Calculate a \(95 \%\) confidence interval for the difference in population mean weights.
  3. Give a reason why it is unnecessary to use a \(t\)-distribution in calculating the confidence interval.
OCR S3 2013 January Q3
7 marks Standard +0.3
3 Two reading schemes, \(A\) and \(B\), are compared by using them with a random sample of 9 five-year-old children. The children are divided into two groups, 5 allotted to scheme \(A\) and 4 to scheme \(B\), and the schemes are taught under similar conditions.
After one year the children are given the same test and their scores, \(x _ { A }\) and \(x _ { B }\), are summarised below. With the usual notation, $$\begin{aligned} & n _ { A } = 5 , \bar { x } _ { A } = 52.0 , \sum \left( x _ { A } - \bar { x } _ { A } \right) ^ { 2 } = 248 , \\ & n _ { B } = 4 , \bar { x } _ { B } = 56.5 , \sum \left( x _ { B } - \bar { x } _ { B } \right) ^ { 2 } = 381 . \end{aligned}$$ It may be assumed that scores have normal distributions.
  1. Calculate an \(80 \%\) confidence interval for the difference in population mean scores for the two methods.
  2. State a further assumption required for the validity of the interval.
CAIE FP2 2017 November Q10
13 marks Standard +0.8
10 A factory produces bottles of an energy juice. Two different machines are used to fill empty bottles with the juice. The manager chooses a random sample of 50 bottles filled by machine \(X\) and a random sample of 60 bottles filled by machine \(Y\). The volumes of juice, \(x\) and \(y\) respectively, measured in appropriate units, are summarised by $$\Sigma x = 45.5 , \quad \Sigma ( x - \bar { x } ) ^ { 2 } = 19.56 , \quad \Sigma y = 72.3 , \quad \Sigma ( y - \bar { y } ) ^ { 2 } = 30.25$$ where \(\bar { x }\) and \(\bar { y }\) are the sample means of the volume of juice in the bottles filled by \(X\) and \(Y\) respectively.
  1. Find a 90\% confidence interval for the difference between the mean volume of juice in bottles filled by machine \(X\) and the mean volume of juice in bottles filled by machine \(Y\).
    A test at the \(\alpha \%\) significance level does not provide evidence that there is any difference in the means of the volume of juice in bottles filled by machine \(X\) and the volume of juice in bottles filled by machine \(Y\).
  2. Find the set of possible values of \(\alpha\).
Edexcel S3 2022 January Q5
15 marks Standard +0.3
  1. A dog breeder claims that the mean weight of male Great Dane dogs is 20 kg more than the mean weight of female Great Dane dogs.
Tammy believes that the mean weight of male Great Dane dogs is more than 20 kg more than the mean weight of female Great Dane dogs. She takes random samples of 50 male and 50 female Great Dane dogs and records their weights. The results are summarised below, where \(x\) denotes the weight, in kg , of a male Great Dane dog and \(y\) denotes the weight, in kg, of a female Great Dane dog. $$\sum x = 3610 \quad \sum x ^ { 2 } = 260955.6 \quad \sum y = 2585 \quad \sum y ^ { 2 } = 133757.2$$
  1. Find unbiased estimates for the mean and variance of the weights of
    1. the male Great Dane dogs,
    2. the female Great Dane dogs.
  2. Stating your hypotheses clearly, carry out a suitable test to assess Tammy's belief. Use a \(5 \%\) level of significance and state your critical value.
  3. For the test in part (b), state whether or not it is necessary to assume that the weights of the Great Dane dogs are normally distributed. Give a reason for your answer.
  4. State an assumption you have made in carrying out the test in part (b).
Edexcel S3 2022 January Q2
8 marks Standard +0.3
  1. Secondary schools in a region conduct ability testing at the start of Year 7 and the start of Year 8. Each year a regional education officer randomly selects 240 Year 7 students and 240 Year 8 students from across the region. The results for last year are summarised in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}Mean scoreVariance of scores
Year 710138
Year 810342
The regional education officer claims that there is no difference between the mean scores of these two year groups.
  1. Test the regional education officer's claim at the \(1 \%\) significance level. You should state your hypotheses, test statistic and critical value clearly.
  2. Explain the significance of the Central Limit Theorem in part (a).
Edexcel S3 2024 January Q5
9 marks Standard +0.3
  1. A professor claims that undergraduates studying History have a typing speed of more than 15 words per minute faster than undergraduates studying Maths.
A sample is taken of 38 undergraduates studying History and 45 undergraduates studying Maths. The typing speed, \(x\) words per minute, of each undergraduate is recorded. The results are summarised in the table below.
\(n\)\(\bar { x }\)\(s ^ { 2 }\)
Undergraduates studying History3856.327.2
Undergraduates studying Maths4539.818.5
  1. Use a suitable test, at the \(5 \%\) level of significance, to investigate the professor's claim.
    State clearly your hypotheses, test statistic and critical value.
  2. State two assumptions you have made in carrying out the test in part (a).
Edexcel S3 2022 June Q2
11 marks Standard +0.3
  1. An experiment is conducted to compare the heat retention of two brands of flasks, brand \(A\) and brand \(B\). Both brands of flask have a capacity of 750 ml .
In the experiment 750 ml of boiling water is poured into the flask, which is then sealed. Four hours later the temperature, in \({ } ^ { \circ } \mathrm { C }\), of the water in the flask is recorded. A random sample of 100 flasks from brand \(A\) gives the following summary statistics, where \(x\) is the temperature of the water in the flask after four hours. $$\sum x = 7690 \quad \sum ( x - \bar { x } ) ^ { 2 } = 669.24$$
  1. Find unbiased estimates for the mean and variance of the temperature of the water, after four hours, for brand \(A\). A random sample of 80 flasks from brand \(B\) gives the following results, where \(y\) is the temperature of the water in the flask after four hours. $$\bar { y } = 75.9 \quad s _ { y } = 2.2$$
  2. Test, at the \(1 \%\) significance level, whether there is a difference in the mean water temperature after four hours between brand \(A\) and brand \(B\). You should state your hypotheses, test statistic and critical value clearly.
  3. Explain why it is reasonable to assume that \(\sigma ^ { 2 } = s ^ { 2 }\) in this situation.
Edexcel S3 2024 June Q5
14 marks Standard +0.3
  1. A manager of a large company is investigating the time it takes the company's employees to complete a task.
The manager believes that the mean time for full-time employees to complete the task is more than a minute quicker than the mean time for part-time employees to complete the task. The manager collects a random sample of 605 full-time employees and 45 part-time employees and records the times, \(t\) minutes, it takes each employee to complete the task. The results are summarised in the table below.
\(n\)\(\bar { t }\)\(s ^ { 2 }\)
Full-time employees6055.69
Part-time employees457.04
  1. Test, at the \(5 \%\) level of significance, the manager's claim. You should state your hypotheses, test statistic, critical value and conclusion clearly.
  2. State two assumptions you have made in carrying out the test in part (a) The company increases the size of the sample of part-time employees to 46 The time taken to complete the task by the extra employee is 8 minutes.
  3. Find an unbiased estimate of the variance for the sample of 46 part-time employees.
Edexcel S3 2020 October Q5
12 marks Standard +0.3
5. A greengrocer is investigating the weights of two types of orange, type \(A\) and type \(B\). She believes that on average type \(A\) oranges weigh greater than 5 grams more than type \(B\) oranges. She collects a random sample of 40 type \(A\) oranges and 32 type \(B\) oranges and records the weight, \(x\) grams, of each orange. The table shows a summary of her data.
\(n\)\(\bar { x }\)\(\sum x ^ { 2 }\)
Type \(A\) oranges40140.4790258
Type \(B\) oranges32134.7581430
  1. Calculate unbiased estimates for the variance of the weights of the population of type \(A\) oranges and the variance of the weights of the population of type \(B\) oranges.
  2. Test, at the \(5 \%\) level of significance, the greengrocer's belief. You should state the hypotheses and the critical value used for this test.
  3. Explain how you have used the fact that the sample sizes are large in your answer to part (b).
Edexcel S3 Specimen Q7
17 marks Moderate -0.3
  1. A large company surveyed its staff to investigate the awareness of company policy. The company employs 6000 full-time staff and 4000 part-time staff.
    1. Describe how a stratified sample of 200 staff could be taken.
    2. Explain an advantage of using a stratified sample rather than a simple random sample.
    A random sample of 80 full-time staff and an independent random sample of 80 part-time staff were given a test of policy awareness. The results are summarised in the table below.
    Mean score \(( \bar { x } )\)
    Variance of
    scores \(\left( s ^ { 2 } \right)\)
    Full-time staff5221
    Part-time staff5019
  2. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not the mean policy awareness scores for full-time and part-time staff are different.
  3. Explain the significance of the Central Limit Theorem to the test in part (c).
  4. State an assumption you have made in carrying out the test in part (c). After all the staff had completed a training course the 80 full-time staff and the 80 part-time staff were given another test of policy awareness. The value of the test statistic \(z\) was 2.53
  5. Comment on the awareness of company policy for the full-time and part-time staff in light of this result. Use a \(1 \%\) level of significance.
  6. Interpret your answers to part (c) and part (f).
Edexcel S3 2006 January Q5
13 marks Standard +0.3
5. Upon entering a school, a random sample of eight girls and an independent random sample of eighty boys were given the same examination in mathematics. The girls and boys were then taught in separate classes. After one year, they were all given another common examination in mathematics. The means and standard deviations of the boys' and the girls' marks are shown in the table.
Examination marks
\multirow{2}{*}{}Upon entryAfter 1 year
MeanStandard deviationMeanStandard deviation
Boys5012596
Girls5312626
You may assume that the test results are normally distributed.
  1. Test, at the \(5 \%\) level of significance, whether or not the difference between the means of the boys' and girls' results was significant when they entered school.
  2. Test, at the \(5 \%\) level of significance, whether or not the mean mark of the boys is significantly less than the mean mark of the girls in the 'After 1 year' examination.
  3. Interpret the results found in part (a) and part (b).
Edexcel S3 2010 June Q7
17 marks Moderate -0.3
  1. A large company surveyed its staff to investigate the awareness of company policy. The company employs 6000 full time staff and 4000 part time staff.
    1. Describe how a stratified sample of 200 staff could be taken.
    2. Explain an advantage of using a stratified sample rather than a simple random sample.
    A random sample of 80 full time staff and an independent random sample of 80 part time staff were given a test of policy awareness. The results are summarised in the table below.
    Mean score \(( \bar { x } )\)
    Variance of
    scores \(\left( s ^ { 2 } \right)\)
    Full time staff5221
    Part time staff5019
  2. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not the mean policy awareness scores for full time and part time staff are different.
  3. Explain the significance of the Central Limit Theorem to the test in part (c).
  4. State an assumption you have made in carrying out the test in part (c). After all the staff had completed a training course the 80 full time staff and the 80 part time staff were given another test of policy awareness. The value of the test statistic \(z\) was 2.53
  5. Comment on the awareness of company policy for the full time and part time staff in light of this result. Use a \(1 \%\) level of significance.
  6. Interpret your answers to part (c) and part (f).
Edexcel S3 2013 June Q7
9 marks Standard +0.3
7. A farmer monitored the amount of lead in soil in a field next to a factory. He took 100 samples of soil, randomly selected from different parts of the field, and found the mean weight of lead to be \(67 \mathrm { mg } / \mathrm { kg }\) with standard deviation \(25 \mathrm { mg } / \mathrm { kg }\).
After the factory closed, the farmer took 150 samples of soil, randomly selected from different parts of the field, and found the mean weight of lead to be \(60 \mathrm { mg } / \mathrm { kg }\) with standard deviation \(10 \mathrm { mg } / \mathrm { kg }\).
  1. Test at the \(5 \%\) level of significance whether or not the mean weight of lead in the soil decreased after the factory closed. State your hypotheses clearly.
  2. Explain the significance of the Central Limit Theorem to the test in part(a).
  3. State an assumption you have made to carry out this test.
Edexcel S3 2015 June Q2
10 marks Standard +0.3
2. A researcher believes that the mean weight loss of those people using a slimming plan as part of a group is more than 1.5 kg a year greater than the mean weight loss of those using the plan on their own. The mean weight loss of a random sample of 80 people using the plan as part of a group is 8.7 kg with a standard deviation of 2.1 kg . The mean weight loss of a random sample of 65 people using the plan on their own is 6.6 kg with a standard deviation of 1.4 kg .
  1. Stating your hypotheses clearly, test the researcher's claim. Use a \(1 \%\) level of significance.
  2. For the test in part (a), state whether or not it is necessary to assume that the weight loss of a person using this plan has a normal distribution. Give a reason for your answer.
Edexcel S3 2017 June Q6
9 marks Standard +0.3
6. An engineer has developed a new battery. She claims that the new battery will last more than 8 hours longer, on average, than the old battery. To test the claim, the engineer randomly selects a sample of 50 new batteries and 40 old batteries. She records how long each battery lasts, \(x\) hours for the new batteries and \(y\) hours for the old batteries. The results are summarised in the table below.
\cline { 2 - 4 } \multicolumn{1}{c|}{}\(n\)Sample mean\(s ^ { 2 }\)
New battery50\(\bar { x } = 83\)7
Old battery40\(\bar { y } = 74\)6
  1. Test, at the \(5 \%\) level of significance, whether or not there is evidence to support the engineer's claim. State your hypotheses and show your working clearly.
  2. Explain the relevance of the Central Limit Theorem to the test in part (a).
AQA S3 2010 June Q2
8 marks Standard +0.3
2 Rodney and Derrick, two independent fruit and vegetable market stallholders, sell punnets of locally-grown raspberries from their stalls during June and July. The following information, based on independent random samples, was collected as part of an investigation by Trading Standards Officers.
\cline { 3 - 5 } \multicolumn{2}{c|}{}Weight of raspberries in a punnet (grams)
\cline { 3 - 5 } \multicolumn{2}{c|}{}Sample sizeSample meanSample standard deviation, \(\boldsymbol { s }\)
\multirow{2}{*}{Stallholder}Rodney502255
\cline { 2 - 5 }Derrick752198
  1. Construct a \(99 \%\) confidence interval for the difference between the mean weight of raspberries in a punnet sold by Rodney and the mean weight of raspberries in a punnet sold by Derrick.
  2. What can be concluded from your confidence interval?
  3. In addition to weight, state one other factor that may influence whether customers buy raspberries from Rodney or from Derrick.
    \includegraphics[max width=\textwidth, alt={}]{b855b5b3-097e-4894-aaec-d77f515949b0-05_2484_1709_223_153}
Edexcel S4 2006 June Q4
13 marks Standard +0.3
4. Two machines \(A\) and \(B\) produce the same type of component in a factory. The factory manager wishes to know whether the lengths, \(x \mathrm {~cm}\), of the components produced by the two machines have the same mean. The manager took a random sample of components from each machine and the results are summarised in the table below.
Sample sizeMean \(\bar { x }\)
Standard
deviation \(s\)
Machine \(A\)94.830.721
Machine \(B\)104.850.572
The lengths of components produced by the machines can be assumed to follow normal distributions.
  1. Use a two tail test to show, at the \(10 \%\) significance level, that the variances of the lengths of components produced by each machine can be assumed to be equal.
    (4)
  2. Showing your working clearly, find a \(95 \%\) confidence interval for \(\mu _ { B } - \mu _ { A }\), where \(\mu _ { A }\) and \(\mu _ { B }\) are the mean lengths of the populations of components produced by machine \(A\) and machine \(B\) respectively. There are serious consequences for the production at the factory if the difference in mean lengths of the components produced by the two machines is more than 0.7 cm .
  3. State, giving your reason, whether or not the factory manager should be concerned.
Edexcel S4 2009 June Q4
14 marks Standard +0.3
  1. A farmer set up a trial to assess whether adding water to dry feed increases the milk yield of his cows. He randomly selected 22 cows. Thirteen of the cows were given dry feed and the other 9 cows were given the feed with water added. The milk yields, in litres per day, were recorded with the following results.
\cline { 2 - 4 } \multicolumn{1}{c|}{}Sample sizeMean\(s ^ { 2 }\)
Dry feed1325.542.45
Feed with water added927.941.02
You may assume that the milk yield from cows given the dry feed and the milk yield from cows given the feed with water added are from independent normal distributions.
  1. Test, at the \(10 \%\) level of significance, whether or not the variances of the populations from which the samples are drawn are the same. State your hypotheses clearly.
  2. Calculate a \(95 \%\) confidence interval for the difference between the two mean milk yields.
  3. Explain the importance of the test in part (a) to the calculation in part (b).
Edexcel S4 2013 June Q6
13 marks Challenging +1.2
6. The carbon content, measured in suitable units, of steel is normally distributed. Two independent random samples of steel were taken from a refining plant at different times and their carbon content recorded. The results are given below. Sample A: \(\quad 1.5 \quad 0.9 \quad 1.3 \quad 1.2\) \(\begin{array} { l l l l l l l } \text { Sample } B : & 0.4 & 0.6 & 0.8 & 0.3 & 0.5 & 0.4 \end{array}\)
  1. Stating your hypotheses clearly, carry out a suitable test, at the \(10 \%\) level of significance, to show that both samples can be assumed to have come from populations with a common variance \(\sigma ^ { 2 }\).
  2. Showing your working clearly, find the \(99 \%\) confidence interval for \(\sigma ^ { 2 }\) based on both samples.
Edexcel S4 2017 June Q1
14 marks Challenging +1.2
  1. The times taken by children to run 150 m are normally distributed. The times taken, \(x\) seconds, by a random sample of 9 boys and an independent random sample of 6 girls are recorded. The following statistics are obtained.
Number of childrenSample mean \(\bar { x }\)\(\sum x ^ { 2 }\)
Boys922.84693.60
Girls629.55236.12
  1. Test, at the \(10 \%\) level of significance, whether or not the variances of the two distributions are equal. State your hypotheses clearly. The Headteacher claims that the mean time taken for the girls is more than 5 seconds greater than the mean time taken for the boys.
  2. Stating your hypotheses clearly, test the Headteacher's claim. Use a \(1 \%\) level of significance and show your working clearly.