Edexcel S3 (Statistics 3) 2023 January

Question 1
View details
1 A machine fills bottles with mineral water.
The machine is checked every day to ensure that it is working correctly. On a particular day a random sample of 100 bottles is taken. The volume of water, \(x\) millilitres, for each bottle is measured and each measurement is coded using $$y = x - 1000$$ The results are summarised below $$\sum y = 847 \quad \sum y ^ { 2 } = 13510.09$$
    1. Show that the value of the unbiased estimate of the mean of \(x\) is 1008.47
    2. Calculate the unbiased estimate of the variance of \(x\) The machine was initially set so that the volume of water in a bottle had a mean value of 1010 millilitres. Later, a test at the \(5 \%\) significance level is used to determine whether or not the mean volume of water in a bottle has changed. If it has changed then the machine is stopped and reset.
  1. Write down suitable null and alternative hypotheses for a 2-tailed test.
  2. Find the critical region for \(\bar { X }\) in the above test.
  3. Using your answer to part (a) and your critical region found in part (c), comment on whether or not the machine needs to be stopped and reset.
    Give a reason for your answer.
  4. Explain why the use of \(\sigma ^ { 2 } = s ^ { 2 }\) is reasonable in this situation.
Question 2
View details
2 The table shows the season's best times, \(x\) seconds, for the 8 athletes who took part in the 200 m final in the 2021 Tokyo Olympics. It also shows their finishing position in the race.
Athlete\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Season's best time19.8919.8319.7419.8419.9119.9920.1320.10
Finishing position12345678
Given that the fastest season's best time is ranked number 1
  1. calculate the value of the Spearman's rank correlation coefficient for these data.
  2. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not there is evidence of a positive correlation between the rank of the season's best time and the finishing position for these athletes. Chris suggests that it would be better to use the actual finishing time, \(y\) seconds, of these athletes rather than their finishing position. Given that $$S _ { x x } = 0.1286875 \quad S _ { y y } = 0.55275 \quad S _ { x y } = 0.225175$$
  3. calculate the product moment correlation coefficient between the season's best time and the finishing time for these athletes.
    Give your answer correct to 3 decimal places.
  4. Use your value of the product moment correlation coefficient to test, at the \(1 \%\) level of significance, whether or not there is evidence of a positive correlation between the season’s best time and the finishing time for these athletes.
Question 3
View details
3 A mobile phone company offers an insurance policy to its customers when they purchase a mobile phone. The company conducted a survey on the age of the customers and whether or not claims were made. A random sample of 1200 customers from this company was investigated for 2020 and the results are shown in the table below.
Claim made in 2020No claim made in 2020Total
\multirow{3}{*}{Age}17-20 years24176200
21-50 years48652700
51 years and over14286300
Total8611141200
The data are to be used to determine whether or not making a claim is independent of age.
  1. Calculate the expected frequencies for the age group 51 years and over that
    1. made a claim in 2020
    2. did not make a claim in 2020 The 4 classes of customers aged between 17 and 50 give a value of \(\sum \frac { ( O - E ) ^ { 2 } } { E } = 7.123\) correct to 3 decimal places.
  2. Test, at the \(1 \%\) level of significance, whether or not making a claim is independent of age. Show your working clearly, stating your hypotheses, the degrees of freedom, the test statistic and the critical value used.
Question 4
View details
4 A research student is investigating the number of children who are girls in families with 4 children. The table below shows her results for 200 such families.
Number of girls01234
Frequency1568693810
The research student suggests that a binomial distribution with \(p = \frac { 1 } { 2 }\) could be a suitable model for the number of children who are girls in a family of 4 children.
  1. Using her results and a \(5 \%\) significance level, test the research student's claim. You should state your hypotheses, expected frequencies, test statistic and the critical value used. The research student decides to refine the model and retains the idea of using a binomial distribution but does not specify the probability that the child is a girl.
  2. Use the data in the table to show that the probability that a child is a girl is 0.45 The research student uses the probability from part (b) to calculate a new set of expected frequencies, none of which are less than 5
    The statistic \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) is evaluated and found to be 2.47
  3. Test, at the \(5 \%\) significance level, whether using a binomial distribution is suitable to model the number of children who are girls in a family of 4 children. You should state your hypotheses and the critical value used.
Question 5
View details
5 Claire grows strawberries on her farm. She wants to compare two brands of fertiliser, brand \(A\) and brand \(B\). She grows two sets of plants of the same variety of strawberries under the same conditions, fertilising one set with brand \(A\) and the other with brand \(B\). The yields per plant, in grams, from each set of plants are summarised below.
MeanStandard deviationNumber of plants
Fertiliser A137717.850
Fertiliser B136818.440
  1. Stating your hypotheses clearly, carry out a suitable test to assess whether the mean yield from plants using fertiliser \(A\) is greater than the mean yield from plants using fertiliser \(B\).
    Use a 1\% level of significance and state your test statistic and critical value. The total cost of fertiliser \(A\) for Claire's 50 plants was \(\pounds 75\)
    The total cost of fertiliser \(B\) for Claire's 40 plants was \(\pounds 50\)
    Claire sells all her strawberries at \(\pounds 3\) per kilogram.
  2. Use this information, together with your answer in part (a), to advise Claire on which of the two brands of fertiliser she should use next year in order to maximise her expected profit per plant, giving a reason for your answer.
Question 6
View details
6 A garden centre sells bags of stones and large bags of gravel.
The weight, \(X\) kilograms, of stones in a bag can be modelled by a normal distribution with unknown mean \(\mu\) and known standard deviation 0.4 The stones in each of a random sample of 36 bags from a large batch is weighed. The total weight of stones in these 36 bags is found to be 806.4 kg
  1. Find a 98\% confidence interval for the mean weight of stones in the batch.
  2. Explain why the use of the Central Limit theorem is not required to answer part (a) The manufacturer of these bags of stones claims that bags in this batch have a mean weight of 22.5 kg
  3. Using your answer to part (a), comment on the claim made by the manufacturer. The weight, \(Y\) kilograms, of gravel in a large bag can be modelled by a normal distribution with mean 850 kg and standard deviation 5 kg A builder purchases 10 large bags of gravel.
  4. Find the probability that the mean weight of gravel in the 10 large bags is less than 848 kg
Question 7
View details
7 At a particular supermarket, the times taken to serve each customer in a queue at a standard checkout may be modelled by a normal distribution with mean 240 seconds and standard deviation 20 seconds. There is a queue of 3 customers at a standard checkout.
Making a reasonable assumption about the times taken to serve these customers,
  1. find the probability that the total time taken to serve the 3 customers will be less than 11 minutes.
  2. State the assumption you have made in part (a) In the supermarket there is also an express checkout, which is reserved for customers buying 10 or fewer items. The time taken to serve a customer at this express checkout may be modelled by a normal distribution with mean 100 seconds and standard deviation 8 seconds. On a particular day Jiang has 8 items to pay for and has to choose whether to join a queue of 3 customers waiting at a standard checkout or a queue of 7 customers waiting at the express checkout. Using a similar assumption to that made in part (a),
  3. find the probability that the total time taken to serve the 3 customers at the standard checkout will exceed the total time taken to serve the 7 customers at the express checkout.