Edexcel FS2 (Further Statistics 2) 2022 June

Question 1
View details
  1. Kwame is investigating a possible relationship between average March temperature, \(t ^ { \circ } \mathrm { C }\), and tea yield, \(y \mathrm {~kg} /\) hectare, for tea grown in a particular location. He uses 30 years of past data to produce the following summary statistics for a linear regression model, with tea yield as the dependent variable.
$$\begin{aligned} & \text { Residual Sum of Squares } ( \mathrm { RSS } ) = 1666567 \quad \mathrm {~S} _ { t t } = 52.0 \quad \mathrm {~S} _ { y y } = 1774155
& \text { least squares regression line: } \quad \text { gradient } = 45.5 \quad y \text {-intercept } = 2080 \end{aligned}$$
  1. Use the regression model to predict the tea yield for an average March temperature of \(20 ^ { \circ } \mathrm { C }\) He also produces the following residual plot for the data.
    \includegraphics[max width=\textwidth, alt={}, center]{d139840b-16ec-42ce-8501-f79c263c8017-02_663_880_868_589}
  2. Explain what you understand by the term residual.
  3. Calculate the product moment correlation coefficient between \(t\) and \(y\)
  4. Explain why the linear model may not be a good fit for the data
    1. with reference to your answer to part (c)
    2. with reference to the residual plot. \section*{Question 1 continues on page 4} Kwame also collects data on total March rainfall, \(w \mathrm {~mm}\), for each of these 30 years. For a linear regression model of \(w\) on \(t\) the following summary statistic is found. $$\text { Residual Sum of Squares (RSS) = } 86754$$ Kwame concludes that since this model has a smaller RSS, there must be a stronger linear relationship between \(w\) and \(t\) than between \(y\) and \(t\) (where RSS \(= 1666567\) )
  5. State, giving a reason, whether or not you agree with the reasoning that led to Kwame's conclusion.
Question 2
View details
  1. A factory produces yellow tennis balls and white tennis balls. Independent samples, one of yellow tennis balls and one of white tennis balls, are taken. The table shows information about the weights of the yellow tennis balls, \(Y\) grams, and the weights of the white tennis balls, \(W\) grams.
Sample sizeMean weight of random sample (grams)Known population standard deviation of weights (grams)
Yellow tennis balls12057.21.2
White tennis balls14056.90.9
  1. Find a 95\% confidence interval for the mean weight of yellow tennis balls. Jamie claims that the mean weight of the population of yellow tennis balls is greater than the mean weight of the population of white tennis balls. A test of Jamie's claim is carried out.
    1. Specify the approximate distribution of \(\bar { Y } - \bar { W }\) under the null hypothesis of the test.
    2. Explain the relevance of the large sample sizes to your answer to part (i).
  2. Complete the hypothesis test using a \(5 \%\) level of significance. You should state your hypotheses and the value of your test statistic clearly.
Question 3
View details
  1. The random variable \(X \sim \mathrm {~N} \left( 5,0.4 ^ { 2 } \right)\) and the random variable \(Y \sim \mathrm {~N} \left( 8,0.1 ^ { 2 } \right)\)
    \(X\) and \(Y\) are independent random variables.
    A random sample of \(a\) independent observations is taken from the distribution of \(X\) and one observation is taken from the distribution of \(Y\)
The random variable \(W = X _ { 1 } + X _ { 2 } + X _ { 3 } + \ldots + X _ { a } + b Y\) and has the distribution \(\mathrm { N } \left( 169,2 ^ { 2 } \right)\)
Find the value of \(a\) and the value of \(b\)
Question 4
View details
  1. A doctor believes that a four-week exercise programme can reduce the resting heart rate of her patients. She takes a random sample of 7 patients and records their resting heart rate before the exercise programme and again after the exercise programme.
Patient\(A\)\(B\)C\(D\)\(E\)\(F\)\(G\)
Resting heart rate before65687779808892
Resting heart rate after63657376808480
  1. Using a \(5 \%\) level of significance, carry out an appropriate test of the doctor's belief. You should state your hypotheses, test statistic and critical value.
  2. State the assumption made about the resting heart rates that was required to carry out the test.
Question 5
View details
  1. The concentration of an air pollutant is measured in micrograms \(/ \mathrm { m } ^ { 3 }\)
Samples of air were taken at two different sites and the concentration of this particular air pollutant was recorded. For Site \(A\) the summary statistics are shown below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}number of samples\(S _ { A } ^ { 2 }\)
Site \(A\)136.39
For Site \(B\) there were 9 samples of air taken.
A test of the hypothesis \(\mathrm { H } _ { 0 } : \sigma _ { A } ^ { 2 } = \sigma _ { B } ^ { 2 }\) against the hypothesis \(\mathrm { H } _ { 1 } : \sigma _ { A } ^ { 2 } \neq \sigma _ { B } ^ { 2 }\) is carried out using a \(2 \%\) level of significance.
  1. State a necessary assumption required to carry out the test. Given that the assumption in part (a) holds,
  2. find the set of values of \(s _ { B } ^ { 2 }\) that would lead to the null hypothesis being rejected,
  3. find a 99\% confidence interval for the variance of the concentration of the air pollutant at Site A.
Question 6
View details
  1. Korhan and Louise challenge each other to find an estimator for the mean, \(\mu\), of the continuous random variable \(X\) which has variance \(\sigma ^ { 2 }\)
    \(X _ { 1 } , X _ { 2 } , X _ { 3 } , \ldots , X _ { n }\) are \(n\) independent observations taken from \(X\)
    Korhan's estimator is given by
$$K = \frac { 2 } { n ( n + 1 ) } \sum _ { r = 1 } ^ { n } r X _ { r }$$ Louise's estimator is given by $$L = \frac { X _ { 1 } + X _ { 2 } } { 3 } + \frac { X _ { 3 } + X _ { 4 } + \ldots + X _ { n } } { 3 ( n - 2 ) }$$
  1. Show that \(K\) and \(L\) are both unbiased estimators of \(\mu\)
    1. Find \(\operatorname { Var } ( K )\)
    2. Find \(\operatorname { Var } ( L )\) The winner of the challenge is the person who finds the better estimator.
  2. Determine the winner of the challenge for large values of \(n\). Give reasons for your answer.
Question 7
View details
  1. A rectangle is to have an area of \(40 \mathrm {~cm} ^ { 2 }\)
The length of the rectangle, \(L \mathrm {~cm}\), follows a continuous uniform distribution over the interval [4, 10] Find the expected value of the perimeter of the rectangle.
Use algebraic integration, rather than your calculator, to evaluate any definite integrals.
Question 8
View details
  1. The continuous random variable \(X\) has cumulative distribution function given by
$$\mathrm { F } ( x ) = \left\{ \begin{array} { c r } 0 & x < 1
1.5 x - 0.25 x ^ { 2 } - 1.25 & 1 \leqslant x \leqslant 3
1 & x > 3 \end{array} \right.$$
  1. Find the exact value of the median of \(X\)
  2. Find \(\mathrm { P } ( X < 1.6 \mid X > 1.2 )\) The random variable \(Y = \frac { 1 } { X }\)
  3. Specify fully the cumulative distribution function of \(Y\)
  4. Hence or otherwise find the mode of \(Y\)