OCR Further Statistics (Further Statistics) 2024 June

Question 1
View details
1 A discrete random variable \(X\) has the following distribution, where \(a , b\) and \(c\) are constants.
\(x\)0123
\(\mathrm { P } ( \mathrm { X } = \mathrm { x } )\)\(a\)\(b\)\(c\)0.1
It is given that \(\mathrm { E } ( X ) = 1.25\) and \(\operatorname { Var } ( X ) = 0.8875\).
  1. Determine the values of \(a\), \(b\) and \(c\).
  2. The random variable \(Y\) is defined by \(Y = 7 - 2 X\). Write down the value of \(\operatorname { Var } ( Y )\).
  3. Twenty independent observations of \(X\) are obtained. The number of those observations for which \(X = 3\) is denoted by \(T\). Find the value of \(\operatorname { Var } ( T )\).
Question 2
View details
2 A newspaper article claimed that "taller dog owners have taller dogs as pets". Alex investigated this claim and obtained data from a random sample of 16 fellow students who owned exactly one dog. The results are summarised as follows, where the height of the student, in cm, is denoted by \(h\) and the height, in cm, of their dog is denoted by \(d\).
\(\mathrm { n } = 16 \quad \sum \mathrm {~h} = 2880 \quad \sum \mathrm {~d} = 660 \quad \sum \mathrm {~h} ^ { 2 } = 519276 \quad \sum \mathrm {~d} ^ { 2 } = 30000 \quad \sum \mathrm { hd } = 119425\)
  1. Calculate the value of Pearson's product moment correlation coefficient for the data.
  2. State what your answer tells you about a scatter diagram illustrating the data.
  3. Use the data to test, at the \(5 \%\) significance level, the claim of the newspaper article.
  4. Explain whether the answer to part (a) would be likely to be different if the dogs' weights had been used instead of their heights.
Question 3
View details
3 Research suggests that the mean reading age of a child about to start secondary school is 10.75 . The reading ages, \(X\) years, of a random sample of 80 children who were about to start secondary school in a particular district were measured, and the results are summarised as follows. $$\mathrm { n } = 80 \quad \sum \mathrm { x } = 893 \quad \sum \mathrm { x } ^ { 2 } = 10267$$
  1. Test at the \(5 \%\) significance level whether the mean reading age of children about to start secondary school in this district is not 10.75 .
  2. A student wrote: "Although we do not know that the distribution of \(X\) is normal, the central limit theorem allows us to assume that it is, as the sample size is large." This statement is incorrect. Give a corrected version of the student's statement.
Question 4
View details
4
  1. Write down the number of ways of choosing 5 objects from 12 distinct objects.
  2. Each possible set of 5 different integers selected from the integers \(1,2 , \ldots , 12\) is obtained, and for each set, the sum of the 5 integers is found. The sum \(S\) can take values between 15 and 50 inclusive. Part of the frequency distribution of \(S\) is shown in the following table, together with the cumulative frequencies.
    S151617181920212223
    Frequency112357101317
    Cumulative Frequency12471219294259
    Use these numbers to determine the critical region for a 1-tail Wilcoxon rank-sum test at the \(2 \%\) significance level when \(m = 5\) and \(n = 7\).
  3. A student says that, for a Wilcoxon rank-sum test on samples of size \(m\) and \(n\), where \(m\) and \(n\) are large, the mean and variance of the test statistic \(R _ { m }\) are 200 and \(616 \frac { 2 } { 3 }\) respectively. Show that at least one of these values must be incorrect.
Question 5
View details
5 Some bird-watchers study the song of chaffinches in a particular wood. They investigate whether the number, \(N\), of separate bursts of song in a 5 minute period can be modelled by a Poisson distribution. They assume that a burst of song can be considered as a single event, and that bursts of song occur randomly. \section*{(a) State two further assumptions needed for \(N\) to be well modelled by a Poisson distribution.} The bird-watchers record the value of \(N\) in each of 60 periods of 5 minutes. The mean and variance of the results are 3.55 and 5.6475 respectively.
(b) Explain what this suggests about the validity of a Poisson distribution as a model in this context. The complete results are shown in the table.
\(n\)012345678\(\geqslant 9\)
Frequency103781366250
The bird-watchers carry out a \(\chi ^ { 2 }\) goodness of fit test at the \(5 \%\) significance level.
(c) State suitable hypotheses for the test.
(d) Determine the contribution to the test statistic for \(n = 3\).
(e) The total value of the test statistic, obtained by combining the cells for \(n \leqslant 1\) and also for \(n \geqslant 6\), is 9.202 , correct to 4 significant figures. Complete the goodness of fit test.
(f) It is known that chaffinches are more likely to sing in the presence of other chaffinches. Explain whether this fact affects the validity of a Poisson model for \(N\).
Question 6
View details
6 A bag contains 6 identical blue counters and 5 identical yellow counters.
  1. Three counters are selected at random, without replacement. Find the probability that at least two of the counters are blue. All 11 counters are now arranged in a row in a random order.
  2. Find the probability that all the yellow counters are next to each other.
  3. Find the probability that no yellow counter is next to another yellow counter.
  4. Find the probability that the counters are arranged in such a way that both of the following conditions hold.
    • Exactly three of the yellow counters are next to one another.
    • Neither of the other two yellow counters is next to a yellow counter.
    • Explain whether the answer to part (d) would be different if the yellow counters were numbered \(1,2,3,4\) and 5 , so that they are not identical.
Question 7
View details
7 The coordinates of a set of 10 points are denoted by ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) for \(i = 1,2 , \ldots , 10\). For a particular set of values of ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) and any constants \(a\) and \(b\) it can be shown that
\(\Sigma \left( y _ { i } - a - b x _ { i } \right) ^ { 2 } = 10 ( 11 - a - 6 b ) ^ { 2 } + 126 \left( b - \frac { 83 } { 42 } \right) ^ { 2 } + \frac { 139 } { 14 }\).
    1. Explain why \(\sum \left( \mathrm { y } _ { \mathrm { i } } - \mathrm { a } - \mathrm { bx } _ { \mathrm { i } } \right) ^ { 2 }\) is minimised by taking \(b = \frac { 83 } { 42 }\) and \(\mathrm { a } = 11 - 6 \mathrm {~b}\).
    2. Hence explain why the equation of the regression line of \(y\) on \(x\) for these points is given by the corresponding values of \(a\) and \(b\) (so that the equation is \(\mathrm { y } = \frac { 83 } { 42 } \mathrm { x } - \frac { 6 } { 7 }\) ).
  1. State which of the following terms cannot apply to the variable \(X\) if the regression line of \(y\) on \(x\) can be used for estimating values of \(Y\). Dependent Independent Controlled Response
  2. Use the regression line to estimate the value of \(y\) corresponding to \(x = 8\).
  3. State what must be true of the value \(x = 8\) if the estimate in part (c) is to be reliable.
  4. Variables \(u\) and \(v\) are related to \(x\) and \(y\) by the following relationships.
    \(u = 2 + 4 x \quad v = 8 - 2 y\) Show that the gradient of the regression line of \(v\) on \(u\) is very close to - 1 .
Question 8
View details
8 A random sample of 100 students were given a task and the time taken by each student to complete the task was recorded. The maximum time allowed to complete the task was one minute and all students completed the task within the maximum time. The times, \(T\) minutes, for the random sample of students are summarised as follows.
\(n = 100 \quad \sum t = 61.88\) A researcher proposes that \(T\) can be modelled by the continuous random variable with probability density function
\(f ( t ) = \begin{cases} \alpha t ^ { \alpha - 1 } & 0 \leqslant t \leqslant 1 ,
0 & \text { otherwise, } \end{cases}\)
where \(\alpha\) is a positive constant. \section*{(a) In this question you must show detailed reasoning.} By finding \(\mathbf { E } ( T )\) according to the researcher's model, determine an approximation for the value of \(\alpha\). Give your answer correct to \(\mathbf { 3 }\) significant figures. Further information about the times taken for the sample of 100 students to complete the task is given in the table.
Time \(t\)\(0 \leqslant t < \frac { 1 } { 3 }\)\(\frac { 1 } { 3 } \leqslant t < \frac { 2 } { 3 }\)\(\frac { 2 } { 3 } \leqslant t \leqslant 1\)
Frequency183745
(b) Using the value of \(\alpha\) found in part (a), determine the extent to which the proposed model is a good model. (Do not carry out a goodness of fit test.)