OCR Further Statistics (Further Statistics) 2023 June

Question 1
View details
1 A certain section of a library contains several thousand books. A lecturer is looking for a book that refers to a particular topic. The lecturer believes that one-twentieth of the books in that section of the library contain a reference to that topic. However, the lecturer does not know which books they might be, so the lecturer looks in each book in turn for a reference to the topic. The first book the lecturer finds that refers to the topic is the \(X\) th book in which the lecturer looks.
  1. A student says, "There is a maximum value of \(X\) as there is only a finite number of books. So a geometric distribution cannot be a good model for \(X\)." Explain whether you agree with the student.
    1. State one modelling assumption (not involving the total number of books) needed for \(X\) to be modelled by a geometric distribution in this context.
    2. Suggest a reason why this assumption may not be valid in this context. Assume now that \(X\) can be well modelled by the distribution \(\operatorname { Geo } ( 0.05 )\).
  2. The probability that the lecturer needs to look in no more than \(n\) books is greater than 0.9 . Find the smallest possible value of \(n\).
  3. The lecturer needs to find four different books that refer to the topic. Find the probability that the lecturer wants to look in exactly 40 books.
Question 2
View details
2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, \(\pounds P\), of the most expensive tickets and the number of people in the audience, \(H\) hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.
\(P\) (£)7565554535
\multirow[t]{5}{*}{\(H\) (hundred)}2727272615
2727202112
2218169
191813
12169
\(\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535\)
  1. Calculate the equation of the regression line of \(h\) on \(p\).
  2. State what change, if any, there would be to your answer to part (a) if \(H\) had been measured in thousands (to 1 decimal place) rather than in hundreds. For a special charity concert, the most expensive tickets cost \(\pounds 50\).
  3. Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to \(\mathbf { 1 }\) decimal place.
  4. Comment on the reliability of your answer to part (c). You should refer to
    • the value of the product-moment correlation coefficient for the data, which is 0.642
    • the value of \(\pounds 50\)
    • any one other relevant factor that should be taken into account.
Question 3
View details
3 The discrete random variable \(W\) has the distribution \(\mathrm { U } ( 11 )\). The independent discrete random variable \(V\) has the distribution \(\mathrm { U } ( 5 )\).
  1. It is given that, for constants \(m\) and \(n\), with \(m > 0\), \(\mathrm { E } ( \mathrm { mW } + \mathrm { nV } ) = 0\) and \(\operatorname { Var } ( \mathrm { mW } + \mathrm { nV } ) = 1\). Determine the exact values of \(m\) and \(n\). The random variable \(T\) is the mean of three independent observations of \(W\).
  2. Explain whether the Central Limit Theorem can be used to say that the distribution of \(T\) is approximately normal.
Question 4
View details
4 Two magazines give numerical ratings to hi-fi systems. Li wishes to test whether there is agreement between the opinions of the magazines. Li chooses a random sample of 5 hi -fi systems and looks up the ratings given by the two magazines. The results are shown in the table.
SystemABCDE
Magazine I6875778392
Magazine II3025403545
  1. Give a reason why Li might choose to use a test based on Spearman's rank correlation coefficient rather than on Pearson’s product-moment correlation coefficient.
  2. Calculate the value of Spearman's rank correlation coefficient for the data.
  3. Use your answer to part (b) to carry out a hypothesis test at the \(5 \%\) significance level.
  4. The value of Spearman's rank correlation coefficient between the ratings given by magazine I and by a third magazine, magazine III, has the same numerical value as the answer to part (b) but with the sign changed. In the Printed Answer Booklet, complete the table showing the rankings given by magazine III.
Question 5
View details
5 An historian has reason to believe that the average age at which men got married in the seventeenth century was higher in urban areas compared to rural areas. The historian collected data from a random sample of 8 men in an urban area and a random sample of 6 men in a rural area, all of whom were married in the seventeenth century. The results were as follows, given in the form years/months.
Urban:\(18 / 3\)\(18 / 5\)\(19 / 9\)\(20 / 7\)\(25 / 6\)\(34 / 6\)\(41 / 8\)\(46 / 3\)
Rural:\(18 / 0\)\(18 / 1\)\(18 / 4\)\(19 / 11\)\(22 / 2\)\(28 / 11\)
  1. Use an appropriate non-parametric method to test at the \(5 \%\) significance level whether the average age at marriage of men is higher in urban areas than in rural areas.
  2. When checking the data, the historian found that the age of one of the men, Mr X, which had been recorded as 28/11, had been wrongly recorded. When corrected, the result of the test in part (a) was unchanged. Determine the youngest age that Mr X could have been, given that it was not the same, in years and months, as that of any of the other men in the sample.
Question 6
View details
6 The continuous random variable \(X\) has a uniform distribution on the interval \([ - \pi , \pi ]\).
The random variable \(Y\) is defined by \(Y = \sin X\).
Determine the cumulative distribution function of \(Y\).
Question 7
View details
7 A club secretary collects data about the time, \(T\) minutes, needed to process the details of a new member. The mean of \(T\) is denoted by \(\mu\). The variance of \(T\) is denoted by \(\sigma ^ { 2 }\). The results of a random sample of 40 observations of \(T\) are summarised as follows.
\(\mathrm { n } = 40 \quad \Sigma \mathrm { t } = 396.0 \quad \Sigma \mathrm { t } ^ { 2 } = 4271.40\)
  1. Determine a 99\% confidence interval for \(\mu\).
  2. The secretary discovers that over a long period the value of \(\sigma ^ { 2 }\) is in fact 10.0 . The secretary collects an independent random sample of 50 observations of \(T\) and constructs a new 99\% confidence interval for \(\mu\) based on this sample of size 50 , but using \(\sigma ^ { 2 } = 10.0\). Find the probability that this new confidence interval contains the value \(\mu + 1.6\).
Question 8
View details
8 A team of researchers have reason to believe that the number of calls received in randomly chosen 10-minute intervals to a call centre can be well modelled by a Poisson distribution. To test this belief the researchers record the number of telephone calls received in 60 randomly chosen 10-minute intervals. The results, together with relevant calculations, are shown in the following table.
Total
Number of calls, \(r\)01234\(\geqslant 5\)
Observed frequency, \(f\)18131298060
rf013242732096
\(\mathrm { r } ^ { 2 } \mathrm { f }\)01348811280270
Expected frequency12.11419.38215.5068.2703.3081.42160
Contribution to test statistic2.8602.1010.7931.2326.99
  1. Calculate the mean of the observed number of calls received.
  2. Calculate the variance of the observed number of calls received.
  3. Comment on what your answers to parts (a) and (b) suggest about the proposed model.
  4. Explain why it is necessary to combine some cells in the table.
  5. Show how the values 15.506 and 0.793 in the table were obtained.
  6. Carry out the test, at the \(5 \%\) significance level. In the light of the result of the test, the team consider that a different model is appropriate. They propose the following improved model: $$P ( R = r ) = \begin{cases} \frac { 1 } { 60 } ( a + ( 2 - r ) b ) & r = 0,1,2,3,4
    0 & \text { otherwise } \end{cases}$$ where \(a\) and \(b\) are integers.
  7. Use at least three of the observed frequencies to suggest appropriate values for \(a\) and \(b\). You should consider more than one possible pair of values, and explain which pair of values you consider best. (Do not carry out a goodness-of-fit test.)