OCR Further Statistics AS (Further Statistics AS) 2020 November

Question 1
View details
1 Five observations of bivariate data \(( x , y )\) are given in the table.
\(x\)781264
\(y\)201671723
  1. Find the value of Pearson's product-moment correlation coefficient.
  2. State what your answer to part (a) tells you about a scatter diagram representing the data.
  3. A new variable \(a\) is defined by \(\mathrm { a } = 3 \mathrm { x } + 4\). Dee says "The value of Pearson's product-moment correlation coefficient between \(a\) and \(y\) will not be the same as the answer to part (a)." State with a reason whether you agree with Dee.
Question 2
View details
2 Every time a spinner is spun, the probability that it shows the number 4 is 0.2 , independently of all other spins.
  1. A pupil spins the spinner repeatedly until it shows the number 4. Find the mean of the number of spins required.
  2. Calculate the probability that the number of spins required is between 3 and 10 inclusive.
  3. Each pupil in a class of 30 spins the spinner until it shows the number 4. Out of the 30 pupils, the number of pupils who require at least 10 spins is denoted by \(X\). Determine the variance of \(X\).
Question 3
View details
3 An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
AccountABCDEFGH
\(p\)1.62.12.42.72.83.35.28.4
\(q\)1.62.32.22.23.12.97.64.8
\(n = 8 \quad \sum \mathrm { p } = 28.5 \quad \sum \mathrm { q } = 26.7 \quad \sum \mathrm { p } ^ { 2 } = 136.35 \quad \sum \mathrm { q } ^ { 2 } = 116.35 \quad \sum \mathrm { pq } = 116.70\)
\includegraphics[max width=\textwidth, alt={}, center]{bf1468d1-e02e-47d2-bf41-5bc8f5b4d7c4-3_782_1280_998_242}
  1. State which, if either, of the variables \(p\) and \(q\) is independent.
  2. Calculate the equation of the regression line of \(q\) on \(p\).
    1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
    2. Give two reasons why this estimate could be considered reliable.
  3. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
Question 4
View details
4 After a holiday organised for a group, the company organising the holiday obtained scores out of 10 for six different aspects of the holiday. The company obtained responses from 100 couples and 100 single travellers. The total scores for each of the aspects are given in the following table.
AspectCouplesSingle travellers
Organisation884867
Travel710633
Food692675
Leader898898
Included visits561736
Optional visits683712
Fred wishes to test whether there is significant positive correlation between the scores given by the two categories.
  1. Explain why it is probably not appropriate to use Pearson's product-moment correlation coefficient.
  2. Carry out an appropriate test at the \(1 \%\) level.
  3. Explain what is meant by the statement that the test carried out in part (b) is a non-parametric test.
Question 5
View details
5 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h]
\multirow{2}{*}{Observed frequencies}Session
EarlyMiddleLate
\multirow{3}{*}{Age group}< 25242040
25 to 604210
> 60282210
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} The cinema manager carries out a test of whether there is any association between age group and session attended.
  1. Show that it is necessary to combine cells in order to carry out the test. It is decided to combine the second and third rows of the table. Some of the expected frequencies for the table with rows combined, and the corresponding contributions to the \(\chi ^ { 2 }\) test statistic, are shown in the following incomplete tables. \begin{table}[h]
    \multirow{2}{*}{Expected frequencies}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 2529.423.1
    \(\geqslant 25\)26.620.9
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table} \begin{table}[h]
    \multirow{2}{*}{Contribution to \(\chi ^ { 2 }\)}Session
    EarlyMiddleLate
    \multirow{2}{*}{Age group}< 250.99180.4160
    \(\geqslant 25\)1.09620.4598
    \captionsetup{labelformat=empty} \caption{Table 3}
    \end{table}
  2. In the Printed Answer Booklet, complete both tables.
  3. Carry out the test at the \(5 \%\) significance level.
  4. Use the figures in your completed Table 3 to comment on the numbers of the audience in different age groups.
Question 6
View details
6 A statistician investigates the number, \(F\), of signal failures per week on a railway network.
  1. The statistician assumes that signal failures occur randomly. Explain what this statement means.
  2. State two further assumptions needed for \(F\) to be well modelled by a Poisson distribution. In a random sample of 50 weeks, the statistician finds that the mean number of failures per week is 1.61, with standard deviation 1.28.
  3. Explain whether this suggests that \(F\) is likely to be well modelled by a Poisson distribution. Assume first that \(F \sim \operatorname { Po } ( 1.61 )\).
  4. Write down an exact expression for \(\mathrm { P } ( F = 0 )\).
  5. Complete the table in the Printed Answer Booklet to show the probabilities of different values of \(F\), correct to three significant figures.
    Value of \(F\)01\(\geqslant 2\)
    Probability0.200
    After further investigation, the statistician decides to use a different model for the distribution of \(F\). In this model it is now assumed that \(\mathrm { P } ( F = 0 )\) is still 0.200 , but that if one failure occurs, there is an increased probability that further failures occur.
  6. Explain the effect of this assumption on the value of \(\mathrm { P } ( F = 1 )\).
Question 7
View details
7 A bag contains \(2 m\) yellow and \(m\) green counters. Three counters are chosen at random, without replacement. The probability that exactly two of the three counters are yellow is \(\frac { 28 } { 55 }\). Determine the value of \(m\).