OCR S1 (Statistics 1) 2005 January

Question 1
View details
1 The scatter diagrams below illustrate three sets of bivariate data, \(A , B\) and \(C\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_440_428_360_317} \captionsetup{labelformat=empty} \caption{Set \(A\)}
\end{figure} \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_440_426_360_858} \captionsetup{labelformat=empty} \caption{Set \(B\)}
\end{figure} \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_435_424_365_1402} \captionsetup{labelformat=empty} \caption{Set \(C\)}
\end{figure} State, with an explanation in each case, which of the three sets of data has
  1. the largest,
  2. the smallest,
    value of the product moment correlation coefficient.
Question 2
View details
2 The back-to-back stem-and-leaf diagram below shows the number of hours of television watched per week by each of 15 boys and 15 girls. $$\begin{aligned} & \text { Boys Girls }
& \left. \begin{array} { r r r r r r r r | r r r r r r r r r r r r r } & 677664 & 4 & 3 & 0 & 0 & 5 & 5 & 6 & 677888 \end{array} \right\} \end{aligned}$$ Key: 4 | 2 | 2 means a boy who watched 24 hours and a girl who watched 22 hours of television per week.
  1. Find the median and the quartiles of the results for the boys.
  2. Give a reason why the median might be preferred to the mean in using an average to compare the two data sets.
  3. State one advantage, and one disadvantage, of using stem-and-leaf diagrams rather than box-andwhisker plots to represent the data.
Question 3
View details
3 Two commentators gave ratings out of 100 for seven sports personalities. The ratings are shown in the table below.
Personality\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)
Commentator I73767865868291
Commentator II77787980868995
  1. Calculate Spearman's rank correlation coefficient for these ratings.
  2. State what your answer tells you about the ratings given by the two commentators.
Question 4
View details
4 The table below shows the probability distribution of the random variable \(X\).
\(x\)- 2- 1012
\(\mathrm { P } ( X = x )\)\(\frac { 1 } { 4 }\)\(\frac { 1 } { 5 }\)\(k\)\(\frac { 2 } { 5 }\)\(\frac { 1 } { 10 }\)
  1. Find the value of the constant \(k\).
  2. Calculate the values of \(\mathrm { E } ( X )\) and \(\operatorname { Var } ( X )\).
Question 5
View details
5 On average 1 in 20 members of the population of this country has a particular DNA feature. Members of the population are selected at random until one is found who has this feature.
  1. Find the probability that the first person to have this feature is
    (a) the sixth person selected,
    (b) not among the first 10 people selected.
  2. Find the expected number of people selected.
Question 6
View details
6 Louise and Marie play a series of tennis matches. It is given that, in any match, the probability that Louise wins the first two sets is \(\frac { 3 } { 8 }\).
  1. Find the probability that, in 5 randomly chosen matches, Louise wins the first two sets in exactly 2 of the matches. It is also given that Louise and Marie are equally likely to win the first set.
  2. Show that P (Louise wins the second set, given that she won the first set) \(= \frac { 3 } { 4 }\).
  3. The probability that Marie wins the first two sets is \(\frac { 1 } { 3 }\). Find P(Marie wins the second set, given that she won the first set).
Question 7
View details
7 It is known that, on average, one match box in 10 contains fewer than 42 matches. Eight boxes are selected, and the number of boxes that contain fewer than 42 matches is denoted by \(Y\).
  1. State two conditions needed to model \(Y\) by a binomial distribution. Assume now that a binomial model is valid.
  2. Find
    (a) \(\mathrm { P } ( Y = 0 )\),
    (b) \(\mathrm { P } ( Y \geqslant 2 )\).
  3. On Wednesday 8 boxes are selected, and on Thursday another 8 boxes are selected. Find the probability that on one of these days the number of boxes containing fewer than 42 matches is 0 , and that on the other day the number is 2 or more.
Question 8
View details
8 An examination paper consists of 8 questions, of which one is on geometric distributions and one is on binomial distributions.
  1. If the 8 questions are arranged in a random order, find the probability that the question on geometric distributions is next to the question on binomial distributions. Four of the questions, including the one on geometric distributions, are worth 7 marks each, and the remaining four questions, including the one on binomial distributions, are worth 9 marks each. The 7-mark questions are the first four questions on the paper, but are arranged in random order. The 9-mark questions are the last four questions, but are arranged in random order. Find the probability that
  2. the questions on geometric distributions and on binomial distributions are next to one another,
  3. the questions on geometric distributions and on binomial distributions are separated by at least 2 other questions.
Question 9
View details
9 Five observations of bivariate data produce the following results, denoted as ( \(x _ { i } , y _ { i }\) ) for \(i = 1,2,3,4,5\). $$\begin{aligned} & ( 13,2.7 )
& { \left[ \Sigma x = 90 , \Sigma y = 15.0 , \Sigma x ^ { 2 } = 1720 , \Sigma y ^ { 2 } = 46.86 , \Sigma x y = 264.0 . \right] } \end{aligned}$$
  1. Show that the regression line of \(y\) on \(x\) has gradient - 0.06 , and find its equation in the form \(y = a + b x\).
  2. The regression line is used to estimate the value of \(y\) corresponding to \(x = 20\), but the value \(x = 20\) is accurate only to the nearest whole number. Calculate the difference between the largest and the smallest values that the estimated value of \(y\) could take. The numbers \(e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }\) are defined by $$e _ { i } = a + b x _ { i } - y _ { i } \quad \text { for } i = 1,2,3,4,5$$
  3. The values of \(e _ { 1 } , e _ { 2 }\) and \(e _ { 3 }\) are \(0.6 , - 0.7\) and 0.2 respectively. Calculate the values of \(e _ { 4 }\) and \(e _ { 5 }\).
  4. Calculate the value of \(e _ { 1 } ^ { 2 } + e _ { 2 } ^ { 2 } + e _ { 3 } ^ { 2 } + e _ { 4 } ^ { 2 } + e _ { 5 } ^ { 2 }\) and explain the relevance of this quantity to the regression line found in part (i).
  5. Find the mean and the variance of \(e _ { 1 } , e _ { 2 } , e _ { 3 } , e _ { 4 } , e _ { 5 }\).