OCR MEI Further Statistics A AS (Further Statistics A AS) 2019 June

Question 1
View details
1 The discrete random variable \(X\) has probability distribution defined by $$\mathrm { P } ( X = r ) = k \left( r ^ { 2 } + 3 r \right) \text { for } r = 1,2,3,4,5 \text {, where } k \text { is a constant. }$$
  1. Complete the table below, using the copy in the Printed Answer Booklet giving the probabilities in terms of \(k\).
    \(r\)12345
    \(\mathrm { P } ( X = r )\)\(4 k\)\(10 k\)
  2. Show that the value of \(k\) is 0.01 .
  3. Draw a graph to illustrate the distribution.
  4. Describe the shape of the distribution.
  5. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
Question 2
View details
2 Almost all plants of a particular species have red flowers. However on average 1 in every 1500 plants of this species have white flowers. A random sample of 2000 plants of this species is selected. The random variable \(X\) represents the number of plants in the sample that have white flowers.
  1. Name two distributions which could be used to model the distribution of \(X\), stating the parameters of each of these distributions. You may use either of the distributions you have named in the rest of this question.
  2. Calculate each of the following.
    • \(\mathrm { P } ( X = 2 )\)
    • \(\mathrm { P } ( X > 2 )\)
    • A random sample of 20000 plants of this species is selected.
    Calculate the probability that there are at least 10 plants in the sample that have white flowers.
Question 3
View details
3 A fair 8 -sided dice has faces labelled 10, 20, 30, ..., 80 .
  1. State the distribution of the score when the dice is rolled once.
  2. Write down the probability that, when the dice is rolled once, the score is at least 40 .
  3. The dice is rolled three times.
    1. Find the variance of the total score obtained.
    2. Find the probability that on one of the rolls the score is less than 30 , on another it is between 30 and 50 inclusive and on the other it is greater than 50 .
Question 4
View details
4 A student is investigating correlations between various personality traits, two of which are conscientiousness and openness to new experiences.
She selects a random sample of 10 students at her university and uses standard tests to measure their conscientiousness and their openness. The product moment correlation coefficient between these two variables for the 10 students is 0.476 .
  1. Assuming that the underlying population has a bivariate Normal distribution, carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any correlation between openness and conscientiousness in students. Table 4.1 below shows the values of the product moment correlation coefficients between 5 different personality traits for a much larger sample of students. Those correlations that are significant at the \(5 \%\) level are denoted by a * after the value of the correlation. \begin{table}[h]
    NeuroticismExtroversionOpennessAgreeablenessConscientiousness
    Neuroticism1
    Extroversion-0.296*1
    Openness-0.0440.405*1
    Agreeableness-0.190*0.0610.0421
    Conscientiousness-0.485*0.1450.235*0.1121
    \captionsetup{labelformat=empty} \caption{Table 4.1}
    \end{table} The student analyses these factors for effect size.
    Guidelines often used when considering effect size are given in Table 4.2 below. \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Table 4.2}
    \end{table}
  2. The student notes that, despite the result of the test in part (a), the correlation between openness and conscientiousness is significant at the \(5 \%\) level with this second sample. Comment briefly on why this may be the case.
  3. The student intends to summarise her findings about relationships between these factors, including effect sizes, in a report.
    Use the information in Tables 4.1 and 4.2 to identify two summary points the student could make.
Question 5
View details
5 A researcher is investigating births of females and males in a particular species of animal which very often produces litters of 7 offspring.
The table shows some data about the number of females per litter in 200 litters of 7 offspring. The researcher thinks that a binomial distribution \(\mathrm { B } ( 7 , p )\) may be an appropriate model for these data. (c) Complete the test at the \(5 \%\) significance level. Fig. 5 shows the probability distribution \(\mathrm { B } ( 7,0.35 )\) together with the relative frequencies of the observed data (the numbers of litters each divided by 200). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-5_659_1285_342_319} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure} (d) Comment on the result of the test completed in part (c) by considering Fig. 5.
Question 6
View details
6 A meteorologist is investigating the relationship between altitude \(x\) metres and mean annual temperature \(y ^ { \circ } \mathrm { C }\) in an American state.
She selects 12 locations at various altitudes and then stations a remote monitoring device at each of them to measure the temperature over the course of a year. Fig. 6 illustrates the data which she obtains. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{fd496303-10f1-450e-bbeb-421ab6f4de21-6_686_1477_486_292} \captionsetup{labelformat=empty} \caption{Fig. 6}
\end{figure}
  1. Explain why it would not be appropriate to carry out a hypothesis test for correlation based on the product moment correlation coefficient.
  2. Explain why altitude has been plotted on the horizontal axis in Fig. 6. Summary statistics for \(x\) and \(y\) are as follows. $$\sum x = 21200 \quad \sum y = 105.4 \quad \sum x ^ { 2 } = 39100000 \quad \sum y ^ { 2 } = 1004 \quad \sum x y = 176090$$
  3. Calculate the equation of the regression line of \(y\) on \(x\).
  4. Use the equation of the regression line to predict the values of the mean annual temperature at each of the following altitudes.
    • 2000 metres
    • 3000 metres
    • Comment on the reliability of your predictions in part (d).
    • Calculate the value of the residual for the data point ( \(1600,8.1\) ).