OCR MEI Further Statistics Minor (Further Statistics Minor) 2021 November

Question 1
View details
1 The probability distribution of a discrete random variable \(X\) is given by the formula \(\mathrm { P } ( \mathrm { X } = \mathrm { r } ) = \mathrm { k } \left( ( \mathrm { r } - 1 ) ^ { 2 } + 1 \right)\) for \(r = 1,2,3,4,5\).
  1. Show that \(k = \frac { 1 } { 35 }\). The distribution of \(X\) is shown in the table.
    \(r\)12345
    \(\mathrm { P } ( \mathrm { X } = \mathrm { r } )\)\(\frac { 1 } { 35 }\)\(\frac { 2 } { 35 }\)\(\frac { 1 } { 7 }\)\(\frac { 2 } { 7 }\)\(\frac { 17 } { 35 }\)
  2. Comment briefly on the shape of the distribution.
  3. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    The random variable \(Y\) is given by \(Y = 5 X - 10\).
  4. Find each of the following.
    • \(\mathrm { E } ( Y )\)
    • \(\operatorname { Var } ( Y )\)
Question 2
View details
2 A road transport researcher is investigating the link between the age of a person, a years, and the distance, \(d\) metres, at which the person can read a large road sign. The researcher selects 13 individuals of different ages between 20 and 80 and measures the value of \(d\) for each of them. The spreadsheet below shows the data which the researcher obtained, together with a scatter diagram which illustrates the data.
\includegraphics[max width=\textwidth, alt={}, center]{691e8b55-e9a1-4fff-b9ee-a71ff1f73ead-3_725_1566_495_251}
  1. Explain which of the two variables \(a\) and \(d\) is the independent variable.
  2. Find the equation of the regression line of \(d\) on \(a\).
  3. Use the regression line to predict the average distance at which a 60-year-old person can read the road sign.
  4. Explain why it might not be sensible to use the regression line to predict the average distance at which a 5 -year-old child can read the road sign.
  5. Determine the value of the residual for \(a = 40\).
  6. Explain why it would not be useful to find the equation of the regression line of \(a\) on \(d\).
Question 3
View details
3 A student wants to know whether there is any association between age and whether or not people smoke. The student takes a sample of 120 adults and asks each of them whether or not they smoke. Below is a screenshot showing part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted.
ABCDE
1\multirow{3}{*}{}Observed frequency
2Age
316-3435-5960 and over
4\multirow{2}{*}{Smoking status}Smoker1373
5Non-smoker284326
6
7Expected frequency
87.8583
933.1417
10
11Contributions to the test statistic
123.36420.69641.1775
130.16510.2792
11
  1. The student wants to carry out a chi-squared test to analyse the data. State a requirement of the sample if the test is to be valid. For the rest of this question, you should assume that this requirement is met.
  2. Determine the missing values in each of the following cells.
    • E8
    • C13
    • In this question you must show detailed reasoning.
    Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any association between age and smoking status.
  3. Discuss what the data suggest about the smoking status for each different age group.
Question 4
View details
4 A scientist is investigating sea salinity (the level of salt in the sea) in a particular area. She wishes to check whether satellite measurements, \(y\), of salinity are similar to those directly measured, \(x\). Both variables are measured in parts per thousand in suitable units. The scientist obtains a random sample of 10 values of \(x\) and the related values of \(y\). Below is a screenshot of a scatter diagram to illustrate the data. She decides to carry out a hypothesis test to check if there is any correlation between direct measurement, \(x\), and satellite measurement, \(y\).
\includegraphics[max width=\textwidth, alt={}, center]{691e8b55-e9a1-4fff-b9ee-a71ff1f73ead-5_830_837_589_246}
  1. Explain why the scientist might decide to carry out a test based on the product moment correlation coefficient. Summary statistics for \(x\) and \(y\) are as follows.
    \(n = 10 \quad \sum x = 351.9 \quad \sum y = 350.0 \quad \sum x ^ { 2 } = 12384.5 \quad \sum y ^ { 2 } = 12251.2 \quad \sum \mathrm { xy } = 12317.2\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive correlation between directly measured and satellite measured salinity levels.
  4. Explain why it would be preferable to use a larger sample. The scientist is also interested in whether there is any correlation between salinity and numbers of a particular species of shrimp in the water. She takes a large sample and finds that the product moment correlation coefficient for this sample is 0.165 . The result of a test based on this sample is to reject the null hypothesis and conclude that there is correlation between salinity and numbers of shrimp.
  5. Comment on the outcome of the hypothesis test with reference to the effect size of 0.165 .
Question 5
View details
5 Biological cell membranes have receptor molecules which perform various functions. It is known that the number of receptor molecules of a particular type can be modelled by a Poisson distribution with mean 6 per area of 1 square unit.
    1. Determine the probability that there are at least 10 of these receptor molecules in an area of 1 square unit.
    2. Determine the probability that there are fewer than 50 of these receptor molecules in an area of 10 square units.
  1. A scientist is looking at areas of 1 square unit of cell membrane in order to find one which has at least 10 receptor molecules. Find the probability that she has to look at more than 20 to find such an area. It is known that the number of receptor molecules of another type in an area of 1 square unit can be modelled by the random variable \(X\) which has a Poisson distribution with mean \(\mu\). It is given that \(\mathrm { E } \left( X ^ { 2 } \right) = 12\).
  2. Determine \(\mathrm { P } ( X < 5 )\).
Question 6
View details
6 A lottery has tickets numbered 1 to \(n\) inclusive, where \(n\) is a positive integer. The random variable \(X\) denotes the number on a ticket drawn at random.
  1. Determine \(\mathrm { P } \left( \mathrm { X } \leqslant \frac { 1 } { 4 } \mathrm { n } \right)\) in each of the following cases.
    1. \(n\) is a multiple of 4 .
    2. \(n\) is of the form \(4 k + 1\), where \(k\) is a positive integer. Give your answer as a single fraction in terms of \(n\).
  2. Given that \(n = 101\), find the probability that \(X\) is within one standard deviation of the mean.