8 A continuous random variable \(X\) has probability density function given by
$$f ( x ) = \begin{cases} k x ^ { n } & 0 \leqslant x \leqslant 1
0 & \text { otherwise } \end{cases}$$
where \(n\) and \(k\) are positive constants.
- Find \(k\) in terms of \(n\).
- Show that \(\mathrm { E } ( X ) = \frac { n + 1 } { n + 2 }\).
It is given that \(n = 3\).
- Find the variance of \(X\).
- One hundred observations of \(X\) are taken, and the mean of the observations is denoted by \(\bar { X }\). Write down the approximate distribution of \(\bar { X }\), giving the values of any parameters.
- Write down the mean and the variance of the random variable \(Y\) with probability density function given by
$$g ( y ) = \begin{cases} 4 \left( y + \frac { 4 } { 5 } \right) ^ { 3 } & - \frac { 4 } { 5 } \leqslant y \leqslant \frac { 1 } { 5 }
0 & \text { otherwise } \end{cases}$$
\section*{June 2006}
1 Calculate the variance of the continuous random variable with probability density function given by
$$f ( x ) = \begin{cases} \frac { 3 } { 37 } x ^ { 2 } & 3 \leqslant x \leqslant 4
0 & \text { otherwise } \end{cases}$$
2 - The random variable \(R\) has the distribution \(\mathrm { B } ( 6 , p )\). A random observation of \(R\) is found to be 6. Carry out a \(5 \%\) significance test of the null hypothesis \(\mathrm { H } _ { 0 } : p = 0.45\) against the alternative hypothesis \(\mathrm { H } _ { 1 } : p \neq 0.45\), showing all necessary details of your calculation.
- The random variable \(S\) has the distribution \(\mathrm { B } ( n , p ) . \mathrm { H } _ { 0 }\) and \(\mathrm { H } _ { 1 }\) are as in part (i). A random observation of \(S\) is found to be 1 . Use tables to find the largest value of \(n\) for which \(\mathrm { H } _ { 0 }\) is not rejected. Show the values of any relevant probabilities.
3 The continuous random variable \(T\) has mean \(\mu\) and standard deviation \(\sigma\). It is known that \(\mathrm { P } ( T < 140 ) = 0.01\) and \(\mathrm { P } ( T < 300 ) = 0.8\).
- Assuming that \(T\) is normally distributed, calculate the values of \(\mu\) and \(\sigma\).
In fact, \(T\) represents the time, in minutes, taken by a randomly chosen runner in a public marathon, in which about \(10 \%\) of runners took longer than 400 minutes.
- State with a reason whether the mean of \(T\) would be higher than, equal to, or lower than the value calculated in part (i).
4
- Explain briefly what is meant by a random sample.
Random numbers are used to select, with replacement, a sample of size \(n\) from a population numbered 000, 001, 002, ..., 799.
- If \(n = 6\), find the probability that exactly 4 of the selected sample have numbers less than 500 .
- If \(n = 60\), use a suitable approximation to calculate the probability that at least 40 of the selected sample have numbers less than 500 .
5 An airline has 300 seats available on a flight to Australia. It is known from experience that on average only \(99 \%\) of those who have booked seats actually arrive to take the flight, the remaining \(1 \%\) being called 'no-shows'. The airline therefore sells more than 300 seats. If more than 300 passengers then arrive, the flight is over-booked. Assume that the number of no-show passengers can be modelled by a binomial distribution.
- If the airline sells 303 seats, state a suitable distribution for the number of no-show passengers, and state a suitable approximation to this distribution, giving the values of any parameters.
Using the distribution and approximation in part (i),
- show that the probability that the flight is over-booked is 0.4165 , correct to 4 decimal places,
- find the largest number of seats that can be sold for the probability that the flight is over-booked to be less than 0.2 .
\section*{June 2006}
6 Customers arrive at a post office at a constant average rate of 0.4 per minute.
- State an assumption needed to model the number of customers arriving in a given time interval by a Poisson distribution.
Assuming that the use of a Poisson distribution is justified,
- find the probability that more than 2 customers arrive in a randomly chosen 1 -minute interval,
- use a suitable approximation to calculate the probability that more than 55 customers arrive in a given two-hour interval,
- calculate the smallest time for which the probability that no customers arrive in that time is less than 0.02 , giving your answer to the nearest second.
7 Three independent researchers, \(A , B\) and \(C\), carry out significance tests on the power consumption of a manufacturer's domestic heaters. The power consumption, \(X\) watts, is a normally distributed random variable with mean \(\mu\) and standard deviation 60. Each researcher tests the null hypothesis \(\mathrm { H } _ { 0 } : \mu = 4000\) against the alternative hypothesis \(\mathrm { H } _ { 1 } : \mu > 4000\).
Researcher \(A\) uses a sample of size 50 and a significance level of \(5 \%\).
- Find the critical region for this test, giving your answer correct to 4 significant figures.
In fact the value of \(\mu\) is 4020 .
- Calculate the probability that Researcher \(A\) makes a Type II error.
- Researcher \(B\) uses a sample bigger than 50 and a significance level of \(5 \%\). Explain whether the probability that Researcher \(B\) makes a Type II error is less than, equal to, or greater than your answer to part (ii).
- Researcher \(C\) uses a sample of size 50 and a significance level bigger than \(5 \%\). Explain whether the probability that Researcher \(C\) makes a Type II error is less than, equal to, or greater than your answer to part (ii).
- State with a reason whether it is necessary to use the Central Limit Theorem at any point in this question.
1 The random variable \(H\) has the distribution \(\mathrm { N } \left( \mu , 5 ^ { 2 } \right)\). It is given that \(\mathrm { P } ( H < 22 ) = 0.242\). Find the value of \(\mu\).
2 A school has 900 pupils. For a survey, Jan obtains a list of all the pupils, numbered 1 to 900 in alphabetical order. She then selects a sample by the following method. Two fair dice, one red and one green, are thrown, and the number in the list of the first pupil in the sample is determined by the following table.
| \cline { 3 - 8 }
\multicolumn{2}{c|}{} | Score on green dice |
| \cline { 3 - 8 }
\multicolumn{2}{c|}{} | 1 | 2 | 3 | 4 | 5 | 6 |
| 1,2 or 3 | 1 | 2 | 3 | 4 | 5 | 6 |
For example, if the scores on the red and green dice are 5 and 2 respectively, then the first member of the sample is the pupil numbered 8 in the list.
Starting with this first number, every 12th number on the list is then used, so that if the first pupil selected is numbered 8 , the others will be numbered \(20,32,44 , \ldots\). - State the size of the sample.
- Explain briefly whether the following statements are true.
(a) Each pupil in the school has an equal probability of being in the sample.
(b) The pupils in the sample are selected independently of one another. - Give a reason why the number of the first pupil in the sample should not be obtained simply by adding together the scores on the two dice. Justify your answer.
3 A fair dice is thrown 90 times. Use an appropriate approximation to find the probability that the number 1 is obtained 14 or more times.
4 A set of observations of a random variable \(W\) can be summarised as follows:
$$n = 14 , \quad \Sigma w = 100.8 , \quad \Sigma w ^ { 2 } = 938.70 .$$
- Calculate an unbiased estimate of the variance of \(W\).
- The mean of 70 observations of \(W\) is denoted by \(\bar { W }\). State the approximate distribution of \(\bar { W }\), including unbiased estimate(s) of any parameter(s).
\section*{Jan 2007}
5 On a particular night, the number of shooting stars seen per minute can be modelled by the distribution \(\operatorname { Po(0.2). }\)
- Find the probability that, in a given 6 -minute period, fewer than 2 shooting stars are seen.
- Find the probability that, in 20 periods of 6 minutes each, the number of periods in which fewer than 2 shooting stars are seen is exactly 13 .
- Use a suitable approximation to find the probability that, in a given 2-hour period, fewer than 30 shooting stars are seen.
6 The continuous random variable \(X\) has the following probability density function:
$$f ( x ) = \begin{cases} a + b x & 0 \leqslant x \leqslant 2
0 & \text { otherwise } \end{cases}$$
where \(a\) and \(b\) are constants. - Show that \(2 a + 2 b = 1\).
- It is given that \(\mathrm { E } ( X ) = \frac { 11 } { 9 }\). Use this information to find a second equation connecting \(a\) and \(b\), and hence find the values of \(a\) and \(b\).
- Determine whether the median of \(X\) is greater than, less than, or equal to \(\mathrm { E } ( X )\).
A television company believes that the proportion of households that can receive Channel C is 0.35 .
- In a random sample of 14 households it is found that 2 can receive Channel C. Test, at the \(2.5 \%\) significance level, whether there is evidence that the proportion of households that can receive Channel C is less than 0.35 .
- On another occasion the test is carried out again, with the same hypotheses and significance level as in part (i), but using a new sample, of size \(n\). It is found that no members of the sample can receive Channel C. Find the largest value of \(n\) for which the null hypothesis is not rejected. Show all relevant working.
\section*{Jan 2007}
8 The quantity, \(X\) milligrams per litre, of silicon dioxide in a certain brand of mineral water is a random variable with distribution \(\mathrm { N } \left( \mu , 5.6 ^ { 2 } \right)\).
- A random sample of 80 observations of \(X\) has sample mean 100.7. Test, at the \(1 \%\) significance level, the null hypothesis \(\mathrm { H } _ { 0 } : \mu = 102\) against the alternative hypothesis \(\mathrm { H } _ { 1 } : \mu \neq 102\).
- The test is redesigned so as to meet the following conditions.
- The hypotheses are \(\mathrm { H } _ { 0 } : \mu = 102\) and \(\mathrm { H } _ { 1 } : \mu < 102\).
- The significance level is \(1 \%\).
- The probability of making a Type II error when \(\mu = 100\) is to be (approximately) 0.05 .
The sample size is \(n\), and the critical region is \(\bar { X } < c\), where \(\bar { X }\) denotes the sample mean.
(a) Show that \(n\) and \(c\) satisfy (approximately) the equation \(102 - c = \frac { 13.0256 } { \sqrt { n } }\).
(b) Find another equation satisfied by \(n\) and \(c\).
(c) Hence find the values of \(n\) and \(c\).
\section*{June 2007}
1 A random sample of observations of a random variable \(X\) is summarised by
$$n = 100 , \quad \Sigma x = 4830.0 , \quad \Sigma x ^ { 2 } = 249 \text { 509.16. }$$ - Obtain unbiased estimates of the mean and variance of \(X\).
- The sample mean of 100 observations of \(X\) is denoted by \(\bar { X }\). Explain whether you would need any further information about the distribution of \(X\) in order to estimate \(\mathrm { P } ( \bar { X } > 60 )\). [You should not attempt to carry out the calculation.]
2 It is given that on average one car in forty is yellow. Using a suitable approximation, find the probability that, in a random sample of 130 cars, exactly 4 are yellow.
3 The proportion of adults in a large village who support a proposal to build a bypass is denoted by \(p\). A random sample of size 20 is selected from the adults in the village, and the members of the sample are asked whether or not they support the proposal.
- Name the probability distribution that would be used in a hypothesis test for the value of \(p\).
- State the properties of a random sample that explain why the distribution in part (i) is likely to be a good model.
\(4 X\) is a continuous random variable. - State two conditions needed for \(X\) to be well modelled by a normal distribution.
- It is given that \(X \sim \mathrm {~N} \left( 50.0,8 ^ { 2 } \right)\). The mean of 20 random observations of \(X\) is denoted by \(\bar { X }\). Find \(\mathrm { P } ( \bar { X } > 47.0 )\).
5 The number of system failures per month in a large network is a random variable with the distribution \(\operatorname { Po } ( \lambda )\). A significance test of the null hypothesis \(\mathrm { H } _ { 0 } : \lambda = 2.5\) is carried out by counting \(R\), the number of system failures in a period of 6 months. The result of the test is that \(\mathrm { H } _ { 0 }\) is rejected if \(R > 23\) but is not rejected if \(R \leqslant 23\).
- State the alternative hypothesis.
- Find the significance level of the test.
- Given that \(\mathrm { P } ( R > 23 ) < 0.1\), use tables to find the largest possible actual value of \(\lambda\). You should show the values of any relevant probabilities.
6 In a rearrangement code, the letters of a message are rearranged so that the frequency with which any particular letter appears is the same as in the original message. In ordinary German the letter \(e\) appears \(19 \%\) of the time. A certain encoded message of 20 letters contains one letter \(e\).
- Using an exact binomial distribution, test at the \(10 \%\) significance level whether there is evidence that the proportion of the letter \(e\) in the language from which this message is a sample is less than in German, i.e., less than \(19 \%\).
- Give a reason why a binomial distribution might not be an appropriate model in this context.
7 Two continuous random variables \(S\) and \(T\) have probability density functions as follows.
$$\begin{array} { l l }
S : & f ( x ) = \begin{cases} \frac { 1 } { 2 } & - 1 \leqslant x \leqslant 1
0 & \text { otherwise } \end{cases}
T : & g ( x ) = \begin{cases} \frac { 3 } { 2 } x ^ { 2 } & - 1 \leqslant x \leqslant 1
0 & \text { otherwise } \end{cases}
\end{array}$$ - Sketch on the same axes the graphs of \(y = \mathrm { f } ( x )\) and \(y = \mathrm { g } ( x )\). [You should not use graph paper or attempt to plot points exactly.]
- Explain in everyday terms the difference between the two random variables.
- Find the value of \(t\) such that \(\mathrm { P } ( T > t ) = 0.2\).
8 A random variable \(Y\) is normally distributed with mean \(\mu\) and variance 12.25. Two statisticians carry out significance tests of the hypotheses \(\mathrm { H } _ { 0 } : \mu = 63.0 , \mathrm { H } _ { 1 } : \mu > 63.0\).
- Statistician \(A\) uses the mean \(\bar { Y }\) of a sample of size 23, and the critical region for his test is \(\bar { Y } > 64.20\). Find the significance level for \(A\) 's test.
- Statistician \(B\) uses the mean of a sample of size 50 and a significance level of \(5 \%\).
(a) Find the critical region for \(B\) 's test.
(b) Given that \(\mu = 65.0\), find the probability that \(B\) 's test results in a Type II error. - Given that, when \(\mu = 65.0\), the probability that \(A\) 's test results in a Type II error is 0.1365 , state with a reason which test is better.