8. Five coins were tossed 100 times and the number of heads recorded. The results are shown in the table below.
- Calculate Spearman's rank correlation coefficient for the marks awarded by the two judges.
After the show, one competitor complained about the judges. She claimed that there was no positive correlation between their marks.
- Stating your hypotheses clearly, test whether or not this sample provides support for the competitor's claim. Use a \(5 \%\) level of significance.
(4)
2. The Director of Studies at a large college believed that students' grades in Mathematics were independent of their grades in English. She examined the results of a random group of candidates who had studied both subjects and she recorded the number of candidates in each of the 6 categories shown.
Showing your working clearly, test, at the \(1 \%\) level of significance, whether or not there is an association between gender and the type of course taken. State your hypotheses clearly.
3. The product moment correlation coefficient is denoted by \(r\) and Spearman's rank correlation coefficient is denoted by \(r _ { s }\). - Sketch separate scatter diagrams, with five points on each diagram, to show
- \(r = 1\),
- \(r _ { s } = - 1\) but \(r > - 1\).
Two judges rank seven collie dogs in a competition. The collie dogs are labelled \(A\) to \(G\) and the rankings are as follows.
- Calculate Spearman's rank correlation coefficient for these data.
- Stating your hypotheses clearly and using a one tailed test with a \(5 \%\) level of significance, interpret your rank correlation coefficient.
- Give a reason to support the use of the rank correlation coefficient rather than the product moment correlation coefficient with these data.
(1)
4. A sample of size 8 is to be taken from a population that is normally distributed with mean 55 and standard deviation 3 . Find the probability that the sample mean will be greater than 57 .
(5)
5. The number of goals scored by a football team is recorded for 100 games. The results are summarised in Table 1 below.
\begin{table}[h] - Calculate Spearman's rank correlation coefficient between \(b\) and \(s\).
- Stating your hypotheses clearly, test whether or not the data provides support for the researcher's claim. Use a \(1 \%\) level of significance.
(4)
5. A random sample of 100 people were asked if their finances were worse, the same or better than this time last year. The sample was split according to their annual income and the results are shown in the table below. - Calculate the Spearman's rank correlation coefficient between \(h\) and \(c\).
After collecting the data, the councillor thinks there is no correlation between hardship and the number of calls to the emergency services.
- Test, at the \(5 \%\) level of significance, the councillor's claim. State your hypotheses clearly.
3. A factory manufactures batches of an electronic component. Each component is manufactured in one of three shifts. A component may have one of two types of defect, \(D _ { 1 }\) or \(D _ { 2 }\), at the end of the manufacturing process. A production manager believes that the type of defect is dependent upon the shift that manufactured the component. He examines 200 randomly selected defective components and classifies them by defect type and shift.
The results are shown in the table below. - Calculate Spearman's rank correlation coefficient for these data.
- Test, at the \(5 \%\) level of significance, whether there is agreement between the rankings awarded by each manager. State your hypotheses clearly.
Manager \(Y\) later discovered he had miscopied his score for candidate \(D\) and it should be 54 .
- Without carrying out any further calculations, explain how you would calculate Spearman rank correlation in this case.
(2)
2. A lake contains 3 species of fish. There are estimated to be 1400 trout, 600 bass and 450 pike in the lake. A survey of the health of the fish in the lake is carried out and a sample of 30 fish is chosen. - Give a reason why stratified random sampling cannot be used.
- State an appropriate sampling method for the survey.
- Give one advantage and one disadvantage of this sampling method.
- Explain how this sampling method could be used to select the sample of 30 fish. You must show your working.
(4)
3. (a) Explain what you understand by the Central Limit Theorem.
A garage services hire cars on behalf of a hire company. The garage knows that the lifetime of the brake pads has a standard deviation of 5000 miles. The garage records the lifetimes, \(x\) miles, of the brake pads it has replaced. The garage takes a random sample of 100 brake pads and finds that \(\sum x = 1740000\). - Find a 95\% confidence interval for the mean lifetime of a brake pad.
- Explain the relevance of the Central Limit Theorem in part (b).
Brake pads are made to be changed very 20000 miles on average. The hire car company complain that the garage is changing the brake pads too soon.
- Comment on the hire company's complaint. Give a reason for your answer.
4. Two breeds of chicken are surveyed to measure their egg yield. The results are shown in the table below. - Find, to 3 decimal places, Spearman's rank correlation coefficient between the population and the number of council employees.
- Use your value of Spearman's rank correlation coefficient to test for evidence of a positive correlation between the population and the number of council employees. Use a \(2.5 \%\) significance level. State your hypotheses clearly.
It is suggested that a product moment correlation coefficient would be a more suitable calculation in this case. The product moment correlation coefficient for these data is 0.627 to 3 decimal places.
- Use the value of the product moment correlation coefficient to test for evidence of a positive correlation between the population and the number of council employees. Use a \(2.5 \%\) significance level.
- Interpret and comment on your results from part (b) and part (c).
4. John thinks that a person's eye colour is related to their hair colour. He takes a random sample of 600 people and records their eye and hair colours. The results are shown in Table 1.
\begin{table}[h]
Using a \(5 \%\) level of significance, test whether or not there is an association between cholesterol level and intake of saturated fats. State your hypotheses and show your working clearly.
2. The table below shows the number of students per member of staff and the student satisfaction scores for 7 universities. - Calculate Spearman's rank correlation coefficient for these data.
The journalist believes that car models with higher fuel efficiency will achieve higher sales.
- Stating your hypotheses clearly, test whether or not the data support the journalist's belief. Use a \(5 \%\) level of significance.
- State the assumption necessary for a product moment correlation coefficient to be valid in this case.
(1) - The mean and median fuel efficiencies of the car models in the random sample are \(14.5 \mathrm {~km} /\) litre and \(15.65 \mathrm {~km} /\) litre respectively. Considering these statistics, as well as the distribution of the fuel efficiency data, state whether or not the data suggest that the assumption in part (c) might be true in this case. Give a reason for your answer.
(No further calculations are required.)
2. A survey asked a random sample of 200 people their age and the main use of their mobile phone.
The results are shown in Table 1 below.
\begin{table}[h]
Stating your hypotheses, test at the \(5 \%\) level of significance, whether or not there is evidence of an association between happiness and gender. Show your working clearly.
4. The random variable \(A\) is defined as
$$A = B + 4 C - 3 D$$
where \(B\), \(C\) and \(D\) are independent random variables with
$$B \sim \mathrm {~N} \left( 6,2 ^ { 2 } \right) \quad C \sim \mathrm {~N} \left( 7,3 ^ { 2 } \right) \quad D \sim \mathrm {~N} \left( 4,1.5 ^ { 2 } \right)$$
Find \(\mathrm { P } ( A < 45 )\).
5. A research station is doing some work on the germination of a new variety of genetically modified wheat.
They planted 120 rows containing 7 seeds in each row.
The number of seeds germinating in each row was recorded. The results are as follows
Starting with the top left-hand corner (319) and working across, the committee selects 50 random numbers. The first 2 suitable numbers are 241 and 278 . Numbers greater than 300 are ignored. - Find the next two suitable numbers.
When the club's committee looks at the members corresponding to their random numbers they find that only 1 female has been selected.
The committee does not want to be accused of being biased towards males so considers using a systematic sample instead. - Explain clearly how the committee could take a systematic sample.
- Explain why a systematic sample may not give a sample that represents the proportion of males and females in the club.
The committee decides to use a stratified sample instead.
- Describe how to choose members for the stratified sample.
- Explain an advantage of using a stratified sample rather than a quota sample.
2. The random variable \(X\) follows a continuous uniform distribution over the interval \([ \alpha - 3,2 \alpha + 3 ]\) where \(\alpha\) is a constant.
The mean of a random sample of size \(n\) is denoted by \(\bar { X }\). - Show that \(\bar { X }\) is a biased estimator of \(\alpha\), and state the bias.
Given that \(Y = k \bar { X }\) is an unbiased estimator for \(\alpha\),
- find the value of \(k\).
A random sample of 10 values of \(X\) is taken and the results are as follows
$$\begin{array} { l l l l l l l l l l }
3 & 5 & 8 & 12 & 4 & 13 & 10 & 8 & 5 & 12
\end{array}$$
- Hence estimate the maximum value of \(X\).
3. A grocer believes that the average weight of a grapefruit from farm \(A\) is greater than the average weight of a grapefruit from farm \(B\). The weights, in grams, of 80 grapefruit selected at random from farm \(A\) have a mean value of 532 g and a standard deviation, \(s _ { A }\), of 35 g . A random sample of 100 grapefruit from farm \(B\) have a mean weight of 520 g and a standard deviation, \(S _ { B }\), of 28 g .
Stating your hypotheses clearly and using a \(1 \%\) level of significance, test whether or not the grocer's belief is supported by the data.
4. In a survey 10 randomly selected men had their systolic blood pressure, \(x\), and weight, \(w\), measured. Their results are as follows:
| Man | \(\boldsymbol { A }\) | \(\boldsymbol { B }\) | \(\boldsymbol { C }\) | \(\boldsymbol { D }\) | \(\boldsymbol { E }\) | \(\boldsymbol { F }\) | \(\boldsymbol { G }\) | \(\boldsymbol { H }\) | \(\boldsymbol { I }\) | \(\boldsymbol { J }\) |
| \(x\) | 123 | 128 | 137 | 143 | 149 | 153 | 154 | 159 | 162 | 168 |
| \(w\) | 78 | 93 | 85 | 83 | 75 | 98 | 88 | 87 | 95 | 99 |
- Calculate the value of Spearman's rank correlation coefficient between \(x\) and \(w\).
- Stating your hypotheses clearly, test at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between systolic blood pressure and weight.
The product moment correlation coefficient for these data is 0.5114 .
- Use the value of the product moment correlation coefficient to test, at the \(5 \%\) level of significance, whether or not there is evidence of a positive correlation between systolic blood pressure and weight.
- Using your conclusions to part (b) and part (c), describe the relationship between systolic blood pressure and weight.
5. A random sample of 200 people were asked which hot drink they preferred from tea, coffee and hot chocolate. The results are given below.
| \multirow{2}{*}{} | Type of drink preferred | \multirow{2}{*}{Total} |
| | Tea | Coffee | Hot Chocolate | |
| \multirow{2}{*}{Gender} | Males | 57 | 26 | 11 | 94 |
| Females | 42 | 47 | 17 | 106 |
| Total | 99 | 73 | 28 | 200 |
- Test, at the \(5 \%\) significance level, whether or not there is an association between type of drink preferred and gender. State your hypotheses and show your working clearly. You should state your expected frequencies to 2 decimal places.
- State what difference using a \(0.5 \%\) significance level would make to your conclusion. Give a reason for your answer.
6. Eight tasks were given to each of 125 randomly selected job applicants. The number of tasks failed by each applicant is recorded.
The results are as follows:
| Number of tasks | | failed by an | | applicant |
| 0 | 1 | 2 | 3 | 4 | 5 | |
| Frequency | 2 | 21 | 45 | 42 | 12 | 3 | 0 |
- Show that the probability of a randomly selected task, from this sample, being failed is 0.3 .
An employer believes that a binomial distribution might provide a good model for the number of tasks, out of 8 , that an applicant fails.
He uses a binomial distribution, with the estimated probability 0.3 of a task being failed. The calculated expected frequencies are as follows
| Number of tasks | | failed by an | | applicant |
| 0 | 1 | 2 | 3 | 4 | 5 | |
| Frequency | 7.21 | 24.71 | 37.06 | \(r\) | 17.02 | 5.83 | \(s\) |
- Find the value of \(r\) and the value of \(s\) giving your answers to 2 decimal places.
- Test, at the \(5 \%\) level of significance, whether or not a binomial distribution is a suitable model for these data. State your hypotheses and show your working clearly.
The employer believes that all applicants have the same probability of failing each task.
- Use your result from part (c) to comment on this belief.
7. The random variable \(X\) is defined as
$$X = 4 Y - 3 W$$
where \(Y \sim \mathrm {~N} \left( 40,3 ^ { 2 } \right) , W \sim \mathrm {~N} \left( 50,2 ^ { 2 } \right)\) and \(Y\) and \(W\) are independent. - Find \(\mathrm { P } ( X > 25 )\).
The random variables \(Y _ { 1 } , Y _ { 2 }\) and \(Y _ { 3 }\) are independent and each has the same distribution as \(Y\). The random variable \(A\) is defined as
$$A = \sum _ { i = 1 } ^ { 3 } Y _ { i }$$
The random variable \(C\) is such that \(C \sim \mathrm {~N} \left( 115 , \sigma ^ { 2 } \right)\).
Given that \(\mathrm { P } ( A - C < 0 ) = 0.2\) and that \(A\) and \(C\) are independent, - find the variance of \(C\).