Questions S3 - A-Level Maths

Edexcel S3 2006 June Q2

Write down the approximate distribution of the sample mean height. Give a reason for your answer.
Hence find the probability that the sample mean height is at least 91 cm . \item A biologist investigated whether or not the diet of chickens influenced the amount of cholesterol in their eggs. The cholesterol content of 70 eggs selected at random from chickens fed diet $A$ had a mean value of 198 mg and a standard deviation of 47 mg . A random sample of 90 eggs from chickens fed diet $B$ had a mean cholesterol content of 201 mg and a standard deviation of 23 mg .
Stating your hypotheses clearly and using a $5 \%$ level of significance, test whether or not there is a difference between the mean cholesterol content of eggs laid by chickens fed on these two diets.
State, in the context of this question, an assumption you have made in carrying out the test in part (a). \item The table below shows the price of an ice cream and the distance of the shop where it was purchased from a particular tourist attraction. \end{enumerate}
Shop Distance from tourist attraction (m) Price (£)
A 50 1.75
B 175 1.20
C 270 2.00
D 375 1.05
E 425 0.95
F 580 1.25
G 710 0.80
$H$ 790 0.75
I 890 1.00
J 980 0.85
Find, to 3 decimal places, the Spearman rank correlation coefficient between the distance of the shop from the tourist attraction and the price of an ice cream.
Stating your hypotheses clearly and using a $5 \%$ one-tailed test, interpret your rank correlation coefficient.

Edexcel S3 Q1

A hotel has 160 rooms of which 20 are classified as De-luxe, 40 Premier and 100 as Standard. The manager wants to obtain information about room usage in the hotel by taking a $10 \%$ sample of the rooms.
1. Suggest a suitable sampling method.
2. Explain in detail how the manager should obtain the sample.
3. A random sample of 100 classical CDs produced by a record company had a mean playing time of 70.6 minutes and a standard deviation of 9.1 minutes. An independent random sample of 120 CDs produced by a different company had a mean playing time of 67.2 minutes with a standard deviation of 8.4 minutes.
4. Using a $1 \%$ level of significance, test whether or not there is a difference in the mean playing times of the CDs produced by these two companies. State your hypotheses clearly.
5. State an assumption you made in carrying out the test in part (a).
6. The weights of a group of males are normally distributed with mean 80 kg and standard deviation 2.6 kg . A random sample of 10 of these males is selected.
7. Write down the distribution of $\bar { M }$, the mean weight, in kg , of this sample.
8. Find $\mathrm { P } ( \bar { M } < 78.5 )$.
The weights of a group of females are normally distributed with mean 59 kg and standard deviation 1.9 kg . A random sample of 6 of the males and 4 of the females enters a lift that can carry a maximum load of 730 kg .
Find the probability that the maximum load will be exceeded when these 10 people enter the lift.
4. At the end of a season an athletics coach graded a random sample of ten athletes according to their performances throughout the season and their dedication to training. The results, expressed as percentages, are shown in the table below.
Athlete Performance Dedication
$A$ 86 72
$B$ 60 69
$C$ 78 59
$D$ 56 68
$E$ 80 80
$F$ 66 84
$G$ 31 65
$H$ 59 55
$I$ 73 79
$J$ 49 53
Calculate the Spearman rank correlation coefficient between performance and dedication.
Stating clearly your hypotheses and using a $10 \%$ level of significance, interpret your rank correlation coefficient.
Give a reason to support the use of the rank correlation coefficient rather than the product moment correlation coefficient with these data.
5. The manager of a leisure centre collected data on the usage of the facilities in the centre by its members. A random sample from her records is summarised below.
Facility Male Female
Pool 40 68
Jacuzzi 26 33
Gym 52 31
Making your method clear, test whether or not there is any evidence of an association between gender and use of the club facilities. State your hypotheses clearly and use a $5 \%$ level of significance.
6. Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters $n = 8$ and $p = 0.5$ is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.
Number of females Observed number of litters Expected number of litters
0 1 0.78
1 9 6.25
2 27 21.88
3 46 $R$
4 49 $S$
5 35 $T$
6 26 21.88
7 5 6.25
8 2 0.78
Find the values of $R , S$ and $T$.
Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a $5 \%$ level of significance. An alternative test might have involved estimating $p$ rather than assuming $p = 0.5$.
Explain how this would have affected the test.
7. The weights of tubs of margarine are known to be normally distributed. A random sample of 10 tubs of margarine were weighed, to the nearest gram, and the results were as follows. $$\begin{array} { l l l l l l l l l l } 498 & 502 & 500 & 496 & 509 & 504 & 511 & 497 & 506 & 499 \end{array}$$
Find unbiased estimates of the mean and the variance of the population from which this sample was taken. Given that the population standard deviation is 5.0 g ,
estimate limits, to 2 decimal places, between which $90 \%$ of the weights of the tubs lie,
find a $95 \%$ confidence interval for the mean weight of the tubs. A second random sample of 15 tubs was found to have a mean weight of 501.9 g .
Stating your hypotheses clearly and using a $1 \%$ level of significance, test whether or not the mean weight of these tubs is greater than 500 g . \section*{END} \section*{Items included with question papers Nil} Answer Book (AB16)
Graph Paper (ASG2)
Mathematical Formulae (Lilac) Candidates may use any calculator EXCEPT those with the facility for symbolic algebra, differentiation and/or integration. Thus candidates may NOT use calculators such as the Texas Instruments TI 89, TI 92, Casio CFX 9970G, Hewlett Packard HP 48G. Paper Reference(s)
6685 \section*{Edexcel GCE
Statistics S3} Advanced/Advanced Subsidiary
Thursday 5 June 2003 - Morning
Time: $\mathbf { 1 }$ hour $\mathbf { 3 0 }$ minutes In the boxes on the answer book, write the name of the examining body (Edexcel), your centre number, candidate number, the unit title (Statistics S3), the paper reference (6685), your surname, other name and signature.
Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
Full marks may be obtained for answers to ALL questions.
This paper has seven questions. You must ensure that your answers to parts of questions are clearly labelled.
You must show sufficient working to make your methods clear to the Examiner. Answers without working may gain no credit.
1. Explain how to obtain a sample from a population using
2. stratified sampling,
3. quota sampling.
Give one advantage and one disadvantage of each sampling method.

Edexcel S3 Q5

5. The manager of a leisure centre collected data on the usage of the facilities in the centre by its members. A random sample from her records is summarised below.

Facility	Male	Female
Pool	40	68
Jacuzzi	26	33
Gym	52	31

Making your method clear, test whether or not there is any evidence of an association between gender and use of the club facilities. State your hypotheses clearly and use a $5 \%$ level of significance.

Edexcel S3 Q6

6. Data were collected on the number of female puppies born in 200 litters of size 8. It was decided to test whether or not a binomial model with parameters $n = 8$ and $p = 0.5$ is a suitable model for these data. The following table shows the observed frequencies and the expected frequencies, to 2 decimal places, obtained in order to carry out this test.

Number of females	Observed number of litters	Expected number of litters
0	1	0.78
1	9	6.25
2	27	21.88
3	46	$R$
4	49	$S$
5	35	$T$
6	26	21.88
7	5	6.25
8	2	0.78

Find the values of $R , S$ and $T$.
Carry out the test to determine whether or not this binomial model is a suitable one. State your hypotheses clearly and use a $5 \%$ level of significance. An alternative test might have involved estimating $p$ rather than assuming $p = 0.5$.
Explain how this would have affected the test.

Edexcel S3 Q7

7. The weights of tubs of margarine are known to be normally distributed. A random sample of 10 tubs of margarine were weighed, to the nearest gram, and the results were as follows. $$\begin{array} { l l l l l l l l l l } 498 & 502 & 500 & 496 & 509 & 504 & 511 & 497 & 506 & 499 \end{array}$$

Find unbiased estimates of the mean and the variance of the population from which this sample was taken. Given that the population standard deviation is 5.0 g ,
estimate limits, to 2 decimal places, between which $90 \%$ of the weights of the tubs lie,
find a $95 \%$ confidence interval for the mean weight of the tubs. A second random sample of 15 tubs was found to have a mean weight of 501.9 g .
Stating your hypotheses clearly and using a $1 \%$ level of significance, test whether or not the mean weight of these tubs is greater than 500 g . \section*{END} \section*{Items included with question papers Nil} Answer Book (AB16)
Graph Paper (ASG2)
Mathematical Formulae (Lilac) Candidates may use any calculator EXCEPT those with the facility for symbolic algebra, differentiation and/or integration. Thus candidates may NOT use calculators such as the Texas Instruments TI 89, TI 92, Casio CFX 9970G, Hewlett Packard HP 48G. Paper Reference(s)
6685 \section*{Edexcel GCE
Statistics S3} Advanced/Advanced Subsidiary
Thursday 5 June 2003 - Morning
Time: $\mathbf { 1 }$ hour $\mathbf { 3 0 }$ minutes In the boxes on the answer book, write the name of the examining body (Edexcel), your centre number, candidate number, the unit title (Statistics S3), the paper reference (6685), your surname, other name and signature.
Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
Full marks may be obtained for answers to ALL questions.
This paper has seven questions. You must ensure that your answers to parts of questions are clearly labelled.
You must show sufficient working to make your methods clear to the Examiner. Answers without working may gain no credit.
1. Explain how to obtain a sample from a population using
2. stratified sampling,
3. quota sampling.
Give one advantage and one disadvantage of each sampling method.
2. A random sample of 30 apples was taken from a batch. The mean weight of the sample was 124 g with standard deviation 20 g .
Find a $99 \%$ confidence interval for the mean weight $\mu$ grams of the population of apples. Write down any assumptions you made in your calculations. Given that the actual value of $\mu$ is 140 ,
state, with a reason, what you can conclude about the sample of 30 apples.
3. Given the random variables $X \sim \mathrm {~N} ( 20,5 )$ and $Y \sim \mathrm {~N} ( 10,4 )$ where $X$ and $Y$ are independent, find
$\mathrm { E } ( X - Y )$,
$\operatorname { Var } ( X - Y )$,
$\mathrm { P } ( 13 < X - Y < 16 )$.
4. A new drug to treat the common cold was used with a randomly selected group of 100 volunteers. Each was given the drug and their health was monitored to see if they caught a cold. A randomly selected control group of 100 volunteers was treated with a dummy pill. The results are shown in the table below.
Write down a suitable model for $X$.
Test, at the $1 \%$ level of significance, the suitability of your model for these data.
Explain how the test would have been modified if it had not been assumed that the dice were fair.
7. The random variable $D$ is defined as $$D = A - 3 B + 4 C$$ where $A \sim \mathrm {~N} \left( 5,2 ^ { 2 } \right) , B \sim \mathrm {~N} \left( 7,3 ^ { 2 } \right)$ and $C \sim \mathrm {~N} \left( 9,4 ^ { 2 } \right)$, and $A , B$ and $C$ are independent.
Find $\mathrm { P } ( \mathrm { D } < 44 )$. The random variables $B _ { 1 } , B _ { 2 }$ and $B _ { 3 }$ are independent and each has the same distribution as $B$. The random variable $X$ is defined as $$X = A - \sum _ { i = 1 } ^ { 3 } B _ { i } + 4 C$$
Find $\mathrm { P } ( X > 0 )$. \section*{END} \section*{6685/01 6691/01
Edexcel GCE} \section*{Thursday 9 June 2005 - Morning} Materials required for examination
Mathematical Formulae (Lilac)
Graph Paper (ASG2) Candidates may use any calculator EXCEPT those with the facility for symbolic algebra, differentiation and/or integration. Thus candidates may NOT use calculators such as the Texas Instruments TI 89, TI 92, Casio CFX 9970G, Hewlett Packard HP 48G. In the boxes on the answer book, write the name of the examining body (Edexcel), your centre number, candidate number, the unit title (Statistics S3), the paper reference (6685), your surname, other name and signature.
Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
Full marks may be obtained for answers to ALL questions.
This paper has seven questions.
The total mark for this paper is 75 . Items included with question papers
Nil
Nil You must ensure that your answers to parts of questions are clearly labelled.
You must show sufficient working to make your methods clear to the Examiner. Answers without working may gain no credit.
1. A researcher carried out a survey of three treatments for a fruit tree disease. The contingency table below shows the results of a survey of a random sample of 60 diseased trees.
Using a $5 \%$ significance level, test whether or not there is an association between gender and acceptance or rejection of an annual flu injection. State your hypotheses clearly.
5. Upon entering a school, a random sample of eight girls and an independent random sample of eighty boys were given the same examination in mathematics. The girls and boys were then taught in separate classes. After one year, they were all given another common examination in mathematics. The means and standard deviations of the boys' and the girls' marks are shown in the table.
Find, to 3 decimal places, the Spearman rank correlation coefficient between the distance of the shop from the tourist attraction and the price of an ice cream.
Stating your hypotheses clearly and using a $5 \%$ one-tailed test, interpret your rank correlation coefficient.
5. The workers in a large office block use a lift that can carry a maximum load of 1090 kg . The weights of the male workers are normally distributed with mean 78.5 kg and standard deviation 12.6 kg . The weights of the female workers are normally distributed with mean 62.0 kg and standard deviation 9.8 kg . Random samples of 7 males and 8 females can enter the lift.
Find the mean and variance of the total weight of the 15 people that enter the lift.
Comment on any relationship you have assumed in part (a) between the two samples.
Find the probability that the maximum load of the lift will be exceeded by the total weight of the 15 people.
6. A research worker studying colour preference and the age of a random sample of 50 children obtained the results shown below.
Age in years Red Blue Totals
4 12 6 18
8 10 7 17
12 6 9 15
Totals 28 22 50
Using a $5 \%$ significance level, carry out a test to decide whether or not there is an association between age and colour preference. State your hypotheses clearly.
7. A machine produces metal containers. The weights of the containers are normally distributed. A random sample of 10 containers from the production line was weighed, to the nearest 0.1 kg , and gave the following results $$\begin{array} { l l l l l } 49.7 , & 50.3 , & 51.0 , & 49.5 , & 49.9
50.1 , & 50.2 , & 50.0 , & 49.6 , & 49.7 . \end{array}$$
Find unbiased estimates of the mean and variance of the weights of the population of metal containers. The machine is set to produce metal containers whose weights have a population standard deviation of 0.5 kg .
Estimate the limits between which $95 \%$ of the weights of metal containers lie.
Determine the $99 \%$ confidence interval for the mean weight of metal containers.

Edexcel S3 Q8

8. Five coins were tossed 100 times and the number of heads recorded. The results are shown in the table below.

Calculate Spearman's rank correlation coefficient for the marks awarded by the two judges. After the show, one competitor complained about the judges. She claimed that there was no positive correlation between their marks.
Stating your hypotheses clearly, test whether or not this sample provides support for the competitor's claim. Use a $5 \%$ level of significance.
(4)
2. The Director of Studies at a large college believed that students' grades in Mathematics were independent of their grades in English. She examined the results of a random group of candidates who had studied both subjects and she recorded the number of candidates in each of the 6 categories shown. Showing your working clearly, test, at the $1 \%$ level of significance, whether or not there is an association between gender and the type of course taken. State your hypotheses clearly.
3. The product moment correlation coefficient is denoted by $r$ and Spearman's rank correlation coefficient is denoted by $r _ { s }$.
Sketch separate scatter diagrams, with five points on each diagram, to show
1. $r = 1$,
2. $r _ { s } = - 1$ but $r > - 1$. Two judges rank seven collie dogs in a competition. The collie dogs are labelled $A$ to $G$ and the rankings are as follows.
Calculate Spearman's rank correlation coefficient for these data.
Stating your hypotheses clearly and using a one tailed test with a $5 \%$ level of significance, interpret your rank correlation coefficient.
Give a reason to support the use of the rank correlation coefficient rather than the product moment correlation coefficient with these data.
(1)
4. A sample of size 8 is to be taken from a population that is normally distributed with mean 55 and standard deviation 3 . Find the probability that the sample mean will be greater than 57 .
(5)
5. The number of goals scored by a football team is recorded for 100 games. The results are summarised in Table 1 below. \begin{table}[h]
Calculate Spearman's rank correlation coefficient between $b$ and $s$.
Stating your hypotheses clearly, test whether or not the data provides support for the researcher's claim. Use a $1 \%$ level of significance.
(4)
5. A random sample of 100 people were asked if their finances were worse, the same or better than this time last year. The sample was split according to their annual income and the results are shown in the table below.
Calculate the Spearman's rank correlation coefficient between $h$ and $c$. After collecting the data, the councillor thinks there is no correlation between hardship and the number of calls to the emergency services.
Test, at the $5 \%$ level of significance, the councillor's claim. State your hypotheses clearly.
3. A factory manufactures batches of an electronic component. Each component is manufactured in one of three shifts. A component may have one of two types of defect, $D _ { 1 }$ or $D _ { 2 }$, at the end of the manufacturing process. A production manager believes that the type of defect is dependent upon the shift that manufactured the component. He examines 200 randomly selected defective components and classifies them by defect type and shift. The results are shown in the table below.
Calculate Spearman's rank correlation coefficient for these data.
Test, at the $5 \%$ level of significance, whether there is agreement between the rankings awarded by each manager. State your hypotheses clearly. Manager $Y$ later discovered he had miscopied his score for candidate $D$ and it should be 54 .
Without carrying out any further calculations, explain how you would calculate Spearman rank correlation in this case.
(2)
2. A lake contains 3 species of fish. There are estimated to be 1400 trout, 600 bass and 450 pike in the lake. A survey of the health of the fish in the lake is carried out and a sample of 30 fish is chosen.
Give a reason why stratified random sampling cannot be used.
State an appropriate sampling method for the survey.
Give one advantage and one disadvantage of this sampling method.
Explain how this sampling method could be used to select the sample of 30 fish. You must show your working.
(4)
3. (a) Explain what you understand by the Central Limit Theorem. A garage services hire cars on behalf of a hire company. The garage knows that the lifetime of the brake pads has a standard deviation of 5000 miles. The garage records the lifetimes, $x$ miles, of the brake pads it has replaced. The garage takes a random sample of 100 brake pads and finds that $\sum x = 1740000$.
Find a 95\% confidence interval for the mean lifetime of a brake pad.
Explain the relevance of the Central Limit Theorem in part (b). Brake pads are made to be changed very 20000 miles on average. The hire car company complain that the garage is changing the brake pads too soon.
Comment on the hire company's complaint. Give a reason for your answer.
4. Two breeds of chicken are surveyed to measure their egg yield. The results are shown in the table below.
Find, to 3 decimal places, Spearman's rank correlation coefficient between the population and the number of council employees.
Use your value of Spearman's rank correlation coefficient to test for evidence of a positive correlation between the population and the number of council employees. Use a $2.5 \%$ significance level. State your hypotheses clearly. It is suggested that a product moment correlation coefficient would be a more suitable calculation in this case. The product moment correlation coefficient for these data is 0.627 to 3 decimal places.
Use the value of the product moment correlation coefficient to test for evidence of a positive correlation between the population and the number of council employees. Use a $2.5 \%$ significance level.
Interpret and comment on your results from part (b) and part (c).
4. John thinks that a person's eye colour is related to their hair colour. He takes a random sample of 600 people and records their eye and hair colours. The results are shown in Table 1. \begin{table}[h] Using a $5 \%$ level of significance, test whether or not there is an association between cholesterol level and intake of saturated fats. State your hypotheses and show your working clearly.
2. The table below shows the number of students per member of staff and the student satisfaction scores for 7 universities.
Calculate Spearman's rank correlation coefficient for these data. The journalist believes that car models with higher fuel efficiency will achieve higher sales.
Stating your hypotheses clearly, test whether or not the data support the journalist's belief. Use a $5 \%$ level of significance.
State the assumption necessary for a product moment correlation coefficient to be valid in this case.
(1)
The mean and median fuel efficiencies of the car models in the random sample are $14.5 \mathrm {~km} /$ litre and $15.65 \mathrm {~km} /$ litre respectively. Considering these statistics, as well as the distribution of the fuel efficiency data, state whether or not the data suggest that the assumption in part (c) might be true in this case. Give a reason for your answer.
(No further calculations are required.)
2. A survey asked a random sample of 200 people their age and the main use of their mobile phone. The results are shown in Table 1 below. \begin{table}[h] Stating your hypotheses, test at the $5 \%$ level of significance, whether or not there is evidence of an association between happiness and gender. Show your working clearly.
4. The random variable $A$ is defined as $$A = B + 4 C - 3 D$$ where $B$, $C$ and $D$ are independent random variables with $$B \sim \mathrm {~N} \left( 6,2 ^ { 2 } \right) \quad C \sim \mathrm {~N} \left( 7,3 ^ { 2 } \right) \quad D \sim \mathrm {~N} \left( 4,1.5 ^ { 2 } \right)$$ Find $\mathrm { P } ( A < 45 )$.
5. A research station is doing some work on the germination of a new variety of genetically modified wheat. They planted 120 rows containing 7 seeds in each row.
The number of seeds germinating in each row was recorded. The results are as follows Starting with the top left-hand corner (319) and working across, the committee selects 50 random numbers. The first 2 suitable numbers are 241 and 278 . Numbers greater than 300 are ignored.
Find the next two suitable numbers. When the club's committee looks at the members corresponding to their random numbers they find that only 1 female has been selected.
The committee does not want to be accused of being biased towards males so considers using a systematic sample instead.
1. Explain clearly how the committee could take a systematic sample.
2. Explain why a systematic sample may not give a sample that represents the proportion of males and females in the club. The committee decides to use a stratified sample instead.
Describe how to choose members for the stratified sample.
Explain an advantage of using a stratified sample rather than a quota sample.
2. The random variable $X$ follows a continuous uniform distribution over the interval $[ \alpha - 3,2 \alpha + 3 ]$ where $\alpha$ is a constant.
The mean of a random sample of size $n$ is denoted by $\bar { X }$.
Show that $\bar { X }$ is a biased estimator of $\alpha$, and state the bias. Given that $Y = k \bar { X }$ is an unbiased estimator for $\alpha$,
find the value of $k$. A random sample of 10 values of $X$ is taken and the results are as follows $$\begin{array} { l l l l l l l l l l } 3 & 5 & 8 & 12 & 4 & 13 & 10 & 8 & 5 & 12 \end{array}$$

Hence estimate the maximum value of $X$.
3. A grocer believes that the average weight of a grapefruit from farm $A$ is greater than the average weight of a grapefruit from farm $B$. The weights, in grams, of 80 grapefruit selected at random from farm $A$ have a mean value of 532 g and a standard deviation, $s _ { A }$, of 35 g . A random sample of 100 grapefruit from farm $B$ have a mean weight of 520 g and a standard deviation, $S _ { B }$, of 28 g . Stating your hypotheses clearly and using a $1 \%$ level of significance, test whether or not the grocer's belief is supported by the data.
4. In a survey 10 randomly selected men had their systolic blood pressure, $x$, and weight, $w$, measured. Their results are as follows:

Man	$\boldsymbol { A }$	$\boldsymbol { B }$	$\boldsymbol { C }$	$\boldsymbol { D }$	$\boldsymbol { E }$	$\boldsymbol { F }$	$\boldsymbol { G }$	$\boldsymbol { H }$	$\boldsymbol { I }$	$\boldsymbol { J }$
$x$	123	128	137	143	149	153	154	159	162	168
$w$	78	93	85	83	75	98	88	87	95	99

Calculate the value of Spearman's rank correlation coefficient between $x$ and $w$.
Stating your hypotheses clearly, test at the $5 \%$ level of significance, whether or not there is evidence of a positive correlation between systolic blood pressure and weight. The product moment correlation coefficient for these data is 0.5114 .
Use the value of the product moment correlation coefficient to test, at the $5 \%$ level of significance, whether or not there is evidence of a positive correlation between systolic blood pressure and weight.
Using your conclusions to part (b) and part (c), describe the relationship between systolic blood pressure and weight.
5. A random sample of 200 people were asked which hot drink they preferred from tea, coffee and hot chocolate. The results are given below.
\multirow{2}{*}{} Type of drink preferred \multirow{2}{*}{Total}
Tea Coffee Hot Chocolate
\multirow{2}{*}{Gender} Males 57 26 11 94
Females 42 47 17 106
Total 99 73 28 200
Test, at the $5 \%$ significance level, whether or not there is an association between type of drink preferred and gender. State your hypotheses and show your working clearly. You should state your expected frequencies to 2 decimal places.
State what difference using a $0.5 \%$ significance level would make to your conclusion. Give a reason for your answer.
6. Eight tasks were given to each of 125 randomly selected job applicants. The number of tasks failed by each applicant is recorded. The results are as follows:
Number of tasks
failed by an
applicant
0 1 2 3 4 5
6 or
more
Frequency 2 21 45 42 12 3 0
Show that the probability of a randomly selected task, from this sample, being failed is 0.3 . An employer believes that a binomial distribution might provide a good model for the number of tasks, out of 8 , that an applicant fails. He uses a binomial distribution, with the estimated probability 0.3 of a task being failed. The calculated expected frequencies are as follows
Number of tasks
failed by an
applicant
0 1 2 3 4 5
6 or
more
Frequency 7.21 24.71 37.06 $r$ 17.02 5.83 $s$
Find the value of $r$ and the value of $s$ giving your answers to 2 decimal places.
Test, at the $5 \%$ level of significance, whether or not a binomial distribution is a suitable model for these data. State your hypotheses and show your working clearly. The employer believes that all applicants have the same probability of failing each task.
Use your result from part (c) to comment on this belief.
7. The random variable $X$ is defined as $$X = 4 Y - 3 W$$ where $Y \sim \mathrm {~N} \left( 40,3 ^ { 2 } \right) , W \sim \mathrm {~N} \left( 50,2 ^ { 2 } \right)$ and $Y$ and $W$ are independent.
Find $\mathrm { P } ( X > 25 )$. The random variables $Y _ { 1 } , Y _ { 2 }$ and $Y _ { 3 }$ are independent and each has the same distribution as $Y$. The random variable $A$ is defined as $$A = \sum _ { i = 1 } ^ { 3 } Y _ { i }$$ The random variable $C$ is such that $C \sim \mathrm {~N} \left( 115 , \sigma ^ { 2 } \right)$.
Given that $\mathrm { P } ( A - C < 0 ) = 0.2$ and that $A$ and $C$ are independent,
find the variance of $C$.

AQA S3 2006 June Q1

1 A council claims that 80 per cent of households are generally satisfied with the services it provides. A random sample of 250 households shows that 209 are generally satisfied with the council's provision of services.

Construct an approximate $95 \%$ confidence interval for the proportion of households that are generally satisfied with the council's provision of services.
Hence comment on the council's claim.

AQA S3 2006 June Q2

2 The table below shows the heart rates, $x$ beats per minute, and the systolic blood pressures, $y$ milligrams of mercury, of a random sample of 10 patients undergoing kidney dialysis.

Patient	$\mathbf { 1 }$	$\mathbf { 2 }$	$\mathbf { 3 }$	$\mathbf { 4 }$	$\mathbf { 5 }$	$\mathbf { 6 }$	$\mathbf { 7 }$	$\mathbf { 8 }$	$\mathbf { 9 }$	$\mathbf { 1 0 }$
$\boldsymbol { x }$	83	86	88	92	94	98	101	111	115	121
$\boldsymbol { y }$	157	172	161	154	171	169	179	180	192	182

Calculate the value of the product moment correlation coefficient for these data.
Assuming that these data come from a bivariate normal distribution, investigate, at the $1 \%$ level of significance, the claim that, for patients undergoing kidney dialysis, there is a positive correlation between heart rate and systolic blood pressure.

AQA S3 2006 June Q3

3 Each enquiry received by a business support unit is dealt with by Ewan, Fay or Gaby. The probabilities of them dealing with an enquiry are $0.2,0.3$ and 0.5 respectively. Of enquiries dealt with by Ewan, 60\% are answered immediately, 25\% are answered later the same day and the remainder are answered at a later date. Of enquiries dealt with by Fay, 75\% are answered immediately, 15\% are answered later the same day and the remainder are answered at a later date. Of enquiries dealt with by Gaby, 90\% are answered immediately and the remainder are answered at a later date.

Determine the probability that an enquiry:
1. is dealt with by Gaby and answered immediately;
2. is answered immediately;
3. is dealt with by Gaby, given that it is answered immediately.
Determine the probability that an enquiry is dealt with by Ewan, given that it is answered later the same day.

AQA S3 2006 June Q4

4 The table below shows the probability distribution for the number of students, $R$, attending classes for a particular mathematics module.

$\boldsymbol { r }$	6	7	8
$\mathbf { P } ( \boldsymbol { R } = \boldsymbol { r } )$	0.1	0.6	0.3

Find values for $\mathrm { E } ( R )$ and $\operatorname { Var } ( R )$.
The number of students, $S$, attending classes for a different mathematics module is such that $$\mathrm { E } ( S ) = 10.9 , \quad \operatorname { Var } ( S ) = 1.69 \quad \text { and } \quad \rho _ { R S } = \frac { 2 } { 3 }$$ Find values for the mean and variance of:
1. $T = R + S$;
2. $\quad D = S - R$.

AQA S3 2006 June Q5

5 The number of letters per week received at home by Rosa may be modelled by a Poisson distribution with parameter 12.25.

Using a normal approximation, estimate the probability that, during a 4 -week period, Rosa receives at home at least 42 letters but at most 54 letters.
Rosa also receives letters at work. During a 16-week period, she receives at work a total of 248 letters.
1. Assuming that the number of letters received at work by Rosa may also be modelled by a Poisson distribution, calculate a $98 \%$ confidence interval for the average number of letters per week received at work by Rosa.
2. Hence comment on Rosa's belief that she receives, on average, fewer letters at home than at work.

AQA S3 2006 June Q6

6 The random variable $X$ has a Poisson distribution with parameter $\lambda$.

Prove that $\mathrm { E } ( X ) = \lambda$.
By first proving that $\mathrm { E } ( X ( X - 1 ) ) = \lambda ^ { 2 }$, or otherwise, prove that $\operatorname { Var } ( X ) = \lambda$.

AQA S3 2006 June Q7

7 A shop sells cooked chickens in two sizes: medium and large.
The weights, $X$ grams, of medium chickens may be assumed to be normally distributed with mean $\mu _ { X }$ and standard deviation 45. The weights, $Y$ grams, of large chickens may be assumed to be normally distributed with mean $\mu _ { Y }$ and standard deviation 65. A random sample of 20 medium chickens had a mean weight, $\bar { x }$ grams, of 936 .
A random sample of 10 large chickens had the following weights in grams: $$\begin{array} { l l l l l l l l l l } 1165 & 1202 & 1077 & 1144 & 1195 & 1275 & 1136 & 1215 & 1233 & 1288 \end{array}$$

Calculate the mean weight, $\bar { y }$ grams, of this sample of large chickens.
Hence investigate, at the $1 \%$ level of significance, the claim that the mean weight of large chickens exceeds that of medium chickens by more than 200 grams.
1. Deduce that, for your test in part (b), the critical value of $( \bar { y } - \bar { x } )$ is 253.24, correct to two decimal places.
2. Hence determine the power of your test in part (b), given that $\mu _ { Y } - \mu _ { X } = 275$.
3. Interpret, in the context of this question, the value that you obtained in part (c)(ii).
  (3 marks)

AQA S3 2007 June Q1

1 As part of an investigation into the starting salaries of graduates in a European country, the following information was collected.

\multirow{2}{*}{}		Starting salary (€)
	Sample size	Sample mean	Sample standard deviation
Science graduates	175	19268	7321
Arts graduates	225	17896	8205

Stating a necessary assumption about the samples, construct a $98 \%$ confidence interval for the difference between the mean starting salary of science graduates and that of arts graduates.
What can be concluded from your confidence interval?

AQA S3 2007 June Q2

2 A hill-top monument can be visited by one of three routes: road, funicular railway or cable car. The percentages of visitors using these routes are 25, 35 and 40 respectively. The age distribution, in percentages, of visitors using each route is shown in the table. For example, 15 per cent of visitors using the road were under 18 .

\multirow{2}{*}{}		Percentage of visitors using
		Road	Funicular railway	Cable car
\multirow{3}{*}{Age (years)}	Under 18	15	25	10
	18 to 64	80	60	55
	Over 64	5	15	35

Calculate the probability that a randomly selected visitor:

who used the road is aged 18 or over;
is aged between 18 and 64;
used the funicular railway and is aged over 64;
used the funicular railway, given that the visitor is aged over 64.

AQA S3 2007 June Q3

3 Kutz and Styler are two unisex hair salons. An analysis of a random sample of 150 customers at Kutz shows that 28 per cent are male. An analysis of an independent random sample of 250 customers at Styler shows that 34 per cent are male.

Test, at the $5 \%$ level of significance, the hypothesis that there is no difference between the proportion of male customers at Kutz and that at Styler.
State, with a reason, the probability of making a Type I error in the test in part (a) if, in fact, the actual difference between the two proportions is 0.05 .

AQA S3 2007 June Q4

4 A machine is used to fill 5-litre plastic containers with vinegar. The volume, in litres, of vinegar in a container filled by the machine may be assumed to be normally distributed with mean $\mu$ and standard deviation 0.08 . A quality control inspector requires a $99 \%$ confidence interval for $\mu$ to be constructed such that it has a width of at most 0.05 litres. Calculate, to the nearest 5, the sample size necessary in order to achieve the inspector's requirement.

AQA S3 2007 June Q5

5 The duration, $X$ minutes, of a timetabled 1-hour lesson may be assumed to be normally distributed with mean 54 and standard deviation 2. The duration, $Y$ minutes, of a timetabled $1 \frac { 1 } { 2 }$-hour lesson may be assumed to be normally distributed with mean 83 and standard deviation 3. Assuming the durations of lessons to be independent, determine the probability that the total duration of a random sample of three 1 -hour lessons is less than the total duration of a random sample of two $1 \frac { 1 } { 2 }$-hour lessons.
(7 marks)

AQA S3 2007 June Q6

6

The random variable $X$ has a binomial distribution with parameters $n$ and $p$.
1. Prove that $\mathrm { E } ( X ) = n p$.
2. Given that $\mathrm { E } \left( X ^ { 2 } \right) - \mathrm { E } ( X ) = n ( n - 1 ) p ^ { 2 }$, show that $\operatorname { Var } ( X ) = n p ( 1 - p )$.
3. Given that $X$ is found to have a mean of 3 and a variance of 2.97, find values for $n$ and $p$.
4. Hence use a distributional approximation to estimate $\mathrm { P } ( X > 2 )$.
Dressher is a nationwide chain of stores selling women's clothes. It claims that the probability that a customer who buys clothes from its stores uses a Dressher store card is 0.45 . Assuming this claim to be correct, use a distributional approximation to estimate the probability that, in a random sample of 500 customers who buy clothes from Dressher stores, at least half of them use a Dressher store card.

AQA S3 2007 June Q7

7 In a town, the total number, $R$, of houses sold during a week by estate agents may be modelled by a Poisson distribution with a mean of 13 . A new housing development is completed in the town. During the first week in which houses on this development are offered for sale by the developer, the estate agents sell a total of 10 houses.

Using the $10 \%$ level of significance, investigate whether the offer for sale of houses by the developer has resulted in a reduction in the mean value of $R$.
Determine, for your test in part (a), the critical region for $R$.
Assuming that the offer for sale of houses on the new housing development has reduced the mean value of $R$ to 6.5, determine, for a test at the 10\% level of significance, the probability of a Type II error.
(4 marks)

OCR MEI S3 Q2

2 Geoffrey is a university lecturer. He has to prepare five questions for an examination. He knows by experience that it takes about 3 hours to prepare a question, and he models the time (in minutes) taken to prepare one by the Normally distributed random variable $X$ with mean 180 and standard deviation 12, independently for all questions.

One morning, Geoffrey has a gap of 2 hours 50 minutes ( 170 minutes) between other activities. Find the probability that he can prepare a question in this time.
One weekend, Geoffrey can devote 14 hours to preparing the complete examination paper. Find the probability that he can prepare all five questions in this time. A colleague, Helen, has to check the questions.
She models the time (in minutes) to check a question by the Normally distributed random variable $Y$ with mean 50 and standard deviation 6, independently for all questions and independently of $X$. Find the probability that the total time for Geoffrey to prepare a question and Helen to check it exceeds 4 hours.
When working under pressure of deadlines, Helen models the time to check a question in a different way. She uses the Normally distributed random variable $\frac { 1 } { 4 } X$, where $X$ is as above. Find the length of time, as given by this model, which Helen needs to ensure that, with probability 0.9 , she has time to check a question. Ian, an educational researcher, suggests that a better model for the time taken to prepare a question would be a constant $k$ representing "thinking time" plus a random variable $T$ representing the time required to write the question itself, independently for all questions.
Taking $k$ as 45 and $T$ as Normally distributed with mean 120 and standard deviation 10 (all units are minutes), find the probability according to Ian's model that a question can be prepared in less than 2 hours 30 minutes. Juliet, an administrator, proposes that the examination should be reduced in time and shorter questions should be used.
Juliet suggests that Ian's model should be used for the time taken to prepare such shorter questions but with $k = 30$ and $T$ replaced by $\frac { 3 } { 5 } T$. Find the probability as given by this model that a question can be prepared in less than $1 \frac { 3 } { 4 }$ hours.

OCR MEI S3 Q4

10 marks

4 Quality control inspectors in a factory are investigating the lengths of glass tubes that will be used to make laboratory equipment.

Data on the observed lengths of a random sample of 200 glass tubes from one batch are available in the form of a frequency distribution as follows.
Use a suitable statistical procedure to assess the goodness of fit of $X$ to these data. Discuss your conclusions briefly. 2 A bus route runs from the centre of town A through the town's urban area to a point B on its boundary and then through the country to a small town C . Because of traffic congestion and general road conditions, delays occur on both the urban and the country sections. All delays may be considered independent. The scheduled time for the journey from A to B is 24 minutes. In fact, journey times over this section are given by the Normally distributed random variable $X$ with mean 26 minutes and standard deviation 3 minutes. The scheduled time for the journey from B to C is 18 minutes. In fact, journey times over this section are given by the Normally distributed random variable $Y$ with mean 15 minutes and standard deviation 2 minutes. Journey times on the two sections of route may be considered independent. The timetable published to the public does not show details of times at intermediate points; thus, if a bus is running early, it merely continues on its journey and is not required to wait.
Find the probability that a journey from A to B is completed in less than the scheduled time of 24 minutes.
Find the probability that a journey from A to C is completed in less than the scheduled time of 42 minutes.
It is proposed to introduce a system of bus lanes in the urban area. It is believed that this would mean that the journey time from A to B would be given by the random variable $0.85 X$. Assuming this to be the case, find the probability that a journey from A to B would be completed in less than the currently scheduled time of 24 minutes.
An alternative proposal is to introduce an express service. This would leave out some bus stops on both sections of the route and its overall journey time from A to C would be given by the random variable $0.9 X + 0.8 Y$. The scheduled time from A to C is to be given as a whole number of minutes. Find the least possible scheduled time such that, with probability 0.75 , buses would complete the journey on time or early.
A programme of minor road improvements is undertaken on the country section. After their completion, it is thought that the random variable giving the journey time from B to C is still Normally distributed with standard deviation 2 minutes. A random sample of 15 journeys is found to have a sample mean journey time from B to C of 13.4 minutes. Provide a two-sided $95 \%$ confidence interval for the population mean journey time from B to C . 3 An employer has commissioned an opinion polling organisation to undertake a survey of the attitudes of staff to proposed changes in the pension scheme. The staff are categorised as management, professional and administrative, and it is thought that there might be considerable differences of opinion between the categories. There are 60,140 and 300 staff respectively in the categories. The budget for the survey allows for a sample of 40 members of staff to be selected for in-depth interviews.
Explain why it would be unwise to select a simple random sample from all the staff.
Discuss whether it would be sensible to consider systematic sampling.
What are the advantages of stratified sampling in this situation?
State the sample sizes in each category if stratified sampling with as nearly as possible proportional allocation is used. The opinion polling organisation needs to estimate the average wealth of staff in the categories, in terms of property, savings, investments and so on. In a random sample of 11 professional staff, the sample mean is $\pounds 345818$ and the sample standard deviation is $\pounds 69241$.
Assuming the underlying population is Normally distributed, test at the $5 \%$ level of significance the null hypothesis that the population mean is $\pounds 300000$ against the alternative hypothesis that it is greater than $\pounds 300000$. Provide also a two-sided $95 \%$ confidence interval for the population mean.
[0pt] [10] 4 A company has many factories. It is concerned about incidents of trespassing and, in the hope of reducing if not eliminating these, has embarked on a programme of installing new fencing.
Records for a random sample of 9 factories of the numbers of trespass incidents in typical weeks before and after installation of the new fencing are as follows.
Find the probability that, on a randomly chosen visit, it takes less than 50 minutes to mow the lawns.
Find the probability that, on a randomly chosen visit, the total time for hoeing and pruning is less than 50 minutes.
If Bill mows the lawns while Ben does the hoeing and pruning, find the probability that, on a randomly chosen visit, Ben finishes first. Bill and Ben do my gardening twice a month and send me an invoice at the end of the month.
Write down the mean and variance of the total time (in minutes) they spend on mowing, hoeing and pruning per month.
The company charges for the total time spent at 15 pence per minute. There is also a fixed charge of $\pounds 10$ per month. Find the probability that the total charge for a month does not exceed $\pounds 40$. 4 (a) An amateur weather forecaster has been keeping records of air pressure, measured in atmospheres. She takes the measurement at the same time every day using a barometer situated in her garden. A random sample of 100 of her observations is summarised in the table below. The corresponding expected frequencies for a Normal distribution, with its two parameters estimated by sample statistics, are also shown in the table.
Find the probability that the weekly takings for coaches are less than $\pounds 40000$.
Find the probability that the weekly takings for lorries exceed the weekly takings for cars.
Find the probability that over a 4 -week period the total takings for cars exceed $\pounds 225000$. What assumption must be made about the four weeks?
Each week the operator allocates part of the takings for repairs. This is determined for each type of vehicle according to estimates of the long-term damage caused. It is calculated as follows: $5 \%$ of takings for cars, $10 \%$ for coaches and $20 \%$ for lorries. Find the probability that in any given week the total amount allocated for repairs will exceed $\pounds 20000$. 3 The management of a large chain of shops aims to reduce the level of absenteeism among its workforce by means of an incentive bonus scheme. In order to evaluate the effectiveness of the scheme, the management measures the percentage of working days lost before and after its introduction for each of a random sample of 11 shops. The results are shown below.
Give three reasons why a $t$ test would be appropriate.
Carry out the test using a $5 \%$ significance level. State your hypotheses and conclusion carefully.
Find a 95\% confidence interval for the true mean temperature in the reaction chamber.
Describe briefly one advantage and one disadvantage of having a 99\% confidence interval instead of a 95\% confidence interval. 4 (a) In Germany, towards the end of the nineteenth century, a study was undertaken into the distribution of the sexes in families of various sizes. The table shows some data about the numbers of girls in 500 families, each with 5 children. It is thought that the binomial distribution $\mathrm { B } ( 5 , p )$ should model these data.
The grower intends to perform a $t$ test to examine whether there is any difference in the mean yield of the two types of plant. State the hypotheses he should use and also any necessary assumption.
Carry out the test using a $5 \%$ significance level.
(b) The tea grower deals with many types of tea and employs tasters to rate them. The tasters do this by giving each tea a score out of 100. The tea grower wishes to compare the scores given by two of the tasters. Their scores for a random selection of 10 teas are as follows. A Wilcoxon signed rank test is to be used to decide whether there is any evidence of a preference for one of the uniforms.
Explain why this test is appropriate in these circumstances and state the hypotheses that should be used.
Carry out the test at the $5 \%$ significance level. 4 A random variable $X$ has probability density function $\mathrm { f } ( x ) = \frac { 2 x } { \lambda ^ { 2 } }$ for $0 < x < \lambda$, where $\lambda$ is a positive constant.
Show that, for any value of $\lambda , \mathrm { f } ( x )$ is a valid probability density function.
Find $\mu$, the mean value of $X$, in terms of $\lambda$ and show that $\mathrm { P } ( X < \mu )$ does not depend on $\lambda$.
Given that $\mathrm { E } \left( X ^ { 2 } \right) = \frac { \lambda ^ { 2 } } { 2 }$, find $\sigma ^ { 2 }$, the variance of $X$, in terms of $\lambda$. The random variable $X$ is used to model the depth of the space left by the filling machine at the top of a jar of jam. The model gives the following probabilities for $X$ (whatever the value of $\lambda$ ).
Initially it is assumed that the value of $p$ is $\frac { 1 } { 2 }$. Test at the $5 \%$ level of significance whether it is reasonable to suppose that the model applies with $p = \frac { 1 } { 2 }$.
The model is refined by estimating $p$ from the data. Find the mean of the observed data and hence an estimate of $p$.
Using the estimated value of $p$, the value of the test statistic $X ^ { 2 }$ turns out to be 2.3857 . Is it reasonable to suppose, at the $5 \%$ level of significance, that this refined model applies?
Discuss the reasons for the different outcomes of the tests in parts (i) and (iii). 2 (a) A continuous random variable, $X$, has probability density function $$f ( x ) = \begin{cases} \frac { 1 } { 72 } \left( 8 x - x ^ { 2 } \right) & 2 \leqslant x \leqslant 8
0 & \text { otherwise } \end{cases}$$
Find $\mathrm { F } ( x )$, the cumulative distribution function of $X$.
Sketch $\mathrm { F } ( x )$.
The median of $X$ is $m$. Show that $m$ satisfies the equation $m ^ { 3 } - 12 m ^ { 2 } + 148 = 0$. Verify that $m \approx 4.42$.
(b) The random variable in part (a) is thought to model the weights, in kilograms, of lambs at birth. The birth weights, in kilograms, of a random sample of 12 lambs, given in ascending order, are as follows. $$\begin{array} { l l l l l l l l l l l l } 3.16 & 3.62 & 3.80 & 3.90 & 4.02 & 4.72 & 5.14 & 6.36 & 6.50 & 6.58 & 6.68 & 6.78 \end{array}$$ Test at the 5\% level of significance whether a median of 4.42 is consistent with these data. 3 Cholesterol is a lipid (fat) which is manufactured by the liver from the fatty foods that we eat. It plays a vital part in allowing the body to function normally. However, when high levels of cholesterol are present in the blood there is a risk of arterial disease. Among the factors believed to assist with achieving and maintaining low cholesterol levels are weight loss and exercise. A doctor wishes to test the effectiveness of exercise in lowering cholesterol levels. For a random sample of 12 of her patients, she measures their cholesterol levels before and after they have followed a programme of exercise. The measurements obtained are as follows. This sample is to be tested to see whether the campaign appears to have been successful in raising the percentage receiving the booster.
Explain why the use of paired data is appropriate in this context.
Carry out an appropriate Wilcoxon signed rank test using these data, at the $5 \%$ significance level.
(b) Benford's Law predicts the following probability distribution for the first significant digit in some large data sets.
Digit 1 2 3 4 5 6 7 8 9
Probability 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
On one particular day, the first significant digits of the stock market prices of the shares of a random sample of 200 companies gave the following results.
Digit 1 2 3 4 5 6 7 8 9
Frequency 55 34 27 16 15 17 12 15 9
Test at the $10 \%$ level of significance whether Benford's Law provides a reasonable model in the context of share prices. 4 A random variable $X$ has an exponential distribution with probability density function $\mathrm { f } ( x ) = \lambda \mathrm { e } ^ { - \lambda x }$ for $x \geqslant 0$, where $\lambda$ is a positive constant.
Verify that $\int _ { 0 } ^ { \infty } \mathrm { f } ( x ) \mathrm { d } x = 1$ and sketch $\mathrm { f } ( x )$.
In this part of the question you may use the following result. $$\int _ { 0 } ^ { \infty } x ^ { r } \mathrm { e } ^ { - \lambda x } \mathrm {~d} x = \frac { r ! } { \lambda ^ { r + 1 } } \quad \text { for } r = 0,1,2 , \ldots$$ Derive the mean and variance of $X$ in terms of $\lambda$. The random variable $X$ is used to model the lifetime, in years, of a particular type of domestic appliance. The manufacturer of the appliance states that, based on past experience, the mean lifetime is 6 years.
Let $\bar { X }$ denote the mean lifetime, in years, of a random sample of 50 appliances. Write down an approximate distribution for $\bar { X }$.
A random sample of 50 appliances is found to have a mean lifetime of 7.8 years. Does this cast any doubt on the model?

Questions S3 (597 questions)

Browse by module

Browse by board