2.01c Sampling techniques: simple random, opportunity, etc

167 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S3 2018 June Q2
13 marks Standard +0.3
  1. Merchandise is sold at concerts. The manager of a concert claims that the mean value of merchandise sold to premium ticket holders is more than \(\pounds 6\) greater than the mean value of merchandise sold to standard ticket holders.
    1. Given that all the tickets for the next concert have been sold, describe how a stratified sample should be taken at the concert.
    The mean value of merchandise sold to a random sample of 60 standard ticket holders at the concert is \(\pounds 15\) with a standard deviation of \(\pounds 10\). The mean value of merchandise sold to a random sample of 55 premium ticket holders at the concert is \(\pounds 23\) with a standard deviation of \(\pounds 8\).
  2. Test the manager's claim at the \(5 \%\) level of significance. State your hypotheses clearly.
  3. For the test in part (b), state whether or not it is necessary to assume that values of merchandise sold have normal distributions. Give a reason for your answer.
    REA
AQA S1 2007 June Q3
5 marks Easy -1.2
3
  1. A sample of 50 washed baking potatoes was selected at random from a large batch.
    The weights of the 50 potatoes were found to have a mean of 234 grams and a standard deviation of 25.1 grams. Construct a \(95 \%\) confidence interval for the mean weight of potatoes in the batch.
    (4 marks)
  2. The batch of potatoes is purchased by a market stallholder. He sells them to his customers by allowing them to choose any 5 potatoes for \(\pounds 1\). Give a reason why such chosen potatoes are unlikely to represent a random sample from the batch.
AQA S1 2008 June Q7
14 marks Moderate -0.3
7 Vernon, a service engineer, is expected to carry out a boiler service in one hour.
One hour is subtracted from each of his actual times, and the resulting differences, \(x\) minutes, for a random sample of 100 boiler services are summarised in the table.
DifferenceFrequency
\(- 6 \leqslant x < - 4\)4
\(- 4 \leqslant x < - 2\)9
\(- 2 \leqslant x < 0\)13
\(0 \leqslant x < 2\)27
\(2 \leqslant x < 4\)21
\(4 \leqslant x < 6\)15
\(6 \leqslant x < 8\)7
\(8 \leqslant x \leqslant 10\)4
Total100
    1. Calculate estimates of the mean and the standard deviation of these differences.
      (4 marks)
    2. Hence deduce, in minutes, estimates of the mean and the standard deviation of Vernon's actual service times for this sample.
    1. Construct an approximate \(98 \%\) confidence interval for the mean time taken by Vernon to carry out a boiler service.
    2. Give a reason why this confidence interval is approximate rather than exact.
  1. Vernon claims that, more often than not, a boiler service takes more than an hour and that, on average, a boiler service takes much longer than an hour. Comment, with a justification, on each of these claims.
Edexcel S2 Q1
4 marks Easy -1.8
  1. (a) Briefly describe the difference between a census and a sample survey.
    (b) Illustrate the difference by considering the case of a village council which has to decide whether or not to build a new village hall.
Given that the council decides to use a sample survey,
(c) suggest suitable sampling units.
Edexcel S2 Q2
6 marks Easy -1.8
2. A video rental shop needs to find out whether or not videos have been rewound when they are returned; it will do this by taking a sample of returned videos
  1. State one advantage and one disadvantage of taking a sample.
  2. Suggest a suitable sampling frame.
  3. Describe the sampling units.
  4. Criticise the sampling method of looking at just one particular shelf of videos.
Edexcel S2 Q6
12 marks Standard +0.3
6. A teacher is monitoring attendance at lessons in her department. She believes that the number of students absent from each lesson follows a Poisson distribution and wished to test the null hypothesis that the mean is 2.5 against the alternative hypothesis that it is greater than 2.5 She visits one lesson and decides on a critical region of 6 or more students absent.
  1. Find the significance level of this test.
  2. State any assumptions made in carrying out this test and comment on their validity. The teacher decides to undertake a wider study by looking at a sample of all the lessons that have taken place in the department during the previous four weeks.
  3. Suggest a suitable sampling frame. She finds that there have been 96 pupils absent from the 30 lessons in her sample.
  4. Using a suitable approximation, test at the \(5 \%\) level of significance the null hypothesis that the mean is 2.5 students absent per lesson against the alternative hypothesis that it is greater than 2.5. You may assume that the number of absences follows a Poisson distribution.
    (6 marks)
Edexcel S2 Q2
9 marks Moderate -0.8
2. A driving instructor keeps records of all the learners she has taught. In order to analyse her success rate she wishes to take a random sample of 120 of these learners.
  1. Suggest a suitable sampling frame and identify the sampling units. She believes that only 1 in 20 of the people she teaches fail to pass their test in their first two attempts. She decides to use her sample to test whether or not the proportion is different from this.
  2. Using a suitable approximation and stating clearly the hypotheses she should use, find the largest critical region for this test such that the probability in each "tail" is less than \(2.5 \%\).
  3. State the significance level of this test.
AQA S3 2013 June Q3
9 marks Standard +0.3
3 A builders' merchant's depot has two machines, X and Y , each of which can be used for filling bags with sand or gravel. The weight, in kilograms, delivered by machine X may be modelled by a normal distribution with mean \(\mu _ { \mathrm { X } }\) and standard deviation 25 . The weight, in kilograms, delivered by machine Y may be modelled by a normal distribution with mean \(\mu _ { \mathrm { Y } }\) and standard deviation 30 . Fred, the depot's yardman, records the weights, in kilograms, of a random sample of 10 bags of sand delivered by machine X as \(\begin{array} { l l l l l l l l l l } 1055 & 1045 & 1000 & 985 & 1040 & 1025 & 1005 & 1030 & 1015 & 1060 \end{array}\) He also records the weights, in kilograms, of a random sample of 8 bags of gravel delivered by machine Y as $$\begin{array} { l l l l l l l l } 1085 & 1055 & 1055 & 1000 & 1035 & 1050 & 1005 & 1075 \end{array}$$
  1. Construct a \(95 \%\) confidence interval for \(\mu _ { \mathrm { Y } } - \mu _ { \mathrm { X } }\), giving the limits to the nearest 5 kg .
  2. Dot, the depot's manager, commented that Fred's data collection may have been biased. Justify her comment and explain how the possible bias could have been eliminated.
    (2 marks)
Edexcel S3 Q2
7 marks Easy -1.3
2. (a) Explain what is meant by a simple random sample.
(b) Explain briefly how you could use a table of random numbers to select a simple random sample of size 12 from a list of the 70 junior members of a tennis club.
(c) Give an example of a situation in which you might choose to take a stratified sample and explain why.
Edexcel S3 Q1
4 marks Easy -1.2
  1. A Veterinary Surgeon wishes to survey a stratified sample of size 100 from those people who have pets registered at her surgery. The list below shows the strata to be used and the number in each group.
  • people who own just dogs - 165 ,
  • people who own just cats - 140 ,
  • people who own just small mammals - 105,
  • others, including those who own more than one type of pet - 90 .
    1. Find how many members of each group should be included in the sample.
    2. Give two advantages of using stratified sampling.
Edexcel S3 Q1
5 marks Easy -1.8
  1. A personnel manager has details on all company employees and wishes to consult a sample of them on a possible change to the company's hours of business. She decides to take a stratified sample based on different age groups.
    1. Give one advantage of using stratified sampling in this situation.
    The manager needs to select a sample of size 10 , without replacement, from a list of 65 employees aged 16 to 25 . She numbers these employees from 01 to 65 in alphabetical order and uses the table of random numbers given in the formula book. She starts with the top of the sixth two-digit column and works down. The first two numbers she writes down are 30 and 47.
  2. Find the other eight numbers in the sample.
  3. Suggest another factor that might be useful to consider in deciding on the strata.
    (1 mark)
OCR MEI Further Statistics Minor 2019 June Q3
4 marks Easy -1.8
3 A company has been commissioned to make 50 very expensive titanium components.
A sample of the components needs to be tested to ensure that they are sufficiently strong. However, this is a test to destruction, so the components which are tested can no longer be used.
  1. Explain why it would not be appropriate to use a census in these circumstances. A manager suggests that the first 5 components to be manufactured should be tested.
  2. Explain why this would not be a sensible method of selecting the sample. A statistician advises the manager that the sample selected should be a random sample.
  3. Give two desirable features (other than randomness) that the sample should have.
OCR MEI Further Statistics Minor 2023 June Q2
5 marks Easy -1.8
2 A company manufactures batches of twenty thousand tins which are subsequently filled with fruit. The company tests tins from each batch to make sure that they are strong enough. The test is easy and cheap to carry out, but when a tin has been tested it is no longer suitable for filling with fruit.
    1. Explain why a sample size of 5 tins per batch may not be appropriate in this case.
    2. Explain why a sample size of 1000 tins per batch may not be appropriate in this case. The company tests a sample of 30 tins from each batch.
  1. Explain why it would not be sensible for the sample to consist of the final 30 tins produced in a batch.
  2. Give two features that the sample should have.
OCR H240/02 2018 March Q9
10 marks Standard +0.3
9 A bag contains 100 black discs and 200 white discs. Paula takes five discs at random, without replacement. She notes the number \(X\) of these discs that are black.
  1. Find \(\mathrm { P } ( X = 3 )\). Paula decides to use the binomial distribution as a model for the distribution of \(X\).
  2. Explain why this model will give probabilities that are approximately, but not exactly, correct.
  3. Paula uses the binomial model to find an approximate value for \(\mathrm { P } ( X = 3 )\). Calculate the percentage by which her answer will differ from the answer in part (ii). Paula now assumes that the binomial distribution is a good model for \(X\). She uses a computer simulation to generate 1000 values of \(X\). The number of times that \(X = 3\) occurs is denoted by \(Y\).
  4. Calculate estimates of the limits between which two thirds of the values of \(Y\) will lie.
OCR H240/02 2018 March Q10
12 marks Moderate -0.8
10 A researcher is investigating the actual lengths of time that patients spend at their appointments with the doctors at a certain clinic. There are 12 doctors at the clinic, and each doctor has 24 appointments per day. The researcher plans to choose a sample of 24 appointments on a particular day.
  1. The researcher considers the following two methods for choosing the sample. Method A: Choose a random sample of 24 appointments from the 288 on that day.
    Method B: Choose one doctor's 1st and 2nd appointments. Choose another doctor's 3rd and 4th appointments and so on until the last doctor's 23rd and 24th appointments. For each of A and B state a disadvantage of using this method. Appointments are scheduled to last 10 minutes. The researcher suspects that the actual times that patients spend are more than 10 minutes on average. To test this suspicion, he uses method A , and takes a random sample of 24 appointments. He notes the actual time spent for each appointment and carries out a hypothesis test at the \(1 \%\) significance level.
  2. Explain why a 1-tail test is appropriate. The population mean of the actual times that patients spend at their appointments is denoted by \(\mu\) minutes.
  3. Assuming that \(\mu = 10\), state the probability that the conclusion of the test will be that \(\mu\) is not greater than 10 . The actual lengths of time, in minutes, that patients spend for their appointments may be assumed to have a normal distribution with standard deviation 3.4.
    [0pt]
  4. Given that the total length of time spent for the 24 appointments is 285 minutes, carry out the test. [7]
  5. In part (iv) it was necessary to use the fact that the sample mean is normally distributed. Give a reason why you know that this is true in this case.
OCR AS Pure 2017 Specimen Q8
3 marks Easy -1.8
8 A club secretary wishes to survey a sample of members of his club. He uses all members present at a particular meeting as his sample.
  1. Explain why this sample is likely to be biased. Later the secretary decides to choose a random sample of members.
    The club has 253 members and the secretary numbers the members from 1 to 253 . He then generates random 3-digit numbers on his calculator. The first six random numbers generated are 156, 965, 248, 156, 073 and 181. The secretary uses each number, where possible, as the number of a member in the sample.
  2. Find possible numbers for the first four members in the sample.
AQA AS Paper 2 2019 June Q11
1 marks Easy -2.0
11 A survey is undertaken to find out the most popular political party in London.
The first 1100 available people from London are surveyed.
Identify the name of this type of sampling.
Circle your answer.
simple random
opportunity
stratified
quota
AQA AS Paper 2 2021 June Q12
1 marks Easy -1.8
12 The table below shows the total monthly rainfall (in mm ) in England and Wales in a sample of six years. The sample of six years was taken from a data set covering every year from 1768 to 2018.
JanFebMarAprMayJunJulAugSepOctNovDec
1768109.2129.112.885.646.1148.7121.991.6136.8119.4142.5103.6
181898.065.8134.7135.655.931.250.421.0115.675.8112.046.8
186899.962.271.161.436.716.520.0106.790.295.661.4185.6
191891.261.636.763.358.530.9110.062.9189.569.166.3122.5
196885.847.659.568.878.794.0107.872.2148.199.069.684.2
2018104.552.8115.191.451.916.539.676.767.075.8104.9116.0
Deduce the sampling method most likely to have been used to collect this sample. Circle your answer.
[0pt] [1 mark] Opportunity
Simple Random
Stratified
Systematic
AQA AS Paper 2 2022 June Q12
1 marks Easy -1.8
12 Shelly organised an activity weekend for 15 groups of 10 people.
She decided to collect a sample to obtain feedback about the weekend.
To collect the sample Shelly selected two groups at random and then interviewed each member of these two groups. State the name of this sampling method.
Circle your answer.
[0pt] [1 mark] Cluster
Opportunity
Stratified
Systematic \includegraphics[max width=\textwidth, alt={}, center]{11168e8f-5ba5-4d27-83ab-0327cc23d08c-15_2488_1716_219_153}
Edexcel AS Paper 2 Specimen Q1
4 marks Easy -1.8
  1. Sara is investigating the variation in daily maximum gust, \(t \mathrm { kn }\), for Camborne in June and July 1987.
She used the large data set to select a sample of size 20 from the June and July data for 1987. Sara selected the first value using a random number from 1 to 4 and then selected every third value after that.
  1. State the sampling technique Sara used.
  2. From your knowledge of the large data set explain why this process may not generate a sample of size 20 . The data Sara collected are summarised as follows $$n = 20 \quad \sum t = 374 \quad \sum t ^ { 2 } = 7600$$
  3. Calculate the standard deviation.
Edexcel Paper 3 2018 June Q4
13 marks Easy -1.3
  1. Charlie is studying the time it takes members of his company to travel to the office. He stands by the door to the office from 0840 to 0850 one morning and asks workers, as they arrive, how long their journey was.
    1. State the sampling method Charlie used.
    2. State and briefly describe an alternative method of non-random sampling Charlie could have used to obtain a sample of 40 workers.
    Taruni decided to ask every member of the company the time, \(x\) minutes, it takes them to travel to the office.
  2. State the data selection process Taruni used. Taruni's results are summarised by the box plot and summary statistics below. \includegraphics[max width=\textwidth, alt={}, center]{65e4b254-fb7b-45c2-9702-32f034018193-10_378_1349_1050_367} $$n = 95 \quad \sum x = 4133 \quad \sum x ^ { 2 } = 202294$$
  3. Write down the interquartile range for these data.
  4. Calculate the mean and the standard deviation for these data.
  5. State, giving a reason, whether you would recommend using the mean and standard deviation or the median and interquartile range to describe these data. Rana and David both work for the company and have both moved house since Taruni collected her data. Rana's journey to work has changed from 75 minutes to 35 minutes and David's journey to work has changed from 60 minutes to 33 minutes. Taruni drew her box plot again and only had to change two values.
  6. Explain which two values Taruni must have changed and whether each of these values has increased or decreased.
CAIE S2 2011 November Q3
7 marks Easy -1.2
Jack has to choose a random sample of 8 people from the 750 members of a sports club.
  1. Explain fully how he can use random numbers to choose the sample. [3]
Jack asks each person in the sample how much they spent last week in the club café. The results, in dollars, were as follows. 15 \quad 25 \quad 30 \quad 8 \quad 12 \quad 18 \quad 27 \quad 25
  1. Find unbiased estimates of the population mean and variance. [3]
  2. Explain briefly what is meant by 'population' in this question. [1]
CAIE S2 2020 Specimen Q2
3 marks Easy -1.8
Describe briefly how to use a random number generator to obtain a sample of 10 students from a group of 50 students. [3]
Edexcel S2 Q1
4 marks Easy -2.0
The manager of a leisure club is considering a change to the club rules. The club has a large membership and the manager wants to take the views of the members into consideration before deciding whether or not to make the change.
  1. Explain briefly why the manager might prefer to use a sample survey rather than a census to obtain the views. [2]
  2. Suggest a suitable sampling frame. [1]
  3. Identify the sampling units. [1]
Edexcel S2 Q6
20 marks Moderate -0.3
A magazine has a large number of subscribers who each pay a membership fee that is due on January 1st each year. Not all subscribers pay their fee by the due date. Based on correspondence from the subscribers, the editor of the magazine believes that 40\% of subscribers wish to change the name of the magazine. Before making this change the editor decides to carry out a sample survey to obtain the opinions of the subscribers. He uses only those members who have paid their fee on time.
  1. Define the population associated with the magazine. [1]
  2. Suggest a suitable sampling frame for the survey. [1]
  3. Identify the sampling units. [1]
  4. Give one advantage and one disadvantage that would have resulted from the editor using a census rather than a sample survey. [2]
As a pilot study the editor took a random sample of 25 subscribers.
  1. Assuming that the editor's belief is correct, find the probability that exactly 10 of these subscribers agreed with changing the name. [3]
In fact only 6 subscribers agreed to the name being changed.
  1. Stating your hypotheses clearly test, at the 5\% level of significance, whether or not the percentage agreeing to the change is less that the editor believes. [5]
The full survey is to be carried out using 200 randomly chosen subscribers.
  1. Again assuming the editor's belief to be correct and using a suitable approximation, find the probability that in this sample there will be least 71 but fewer than 83 subscribers who agree to the name being changed. [7]