2.01a Population and sample: terminology

105 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI Paper 2 2019 June Q14
9 marks Moderate -0.8
14 The pre-release material includes data concerning crude death rates in different countries of the world. Fig. 14.1 shows some information concerning crude death rates in countries in Europe and in Africa. \begin{table}[h]
EuropeAfrica
\(n\)4856
minimum6.283.58
lower quartile8.507.31
median9.538.71
upper quartile11.4111.93
maximum14.4614.89
\captionsetup{labelformat=empty} \caption{Fig. 14.1}
\end{table}
  1. Use your knowledge of the large data set to suggest a reason why the statistics in Fig. 14.1 refer to only 48 of the 51 European countries.
  2. Use the information in Fig. 14.1 to show that there are no outliers in either data set. The crude death rate in Libya is recorded as 3.58 and the population of Libya is recorded as 6411776.
  3. Calculate an estimate of the number of deaths in Libya in a year. The median age in Germany is 46.5 and the crude death rate is 11.42. The median age in Cyprus is 36.1 and the crude death rate is 6.62 .
  4. Explain why a country like Germany, with a higher median age than Cyprus, might also be expected to have a higher crude death rate than Cyprus. Fig. 14.2 shows a scatter diagram of median age against crude death rate for countries in Africa and Fig. 14.3 shows a scatter diagram of median age against crude death rate for countries in Europe. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{95eb3bcc-6d3c-4f7e-9b27-5e046ab57ec5-10_678_1221_1975_248} \captionsetup{labelformat=empty} \caption{Fig. 14.2}
    \end{figure} \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{95eb3bcc-6d3c-4f7e-9b27-5e046ab57ec5-11_588_1248_223_228} \captionsetup{labelformat=empty} \caption{Fig. 14.3}
    \end{figure} The rank correlation coefficient for the data shown in Fig. 14.2 is - 0.281206 .
    The rank correlation coefficient for the data shown in Fig. 14.3 is 0.335215 .
  5. Compare and contrast what may be inferred about the relationship between median age and crude death rate in countries in Africa and in countries in Europe.
OCR MEI Paper 2 2023 June Q9
5 marks Easy -1.2
9 The pre-release material contains information concerning the median income of taxpayers in different areas of London. Some of the data for Camden is shown in the table below. The years quoted in this question refer to the end of the financial years used in the pre-release material. For example, the year 2004 in the table refers to the year 2003/04 in the pre-release material.
Year20042005200620072008200920102011
Median
Income in \(\pounds\)
2130023200242002590026900\#N/A2840029400
  1. Explain whether these data are a sample or a population of Camden taxpayers. A time series for the data is shown below. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Median income of taxpayers in Camden 2004-2011} \includegraphics[alt={},max width=\textwidth]{11788aaf-98fb-4a78-8a40-a40743b1fe15-07_624_1469_950_242}
    \end{figure} The LINEST function on a spreadsheet is used to formulate the following model for the data: \(I = 1115 Y - 2212950\), where \(I =\) median income of taxpayers in \(\pounds\) and \(Y =\) year.
  2. Use this model to find an estimate of the median income of taxpayers in Camden in 2009.
  3. Give two reasons why this estimate is likely to be close to the true value. The median income of taxpayers in Croydon in 2009 is also not available.
  4. Use your knowledge of the pre-release material to explain whether the model used in part (b) would give a reasonable estimate of the missing value for Croydon.
OCR MEI Paper 2 2024 June Q9
4 marks Easy -1.8
9 A teacher is investigating how pupils travel to and from school each day. Pupils can either travel by bus, train, car, bicycle or walk. The teacher decides to collect a sample of size 60 for the investigation.
  1. The teacher lives in a village 10 miles away from the school. Explain how collecting a sample which just consists of pupils who live in the same village as the teacher might introduce bias. The table below shows how many students there are in each year.
    Year 7Year 8Year 9Year 10Year 11
    86105107101101
  2. The teacher decides to use the method of proportional stratified sampling. Calculate the number of pupils in the sample who are in Year 9. The teacher generates a sample of 10 pupils from the 86 in Year 7 by listing them in alphabetical order and selecting the first name on the list and every ninth name thereafter.
  3. Explain whether this method will generate a simple random sample of the pupils who travel in Year 7.
Edexcel S1 2007 January Q6
5 marks Easy -2.0
  1. (a) Give two reasons to justify the use of statistical models.
It has been suggested that there are 7 stages involved in creating a statistical model. They are summarised below, with stages 3 , 4 and 7 missing. Stage 1. The recognition of a real-world problem. Stage 2. A statistical model is devised. Stage 3. Stage 4. Stage 5. Comparisons are made against the devised model. Stage 6. Statistical concepts are used to test how well the model describes the real-world problem. Stage 7.
(b) Write down the missing stages.
Edexcel S1 2002 June Q2
4 marks Easy -1.2
2. Statistical models can be used to describe real world problems. Explain the process involved in the formulation of a statistical model.
(4)
Edexcel S2 2014 January Q2
10 marks Moderate -0.3
2. Bill owns a restaurant. Over the next four weeks Bill decides to carry out a sample survey to obtain the customers' opinions.
  1. Suggest a suitable sampling frame for the sample survey.
  2. Identify the sampling units.
  3. Give one advantage and one disadvantage of taking a census rather than a sample survey. Bill believes that only \(30 \%\) of customers would like a greater choice on the menu. He takes a random sample of 50 customers and finds that 20 of them would like a greater choice on the menu.
  4. Test, at the \(5 \%\) significance level, whether or not the percentage of customers who would like a greater choice on the menu is more than Bill believes. State your hypotheses clearly.
Edexcel S2 2015 January Q3
11 marks Moderate -0.8
3. Explain what you understand by
  1. a statistic,
  2. a sampling distribution. A factory stores screws in packets. A small packet contains 100 screws and a large packet contains 200 screws. The factory keeps small and large packets in the ratio 4:3 respectively.
  3. Find the mean and the variance of the number of screws in the packets stored at the factory. A random sample of 3 packets is taken from the factory and \(Y _ { 1 } , Y _ { 2 }\) and \(Y _ { 3 }\) denote the number of screws in each of these packets.
  4. List all the possible samples.
  5. Find the sampling distribution of \(\bar { Y }\)
Edexcel S2 2021 January Q6
10 marks Moderate -0.8
6. The owner of a very large youth club has designed a new method for allocating people to teams. Before introducing the method he decided to find out how the members of the youth club might react.
  1. Explain why the owner decided to take a random sample of the youth club members rather than ask all the youth club members.
  2. Suggest a suitable sampling frame.
  3. Identify the sampling units. The new method uses a bag containing a large number of balls. Each ball is numbered either 20, 50 or 70
    When a ball is selected at random, the random variable \(X\) represents the number on the ball where $$\mathrm { P } ( X = 20 ) = p \quad \mathrm { P } ( X = 50 ) = q \quad \mathrm { P } ( X = 70 ) = r$$ A youth club member takes a ball from the bag, records its number and replaces it in the bag. He then takes a second ball from the bag, records its number and replaces it in the bag. The random variable \(M\) is the mean of the 2 numbers recorded. Given that $$\mathrm { P } ( M = 20 ) = \frac { 25 } { 64 } \quad \mathrm { P } ( M = 60 ) = \frac { 1 } { 16 } \quad \text { and } \quad q > r$$
  4. show that \(\mathrm { P } ( M = 50 ) = \frac { 1 } { 16 }\)
    VIHV SIHII NI I IIIM I ON OCVIAV SIHI NI JYHAM ION OOVI4V SIHI NI JLIYM ION OO
    \includegraphics[max width=\textwidth, alt={}, center]{f63c39df-cfc9-4a6b-838d-67613710b0ce-24_111_65_2525_1880} \includegraphics[max width=\textwidth, alt={}, center]{f63c39df-cfc9-4a6b-838d-67613710b0ce-24_140_233_2625_1733}
Edexcel S2 2022 January Q6
10 marks Standard +0.3
6
  1. Explain what you understand by the sampling distribution of a statistic. At Sam's cafe a standard breakfast consists of 6 breakfast items. Customers can then choose to upgrade to a medium breakfast by adding 1 extra breakfast item or they can upgrade to a large breakfast by adding 2 extra breakfast items. Standard, medium and large breakfasts are sold in the ratio \(6 : 3 : 2\) respectively. A random sample of 2 customers is taken from customers who have bought a breakfast from Sam's cafe on a particular day.
  2. Find the sampling distribution for the total number, \(T\), of breakfast items bought by these 2 customers. Show your working clearly.
  3. Find \(\mathrm { E } ( T )\)
Edexcel S2 2023 January Q2
11 marks Moderate -0.8
  1. A bag contains a large number of coins. It only contains 20 p and 50 p coins. A random sample of 3 coins is taken from the bag.
    1. List all the possible combinations of 3 coins that might be taken.
    Let \(\bar { X }\) represent the mean value of the 3 coins taken.
    Part of the sampling distribution of \(\bar { X }\) is given below.
    \(\bar { x }\)20\(a\)\(b\)50
    \(\mathrm { P } ( \bar { X } = \bar { x } )\)\(\frac { 4913 } { 8000 }\)\(c\)\(d\)\(\frac { 27 } { 8000 }\)
  2. Write down the value of \(a\) and the value of \(b\) The probability of taking a 20p coin at random from the bag is \(p\) The probability of taking a 50p coin at random from the bag is \(q\)
  3. Find the value of \(p\) and the value of \(q\)
  4. Hence, find the value of \(c\) and the value of \(d\) Let \(M\) represent the mode of the 3 coins taken at random from the bag.
  5. Find the sampling distribution of \(M\)
Edexcel S2 2018 June Q4
6 marks Moderate -0.8
4. The volume of milk, \(M\) litres, in cartons produced by a dairy, has distribution \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\), where \(\mu\) and \(\sigma\) are unknown. A random sample of 12 cartons is taken and the volume of milk in each carton is measured ( \(M _ { 1 } , M _ { 2 } , \ldots , M _ { 12 }\) ). A statistic \(X\) is based on this sample.
  1. Explain what is meant by "a random sample" in this case.
  2. State the population in this case.
  3. Write down the distribution of \(\frac { M _ { 12 } - \mu } { \sigma }\)
  4. Explain what you understand by the sampling distribution of \(X\).
  5. State, giving a reason, which of the following is not a statistic based on this sample.
    (I) \(3 M _ { 1 } + \frac { 2 M _ { 11 } } { 6 }\) (II) \(\sum _ { i = 1 } ^ { 12 } \left( \frac { M _ { i } - \mu } { \sigma } \right) ^ { 2 }\) (III) \(\sum _ { i = 1 } ^ { 12 } \left( 2 M _ { i } - 3 \right)\)
Edexcel S2 2023 June Q2
4 marks Easy -1.8
  1. (a) State one characteristic of a population that would make a census a practical alternative to sampling.
A leisure centre has 2500 members.
It asks a sample of 300 members for their opinions on the fees it charges for using the centre. For the sample,
(b) (i) identify a suitable sampling frame,
(ii) identify a sampling unit. The leisure centre has the following pieces of information. \(A\) is the list of the different types of membership that can be paid for by members. \(B\) is the mean of the membership fees paid by all 2500 members. \(C\) is the number in the sample of 300 members who are satisfied with the fees they pay.
(c) State the piece of information that is a statistic. Give a reason for your answer.
Edexcel S2 2018 Specimen Q3
11 marks Moderate -0.3
3. Explain what you understand by
  1. a statistic,
  2. a sampling distribution. A factory stores screws in packets. A small packet contains 100 screws and a large packet contains 200 screws. The factory keeps small and large packets in the ratio 4:3 respectively.
  3. Find the mean and the variance of the number of screws in the packets stored at the factory. A random sample of 3 packets is taken from the factory and \(Y _ { 1 } , Y _ { 2 }\) and \(Y _ { 3 }\) denote the number of screws in each of these packets.
  4. List all the possible samples.
  5. Find the sampling distribution of \(\bar { Y }\)
    VIIIV SIHI NI IIIYM ION OCVIUV SIHI NI JIIIM I I ON OCVEXV SIHII NI JIIIM I ION OO
Edexcel S2 Specimen Q1
5 marks Easy -1.8
  1. Explain what you understand by
    1. a population,
    2. a statistic.
    A researcher took a sample of 100 voters from a certain town and asked them who they would vote for in an election. The proportion who said they would vote for Dr Smith was \(35 \%\).
  2. State the population and the statistic in this case.
  3. Explain what you understand by the sampling distribution of this statistic.
Edexcel S2 2002 January Q1
7 marks Easy -1.8
  1. Explain what you understand by
    1. a population,
    2. a statistic.
    A questionnaire concerning attitudes to classes in a college was completed by a random sample of 50 students. The students gave the college a mean approval rating of 75\%.
  2. Identify the population and the statistic in this situation.
  3. Explain what you understand by the sampling distribution of this statistic.
Edexcel S2 2003 January Q6
20 marks Moderate -0.8
6. A magazine has a large number of subscribers who each pay a membership fee that is due on January 1st each year. Not all subscribers pay their fee by the due date. Based on correspondence from the subscribers, the editor of the magazine believes that \(40 \%\) of subscribers wish to change the name of the magazine. Before making this change the editor decides to carry out a sample survey to obtain the opinions of the subscribers. He uses only those members who have paid their fee on time.
  1. Define the population associated with the magazine.
  2. Suggest a suitable sampling frame for the survey.
  3. Identify the sampling units.
  4. Give one advantage and one disadvantage that would have resulted from the editor using a census rather than a sample survey. As a pilot study the editor took a random sample of 25 subscribers.
  5. Assuming that the editor's belief is correct, find the probability that exactly 10 of these subscribers agreed with changing the name. In fact only 6 subscribers agreed to the name being changed.
  6. Stating your hypotheses clearly test, at the \(5 \%\) level of significance, whether or not the percentage agreeing to the change is less that the editor believes. The full survey is to be carried out using 200 randomly chosen subscribers.
  7. Again assuming the editor's belief to be correct and using a suitable approximation, find the probability that in this sample there will be least 71 but fewer than 83 subscribers who agree to the name being changed. \section*{END}
Edexcel S2 2005 January Q2
7 marks Easy -1.8
2. (a) Explain what you understand by (i) a population and (ii) a sampling frame. The population and the sampling frame may not be the same.
(b) Explain why this might be the case.
(c) Give an example, justifying your choices, to illustrate when you might use
  1. a census,
  2. a sample.
Edexcel S2 2008 January Q1
4 marks Easy -2.0
  1. (a) Explain what you understand by a census.
Each cooker produced at GT Engineering is stamped with a unique serial number. GT Engineering produces cookers in batches of 2000. Before selling them, they test a random sample of 5 to see what electric current overload they will take before breaking down.
(b) Give one reason, other than to save time and cost, why a sample is taken rather than a census.
(c) Suggest a suitable sampling frame from which to obtain this sample.
(d) Identify the sampling units.
Edexcel S2 2001 June Q1
6 marks Easy -1.8
  1. The small village of Tornep has a preservation society which is campaigning for a new by-pass to be built. The society needs to measure
    1. the strength of opinion amongst the residents of Tornep for the scheme and
    2. the flow of traffic through the village on weekdays.
    The society wants to know whether to use a census or a sample survey for each of these measures.
    (a) In each case suggest which they should use and specify a suitable sampling frame. For the measurement of traffic flow through Tornep,
    (b) suggest a suitable statistic and a possible statistical model for this statistic.
Edexcel S2 2005 June Q4
4 marks Easy -1.8
4. Explain what you understand by
  1. a sampling unit,
  2. a sampling frame,
  3. a sampling distribution.
Edexcel S2 2011 June Q1
3 marks Easy -1.8
  1. A factory produces components. Each component has a unique identity number and it is assumed that \(2 \%\) of the components are faulty. On a particular day, a quality control manager wishes to take a random sample of 50 components.
    1. Identify a sampling frame.
    The statistic \(F\) represents the number of faulty components in the random sample of size 50.
  2. Specify the sampling distribution of \(F\).
Edexcel S2 Q3
11 marks Moderate -0.8
3. An athletics teacher has kept careful records over the past 20 years of results from school sports days. There are always 10 competitors in the javelin competition. Each competitor is allowed 3 attempts and the teacher has a record of the distances thrown by each competitor at each attempt. The random variable \(D\) represents the greatest distance thrown by each competitor and the random variable \(A\) represents the number of the attempt in which the competitor achieved their greatest distance.
  1. State which of the two random variables \(D\) or \(A\) is continuous. A new athletics coach wishes to take a random sample of the records of 36 javelin competitors.
  2. Specify a suitable sampling frame and explain how such a sample could be taken.
    (2 marks)
    The coach assumes that \(\mathrm { P } ( A = 2 ) = \frac { 1 } { 3 }\), and is therefore surprised to find that 20 of the 36 competitors in the sample achieved their greatest distance on their second attempt. Using a suitable approximation, and assuming that \(\mathrm { P } ( A = 2 ) = \frac { 1 } { 3 }\),
  3. find the probability that at least 20 of the competitors achieved their greatest distance on their second attempt.
    (6 marks)
  4. Comment on the assumption that \(\mathrm { P } ( A = 2 ) = \frac { 1 } { 3 }\).
Edexcel S3 2021 October Q6
12 marks Standard +0.3
6. Amala believes that the resting heart rate is lower in men who exercise regularly compared to men who do not exercise regularly. She measures the resting heart rate, \(h\), of a random sample of 50 men who exercise regularly and a random sample of 40 men who do not exercise regularly. Her results are summarised in the table below.
\cline { 2 - 6 } \multicolumn{1}{c|}{}
Sample
size
\(\sum \boldsymbol { h }\)\(\sum \boldsymbol { h } ^ { 2 }\)
Unbiased
estimate of
the mean
Unbiased
estimate of
the variance
Exercise regularly503270214676\(\alpha\)\(\beta\)
Do not exercise
regularly
40283220166070.829.6
  1. Calculate the value of \(\alpha\) and the value of \(\beta\)
  2. Test, at the \(5 \%\) level of significance, whether there is evidence to support Amala's belief. State your hypotheses clearly.
  3. Explain the significance of the central limit theorem to the test in part (b).
  4. State two assumptions you have made in carrying out the test in part (b).
Edexcel S3 2018 Specimen Q6
13 marks Standard +0.3
6. As part of an investigation, a random sample was taken of 50 footballers who had completed an obstacle course in the early morning. The time taken by each of these footballers to complete the obstacle course, \(x\) minutes, was recorded and the results are summarised by $$\sum x = 1570 \quad \text { and } \quad \sum x ^ { 2 } = 49467.58$$
  1. Find unbiased estimates for the mean and variance of the time taken by footballers to complete the obstacle course in the early morning. An independent random sample was taken of 50 footballers who had completed the same obstacle course in the late afternoon. The time taken by each of these footballers to complete the obstacle course, \(y\) minutes, was recorded and the results are summarised as $$\bar { y } = 30.9 \quad \text { and } \quad s _ { y } ^ { 2 } = 3.03$$
  2. Test, at the \(5 \%\) level of significance, whether or not the mean time taken by footballers to complete the obstacle course in the early morning, is greater than the mean time taken by footballers to complete the obstacle course in the late afternoon. State your hypotheses clearly.
  3. Explain the relevance of the Central Limit Theorem to the test in part (b).
  4. State an assumption you have made in carrying out the test in part (b).
Edexcel S3 Specimen Q7
17 marks Moderate -0.3
  1. A large company surveyed its staff to investigate the awareness of company policy. The company employs 6000 full-time staff and 4000 part-time staff.
    1. Describe how a stratified sample of 200 staff could be taken.
    2. Explain an advantage of using a stratified sample rather than a simple random sample.
    A random sample of 80 full-time staff and an independent random sample of 80 part-time staff were given a test of policy awareness. The results are summarised in the table below.
    Mean score \(( \bar { x } )\)
    Variance of
    scores \(\left( s ^ { 2 } \right)\)
    Full-time staff5221
    Part-time staff5019
  2. Stating your hypotheses clearly, test, at the \(1 \%\) level of significance, whether or not the mean policy awareness scores for full-time and part-time staff are different.
  3. Explain the significance of the Central Limit Theorem to the test in part (c).
  4. State an assumption you have made in carrying out the test in part (c). After all the staff had completed a training course the 80 full-time staff and the 80 part-time staff were given another test of policy awareness. The value of the test statistic \(z\) was 2.53
  5. Comment on the awareness of company policy for the full-time and part-time staff in light of this result. Use a \(1 \%\) level of significance.
  6. Interpret your answers to part (c) and part (f).