2.01a Population and sample: terminology

105 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S3 2006 January Q5
13 marks Standard +0.3
5. Upon entering a school, a random sample of eight girls and an independent random sample of eighty boys were given the same examination in mathematics. The girls and boys were then taught in separate classes. After one year, they were all given another common examination in mathematics. The means and standard deviations of the boys' and the girls' marks are shown in the table.
Examination marks
\multirow{2}{*}{}Upon entryAfter 1 year
MeanStandard deviationMeanStandard deviation
Boys5012596
Girls5312626
You may assume that the test results are normally distributed.
  1. Test, at the \(5 \%\) level of significance, whether or not the difference between the means of the boys' and girls' results was significant when they entered school.
  2. Test, at the \(5 \%\) level of significance, whether or not the mean mark of the boys is significantly less than the mean mark of the girls in the 'After 1 year' examination.
  3. Interpret the results found in part (a) and part (b).
Edexcel S3 2014 June Q1
5 marks Easy -1.8
  1. (a) Explain what you understand by a random sample from a finite population.
    (b) Give an example of a situation when it is not possible to take a random sample.
A college lecturer specialising in shoe design wants to change the way in which she organises practical work. She decides to gather ideas from her 75 students. She plans to give a questionnaire to a random sample of 8 of these students.
(c) (i) Describe the sampling frame that she should use.
(ii) Explain in detail how she should use a table of random numbers to obtain her sample.
Edexcel S3 2015 June Q3
11 marks Moderate -0.8
3. A nursery has 16 staff and 40 children on its records. In preparation for an outing the manager needs an estimate of the mean weight of the people on its records and decides to take a stratified sample of size 14 .
  1. Describe how this stratified sample should be taken. The weights, \(x \mathrm {~kg}\), of each of the 14 people selected are summarised as $$\sum x = 437 \text { and } \sum x ^ { 2 } = 26983$$
  2. Find unbiased estimates of the mean and the variance of the weights of all the people on the nursery's records.
  3. Estimate the standard error of the mean. The estimates of the standard error of the mean for the staff and for the children are 5.11 and 1.10 respectively.
  4. Comment on these values with reference to your answer to part (c) and give a reason for any differences.
Edexcel S1 Q1
6 marks Moderate -0.8
  1. (a) Explain briefly what you understand by a statistical model.
    (2 marks)
    A zoologist is analysing data on the weights of adult female otters.
    (b) Name a distribution that you think might be suitable for modelling such data.
    (1 mark)
    (c) Describe two features that you would expect to find in the distribution of the weights of adult female otters and that led to your choice in part (b).
    (2 marks)
    (d) Why might your choice in part (b) not be suitable for modelling the weights of all adult otters?
    (1 mark)
  2. For a geography project a student studied weather records kept by her school since 1993. To see if there was any evidence of global warming she worked out the mean temperature in degrees Celsius at noon for the month of June in each year.
Her results are shown in the table below.
Year19931994199519961997199819992000
Mean temperature
\(\left( { } ^ { \circ } \mathrm { C } \right)\)
21.924.120.723.024.222.122.623.9
AQA S2 2013 June Q6
13 marks Standard +0.3
6 A supermarket buys pears from a local supplier. The supermarket requires the mean weight of the pears to be at least 175 grams. William, the fresh-produce manager at the supermarket, suspects that the latest batch of pears delivered does not meet this requirement.
  1. William weighs a random sample of 6 pears, obtaining the following weights, in grams. $$\begin{array} { l l l l l l } 160.6 & 155.4 & 181.3 & 176.2 & 162.3 & 172.8 \end{array}$$ Previous batches of pears have had weights that could be modelled by a normal distribution with standard deviation 9.4 grams. Assuming that this still applies, show that a hypothesis test at the \(5 \%\) level of significance supports William's suspicion.
    (7 marks)
  2. William then weighs a random sample of 20 pears. The mean of this sample is 169.4 grams and \(s = 11.2\) grams, where \(s ^ { 2 }\) is an unbiased estimate of the population variance. Assuming that the population from which this sample is taken has a normal distribution but with unknown standard deviation, test William's suspicion at the \(\mathbf { 1 \% }\) level of significance.
  3. Give a reason why the probability of a Type I error occurring was smaller when conducting the test in part (b) than when conducting the test in part (a).
Edexcel S2 Q1
4 marks Easy -1.8
  1. (a) Briefly describe the difference between a census and a sample survey.
    (b) Illustrate the difference by considering the case of a village council which has to decide whether or not to build a new village hall.
Given that the council decides to use a sample survey,
(c) suggest suitable sampling units.
Edexcel S2 Q1
4 marks Easy -1.8
  1. Explain what is meant by
    1. a population,
    2. a sampling unit.
    Suggest suitable sampling frames for surveys of
  2. families who have holidays in Greece,
  3. mothers with children under two years old.
Edexcel S2 Q1
4 marks Easy -1.8
  1. A random sample is to be taken from the A-level results obtained by the final-year students in a Sixth Form College. Suggest
    1. suitable sampling units,
    2. a suitable sampling frame.
    3. Would it be advisable simply to use the results of all those doing A-level Maths?
    Explain your answer.
Edexcel S2 Q2
6 marks Easy -1.8
2. A video rental shop needs to find out whether or not videos have been rewound when they are returned; it will do this by taking a sample of returned videos
  1. State one advantage and one disadvantage of taking a sample.
  2. Suggest a suitable sampling frame.
  3. Describe the sampling units.
  4. Criticise the sampling method of looking at just one particular shelf of videos.
AQA S3 2014 June Q4
8 marks Moderate -0.3
4 A sample of 50 male Eastern Grey kangaroos had a mean weight of 42.6 kg and a standard deviation of 6.2 kg . A sample of 50 male Western Grey kangaroos had a mean weight of 39.7 kg and a standard deviation of 5.3 kg .
  1. Construct a 98\% confidence interval for the difference between the mean weight of male Eastern Grey kangaroos and that of male Western Grey kangaroos.
    [0pt] [5 marks]
    1. What assumption about the selection of each of the two samples was it necessary to make in order that the confidence interval constructed in part (a) was valid?
      [0pt] [1 mark]
    2. Why was it not necessary to assume anything about the distributions of the weights of male kangaroos in order that the confidence interval constructed in part (a) was valid?
      [0pt] [2 marks]
Edexcel S3 Q2
7 marks Easy -1.3
2. (a) Explain what is meant by a simple random sample.
(b) Explain briefly how you could use a table of random numbers to select a simple random sample of size 12 from a list of the 70 junior members of a tennis club.
(c) Give an example of a situation in which you might choose to take a stratified sample and explain why.
OCR MEI Further Statistics Minor 2019 June Q3
4 marks Easy -1.8
3 A company has been commissioned to make 50 very expensive titanium components.
A sample of the components needs to be tested to ensure that they are sufficiently strong. However, this is a test to destruction, so the components which are tested can no longer be used.
  1. Explain why it would not be appropriate to use a census in these circumstances. A manager suggests that the first 5 components to be manufactured should be tested.
  2. Explain why this would not be a sensible method of selecting the sample. A statistician advises the manager that the sample selected should be a random sample.
  3. Give two desirable features (other than randomness) that the sample should have.
OCR MEI Further Statistics Minor 2022 June Q5
14 marks Standard +0.3
5 A medical researcher is investigating whether there is any relationship between the age of a person and the level of a particular protein in the person's blood. She measures the levels of the protein (measured in suitable units) in a random sample of 12 hospital patients of various ages (in years). The spreadsheet shows the values obtained, together with a scatter diagram which illustrates the data. \includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-5_736_1470_1087_246}
  1. The researcher decides that a test based on Pearson's product moment correlation coefficient may not be valid. Explain why she comes to this conclusion.
  2. Calculate the value of Spearman's rank correlation coefficient.
  3. Carry out a test based on this coefficient at the \(5 \%\) significance level to investigate whether there is any association between age and protein level.
  4. Explain why the researcher chose a sample that was random.
  5. The researcher had originally intended to use a sample size of 6 rather than the 12 that she actually used. Explain what advantage there is in using the larger sample size.
OCR MEI Further Statistics Minor 2023 June Q2
5 marks Easy -1.8
2 A company manufactures batches of twenty thousand tins which are subsequently filled with fruit. The company tests tins from each batch to make sure that they are strong enough. The test is easy and cheap to carry out, but when a tin has been tested it is no longer suitable for filling with fruit.
    1. Explain why a sample size of 5 tins per batch may not be appropriate in this case.
    2. Explain why a sample size of 1000 tins per batch may not be appropriate in this case. The company tests a sample of 30 tins from each batch.
  1. Explain why it would not be sensible for the sample to consist of the final 30 tins produced in a batch.
  2. Give two features that the sample should have.
OCR MEI Further Statistics Minor 2024 June Q4
12 marks Moderate -0.3
4 A genetics researcher is investigating whether there is any association between natural hair colour and natural eye colour. A random sample of 800 adults is selected. Each adult can categorise their natural hair colour as blonde, brown, black or red and their natural eye colour as brown, blue or green.
  1. Explain the benefit of using a random sample in this investigation. The data collected from the sample are summarised in Table 4.1. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 4.1}
    \multirow{2}{*}{Observed frequency}Hair Colour
    BlondeBrownBlackRedTotal
    \multirow{3}{*}{Eye Colour}Brown4715319636432
    Blue617811526280
    Green1922311688
    Total12725334278800
    \end{table} The researcher decides to carry out a chi-squared test.
  2. Determine the expected frequencies for each eye colour in the blonde hair category. You are given that the test statistic is 28.62 to 2 decimal places.
  3. Carry out the chi-squared test at the 10\% significance level. Table 4.2 shows the chi-squared contributions for some of the categories. The contributions for the categories relating to green eye colour have been deliberately omitted. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 4.2}
    Hair Colour
    \cline { 2 - 6 }BlondeBrownBlackRed
    \multirow{3}{*}{
    Eye
    Colour
    }
    Brown6.7911.9640.6940.889
    \cline { 2 - 6 }Blue6.1621.2570.1850.062
    \cline { 2 - 6 }Green
    \end{table}
  4. Calculate the chi-squared contribution for the green eye and blonde hair category.
  5. With reference to the values in Table 4.2, discuss what the data suggest about brown eye colour and blue eye colour for people with blonde hair.
  6. A different researcher, carrying out the same investigation, independently takes a different random sample of size 800 and performs the same hypothesis test, but at the 1\% significance level, reaching the same conclusion as the original test. By comparing only the significance level of the two tests, specify which test, the one at the 10\% significance level or the one at the 1\% significance level, provides stronger evidence for the conclusion. Justify your answer.
  7. Edexcel FS2 2019 June Q3
    8 marks Standard +0.8
    3 Yin grows two varieties of potato, plant \(A\) and plant \(B\). A random sample of each variety of potato is taken and the yield, \(x \mathrm {~kg}\), produced by each plant is measured. The following statistics are obtained from the data.
    Number of plants\(\sum x\)\(\sum x ^ { 2 }\)
    \(A\)25194.71637.37
    \(B\)26227.52031.19
    1. Stating your hypotheses clearly, test, at the \(10 \%\) significance level, whether or not the variances of the yields of the two varieties of potato are the same.
    2. State an assumption you have made in order to carry out the test in part (a).
    OCR MEI Further Statistics Major 2022 June Q6
    11 marks Standard +0.3
    1. Determine a 95\% confidence interval for the mean weight of liquid paraffin in a tub.
    2. Explain whether the confidence interval supports the researcher's belief.
    3. Explain why the sample has to be random in order to construct the confidence interval.
      [0pt]
    4. A 95\% confidence interval for the mean weight in grams of another ingredient in the skin cream is [1.202, 1.398]. This confidence interval is based on a large sample and the unbiased estimate of the population variance calculated from the sample is 0.25 . Find each of the following.
    OCR H240/02 2018 September Q10
    6 marks Easy -1.8
    10 The table shows information, derived from the 2011 UK census, about the percentage of employees who used various methods of travel to work in four Local Authorities.
    Local AuthorityUnderground, metro, light rail or tramTrainBusDriveWalk or cycle
    A0.3\%4.5\%17\%52.8\%11\%
    B0.2\%1.7\%1.7\%63.4\%11\%
    C35.2\%3.0\%12\%11.7\%16\%
    D8.9\%1.4\%9\%54.7\%10\%
    One of the Local Authorities is a London borough and two are metropolitan boroughs, not in London.
    1. Which one of the Local Authorities is a London borough? Give a reason for your answer.
    2. Which two of the Local Authorities are metropolitan boroughs outside London? In each case give a reason for your answer.
    3. Describe one difference between the public transport available in the two metropolitan boroughs, as suggested by the table.
    4. Comment on the availability of public transport in Local Authority B as suggested by the table.
    Edexcel S1 2022 January Q3
    10 marks Moderate -0.8
    1. The stem and leaf diagram shows the number of deliveries made by Pat each day for 24 days
    \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Key: 10 \(\mathbf { 8 }\) represents 108 deliveries}
    1089(2)
    1103666889999(11)
    1245555558(8)
    13\(a\)\(b\)\(c\)(3)
    \end{table} where \(a\), \(b\) and \(c\) are positive integers with \(a < b < c\) An outlier is defined as any value greater than \(1.5 \times\) interquartile range above the upper quartile. Given that there is only one outlier for these data,
    1. show that \(c = 9\) The number of deliveries made by Pat each day is represented by \(d\) The data in the stem and leaf diagram are coded using $$x = d - 125$$ and the following summary statistics are obtained $$\sum x = - 96 \quad \text { and } \quad \sum ( x - \bar { x } ) ^ { 2 } = 1306$$
    2. Find the mean number of deliveries.
    3. Find the standard deviation of the number of deliveries. One of these 24 days is selected at random. The random variable \(D\) represents the number of deliveries made by Pat on this day. The random variable \(X = D - 125\)
    4. Find \(\mathrm { P } ( D > 118 \mid X < 0 )\)
    Edexcel AS Paper 2 2018 June Q4
    8 marks Moderate -0.8
    1. Helen is studying the daily mean wind speed for Camborne using the large data set from 1987. The data for one month are summarised in Table 1 below.
    \begin{table}[h]
    Windspeed\(\mathrm { n } / \mathrm { a }\)67891112131416
    Frequency13232231212
    \captionsetup{labelformat=empty} \caption{Table 1}
    \end{table}
    1. Calculate the mean for these data.
    2. Calculate the standard deviation for these data and state the units. The means and standard deviations of the daily mean wind speed for the other months from the large data set for Camborne in 1987 are given in Table 2 below. The data are not in month order. \begin{table}[h]
      Month\(A\)\(B\)\(C\)\(D\)\(E\)
      Mean7.588.268.578.5711.57
      Standard Deviation2.933.893.463.874.64
      \captionsetup{labelformat=empty} \caption{Table 2}
      \end{table}
    3. Using your knowledge of the large data set, suggest, giving a reason, which month had a mean of 11.57 The data for these months are summarised in the box plots on the opposite page. They are not in month order or the same order as in Table 2.
      1. State the meaning of the * symbol on some of the box plots.
      2. Suggest, giving your reasons, which of the months in Table 2 is most likely to be summarised in the box plot marked \(Y\). \includegraphics[max width=\textwidth, alt={}, center]{2edcf965-9c93-4a9b-9395-2d3c023801af-11_1177_1216_324_427}
    Edexcel Paper 3 2018 June Q4
    13 marks Easy -1.3
    1. Charlie is studying the time it takes members of his company to travel to the office. He stands by the door to the office from 0840 to 0850 one morning and asks workers, as they arrive, how long their journey was.
      1. State the sampling method Charlie used.
      2. State and briefly describe an alternative method of non-random sampling Charlie could have used to obtain a sample of 40 workers.
      Taruni decided to ask every member of the company the time, \(x\) minutes, it takes them to travel to the office.
    2. State the data selection process Taruni used. Taruni's results are summarised by the box plot and summary statistics below. \includegraphics[max width=\textwidth, alt={}, center]{65e4b254-fb7b-45c2-9702-32f034018193-10_378_1349_1050_367} $$n = 95 \quad \sum x = 4133 \quad \sum x ^ { 2 } = 202294$$
    3. Write down the interquartile range for these data.
    4. Calculate the mean and the standard deviation for these data.
    5. State, giving a reason, whether you would recommend using the mean and standard deviation or the median and interquartile range to describe these data. Rana and David both work for the company and have both moved house since Taruni collected her data. Rana's journey to work has changed from 75 minutes to 35 minutes and David's journey to work has changed from 60 minutes to 33 minutes. Taruni drew her box plot again and only had to change two values.
    6. Explain which two values Taruni must have changed and whether each of these values has increased or decreased.
    Edexcel Paper 3 Specimen Q2
    6 marks Standard +0.3
    1. A meteorologist believes that there is a relationship between the daily mean windspeed, \(w \mathrm { kn }\), and the daily mean temperature, \(t ^ { \circ } \mathrm { C }\). A random sample of 9 consecutive days is taken from past records from a town in the UK in July and the relevant data is given in the table below.
    \(\boldsymbol { t }\)13.316.215.716.616.316.419.317.113.2
    \(\boldsymbol { w }\)711811138151011
    The meteorologist calculated the product moment correlation coefficient for the 9 days and obtained \(r = 0.609\)
    1. Explain why a linear regression model based on these data is unreliable on a day when the mean temperature is \(24 ^ { \circ } \mathrm { C }\)
    2. State what is measured by the product moment correlation coefficient.
    3. Stating your hypotheses clearly test, at the \(5 \%\) significance level, whether or not the product moment correlation coefficient for the population is greater than zero. Using the same 9 days a location from the large data set gave \(\bar { t } = 27.2\) and \(\bar { w } = 3.5\)
    4. Using your knowledge of the large data set, suggest, giving your reason, the location that gave rise to these statistics.
    Edexcel Paper 3 Specimen Q2
    7 marks Moderate -0.3
    2. A researcher believes that there is a linear relationship between daily mean temperature and daily total rainfall. The 7 places in the northern hemisphere from the large data set are used. The mean of the daily mean temperatures, \(t ^ { \circ } \mathrm { C }\), and the mean of the daily total rainfall, \(s \mathrm {~mm}\), for the month of July in 2015 are shown on the scatter diagram below. \includegraphics[max width=\textwidth, alt={}, center]{565bfa73-8095-4242-80b6-cd47aaff6a31-03_844_1339_497_372}
    1. With reference to the scatter diagram, explain why a linear regression model may not be suitable for the relationship between \(t\) and s .
      (1) The researcher calculated the product moment correlation coefficient for the 7 places and obtained \(r = 0.658\).
    2. Stating your hypotheses clearly, test at the \(10 \%\) level of significance, whether or not the product moment correlation coefficient for the population is greater than zero.
      (3)
    3. Using your knowledge of the large data set, suggest the names of the 2 places labelled \(G\) and \(H\).
      (1)
    4. Using your knowledge from the large data set, and with reference to the locations of the two places labelled \(G\) and \(H\), give a reason why these places have the highest temperatures in July.
      (2)
    5. Suggest how you could make better use of the large data set to investigate the relationship between daily mean temperature and daily total rainfall.
      (1)
      (Total 7 marks)
    Edexcel S1 2002 January Q1
    4 marks Easy -1.8
    1. Explain briefly what you understand by
      1. a statistical experiment, [1]
      2. an event. [1]
    2. State one advantage and one disadvantage of a statistical model. [2]
    Edexcel S2 Q1
    6 marks Easy -1.8
    The small village of Tornep has a preservation society which is campaigning for a new by-pass to be built. The society needs to measure
    1. the strength of opinion amongst the residents of Tornep for the scheme and
    2. the flow of traffic through the village on weekdays. The society wants to know whether to use a census or a sample survey for each of these measures.
      1. In each case suggest which they should use and specify a suitable sampling frame. [4] For the measurement of traffic flow through Tornep,
      2. suggest a suitable statistic and a possible statistical model for this statistic. [2]