2.01d Select/critique sampling: in context

73 questions

Sort by: Default | Easiest first | Hardest first
CAIE S2 2014 June Q7
10 marks Standard +0.3
7 A researcher is investigating the actual lengths of time that patients spend with the doctor at their appointments. He plans to choose a sample of 12 appointments on a particular day.
  1. Which of the following methods is preferable, and why?
    • Choose the first 12 appointments of the day.
    • Choose 12 appointments evenly spaced throughout the day.
    Appointments are scheduled to last 10 minutes. The actual lengths of time, in minutes, that patients spend with the doctor may be assumed to have a normal distribution with mean \(\mu\) and standard deviation 3.4. The researcher suspects that the actual time spent is more than 10 minutes on average. To test this suspicion, he recorded the actual times spent for a random sample of 12 appointments and carried out a hypothesis test at the 1\% significance level.
  2. State the probability of making a Type I error and explain what is meant by a Type I error in this context.
  3. Given that the total length of time spent for the 12 appointments was 147 minutes, carry out the test.
  4. Give a reason why the Central Limit theorem was not needed in part (iii).
CAIE S2 2015 June Q1
5 marks Easy -1.8
1 Jyothi wishes to choose a representative sample of 5 students from the 82 members of her school year.
  1. She considers going into the canteen and choosing a table with five students from her year sitting at it, and using these five people as her sample. Give two reasons why this method is unsatisfactory.
  2. Jyothi decides to use another method. She numbers all the students in her year from 1 to 82 . Then she uses her calculator and generates the following random numbers. $$231492 \quad 762305 \quad 346280$$ From these numbers, she obtains the student numbers \(23,14,76,5,34\) and 62 . Explain how Jyothi obtained these student numbers from the list of random numbers.
CAIE S2 2021 November Q2
3 marks Easy -1.8
2 Andy and Jessica are doing a survey about musical preferences. They plan to choose a representative sample of six students from the 256 students at their college.
  1. Andy suggests that they go to the music building during the lunch hour and choose six students at random from the students who are there. Give a reason why this method is unsatisfactory.
  2. Jessica decides to use another method. She numbers all the students in the college from 1 to 256. Then she uses her calculator and generates the following random numbers. $$\begin{array} { l l l l l } 204393 & 162007 & 204028 & 587119 & 207395 \end{array}$$ From these numbers, she obtains six student numbers. The first three of her student numbers are 204, 162 and 7. Continue Jessica's method to obtain the next three student numbers.
CAIE S2 2008 June Q1
5 marks Easy -1.8
1 A magazine conducted a survey about the sleeping time of adults. A random sample of 12 adults was chosen from the adults travelling to work on a train.
  1. Give a reason why this is an unsatisfactory sample for the purposes of the survey.
  2. State a population for which this sample would be satisfactory. A satisfactory sample of 12 adults gave numbers of hours of sleep as shown below. \(4.6 \quad 6.8\) 5.2
    6.2
    5.7 \(\begin{array} { l l } 7.1 & 6.3 \end{array}\) 5.6
    7.0 \(5.8 \quad 6.5\) 7.2
  3. Calculate unbiased estimates of the mean and variance of the sleeping times of adults.
CAIE S2 2019 June Q6
10 marks Moderate -0.8
6 Ramesh plans to carry out a survey in order to find out what adults in his town think about local sports facilities. He chooses a random sample from the adult members of a tennis club and gives each of them a questionnaire.
  1. Give a reason why this will not result in Ramesh having a random sample of adults who live in the town.
  2. Describe briefly a valid method that Ramesh could use to choose a random sample of adults in the town.
    Ramesh now uses a valid method to choose a random sample of 350 adults from the town. He finds that 47 adults think that the local sports facilities are good.
  3. Calculate an approximate \(90 \%\) confidence interval for the proportion of all adults in the town who think that the local sports facilities are good.
  4. Ramesh calculates a confidence interval whose width is 1.25 times the width of this \(90 \%\) confidence interval. Ramesh's new interval is an \(x \%\) confidence interval. Find the value of \(x\).
    If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
CAIE S2 2016 November Q2
3 marks Easy -1.8
2 Dominic wishes to choose a random sample of five students from the 150 students in his year. He numbers the students from 1 to 150 . Then he uses his calculator to generate five random numbers between 0 and 1 . He multiplies each random number by 150 and rounds up to the next whole number to give a student number.
  1. Dominic's first random number is 0.392 . Find the student number that is produced by this random number.
  2. Dominic's second student number is 104 . Find a possible random number that would produce this student number.
  3. Explain briefly why five random numbers may not be enough to produce a sample of five student numbers.
OCR S2 2005 June Q1
4 marks Easy -1.8
1 It is desired to obtain a random sample of 15 pupils from a large school. One pupil suggests listing all the pupils in the school in alphabetical order and choosing the first 15 names on the list.
  1. Explain why this method is unsatisfactory.
  2. Suggest a better method.
OCR MEI S3 2006 June Q3
18 marks Moderate -0.3
3 An employer has commissioned an opinion polling organisation to undertake a survey of the attitudes of staff to proposed changes in the pension scheme. The staff are categorised as management, professional and administrative, and it is thought that there might be considerable differences of opinion between the categories. There are 60,140 and 300 staff respectively in the categories. The budget for the survey allows for a sample of 40 members of staff to be selected for in-depth interviews.
  1. Explain why it would be unwise to select a simple random sample from all the staff.
  2. Discuss whether it would be sensible to consider systematic sampling.
  3. What are the advantages of stratified sampling in this situation?
  4. State the sample sizes in each category if stratified sampling with as nearly as possible proportional allocation is used. The opinion polling organisation needs to estimate the average wealth of staff in the categories, in terms of property, savings, investments and so on. In a random sample of 11 professional staff, the sample mean is \(\pounds 345818\) and the sample standard deviation is \(\pounds 69241\).
  5. Assuming the underlying population is Normally distributed, test at the \(5 \%\) level of significance the null hypothesis that the population mean is \(\pounds 300000\) against the alternative hypothesis that it is greater than \(\pounds 300000\). Provide also a two-sided \(95 \%\) confidence interval for the population mean.
    [0pt] [10]
OCR MEI S4 2006 June Q4
24 marks Standard +0.3
4 An experiment is carried out to compare five industrial paints, A, B, C, D, E, that are intended to be used to protect exterior surfaces in polluted urban environments. Five different types of surface (I, II, III, IV, V) are to be used in the experiment, and five specimens of each type of surface are available. Five different external locations ( \(1,2,3,4,5\) ) are used in the experiment. The paints are applied to the specimens of the surfaces which are then left in the locations for a period of six months. At the end of this period, a "score" is given to indicate how effective the paint has been in protecting the surface.
  1. Name a suitable experimental design for this trial and give an example of an experimental layout. Initial analysis of the data indicates that any differences between the types of surface are negligible, as also are any differences between the locations. It is therefore decided to analyse the data by one-way analysis of variance.
  2. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  3. The data for analysis are as follows. Higher scores indicate better performance.
    Paint APaint BPaint CPaint DPaint E
    6466596564
    5868567852
    7376696956
    6070607261
    6771637158
    [The sum of these data items is 1626 and the sum of their squares is 106838 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a 5\% significance level. Report briefly on your conclusions.
    [0pt] [12]
OCR MEI S4 2007 June Q4
24 marks Standard +0.8
4 An agricultural company conducts a trial of five fertilisers (A, B, C, D, E) in an experimental field at its research station. The fertilisers are applied to plots of the field according to a completely randomised design. The yields of the crop from the plots, measured in a standard unit, are analysed by the one-way analysis of variance, from which it appears that there are no real differences among the effects of the fertilisers. A statistician notes that the residual mean square in the analysis of variance is considerably larger than had been anticipated from knowledge of the general behaviour of the crop, and therefore suspects that there is some inadequacy in the design of the trial.
  1. Explain briefly why the statistician should be suspicious of the design.
  2. Explain briefly why an inflated residual leads to difficulty in interpreting the results of the analysis of variance, in particular that the null hypothesis is more likely to be accepted erroneously. Further investigation indicates that the soil at the west side of the experimental field is naturally more fertile than that at the east side, with a consistent 'fertility gradient' from west to east.
  3. What experimental design can accommodate this feature? Provide a simple diagram of the experimental field indicating a suitable layout. The company decides to conduct a new trial in its glasshouse, where experimental conditions can be controlled so that a completely randomised design is appropriate. The yields are as follows.
    Fertiliser AFertiliser BFertiliser CFertiliser DFertiliser E
    23.626.018.829.017.7
    18.235.316.737.216.5
    32.430.523.032.612.8
    20.831.428.331.420.4
    [The sum of these data items is 502.6 and the sum of their squares is 13610.22 .]
  4. Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  5. State the assumptions about the distribution of the experimental error that underlie your analysis in part (iv).
OCR S2 2009 June Q4
7 marks Moderate -0.8
4 A survey is to be carried out to draw conclusions about the proportion \(p\) of residents of a town who support the building of a new supermarket. It is proposed to carry out the survey by interviewing a large number of people in the high street of the town, which attracts a large number of tourists.
  1. Give two different reasons why this proposed method is inappropriate.
  2. Suggest a good method of carrying out the survey.
  3. State two statistical properties of your survey method that would enable reliable conclusions about \(p\) to be drawn.
OCR MEI S3 2012 June Q2
18 marks Easy -1.8
2
    1. Give two reasons why an investigator might need to take a sample in order to obtain information about a population.
    2. State two requirements of a sample.
    3. Discuss briefly the advantage of the sampling being random.
    1. Under what circumstances might one use a Wilcoxon single sample test in order to test a hypothesis about the median of a population? What distributional assumption is needed for the test?
    2. On a stretch of road leading out of the centre of a town, highways officials have been monitoring the speed of the traffic in case it has increased. Previously it was known that the median speed on this stretch was 28.7 miles per hour. For a random sample of 12 vehicles on the stretch, the following speeds were recorded. $$\begin{array} { l l l l l l l l l l l l } 32.0 & 29.1 & 26.1 & 35.2 & 34.4 & 28.6 & 32.3 & 28.5 & 27.0 & 33.3 & 28.2 & 31.9 \end{array}$$ Carry out a test, with a \(5 \%\) significance level, to see whether the speed of the traffic on this stretch of road seems to have increased on the whole.
      [0pt] [10]
OCR MEI S4 Q4
12 marks Standard +0.8
4 An experiment is carried out to compare five industrial paints, A, B, C, D, E, that are intended to be used to protect exterior surfaces in polluted urban environments. Five different types of surface (I, II, III, IV, V) are to be used in the experiment, and five specimens of each type of surface are available. Five different external locations ( \(1,2,3,4,5\) ) are used in the experiment. The paints are applied to the specimens of the surfaces which are then left in the locations for a period of six months. At the end of this period, a "score" is given to indicate how effective the paint has been in protecting the surface.
  1. Name a suitable experimental design for this trial and give an example of an experimental layout. Initial analysis of the data indicates that any differences between the types of surface are negligible, as also are any differences between the locations. It is therefore decided to analyse the data by one-way analysis of variance.
  2. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  3. The data for analysis are as follows. Higher scores indicate better performance. The underlying distributions of strengths are assumed to be Normal for both suppliers, with variances 2.45 for supplier A and 1.40 for supplier B.
  4. Test at the \(5 \%\) level of significance whether it is reasonable to assume that the mean strengths from the two suppliers are equal.
  5. Provide a two-sided 90\% confidence interval for the true mean difference.
  6. Show that the test procedure used in part (i), with samples of sizes 7 and 5 and a \(5 \%\) significance level, leads to acceptance of the null hypothesis of equal means if \(- 1.556 < \bar { x } - \bar { y } < 1.556\), where \(\bar { x }\) and \(\bar { y }\) are the observed sample means from suppliers A and B . Hence find the probability of a Type II error for this test procedure if in fact the true mean strength from supplier A is 2.0 units more than that from supplier B.
  7. A manager suggests that the Wilcoxon rank sum test should be used instead, comparing the median strengths for the samples of sizes 7 and 5 . Give one reason why this suggestion might be sensible and two why it might not.
OCR MEI S4 2009 June Q4
24 marks Standard +0.3
4
  1. Describe, with the aid of a specific example, an experimental situation for which a Latin square design is appropriate, indicating carefully the features which show that a completely randomised or randomised blocks design would be inappropriate.
  2. The model for the one-way analysis of variance may be written, in a customary notation, as $$x _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ State the distributional assumptions underlying \(e _ { i j }\) in this model. What is the interpretation of the term \(\alpha _ { i }\) ?
  3. An experiment for comparing 5 treatments is carried out, with a total of 20 observations. A partial one-way analysis of variance table for the analysis of the results is as follows.
    Source of variationSums of squaresDegrees of freedomMean squaresMean square ratio
    Between treatments
    Residual68.76
    Total161.06
    Copy and complete the table, and carry out the appropriate test using a \(1 \%\) significance level.
OCR H240/02 2021 November Q11
2 marks Moderate -0.8
11 Zac is planning to write a report on the music preferences of the students at his college. There is a large number of students at the college.
  1. State one reason why Zac might wish to obtain information from a sample of students, rather than from all the students.
  2. Amaya suggests that Zac should use a sample that is stratified by school year. Give one advantage of this method as compared with random sampling, in this context. Zac decides to take a random sample of 60 students from his college. He asks each student how many hours per week, on average, they spend listening to music during term. From his results he calculates the following statistics.
    Mean
    Standard
    deviation
    Median
    Lower
    quartile
    Upper
    quartile
    21.04.2020.518.022.9
  3. Sundip tells Zac that, during term, she spends on average 30 hours per week listening to music. Discuss briefly whether this value should be considered an outlier.
  4. Layla claims that, during term, each student spends on average 20 hours per week listening to music. Zac believes that the true figure is higher than 20 hours. He uses his results to carry out a hypothesis test at the 5\% significance level. Assume that the time spent listening to music is normally distributed with standard deviation 4.20 hours. Carry out the test.
Edexcel AS Paper 2 2019 June Q4
8 marks Moderate -0.8
  1. Joshua is investigating the daily total rainfall in Hurn for May to October 2015
Using the information from the large data set, Joshua wishes to calculate the mean of the daily total rainfall in Hurn for May to October 2015
  1. Using your knowledge of the large data set, explain why Joshua needs to clean the data before calculating the mean. Using the information from the large data set, he produces the grouped frequency table below.
    Daily total rainfall ( \(r \mathrm {~mm}\) )FrequencyMidpoint ( \(\boldsymbol { x } \mathbf { m m }\) )
    \(0 \leqslant r < 0.5\)1210.25
    \(0.5 \leqslant r < 1.0\)100.75
    \(1.0 \leqslant r < 5.0\)243.0
    \(5.0 \leqslant r < 10.0\)127.5
    \(10.0 \leqslant r < 30.0\)1720.0
    $$\text { You may use } \sum \mathrm { f } x = 539.75 \text { and } \sum \mathrm { f } x ^ { 2 } = 7704.1875$$
  2. Use linear interpolation to calculate an estimate for the upper quartile of the daily total rainfall.
  3. Calculate an estimate for the standard deviation of the daily total rainfall in Hurn for May to October 2015
    1. State the assumption involved with using class midpoints to calculate an estimate of a mean from a grouped frequency table.
    2. Using your knowledge of the large data set, explain why this assumption does not hold in this case.
    3. State, giving a reason, whether you would expect the actual mean daily total rainfall in Hurn for May to October 2015 to be larger than, smaller than or the same as an estimate based on the grouped frequency table.
Edexcel AS Paper 2 2020 June Q4
7 marks Easy -1.2
  1. A lake contains three different types of carp.
There are an estimated 450 mirror carp, 300 leather carp and 850 common carp.
Tim wishes to investigate the health of the fish in the lake.
He decides to take a sample of 160 fish.
  1. Give a reason why stratified random sampling cannot be used.
  2. Explain how a sample of size 160 could be taken to ensure that the estimated populations of each type of carp are fairly represented. You should state the name of the sampling method used. As part of the health check, Tim weighed the fish.
    His results are given in the table below.
    Weight (wkg)Frequency (f)Midpoint (m kg)
    \(2 \leqslant w < 3.5\)82.75
    \(3.5 \leqslant w < 4\)323.75
    \(4 \leqslant w < 4.5\)644.25
    \(4.5 \leqslant w < 5\)404.75
    \(5 \leqslant w < 6\)165.5
    $$\left( \text { You may use } \sum \mathrm { f } m = 692 \quad \text { and } \quad \sum \mathrm { f } m ^ { 2 } = 3053 \right)$$
  3. Calculate an estimate for the standard deviation of the weight of the carp. Tim realised that he had transposed the figures for 2 of the weights of the fish.
    He had recorded in the table 2.3 instead of 3.2 and 4.6 instead of 6.4
  4. Without calculating a new estimate for the standard deviation, state what effect
    1. using the correct figure of 3.2 instead of 2.3
    2. using the correct figure of 6.4 instead of 4.6
      would have on your estimated standard deviation.
      Give a reason for each of your answers.
Edexcel AS Paper 2 2024 June Q2
3 marks Easy -1.8
  1. Keith is studying the variable Daily Mean Wind Direction, in degrees, from the large data set.
Keith summarised the data for Camborne from 1987 into 4 directions \(A , B , C\) and \(D\) representing North, South, East and West in some order.
Direction\(A\)\(B\)\(C\)\(D\)
Frequency22485658
  1. Using your knowledge of the large data set state, giving a reason, which direction \(A\) represents. The entry for Hurn on 27th September 1987 was 999
  2. State, giving a reason, what Keith should do with this value.
Edexcel Paper 3 2020 October Q2
7 marks Moderate -0.8
  1. A random sample of 15 days is taken from the large data set for Perth in June and July 1987. The scatter diagram in Figure 1 displays the values of two of the variables for these 15 days.
\begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{2b63aa7f-bc50-4422-8dc0-e661b521c221-04_722_709_376_677} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure}
  1. Describe the correlation. The variable on the \(x\)-axis is Daily Mean Temperature measured in \({ } ^ { \circ } \mathrm { C }\).
  2. Using your knowledge of the large data set,
    1. suggest which variable is on the \(y\)-axis,
    2. state the units that are used in the large data set for this variable. Stav believes that there is a correlation between Daily Total Sunshine and Daily Maximum Relative Humidity at Heathrow. He calculates the product moment correlation coefficient between these two variables for a random sample of 30 days and obtains \(r = - 0.377\)
  3. Carry out a suitable test to investigate Stav's belief at a \(5 \%\) level of significance. State clearly
    • your hypotheses
    • your critical value
    On a random day at Heathrow the Daily Maximum Relative Humidity was 97\%
  4. Comment on the number of hours of sunshine you would expect on that day, giving a reason for your answer.
Edexcel Paper 3 2021 October Q3
8 marks Easy -1.2
  1. Stav is studying the large data set for September 2015
He codes the variable Daily Mean Pressure, \(x\), using the formula \(y = x - 1010\) The data for all 30 days from Hurn are summarised by $$\sum y = 214 \quad \sum y ^ { 2 } = 5912$$
  1. State the units of the variable \(x\)
  2. Find the mean Daily Mean Pressure for these 30 days.
  3. Find the standard deviation of Daily Mean Pressure for these 30 days. Stav knows that, in the UK, winds circulate
    • in a clockwise direction around a region of high pressure
    • in an anticlockwise direction around a region of low pressure
    The table gives the Daily Mean Pressure for 3 locations from the large data set on 26/09/2015
    LocationHeathrowHurnLeuchars
    Daily Mean Pressure102910281028
    Cardinal Wind Direction
    The Cardinal Wind Directions for these 3 locations on 26/09/2015 were, in random order, $$\begin{array} { l l l } W & N E & E \end{array}$$ You may assume that these 3 locations were under a single region of pressure.
  4. Using your knowledge of the large data set, place each of these Cardinal Wind Directions in the correct location in the table.
    Give a reason for your answer. \section*{Question 3 continued.}
OCR MEI AS Paper 2 2024 June Q3
4 marks Easy -1.3
3 A student conducts an investigation into the number of hours spent cooking per week by people who live in village A. The student represents the data in the cumulative frequency diagram below. \section*{Hours spent cooking per week by people who live in village A} \includegraphics[max width=\textwidth, alt={}, center]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-3_796_1494_918_233}
  1. How many people were involved in the investigation?
  2. Use the copy of the diagram in the Printed Answer Booklet to determine an estimate for the interquartile range. The student conducts a similar investigation into the number of hours spent cooking per week by 200 people who live in village B. The interquartile range is found to be 3.9 hours.
  3. Explain whether the evidence suggests that the number of hours spent cooking by people who live in village B is more variable, equally variable or less variable than the number of hours spent cooking by people who live in village A .
OCR MEI AS Paper 2 2024 June Q5
3 marks Easy -1.8
5 The pre-release material contains information for countries in the world concerning real GDP per capita in US\$ and mobile phone subscribers per 100 population. In an investigation into the relationship between these two variables, a student takes a sample of 20 countries in Africa. The student draws a scatter diagram for the data, which is shown in Fig. 5.1. \section*{Fig. 5.1} \section*{Africa 1st sample} \includegraphics[max width=\textwidth, alt={}, center]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-4_433_1043_842_244}
  1. What does Fig. 5.1 suggest about the relationship between real GDP per capita and the number of mobile phone subscribers per 100 population? Another student collects a different sample of 20 countries from Africa, and draws a scatter diagram for the data, which is shown in Fig. 5.2. \section*{Fig. 5.2} \section*{Africa 2nd sample}
    \includegraphics[max width=\textwidth, alt={}]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-4_273_1084_1818_244}
    Mobile phone subscribers per 100 population
  2. What does Fig. 5.2 suggest about the relationship between real GDP per capita and the number of mobile phone subscribers per 100 population?
  3. Explain whether either of the two scatter diagrams is likely to be representative of the true relationship between real GDP per capita and the number of mobile phone subscribers per 100 population, for countries in Africa.
OCR MEI AS Paper 2 2021 November Q9
5 marks Moderate -0.5
9 Arun, Beth and Charlie are investigating whether there is any association between death rate per 1000 and physician density per 1000. They each collect a random sample of size 10. Arun's sample is shown in Fig.9.1. \begin{table}[h]
death rate per 1000physician density per 1000
Canberra7.23.62
Dhaka5.30.49
Brasilia6.82.23
Yaounde9.30.08
Zagreb12.53.08
Tehran5.41.16
Rome10.74.14
Tripoli3.82.09
Oslo7.94.51
Abuja9.70.35
\captionsetup{labelformat=empty} \caption{Fig. 9.1}
\end{table}
  1. Explain whether or not Arun collected his data from the pre-release material, or whether it is not possible to say. Beth and Charlie collected their samples from the pre-release material. Each of them drew a scatter diagram for their samples. The samples and scatter diagrams are shown in Figs. 9.2 and 9.3.
    Beth's sampledeath rate per 1000physician density per 1000
    Sudan6.70.41
    Cambodia7.40.17
    Gabon6.20.36
    Seychelles70.95
    Mexico5.42.25
    Kuwait2.32.58
    Haiti7.50.23
    Maldives41.04
    Nauru5.91.24
    Jordan3.42.34
    \includegraphics[max width=\textwidth, alt={}]{2b9ce212-84e2-4817-be94-98e2adff12a3-08_545_1024_340_918}
    \begin{table}[h]
    Charlie's sampledeath rate per 1000physician density per 1000
    Vanuata40.17
    Solomon Islands3.80.2
    N. Mariana Islands4.90.36
    Nauru5.91.24
    United Kingdom9.42.81
    Portugal10.63.34
    North Macedonia9.62.87
    Faroe Islands8.82.62
    Bulgaria14.53.99
    St. Kitts and Nevis7.22.52
    \captionsetup{labelformat=empty} \caption{Fig. 9.3}
    \end{table} \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 9.2} \includegraphics[alt={},max width=\textwidth]{2b9ce212-84e2-4817-be94-98e2adff12a3-08_572_899_1400_1041}
    \end{figure} Arun states that Charlie's sample and Beth's sample cannot both be random for the following reasons.
    Kofi collects a sample of 10 African countries and 10 European countries. The scatter diagram for his results is shown in Fig. 9.4. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{2b9ce212-84e2-4817-be94-98e2adff12a3-09_485_903_902_260} \captionsetup{labelformat=empty} \caption{Fig. 9.4}
    \end{figure}
  2. On the copy of Fig. 9.4 in the Printed Answer Booklet, use your knowledge of the pre-release material to identify the points representing the 10 European countries, justifying your choice.
OCR MEI Paper 2 2018 June Q14
9 marks Moderate -0.8
14 The pre-release material includes data on unemployment rates in different countries. A sample from this material has been taken. All the countries in the sample are in Europe. The data have been grouped and are shown in Fig 14.1. \begin{table}[h]
Unemployment rate\(0 -\)\(5 -\)\(10 -\)\(15 -\)\(20 -\)\(35 - 50\)
Frequency15215522
\captionsetup{labelformat=empty} \caption{Fig. 14.1}
\end{table} A cumulative frequency curve has been generated for the sample data using a spreadsheet. This is shown in Fig. 14.2. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{d8ff9511-aff7-45ea-ba55-e6667e8ba760-08_639_1081_808_466} \captionsetup{labelformat=empty} \caption{Fig. 14.2}
\end{figure} Hodge used Fig. 14.2 to estimate the median unemployment rate in Europe. He obtained the answer 5.0. The correct value for this sample is 6.9.
  1. (A) There is a systematic error in the diagram.
    The scatter diagram shown in Fig. 14.3 shows the unemployment rate and life expectancy at birth for the 47 countries in the sample for which this information is available. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Scatter diagram to show life expectancy at birth against unemployment rate} \includegraphics[alt={},max width=\textwidth]{d8ff9511-aff7-45ea-ba55-e6667e8ba760-09_627_1281_456_367}
    \end{figure} Fig. 14.3 The product moment correlation coefficient for the 47 items in the sample is - 0.2607 .
    The \(p\)-value associated with \(r = - 0.2607\) and \(n = 47\) is 0.0383 .
  2. Does this information suggest that there is an association between unemployment rate and life expectancy at birth in countries in Europe? Hodge uses the spreadsheet tools to obtain the equation of a line of best fit for this data.
  3. The unemployment rate in Kosovo is 35.3 , but there is no data available on life expectancy. Is it reasonable to use Hodge's line of best fit to estimate life expectancy at birth in Kosovo?
OCR MEI Paper 2 2023 June Q8
6 marks Easy -1.2
8 A garden centre stocks coniferous hedging plants. These are displayed in 10 rows, each of 120 plants. An employee collects a sample of the heights of these plants by recording the height of each plant on the front row of the display.
  1. Explain whether the data collected by the employee is a simple random sample. The data are shown in the cumulative frequency curve below. \includegraphics[max width=\textwidth, alt={}, center]{11788aaf-98fb-4a78-8a40-a40743b1fe15-06_1376_1344_680_233} The owner states that at least \(75 \%\) of the plants are between 40 cm and 80 cm tall.
  2. Show that the data collected by the employee supports this statement.
  3. Explain whether all samples of 120 plants would necessarily support the owner's statement.