2.02h Recognize outliers

154 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 2010 January Q2
9 marks Easy -1.2
The 19 employees of a company take an aptitude test. The scores out of 40 are illustrated in the stem and leaf diagram below. \(2|6\) means a score of 26 \begin{align} 0 & | 7 & (1)
1 & | 88 & (2)
2 & | 4468 & (4)
3 & | 2333459 & (7)
4 & | 00000 & (5) \end{align} Find
  1. the median score, [1]
  2. the interquartile range. [3]
The company director decides that any employees whose scores are so low that they are outliers will undergo retraining. An outlier is an observation whose value is less than the lower quartile minus 1.0 times the interquartile range.
  1. Explain why there is only one employee who will undergo retraining. [2]
  2. On the graph paper on page 5, draw a box plot to illustrate the employees' scores. [3]
Edexcel S1 2002 November Q7
18 marks Moderate -0.8
The following stem and leaf diagram shows the aptitude scores \(x\) obtained by all the applicants for a particular job.
Aptitude score\(3|1\) means 31
31 2 9(3)
42 4 6 8 9(5)
51 3 3 5 6 7 9(7)
60 1 3 3 3 5 6 8 8 9(10)
71 2 2 2 4 5 5 5 6 8 8 8 8 9(14)
80 1 2 3 5 8 8 9(8)
90 1 2(3)
  1. Write down the modal aptitude score. [1]
  2. Find the three quartiles for these data. [3]
Outliers can be defined to be outside the limits \(Q_1 - 1.0(Q_3 - Q_1)\) and \(Q_3 + 1.0(Q_3 - Q_1)\).
  1. On a graph paper, draw a box plot to represent these data. [7]
For these data, \(\Sigma x = 3363\) and \(\Sigma x^2 = 238305\).
  1. Calculate, to 2 decimal places, the mean and the standard deviation for these data. [3]
  2. Use two different methods to show that these data are negatively skewed. [4]
Edexcel S1 Q7
21 marks Standard +0.3
The following table gives the weights, in grams, of 60 items delivered to a company in a day.
Weight (g)0 - 1010 - 2020 - 3030 - 4040 - 5050 - 6060 - 80
No. of items2111812962
  1. Use interpolation to calculate estimated values of
    1. the median weight,
    2. the interquartile range,
    3. the thirty-third percentile.
    [7 marks]
Outliers are defined to be outside the range from \(2.5Q_1 - 1.5Q_2\) to \(2.5Q_2 - 1.5Q_1\).
  1. Given that the lightest item weighed 3 g and the two heaviest weighed 65 g and 79 g, draw on graph paper an accurate box-and-whisker plot of the data. Indicate any outliers clearly. [5 marks]
  2. Describe the skewness of the distribution. [1 mark]
The mean weight was 32.0 g and the standard deviation of the weights was 14.9 g.
  1. State, with a reason, whether you would choose to summarise the data by using the mean and standard deviation or the median and interquartile range. [2 marks]
On another day, items were delivered whose weights ranged from 14 g to 58 g; the median was 32 g, the lower quartile was 24 g and the interquartile range was 26 g.
  1. Draw a further box plot for these data on the same diagram. Briefly compare the two sets of data using your plots. [6 marks]
OCR S1 2009 June Q5
5 marks Moderate -0.8
The diameters of 100 pebbles were measured. The measurements rounded to the nearest millimetre, \(x\), are summarised in the table.
\(x\)\(10 \leqslant x \leqslant 19\)\(20 \leqslant x \leqslant 24\)\(25 \leqslant x \leqslant 29\)\(30 \leqslant x \leqslant 49\)
Number of stones25222924
These data are to be presented on a statistical diagram.
  1. For a histogram, find the frequency density of the \(10 \leqslant x \leqslant 19\) class. [2]
  2. For a cumulative frequency graph, state the coordinates of the first two points that should be plotted. [2]
  3. Why is it not possible to draw an exact box-and-whisker plot to illustrate the data? [1]
OCR S1 2010 June Q1
9 marks Easy -1.2
The marks of some students in a French examination were summarised in a grouped frequency distribution and a cumulative frequency diagram was drawn, as shown below. \includegraphics{figure_1}
  1. Estimate how many students took the examination. [1]
  2. How can you tell that no student scored more than 55 marks? [1]
  3. Find the greatest possible range of the marks. [1]
  4. The minimum mark for Grade C was 27. The number of students who gained exactly Grade C was the same as the number of students who gained a grade lower than C. Estimate the maximum mark for Grade C. [3]
  5. In a German examination the marks of the same students had an interquartile range of 16 marks. What does this result indicate about the performance of the students in the German examination as compared with the French examination? [3]
OCR MEI S1 2010 January Q7
19 marks Moderate -0.8
A pear grower collects a random sample of 120 pears from his orchard. The histogram below shows the lengths, in mm, of these pears. \includegraphics{figure_7}
  1. Calculate the number of pears which are between 90 and 100 mm long. [2]
  2. Calculate an estimate of the mean length of the pears. Explain why your answer is only an estimate. [4]
  3. Calculate an estimate of the standard deviation. [3]
  4. Use your answers to parts (ii) and (iii) to investigate whether there are any outliers. [4]
  5. Name the type of skewness of the distribution. [1]
  6. Illustrate the data using a cumulative frequency diagram. [5]
OCR MEI S1 2011 June Q8
18 marks Moderate -0.3
The heating quality of the coal in a sample of 50 sacks is measured in suitable units. The data are summarised below.
Heating quality (\(x\))9.1 \(\leqslant x <\) 9.39.3 \(< x \leqslant\) 9.59.5 \(< x \leqslant\) 9.79.7 \(< x \leqslant\) 9.99.9 \(< x \leqslant\) 10.1
Frequency5715167
  1. Draw a cumulative frequency diagram to illustrate these data. [5]
  2. Use the diagram to estimate the median and interquartile range of the data. [3]
  3. Show that there are no outliers in the sample. [3]
  4. Three of these 50 sacks are selected at random. Find the probability that
    1. in all three, the heating quality \(x\) is more than 9.5, [3]
    2. in at least two, the heating quality \(x\) is more than 9.5. [4]
OCR MEI S1 2014 June Q6
17 marks Moderate -0.8
The weights, \(w\) grams, of a random sample of 60 carrots of variety A are summarised in the table below.
Weight\(30 \leqslant w < 50\)\(50 \leqslant w < 60\)\(60 \leqslant w < 70\)\(70 \leqslant w < 80\)\(80 \leqslant w < 90\)
Frequency111018147
  1. Draw a histogram to illustrate these data. [5]
  2. Calculate estimates of the mean and standard deviation of \(w\). [4]
  3. Use your answers to part (ii) to investigate whether there are any outliers. [3]
The weights, \(x\) grams, of a random sample of 50 carrots of variety B are summarised as follows. $$n = 50 \quad \sum x = 3624.5 \quad \sum x^2 = 265416$$
  1. Calculate the mean and standard deviation of \(x\). [3]
  2. Compare the central tendency and variation of the weights of varieties A and B. [2]
OCR MEI S1 Q4
7 marks Moderate -0.8
A sprinter runs many 100-metre trials, and the time, \(x\) seconds, for each is recorded. A sample of eight of these times is taken, as follows. 10.53 \quad 10.61 \quad 10.04 \quad 10.49 \quad 10.63 \quad 10.55 \quad 10.47 \quad 10.63
  1. Calculate the sample mean, \(\bar{x}\), and sample standard deviation, \(s\), of these times. [3]
  2. Show that the time of 10.04 seconds may be regarded as an outlier. [2]
  3. Discuss briefly whether or not the time of 10.04 seconds should be discarded. [2]
OCR MEI S1 Q6
17 marks Moderate -0.8
The weights, \(w\) grams, of a random sample of 60 carrots of variety A are summarised in the table below.
Weight\(30 \leqslant w < 50\)\(50 \leqslant w < 60\)\(60 \leqslant w < 70\)\(70 \leqslant w < 80\)\(80 \leqslant w < 90\)
Frequency111018147
  1. Draw a histogram to illustrate these data. [5]
  2. Calculate estimates of the mean and standard deviation of \(w\). [4]
  3. Use your answers to part (ii) to investigate whether there are any outliers. [3]
The weights, \(x\) grams, of a random sample of 50 carrots of variety B are summarised as follows. \(n = 50\) \quad \(\sum x = 3624.5\) \quad \(\sum x^2 = 265416\)
  1. Calculate the mean and standard deviation of \(x\). [3]
  2. Compare the central tendency and variation of the weights of varieties A and B. [2]
OCR H240/02 2023 June Q7
5 marks Standard +0.8
A student wishes to prove that, for all positive integers \(a\) and \(b\), \(a^2 - 4b \neq 2\).
  1. Prove that \(a^2 - 4b = 2 \Rightarrow a\) is even. [2]
  2. Hence or otherwise prove that, for all positive integers \(a\) and \(b\), \(a^2 - 4b \neq 2\). [3]
AQA AS Paper 2 2018 June Q18
6 marks Easy -1.2
Jennie is a piano teacher who teaches nine pupils. She records how many hours per week they practice the piano along with their most recent practical exam score.
StudentPractice (hours per week)Practical exam score (out of 100)
Donovan5064
Vazquez671
Higgins355
Begum2.547
Collins180
Coldbridge461
Nedbalek4.565
Carter883
White1192
[diagram]
  1. Identify two possible outliers by name, giving a possible explanation for the position on the scatter diagram of each outlier. [4 marks]
  2. Jennie discards the two outliers.
    1. Describe the correlation shown by the scatter diagram for the remaining points. [1 mark]
    2. Interpret this correlation in the context of the question. [1 mark]
AQA AS Paper 2 2024 June Q11
1 marks Easy -1.8
The table below shows the daily salt intake, \(x\) grams, and the daily Vitamin C intake, \(y\) milligrams, for a group of 10 adults.
AdultABCDEFGHIJ
\(x\)5.36.23.610.42.49.4657.111.2
\(y\)9014588481144480955541
A scatter diagram of the data is shown below. \includegraphics{figure_3} One of the adults is an outlier. Identify the letter of the adult that is the outlier. Circle your answer below. [1 mark] A \(\qquad\) B \(\qquad\) E \(\qquad\) J
AQA AS Paper 2 Specimen Q17
6 marks Moderate -0.8
The table below is an extract from the Large Data Set.
MakeRegionEngine sizeMassCO2CO
VAUXHALLSouth West139811631180.463
VOLKSWAGENLondon99910551060.407
VAUXHALLSouth West12481225850.141
BMWSouth West297916351940.139
TOYOTASouth West199516501230.274
BMWSouth West297902440.447
FORDSouth West159601650.518
TOYOTASouth West12991050144
VAUXHALLLondon139813611400.695
FORDNorth West495117992990.621
    1. Calculate the standard deviation of the engine sizes in the table. [1 mark]
    2. The mean of the engine sizes is 2084 Any value more than 2 standard deviations from the mean can be identified as an outlier. Using this definition of an outlier, show that the sample of engine sizes has exactly one outlier. Fully justify your answer. [3 marks]
  1. Rajan calculates the mean of the masses of the cars in this extract and states that it is 1094 kg. Use your knowledge of the Large Data Set to suggest what error Rajan is likely to have made in his calculation. [1 mark]
  2. Rajan claims there is an error in the data recorded in the table for one of the Toyotas from the South West, because there is no value for its carbon monoxide emissions. Use your knowledge of the Large Data Set to comment on Rajan's claim. [1 mark]
AQA Paper 3 2019 June Q12
6 marks Moderate -0.3
Amelia decides to analyse the heights of members of her school rowing club. The heights of a random sample of 10 rowers are shown in the table below.
RowerJessNellLivNeveAnnToriMayaKathDarcyJen
Height (cm)162169172156146161159164157160
  1. Any value more than 2 standard deviations from the mean may be regarded as an outlier. Verify that Ann's height is an outlier. Fully justify your answer. [4 marks]
  2. Amelia thinks she may have written down Ann's height incorrectly. If Ann's height were discarded, state with a reason what, if any, difference this would make to the mean and standard deviation. [2 marks]
AQA Paper 3 2021 June Q13
6 marks Moderate -0.8
The table below is an extract from the Large Data Set.
Propulsion TypeRegionEngine SizeMassCO₂Particulate Emissions
2London189615331540.04
2North West189614231460.029
2North West189613531380.025
2South West199815471590.026
2London189613881380.025
2South West189612141300.011
2South West189614801460.029
2South West189614131460.024
2South West249616951920.034
2South West142212511220.025
2South West199520751750.034
2London189612851400.036
2North West18960146
    1. Calculate the mean and standard deviation of CO₂ emissions in the table. [2 marks]
    2. Any value more than 2 standard deviations from the mean can be identified as an outlier. Determine, using this definition of an outlier, if there are any outliers in this sample of CO₂ emissions. Fully justify your answer. [2 marks]
  1. Maria claims that the last line in the table must contain two errors. Use your knowledge of the Large Data Set to comment on Maria's claim. [2 marks]
AQA Paper 3 2023 June Q15
11 marks Moderate -0.8
  1. A random sample of eight cars was selected from the Large Data Set. The masses of these cars, in kilograms, were as follows. 950 989 1247 1415 1506 1680 1833 2040 It is given that, for the population of cars in the Large Data Set: lower quartile = 1167 median = 1393 upper quartile = 1570
    1. It was decided to remove any of the masses which fall outside the following interval. median \(- 1.5 \times\) interquartile range \(\leq\) mass \(\leq\) median \(+ 1.5 \times\) interquartile range Show that only one of the eight masses in the sample should be removed. [3 marks]
    2. Write down the statistical name for the mass that should be removed in part (a)(i). [1 mark]
  2. The table shows the probability distribution of the number of previous owners, \(N\), for a sample of cars taken from the Large Data Set.
    \(n\)0123456 or more
    \(P(N = n)\)0.140.370.9k0.250.4k1.7k0
    Find the value of \(P(1 \leq N < 5)\) [4 marks]
  3. An expert team is investigating whether there have been any changes in CO₂ emissions from all cars taken from the Large Data Set. The team decided to collect a quota sample of 200 cars to reflect the different years and the different makes of cars in the Large Data Set.
    1. Using your knowledge of the Large Data Set, explain how the team can collect this sample. [2 marks]
    2. Describe one disadvantage of quota sampling. [1 mark]
OCR PURE Q10
9 marks Easy -1.2
The masses of a random sample of 120 boulders in a certain area were recorded. The results are summarized in the histogram. \includegraphics{figure_5}
  1. Calculate the number of boulders with masses between 60 and 65 kg. [2]
    1. Use midpoints to find estimates of the mean and standard deviation of the masses of the boulders in the sample. [3]
    2. Explain why your answers are only estimates. [1]
  2. Use your answers to part (b)(i) to determine an estimate of the number of outliers, if any, in the distribution. [2]
  3. Give one advantage of using a histogram rather than a pie chart in this context. [1]
OCR MEI Paper 2 Specimen Q16
20 marks Easy -1.8
Fig. 16.1, Fig. 16.2 and Fig. 16.3 show some data about life expectancy, including some from the pre-release data set. \includegraphics{figure_16_1} \includegraphics{figure_16_2} \includegraphics{figure_16_3}
  1. Comment on the shapes of the distributions of life expectancy at birth in 2014 and 1974. [2]
    1. The minimum value shown in the box plot is negative. What does a negative value indicate? [1]
    2. What feature of Fig 16.3 suggests that a Normal distribution would not be an appropriate model for increase in life expectancy from one year to another year? [1]
    3. Software has been used to obtain the values in the table in Fig. 16.3. Decide whether the level of accuracy is appropriate. Justify your answer. [1]
    4. John claims that for half the people in the world their life expectancy has improved by 10 years or more. Explain why Fig. 16.3 does not provide conclusive evidence for John's claim. [1]
  2. Decide whether the maximum increase in life expectancy from 1974 to 2014 is an outlier. Justify your answer. [3]
Here is some further information from the pre-release data set.
CountryLife expectancy at birth in 2014
Ethiopia60.8
Sweden81.9
    1. Estimate the change in life expectancy at birth for Ethiopia between 1974 and 2014.
    2. Estimate the change in life expectancy at birth for Sweden between 1974 and 2014.
    3. Give one possible reason why the answers to parts (i) and (ii) are so different. [4]
Fig. 16.4 shows the relationship between life expectancy at birth in 2014 and 1974. \includegraphics{figure_16_4} A spreadsheet gives the following linear model for all the data in Fig 16.4. (Life expectancy at birth 2014) = 30.98 + 0.67 × (Life expectancy at birth 1974) The life expectancy at birth in 1974 for the region that now constitutes the country of South Sudan was 37.4 years. The value for this country in 2014 is not available.
    1. Use the linear model to estimate the life expectancy at birth in 2014 for South Sudan. [2]
    2. Give two reasons why your answer to part (i) is not likely to be an accurate estimate for the life expectancy at birth in 2014 for South Sudan. You should refer to both information from Fig 16.4 and your knowledge of the large data set. [2]
  1. In how many of the countries represented in Fig. 16.4 did life expectancy drop between 1974 and 2014? Justify your answer. [3]
WJEC Unit 2 2018 June Q06
10 marks Moderate -0.8
Basel is a keen learner of languages. He finds a website on which a large number of language tutors offer their services. Basel records the cost, in dollars, of a one hour lesson from a random sample of tutors. He puts the data into a computer program which gives the following summary statistics. Cost per 1 hour lesson Min. :10.0 1st Qu. :16.0 Median :17.2 Mean :19.8 3rd Qu. :21.0 Max. :40.0
  1. Showing all calculations, comment on any outliers for the cost of a one hour lesson with a language tutor. [4]
  2. Describe the skewness of the data and explain what it means in this context. [2]
Dafydd is also a keen learner of languages. He takes his own random sample of the cost, in dollars, for a one hour lesson. He produces the following box plot. \includegraphics{figure_6}
    1. What will happen to the mean if the outlier is removed?
    2. What will happen to the median if the outlier is removed? [2]
  1. Compare and contrast the distributions of the cost of one hour language lessons for Dafydd's sample and Basel's sample. [2]
WJEC Unit 2 Specimen Q5
12 marks Easy -1.2
Gareth has a keen interest in pop music. He recently read the following claim in a music magazine. In the pop industry most songs on the radio are not longer than three minutes.
  1. He decided to investigate this claim by recording the lengths of the top 50 singles in the UK Official Singles Chart for the week beginning 17 June 2016. (A 'single' in this context is one digital audio track.) Comment on the suitability of this sample to investigate the magazine's claim. [1]
  2. Gareth recorded the data in the table below.
    Length of singles for top 50 UK Official Chart singles, 17 June 2016
    2.5-(3.0)3.0-(3.5)3.5-(4.0)4.0-(4.5)4.5-(5.0)5.0-(5.5)5.5-(6.0)6.0-(6.5)6.5-(7.0)7.0-(7.5)
    317227000001
    He used these data to produce a graph of the distributions of the lengths of singles \includegraphics{figure_2} State two corrections that Gareth needs to make to the histogram so that it accurately represents the data in the table. [2]
  3. Gareth also produced a box plot of the lengths of singles. \includegraphics{figure_3} He sees that there is one obvious outlier.
    1. What will happen to the mean if the outlier is removed?
    2. What will happen to the standard deviation if the outlier is removed? [2]
  4. Gareth decided to remove the outlier. He then produced a table of summary statistics.
    1. Use the appropriate statistics from the table to show, by calculation, that the maximum value for the length of a single is not an outlier.
      Summary statistics
      Length of single for top 50 UK Official Singles Chart (minutes)
      Length of singleNMeanStandard deviationMinimumLower quartileMedianUpper quartileMaximum
      493.570.3932.773.263.603.894.38
    2. State, with a reason, whether these statistics support the magazine's claim. [4]
  5. Gareth also calculated summary statistics for the lengths of 30 singles selected at random from his personal collection.
    Summary statistics
    Length of single for Gareth's random sample of 30 singles (minutes)
    Length of singleNMeanStandard deviationMinimumLower quartileMedianUpper quartileMaximum
    303.130.3642.582.732.923.223.95
    Compare and contrast the distribution of lengths of singles in Gareth's personal collection with the distribution in the top 50 UK Official Singles Chart. [3]
SPS SPS FM Statistics 2021 June Q5
7 marks Moderate -0.3
Eleven students in a class sit a Mathematics exam and their average score is 67% with a standard deviation of 12%. One student from the class is absent and sits the paper later, achieving a score of 85%.
  1. Find the mean score for the whole class and the standard deviation for the whole class. [5]
  2. Comment, with justification, on whether the score for the paper sat later should be considered as an outlier. [2]
SPS SPS SM 2021 February Q4
10 marks Easy -1.3
Each member of a group of 27 people was timed when completing a puzzle. The time taken, \(x\) minutes, for each member of the group was recorded. These times are summarised in the following box and whisker plot. \includegraphics{figure_4}
  1. Find the range of the times. [1]
  2. Find the interquartile range of the times. [1]
  3. For these 27 people \(\sum x = 607.5\) and \(\sum x^2 = 17623.25\) calculate the mean time taken to complete the puzzle. [1]
  4. calculate the standard deviation of the times taken to complete the puzzle. [2]
  5. Taruni defines an outlier as a value more than 3 standard deviations above the mean. State how many outliers Taruni would say there are in these data, giving a reason for your answer. [1]
  6. Adam and Beth also completed the puzzle in \(a\) minutes and \(b\) minutes respectively, where \(a > b\). When their times are included with the data of the other 27 people
    Suggest a possible value for \(a\) and a possible value for \(b\), explaining how your values satisfy the above conditions. [3]
  7. Without carrying out any further calculations, explain why the standard deviation of all 29 times will be lower than your answer to part (d). [1]
SPS SPS SM Statistics 2024 January Q3
12 marks Moderate -0.8
Zac is planning to write a report on the music preferences of the students at his college. There is a large number of students at the college.
  1. State one reason why Zac might wish to obtain information from a sample of students, rather than from all the students. [1]
  2. Amaya suggests that Zac should use a sample that is stratified by school year. Give one advantage of this method as compared with random sampling, in this context. [1]
Zac decides to take a random sample of 60 students from his college. He asks each student how many hours per week, on average, they spend listening to music during term. From his results he calculates the following statistics.
MeanStandard deviationMedianLower quartileUpper quartile
21.04.2020.518.022.9
  1. Sundip tells Zac that, during term, she spends on average 30 hours per week listening to music. Discuss briefly whether this value should be considered an outlier. [3]
  2. Layla claims that, during term, each student spends on average 20 hours per week listening to music. Zac believes that the true figure is higher than 20 hours. He uses his results to carry out a hypothesis test at the 5\% significance level. Assume that the time spent listening to music is normally distributed with standard deviation 4.20 hours. Carry out the test. [7]
OCR H240/02 2017 Specimen Q13
5 marks Moderate -0.8
The table and the four scatter diagrams below show data taken from the 2011 UK census for four regions. On the scatter diagrams the names have been replaced by letters. The table shows, for each region, the mean and standard deviation of the proportion of workers in each Local Authority who travel to work by driving a car or van and the proportion of workers in each Local Authority who travel to work as a passenger in a car or van. Each scatter diagram shows, for each of the Local Authorities in a particular region, the proportion of workers who travel to work by driving a car or van and the proportion of workers who travel to work as a passenger in a car or van. \includegraphics{figure_13}
  1. Using the values given in the table, match each region to its corresponding scatter diagram, explaining your reasoning. [3]
  2. Steven claims that the outlier in the scatter diagram for Region C consists of a group of small islands. Explain whether or not the data given above support his claim. [1]
  3. One of the Local Authorities in Region B consists of a single large island. Explain whether or not you would expect this Local Authority to appear as an outlier in the scatter diagram for Region B. [1]