2.02h Recognize outliers

154 questions

Sort by: Default | Easiest first | Hardest first
CAIE S1 2013 June Q5
9 marks Easy -1.8
5 The following are the annual amounts of money spent on clothes, to the nearest \(\\) 10$, by 27 people.
10406080100130140140140
150150150160160160160170180
180200210250270280310450570
  1. Construct a stem-and-leaf diagram for the data.
  2. Find the median and the interquartile range of the data. An 'outlier' is defined as any data value which is more than 1.5 times the interquartile range above the upper quartile, or more than 1.5 times the interquartile range below the lower quartile.
  3. List the outliers.
CAIE S1 2019 June Q6
10 marks Easy -1.8
6
  1. Give one advantage and one disadvantage of using a box-and-whisker plot to represent a set of data.
  2. The times in minutes taken to run a marathon were recorded for a group of 13 marathon runners and were found to be as follows. $$\begin{array} { l l l l l l l l l l l l l } 180 & 275 & 235 & 242 & 311 & 194 & 246 & 229 & 238 & 768 & 332 & 227 & 228 \end{array}$$ State which of the mean, mode or median is most suitable as a measure of central tendency for these times. Explain why the other measures are less suitable.
  3. Another group of 33 people ran the same marathon and their times in minutes were as follows.
    190203215246249253255254258260261
    263267269274276280288283287294300
    307318327331336345351353360368375
    1. On the grid below, draw a box-and-whisker plot to illustrate the times for these 33 people. \includegraphics[max width=\textwidth, alt={}, center]{f4d040a2-6a04-49ce-98ac-8ba5c515f905-09_611_1202_1270_555}
    2. Find the interquartile range of these times.
OCR S1 2016 June Q3
13 marks Moderate -0.8
3 The masses, \(m\) grams, of 52 apples of a certain variety were found and summarised as follows. $$n = 52 \quad \Sigma ( m - 150 ) = - 182 \quad \Sigma ( m - 150 ) ^ { 2 } = 1768$$
  1. Find the mean and variance of the masses of these 52 apples.
  2. Use your answers from part (i) to find the exact value of \(\Sigma m ^ { 2 }\). The masses of the apples are illustrated in the box-and-whisker plot below. \includegraphics[max width=\textwidth, alt={}, center]{b5ce3230-7528-439c-9e85-ef159a49cba3-3_250_1310_662_383}
  3. How many apples have masses in the interval \(130 \leqslant m < 140\) ?
  4. An 'outlier' is a data item that lies more than 1.5 times the interquartile range above the upper quartile, or more than 1.5 times the interquartile range below the lower quartile. Explain whether any of the masses of these apples are outliers.
OCR MEI S1 2005 January Q2
7 marks Moderate -0.8
2 A sprinter runs many 100 -metre trials, and the time, \(x\) seconds, for each is recorded. A sample of eight of these times is taken, as follows. $$\begin{array} { l l l l l l l l } 10.53 & 10.61 & 10.04 & 10.49 & 10.63 & 10.55 & 10.47 & 10.63 \end{array}$$
  1. Calculate the sample mean, \(\bar { x }\), and sample standard deviation, \(s\), of these times.
  2. Show that the time of 10.04 seconds may be regarded as an outlier.
  3. Discuss briefly whether or not the time of 10.04 seconds should be discarded.
OCR MEI S1 2005 January Q7
12 marks Easy -1.2
7 The cumulative frequency graph below illustrates the distances that 176 children live from their primary school. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Distance from school} \includegraphics[alt={},max width=\textwidth]{b35b2b3b-0d26-4a35-b4d2-110bf270d5dc-4_1073_1571_580_340}
\end{figure}
  1. Use the graph to estimate, to the nearest 10 metres,
    (A) the median distance from school,
    (B) the lower quartile, upper quartile and interquartile range.
  2. Draw a box and whisker plot to illustrate the data. The graph on page 4 used the following grouped data.
    Distance (metres)20040060080010001200
    Cumulative frequency2064118150169176
  3. Copy and complete the grouped frequency table below describing the same data.
    Distance ( \(d\) metres)Frequency
    \(0 < d \leqslant 200\)20
    \(200 < d \leqslant 400\)
  4. Hence estimate the mean distance these children live from school. It is subsequently found that none of the 176 children lives within 100 metres of the school.
  5. Calculate the revised estimate of the mean distance.
  6. Describe what change needs to be made to the cumulative frequency graph.
OCR MEI S1 2006 January Q1
6 marks Easy -1.8
1 The times taken, in minutes, by 80 people to complete a crossword puzzle are summarised by the box and whisker plot below. \includegraphics[max width=\textwidth, alt={}, center]{acb05873-e441-4b95-9732-6ebd5ae79fa6-2_147_848_507_612}
  1. Write down the range and the interquartile range of the times.
  2. Determine whether any of the times can be regarded as outliers.
  3. Describe the shape of the distribution of the times.
OCR MEI S1 2007 January Q6
18 marks Moderate -0.3
6 The birth weights in grams of a random sample of 1000 babies are displayed in the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{05b96db3-93c7-4921-a1c6-c7b2f8952a8f-4_1264_1553_486_296}
  1. Use the diagram to estimate the median and interquartile range of the data.
  2. Use your answers to part (i) to estimate the number of outliers in the sample.
  3. Should these outliers be excluded from any further analysis? Briefly explain your answer.
  4. Any baby whose weight is below the 10th percentile is selected for careful monitoring. Use the diagram to determine the range of weights of the babies who are selected. \(12 \%\) of new-born babies require some form of special care. A maternity unit has 17 new-born babies. You may assume that these 17 babies form an independent random sample.
  5. Find the probability that
    (A) exactly 2 of these 17 babies require special care,
    (B) more than 2 of the 17 babies require special care.
  6. On 100 independent occasions the unit has 17 babies. Find the expected number of occasions on which there would be more than 2 babies who require special care.
OCR MEI S1 2008 January Q6
18 marks Easy -1.2
6 The maximum temperatures \(x\) degrees Celsius recorded during each month of 2005 in Cambridge are given in the table below.
JanFebMarAprMayJunJulAugSepOctNovDec
9.27.110.714.216.621.822.022.621.117.410.17.8
These data are summarised by \(n = 12 , \Sigma x = 180.6 , \Sigma x ^ { 2 } = 3107.56\).
  1. Calculate the mean and standard deviation of the data.
  2. Determine whether there are any outliers.
  3. The formula \(y = 1.8 x + 32\) is used to convert degrees Celsius to degrees Fahrenheit. Find the mean and standard deviation of the 2005 maximum temperatures in degrees Fahrenheit.
  4. In New York, the monthly maximum temperatures are recorded in degrees Fahrenheit. In 2005 the mean was 63.7 and the standard deviation was 16.0 . Briefly compare the maximum monthly temperatures in Cambridge and New York in 2005. The total numbers of hours of sunshine recorded in Cambridge during the month of January for each of the last 48 years are summarised below.
    Hours \(h\)\(70 \leqslant h < 100\)\(100 \leqslant h < 110\)\(110 \leqslant h < 120\)\(120 \leqslant h < 150\)\(150 \leqslant h < 170\)\(170 \leqslant h < 190\)
    Number of years681011103
  5. Draw a cumulative frequency graph for these data.
  6. Use your graph to estimate the 90th percentile.
OCR MEI S1 2005 June Q1
5 marks Moderate -0.8
1 At a certain stage of a football league season, the numbers of goals scored by a sample of 20 teams in the league were as follows. \(\begin{array} { l l l l l l l l l l l l l l l l l l l l l } 22 & 23 & 23 & 23 & 26 & 28 & 28 & 30 & 31 & 33 & 33 & 34 & 35 & 35 & 36 & 36 & 37 & 46 & 49 & 49 \end{array}\)
  1. Calculate the sample mean and sample variance, \(s ^ { 2 }\), of these data.
  2. The three teams with the most goals appear to be well ahead of the other teams. Determine whether or not any of these three pieces of data may be considered outliers.
OCR MEI S1 2007 June Q3
8 marks Moderate -0.8
3 The marks \(x\) scored by a sample of 56 students in an examination are summarised by $$n = 56 , \quad \Sigma x = 3026 , \quad \Sigma x ^ { 2 } = 178890 .$$
  1. Calculate the mean and standard deviation of the marks.
  2. The highest mark scored by any of the 56 students in the examination was 93 . Show that this result may be considered to be an outlier.
  3. The formula \(y = 1.2 x - 10\) is used to scale the marks. Find the mean and standard deviation of the scaled marks.
OCR MEI S1 2008 June Q1
6 marks Moderate -0.8
1 In a survey, a sample of 44 fields is selected. Their areas ( \(x\) hectares) are summarised in the grouped frequency table.
Area \(( x )\)\(0 < x \leqslant 3\)\(3 < x \leqslant 5\)\(5 < x \leqslant 7\)\(7 < x \leqslant 10\)\(10 < x \leqslant 20\)
Frequency3813146
  1. Calculate an estimate of the sample mean and the sample standard deviation.
  2. Determine whether there could be any outliers at the upper end of the distribution.
OCR MEI S1 Q2
19 marks Moderate -0.5
2 The box and whisker plot below summarises the weights in grams of the 20 chocolates in a box. \includegraphics[max width=\textwidth, alt={}, center]{452a52c9-b1fa-4b98-a85d-a34ba0f84a9d-1_290_1186_1099_452}
  1. Find the interquartile range of the data and hence determine whether there are any outliers at either end of the distribution. Ben buys a box of these chocolates each weekend. The chocolates all look the same on the outside, but 7 of them have orange centres, 6 have cherry centres, 4 have coffee centres and 3 have lemon centres. One weekend, each of Ben's 3 children eats one of the chocolates, chosen at random.
  2. Calculate the probabilities of the following events. A: all 3 chocolates have orange centres \(B\) : all 3 chocolates have the same centres
  3. Find \(\mathrm { P } ( A \mid B )\) and \(\mathrm { P } ( B \mid A )\). The following weekend, Ben buys an identical box of chocolates and again each of his 3 children eats one of the chocolates, chosen at random.
  4. Find the probability that, on both weekends, the 3 chocolates that they eat all have orange centres.
  5. Ben likes all of the chocolates except those with cherry centres. On another weekend he is the first of his family to eat some of the chocolates. Find the probability that he has to select more than 2 chocolates before he finds one that he likes.
OCR MEI S1 Q4
17 marks Moderate -0.8
4 The weights, \(w\) grams, of a random sample of 60 carrots of variety A are summarised in the table below.
Weight\(30 \leqslant w < 50\)\(50 \leqslant w < 60\)\(60 \leqslant w < 70\)\(70 \leqslant w < 80\)\(80 \leqslant w < 90\)
Frequency111018147
  1. Draw a histogram to illustrate these data.
  2. Calculate estimates of the mean and standard deviation of \(w\).
  3. Use your answers to part (ii) to investigate whether there are any outliers. The weights, \(x\) grams, of a random sample of 50 carrots of variety B are summarised as follows. $$n = 50 \quad \sum x = 3624.5 \quad \sum x ^ { 2 } = 265416$$
  4. Calculate the mean and standard deviation of \(x\).
  5. Compare the central tendency and variation of the weights of varieties A and B .
OCR MEI S1 Q2
18 marks Moderate -0.8
2 The engine sizes \(x \mathrm {~cm} ^ { 3 }\) of a sample of 80 cars are summarised in the table below.
Engine size \(x\)\(500 \leqslant x \leqslant 1000\)\(1000 < x \leqslant 1500\)\(1500 < x \leqslant 2000\)\(2000 < x \leqslant 3000\)\(3000 < x \leqslant 5000\)
Frequency72226187
  1. Draw a histogram to illustrate the distribution.
  2. A student claims that the midrange is \(2750 \mathrm {~cm} ^ { 3 }\). Discuss briefly whether he is likely to be correct.
  3. Calculate estimates of the mean and standard deviation of the engine sizes. Explain why your answers are only estimates.
  4. Hence investigate whether there are any outliers in the sample.
  5. A vehicle duty of \(\pounds 1000\) is proposed for all new cars with engine size greater than \(2000 \mathrm {~cm} ^ { 3 }\). Assuming that this sample of cars is representative of all new cars in Britain and that there are 2.5 million new cars registered in Britain each year, calculate an estimate of the total amount of money that this vehicle duty would raise in one year.
  6. Why in practice might your estimate in part (v) turn out to be too high?
OCR MEI S1 Q3
19 marks Moderate -0.3
3 The birth weights of 200 lambs from crossbred sheep are illustrated by the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{ab4d5ab1-e3b7-495f-9142-d37df7e712de-3_919_1144_430_476}
  1. Estimate the percentage of lambs with birth weight over 6 kg .
  2. Estimate the median and interquartile range of the data.
  3. Use your answers to part (ii) to show that there are very few, if any, outliers. Comment briefly on whether any outliers should be disregarded in analysing these data. The box and whisker plot shows the birth weights of 100 lambs from Welsh Mountain sheep. \includegraphics[max width=\textwidth, alt={}, center]{ab4d5ab1-e3b7-495f-9142-d37df7e712de-3_321_1610_1818_293}
  4. Use appropriate measures to compare briefly the central tendencies and variations of the weights of the two types of lamb.
  5. The weight of the largest Welsh Mountain lamb was originally recorded as 6.5 kg , but then corrected. If this error had not been corrected, how would this have affected your answers to part (iv)? Briefly explain your answer.
  6. One lamb of each type is selected at random. Estimate the probability that the birth weight of both lambs is at least 3.9 kg .
OCR MEI S1 Q3
18 marks Moderate -0.8
3 The heating quality of the coal in a sample of 50 sacks is measured in suitable units. The data are summarised below.
Heating quality \(( x )\)\(9.1 \leqslant x \leqslant 9.3\)\(9.3 < x \leqslant 9.5\)\(9.5 < x \leqslant 9.7\)\(9.7 < x \leqslant 9.9\)\(9.9 < x \leqslant 10.1\)
Frequency5715167
  1. Draw a cumulative frequency diagram to illustrate these data.
  2. Use the diagram to estimate the median and interquartile range of the data.
  3. Show that there are no outliers in the sample.
  4. Three of these 50 sacks are selected at random. Find the probability that
    (A) in all three, the heating quality \(x\) is more than 9.5 , \(( B )\) in at least two, the heating quality \(x\) is more than 9.5.
OCR MEI S1 Q4
19 marks Moderate -0.3
4 The incomes of a sample of 918 households on an island are given in the table below.
Income
\(( x\) thousand pounds \()\)
\(0 \leqslant x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x \leqslant 60\)\(60 < x \leqslant 100\)\(100 < x \leqslant 200\)
Frequency23836514212845
  1. Draw a histogram to illustrate the data.
  2. Calculate an estimate of the mean income.
  3. Calculate an estimate of the standard deviation of the incomes.
  4. Use your answers to parts (ii) and (iii) to show there are almost certainly some outliers in the sample. Explain whether or not it would be appropriate to exclude the outliers from the calculation of the mean and the standard deviation.
  5. The incomes were converted into another currency using the formula \(y = 1.15 x\). Calculate estimates of the mean and variance of the incomes in the new currency.
OCR MEI S1 Q1
8 marks Easy -1.2
1 A business analyst collects data about the distribution of hourly wages, in \(\pounds\), of shop-floor workers at a factory. These data are illustrated in the box and whisker plot. \includegraphics[max width=\textwidth, alt={}, center]{56f1bd5c-4b45-4e36-a324-e7e0edbb5bdd-1_206_1420_505_397}
  1. Name the type of skewness of the distribution.
  2. Find the interquartile range and hence show that there are no outliers at the lower end of the distribution, but there is at least one outlier at the upper end.
  3. Suggest possible reasons why this may be the case.
OCR MEI S1 Q3
19 marks Moderate -0.3
3 A pear grower collects a random sample of 120 pears from his orchard. The histogram below shows the lengths, in mm, of these pears. \includegraphics[max width=\textwidth, alt={}, center]{56f1bd5c-4b45-4e36-a324-e7e0edbb5bdd-2_825_1634_467_295}
  1. Calculate the number of pears which are between 90 and 100 mm long.
  2. Calculate an estimate of the mean length of the pears. Explain why your answer is only an estimate.
  3. Calculate an estimate of the standard deviation.
  4. Use your answers to parts (ii) and (iii) to investigate whether there are any outliers.
  5. Name the type of skewness of the distribution.
  6. Illustrate the data using a cumulative frequency diagram.
OCR MEI S1 Q1
18 marks Moderate -0.3
1 The birth weights in grams of a random sample of 1000 babies are displayed in the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{088972e9-bfcd-429c-9145-af274a4c0a58-1_1268_1548_472_335}
  1. Use the diagram to estimate the median and interquartile range of the data.
  2. Use your answers to part (i) to estimate the number of outliers in the sample.
  3. Should these outliers be excluded from any further analysis? Briefly explain your answer.
  4. Any baby whose weight is below the 10th percentile is selected for careful monitoring. Use the diagram to determine the range of weights of the babies who are selected. \(12 \%\) of new-born babies require some form of special care. A maternity unit has 17 new-born babies. You may assume that these 17 babies form an independent random sample.
  5. Find the probability that
    (A) exactly 2 of these 17 babies require special care,
    (B) more than 2 of the 17 babies require special care.
  6. On 100 independent occasions the unit has 17 babies. Find the expected number of occasions on which there would be more than 2 babies who require special care.
OCR MEI S1 Q2
6 marks Easy -1.3
2 The times taken, in minutes, by 80 people to complete a crossword puzzle are summarised by the box and whisker plot below. \includegraphics[max width=\textwidth, alt={}, center]{088972e9-bfcd-429c-9145-af274a4c0a58-2_163_857_436_642}
  1. Write down the range and the interquartile range of the times.
  2. Determine whether any of the times can be regarded as outliers.
  3. Describe the shape of the distribution of the times.
OCR MEI S1 Q4
5 marks Moderate -0.8
4 At a certain stage of a football league season, the numbers of goals scored by a sample of 20 teams in the league were as follows. \(\begin{array} { l l l l l l l l l l l l l l l l l l l l } 22 & 23 & 23 & 23 & 26 & 28 & 28 & 30 & 31 & 33 & 33 & 34 & 35 & 35 & 36 & 36 & 37 & 46 & 49 & 49 \end{array}\)
  1. Calculate the sample mean and sample variance, \(s ^ { 2 }\), of these data.
  2. The three teams with the most goals appear to be well ahead of the other teams. Determine whether or not any of these three pieces of data may be considered outliers.
OCR MEI S1 Q3
8 marks Moderate -0.8
3 The stem and leaf diagram illustrates the heights in metres of 25 young oak trees.
3467899
402234689
501358
6245
746
81
Key: 4 |2 represents 4.2
  1. State the type of skewness of the distribution.
  2. Use your calculator to find the mean and standard deviation of these data.
  3. Determine whether there are any outliers.
OCR MEI S1 Q1
18 marks Easy -1.2
1 The maximum temperatures \(x\) degrees Celsius recorded during each month of 2005 in Cambridge are given in the table below.
JanFebMarAprMayJunJulAugSepOctNovDec
9.27.110.714.216.621.822.022.621.117.410.17.8
These data are summarised by \(n = 12 , \Sigma x = 180.6 , \Sigma x ^ { 2 } = 3107.56\).
  1. Calculate the mean and standard deviation of the data.
  2. Determine whether there are any outliers.
  3. The formula \(y = 1.8 x + 32\) is used to convert degrees Celsius to degrees Fahrenheit. Find the mean and standard deviation of the 2005 maximum temperatures in degrees Fahrenheit.
  4. In New York, the monthly maximum temperatures are recorded in degrees Fahrenheit. In 2005 the mean was 63.7 and the standard deviation was 16.0 . Briefly compare the maximum monthly temperatures in Cambridge and New York in 2005. The total numbers of hours of sunshine recorded in Cambridge during the month of January for each of the last 48 years are summarised below.
    Hours \(h\)\(70 \leqslant h < 100\)\(100 \leqslant h < 110\)\(110 \leqslant h < 120\)\(120 \leqslant h < 150\)\(150 \leqslant h < 170\)\(170 \leqslant h < 190\)
    Number of years681011103
  5. Draw a cumulative frequency graph for these data.
  6. Use your graph to estimate the 90th percentile.
OCR MEI S1 Q6
6 marks Moderate -0.8
6 In a survey, a sample of 44 fields is selected. Their areas ( \(x\) hectares) are summarised in the grouped frequency table.
Area \(( x )\)\(0 < x \leqslant 3\)\(3 < x \leqslant 5\)\(5 < x \leqslant 7\)\(7 < x \leqslant 10\)\(10 < x \leqslant 20\)
Frequency3813146
  1. Calculate an estimate of the sample mean and the sample standard deviation.
  2. Determine whether there could be any outliers at the upper end of the distribution.