2.02h Recognize outliers

154 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI S1 Q2
8 marks Moderate -0.8
2 The marks \(x\) scored by a sample of 56 students in an examination are summarised by $$n = 56 , \quad \Sigma x = 3026 , \quad \Sigma x ^ { 2 } = 178890 .$$
  1. Calculate the mean and standard deviation of the marks.
  2. The highest mark scored by any of the 56 students in the examination was 93. Show that this result may be considered to be an outlier.
  3. The formula \(y = 1.2 x - 10\) is used to scale the marks. Find the mean and standard deviation of the scaled marks.
OCR MEI S1 Q3
16 marks Moderate -0.3
3 The birth weights in grams of a random sample of 1000 babies are displayed in the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{dfb0acd8-d84b-4291-a811-a68f4942794b-2_1266_1546_487_335}
  1. Use the diagram to estimate the median and interquartile range of the data.
  2. Use your answers to part (i) to estimate the number of outliers in the sample.
  3. Should these outliers be excluded from any further analysis? Briefly explain your answer.
  4. Any baby whose weight is below the 10th percentile is selected for careful monitoring. Use the diagram to determine the range of weights of the babies who are selected. \(12 \%\) of new-born babies require some form of special care. A maternity unit has 17 new-born babies. You may assume that these 17 babies form an independent random sample.
  5. Find the probability that
    (A) exactly 2 of these 17 babies require special care,
    (B) more than 2 of the 17 babies require special care.
  6. On 100 independent occasions the unit has 17 babies. Find the expected number of occasions on which there would be more than 2 babies who require special care.
OCR MEI S1 Q3
18 marks Moderate -0.3
3 The birth weights in grams of a random sample of 1000 babies are displayed in the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{79f1015b-7c3d-4576-8d5b-e9fc89d8a49e-2_1266_1546_487_335}
  1. Use the diagram to estimate the median and interquartile range of the data.
  2. Use your answers to part (i) to estimate the number of outliers in the sample.
  3. Should these outliers be excluded from any further analysis? Briefly explain your answer.
  4. Any baby whose weight is below the 10th percentile is selected for careful monitoring. Use the diagram to determine the range of weights of the babies who are selected. \(12 \%\) of new-born babies require some form of special care. A maternity unit has 17 new-born babies. You may assume that these 17 babies form an independent random sample.
  5. Find the probability that
    (A) exactly 2 of these 17 babies require special care,
    (B) more than 2 of the 17 babies require special care.
  6. On 100 independent occasions the unit has 17 babies. Find the expected number of occasions on which there would be more than 2 babies who require special care.
OCR MEI S1 Q1
8 marks Easy -1.2
1 The stem and leaf diagram illustrates the heights in metres of 25 young oak trees.
3467899
402234689
501358
6245
746
81
Key: 4 |2 represents 4.2
  1. State the type of skewness of the distribution.
  2. Use your calculator to find the mean and standard deviation of these data.
  3. Determine whether there are any outliers.
OCR MEI S1 Q5
20 marks Moderate -0.3
5 A pear grower collects a random sample of 120 pears from his orchard. The histogram below shows the lengths, in mm , of these pears. \includegraphics[max width=\textwidth, alt={}, center]{056d3e9a-088d-4c97-9546-7cecb59b8727-3_815_1628_505_304}
  1. Calculate the number of pears which are between 90 and 100 mm long.
  2. Calculate an estimate of the mean length of the pears. Explain why your answer is only an estimate.
  3. Calculate an estimate of the standard deviation.
  4. Use your answers to parts (ii) and (iii) to investigate whether there are any outliers.
  5. Name the type of skewness of the distribution.
  6. Illustrate the data using a cumulative frequency diagram.
OCR MEI S1 Q5
22 marks Moderate -0.3
5 A pear grower collects a random sample of 120 pears from his orchard. The histogram below shows the lengths, in mm , of these pears. \includegraphics[max width=\textwidth, alt={}, center]{99c502aa-2c9f-461d-9dc0-ed55e3df32a2-3_815_1628_505_304}
  1. Calculate the number of pears which are between 90 and 100 mm long.
  2. Calculate an estimate of the mean length of the pears. Explain why your answer is only an estimate.
  3. Calculate an estimate of the standard deviation.
  4. Use your answers to parts (ii) and (iii) to investigate whether there are any outliers.
  5. Name the type of skewness of the distribution.
  6. Illustrate the data using a cumulative frequency diagram.
Edexcel S1 2016 January Q2
12 marks Easy -1.2
2. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{70137e9a-0a6b-48b5-8dd4-c436cb063351-04_284_1244_260_388} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure} Figure 1 shows part of a box and whisker plot for the marks in an examination with a large number of candidates. Part of the lower whisker has been torn off.
  1. Given that \(75 \%\) of the candidates passed the examination, state the lowest mark for the award of a pass.
  2. Given that the top \(25 \%\) of the candidates achieved a merit grade, state the lowest mark for the award of a merit grade. An outlier is defined as any value greater than \(c\) or any value less than \(d\) where $$\begin{aligned} & c = Q _ { 3 } + 1.5 \left( Q _ { 3 } - Q _ { 1 } \right) \\ & d = Q _ { 1 } - 1.5 \left( Q _ { 3 } - Q _ { 1 } \right) \end{aligned}$$
  3. Find the value of \(c\) and the value of \(d\).
  4. Write down the 3 highest marks scored in the examination. The 3 lowest marks in the examination were 5, 10 and 15
  5. On the diagram on page 7, complete the box and whisker plot. Three candidates are selected at random from those who took this examination.
  6. Find the probability that all 3 of these candidates passed the examination but only 2 achieved a merit grade.
    \includegraphics[max width=\textwidth, alt={}, center]{70137e9a-0a6b-48b5-8dd4-c436cb063351-05_285_1628_2343_166} Turn over for a spare diagram if you need to redraw your plot.
Edexcel S1 2016 January Q6
9 marks Moderate -0.8
6. Yujie is investigating the weights of 10 young rabbits. She records the weight, \(x\) grams, of each rabbit and the results are summarised below. $$\sum x = 8360 \quad \text { and } \quad \sum ( x - \bar { x } ) ^ { 2 } = 63840$$
  1. Calculate the mean and the standard deviation of the weights of these rabbits. Given that the median weight of these rabbits is 815 grams,
  2. describe, giving a reason, the skewness of these data. Two more rabbits weighing 776 grams and 896 grams are added to make a group of 12 rabbits.
  3. State, giving a reason, how the inclusion of these two rabbits would affect the mean.
  4. By considering the change in \(\sum ( x - \bar { x } ) ^ { 2 }\), state what effect the inclusion of these two rabbits would have on the standard deviation.
    END
Edexcel S1 2018 January Q1
12 marks Moderate -0.8
  1. Two classes of students, class \(A\) and class \(B\), sat a test.
Class \(A\) has 10 students. Class \(B\) has 15 students. Each student achieved a score, \(x\), on the test and their scores are summarised in the table below.
\cline { 2 - 4 } \multicolumn{1}{c|}{}\(n\)\(\sum x\)\(\sum x ^ { 2 }\)
Class \(A\)1077059610
Class \(B\)15\(t\)58035
The mean score for Class \(A\) is 77 and the mean score for Class \(B\) is 61
  1. Find the value of \(t\)
  2. Calculate the variance of the test scores for each class. The highest score on the test was 95 and the lowest score was 45 These were each scored by students from the same class.
  3. State, with a reason, which class you believe they were from. The two classes are combined into one group of 25 students.
    1. Find the mean test score for all 25 students.
    2. Find the variance of the test scores for all 25 students. The teacher of class \(A\) later realises that he added up the test scores for his class incorrectly. Each student's test score in class \(A\) should be increased by 3
  4. Without further calculations, state, with a reason, the effect this will have on
    1. the variance of the test scores for class \(A\)
    2. the mean test score for all 25 students
    3. the variance of the test scores for all 25 students.
Edexcel S1 2021 January Q2
9 marks Easy -1.3
2. The stem and leaf diagram below shows the ages (in years) of the residents in a care home.
AgeKey: \(4 \mid 3\) is an age of 43
43\(( 1 )\)
54
6235688899\(( 1 )\)
711344666889\(( 9 )\)
80027889\(( 11 )\)
937
  1. Find the median age of the residents.
  2. Find the interquartile range (IQR) of the ages of the residents. An outlier is defined as a value that is either
    more than \(1.5 \times ( \mathrm { IQR } )\) below the lower quartile or more than \(1.5 \times ( \mathrm { IQR } )\) above the upper quartile.
  3. Determine any outliers in these data. Show clearly any calculations that you use.
  4. On the grid on page 5, draw a box plot to summarise these data.
    Ages
Edexcel S1 2023 January Q5
17 marks Moderate -0.3
  1. The lengths, \(L \mathrm {~mm}\), of housefly wings are normally distributed with \(L \sim \mathrm {~N} \left( 4.5,0.4 ^ { 2 } \right)\)
    1. Find the probability that a randomly selected housefly has a wing length of less than 3.86 mm .
    2. Find
      1. the upper quartile ( \(Q _ { 3 }\) ) of \(L\)
      2. the lower quartile ( \(Q _ { 1 }\) ) of \(L\)
    A value that is greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or smaller than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) is defined as an outlier.
  2. Find these two outlier limits. A housefly is selected at random.
  3. Using standardisation, show that the probability that this housefly is not an outlier is 0.993 to 3 decimal places. Given that this housefly is not an outlier,
  4. showing your working, find the probability that the wing length of this housefly is greater than 5 mm .
Edexcel S1 2024 January Q4
12 marks Moderate -0.8
  1. A French test and a Spanish test were sat by 11 students.
The table below shows their marks.
StudentABCDEFGHIJK
French mark ( f )2430323236364044506068
Spanish mark ( \(\boldsymbol { s }\) )1690242832363844484868
Greg says that if these points were plotted on a scatter diagram, then the point \(( 30,90 )\) would be an outlier because 90 is an outlier for the Spanish marks. An outlier is defined as a value that is $$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { or smaller than } Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  1. Show that 90 is an outlier for the Spanish marks. Ignoring the point (30, 90), Greg calculated the following summary statistics. $$\sum f = 422 \quad \sum s = 382 \quad S _ { f f } = 1667.6 \quad S _ { f s } = 1735.6$$
  2. Use these summary statistics to show that the equation of the least squares regression line of \(s\) on \(f\) for the remaining 10 students is $$s = - 5.72 + 1.04 f$$ where the values of the intercept and gradient are given to 3 significant figures. You must show your working.
  3. Give an interpretation of the gradient of the regression line. Two further students sat the French test but missed the Spanish test.
  4. Using the equation given in part (b), estimate
    1. a Spanish mark for the student who scored 55 marks in their French test,
    2. a Spanish mark for the student who scored 18 marks in their French test.
  5. State, giving a reason, which of the two estimates found in part (d) would be the more reliable estimate.
OCR S1 2009 January Q5
8 marks Easy -1.3
5 The stem-and-leaf diagram shows the masses, in grams, of 23 plums, measured correct to the nearest gram.
5567889
61235689
700245678
80
97
9
\(\quad\) Key \(: 6 \mid 2\) means 62
  1. Find the median and interquartile range of these masses.
  2. State one advantage of using the interquartile range rather than the standard deviation as a measure of the variation in these masses.
  3. State one advantage and one disadvantage of using a stem-and-leaf diagram rather than a box-and-whisker plot to represent data.
  4. James wished to calculate the mean and standard deviation of the given data. He first subtracted 5 from each of the digits to the left of the line in the stem-and-leaf diagram, giving the following.
    0567889
    11235689
    200245678
    30
    47
    The mean and standard deviation of the data in this diagram are 18.1 and 9.7 respectively, correct to 1 decimal place. Write down the mean and standard deviation of the data in the original diagram.
OCR S1 2015 June Q2
10 marks Easy -1.3
2 The masses, in grams, of 400 plums were recorded. The masses were then collected into class intervals of width 5 g and a cumulative frequency graph was drawn, as shown below. \includegraphics[max width=\textwidth, alt={}, center]{e5957185-5fe3-45d9-9ab3-c2aab9cbd8dd-3_1045_1401_358_333}
  1. Find the number of plums with masses in the interval 40 g to 45 g .
  2. Find the percentage of plums with masses greater than 70 g .
  3. Give estimates of the highest and lowest masses in the sample, explaining why their exact values cannot be read from the graph.
  4. On the graph paper in the answer book, draw a box-and-whisker plot to illustrate the masses of the plums in the sample.
  5. Comment briefly on the shape of the distribution of masses.
OCR MEI S1 2012 January Q7
19 marks Moderate -0.3
7 The birth weights of 200 lambs from crossbred sheep are illustrated by the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{4b259fe3-73ef-419f-85ad-1a3b1e6ea56e-4_917_1146_367_447}
  1. Estimate the percentage of lambs with birth weight over 6 kg .
  2. Estimate the median and interquartile range of the data.
  3. Use your answers to part (ii) to show that there are very few, if any, outliers. Comment briefly on whether any outliers should be disregarded in analysing these data. The box and whisker plot shows the birth weights of 100 lambs from Welsh Mountain sheep. \includegraphics[max width=\textwidth, alt={}, center]{4b259fe3-73ef-419f-85ad-1a3b1e6ea56e-4_328_1616_1749_260}
  4. Use appropriate measures to compare briefly the central tendencies and variations of the weights of the two types of lamb.
  5. The weight of the largest Welsh Mountain lamb was originally recorded as 6.5 kg , but then corrected. If this error had not been corrected, how would this have affected your answers to part (iv)? Briefly explain your answer.
  6. One lamb of each type is selected at random. Estimate the probability that the birth weight of both lambs is at least 3.9 kg .
OCR MEI S1 2010 June Q1
8 marks Moderate -0.8
1 A business analyst collects data about the distribution of hourly wages, in \(\pounds\), of shop-floor workers at a factory. These data are illustrated in the box and whisker plot. \includegraphics[max width=\textwidth, alt={}, center]{091d6f43-ad01-4849-9f3c-3e58349aa169-2_204_1422_484_363}
  1. Name the type of skewness of the distribution.
  2. Find the interquartile range and hence show that there are no outliers at the lower end of the distribution, but there is at least one outlier at the upper end.
  3. Suggest possible reasons why this may be the case.
OCR MEI S1 2012 June Q6
18 marks Moderate -0.3
6 The engine sizes \(x \mathrm {~cm} ^ { 3 }\) of a sample of 80 cars are summarised in the table below.
Engine size \(x\)\(500 \leqslant x \leqslant 1000\)\(1000 < x \leqslant 1500\)\(1500 < x \leqslant 2000\)\(2000 < x \leqslant 3000\)\(3000 < x \leqslant 5000\)
Frequency72226187
  1. Draw a histogram to illustrate the distribution.
  2. A student claims that the midrange is \(2750 \mathrm {~cm} ^ { 3 }\). Discuss briefly whether he is likely to be correct.
  3. Calculate estimates of the mean and standard deviation of the engine sizes. Explain why your answers are only estimates.
  4. Hence investigate whether there are any outliers in the sample.
  5. A vehicle duty of \(\pounds 1000\) is proposed for all new cars with engine size greater than \(2000 \mathrm {~cm} ^ { 3 }\). Assuming that this sample of cars is representative of all new cars in Britain and that there are 2.5 million new cars registered in Britain each year, calculate an estimate of the total amount of money that this vehicle duty would raise in one year.
  6. Why in practice might your estimate in part (v) turn out to be too high?
OCR MEI S1 2013 June Q6
18 marks Easy -1.2
6 The birth weights in kilograms of 25 female babies are shown below, in ascending order.
1.392.502.682.762.822.822.843.033.063.163.163.243.32
3.363.403.543.563.563.703.723.723.844.024.244.34
  1. Find the median and interquartile range of these data.
  2. Draw a box and whisker plot to illustrate the data.
  3. Show that there is exactly one outlier. Discuss whether this outlier should be removed from the data. The cumulative frequency curve below illustrates the birth weights of 200 male babies. \includegraphics[max width=\textwidth, alt={}, center]{6b886da6-3fb8-4b4c-b572-f4b770ae5a4c-3_929_1569_1450_248}
  4. Find the median and interquartile range of the birth weights of the male babies.
  5. Compare the weights of the female and male babies.
  6. Two of these male babies are chosen at random. Calculate an estimate of the probability that both of these babies weigh more than any of the female babies.
OCR MEI S1 2015 June Q8
19 marks Standard +0.3
8 The box and whisker plot below summarises the weights in grams of the 20 chocolates in a box. \includegraphics[max width=\textwidth, alt={}, center]{6015ae6c-bf76-4a0c-af0f-5c53f9c5ed2a-4_287_1177_319_427}
  1. Find the interquartile range of the data and hence determine whether there are any outliers at either end of the distribution. Ben buys a box of these chocolates each weekend. The chocolates all look the same on the outside, but 7 of them have orange centres, 6 have cherry centres, 4 have coffee centres and 3 have lemon centres. One weekend, each of Ben's 3 children eats one of the chocolates, chosen at random.
  2. Calculate the probabilities of the following events. A: all 3 chocolates have orange centres \(B\) : all 3 chocolates have the same centres
  3. Find \(\mathrm { P } ( A \mid B )\) and \(\mathrm { P } ( B \mid A )\). The following weekend, Ben buys an identical box of chocolates and again each of his 3 children eats one of the chocolates, chosen at random.
  4. Find the probability that, on both weekends, the 3 chocolates that they eat all have orange centres.
  5. Ben likes all of the chocolates except those with cherry centres. On another weekend he is the first of his family to eat some of the chocolates. Find the probability that he has to select more than 2 chocolates before he finds one that he likes. \section*{END OF QUESTION PAPER} \section*{OCR
    Oxford Cambridge and RSA}
OCR MEI S1 2009 January Q6
17 marks Easy -1.2
6 The temperature of a supermarket fridge is regularly checked to ensure that it is working correctly. Over a period of three months the temperature (measured in degrees Celsius) is checked 600 times. These temperatures are displayed in the cumulative frequency diagram below. \includegraphics[max width=\textwidth, alt={}, center]{7b92607f-1bf9-45f0-997b-fe76c88b5fcd-4_1054_1649_539_248}
  1. Use the diagram to estimate the median and interquartile range of the data.
  2. Use your answers to part (i) to show that there are very few, if any, outliers in the sample.
  3. Suppose that an outlier is identified in these data. Discuss whether it should be excluded from any further analysis.
  4. Copy and complete the frequency table below for these data.
    Temperature
    \(( t\) degrees Celsius \()\)
    \(3.0 \leqslant t \leqslant 3.4\)\(3.4 < t \leqslant 3.8\)\(3.8 < t \leqslant 4.2\)\(4.2 < t \leqslant 4.6\)\(4.6 < t \leqslant 5.0\)
    Frequency243157
  5. Use your table to calculate an estimate of the mean.
  6. The standard deviation of the temperatures in degrees Celsius is 0.379 . The temperatures are converted from degrees Celsius into degrees Fahrenheit using the formula \(F = 1.8 C + 32\). Hence estimate the mean and find the standard deviation of the temperatures in degrees Fahrenheit.
OCR MEI S1 2016 June Q1
7 marks Moderate -0.8
1 The stem and leaf diagram illustrates the weights in grams of 20 house sparrows.
250
26058
2779
28145
29002
3077
316
32047
3333
Key: \(\quad 27 \quad \mid \quad 7 \quad\) represents 27.7 grams
  1. Find the median and interquartile range of the data.
  2. Determine whether there are any outliers.
OCR H240/02 2019 June Q8
6 marks Easy -1.3
8 The stem-and-leaf diagram shows the heights, in centimetres, of 17 plants, measured correct to the nearest centimetre.
55799
63455599
745799
8
99
Key: 5 | 6 means 56
  1. Find the median and inter-quartile range of these heights.
  2. Calculate the mean and standard deviation of these heights.
  3. State one advantage of using the median rather than the mean as a measure of average for these heights.
OCR H240/02 2021 November Q11
2 marks Moderate -0.8
11 Zac is planning to write a report on the music preferences of the students at his college. There is a large number of students at the college.
  1. State one reason why Zac might wish to obtain information from a sample of students, rather than from all the students.
  2. Amaya suggests that Zac should use a sample that is stratified by school year. Give one advantage of this method as compared with random sampling, in this context. Zac decides to take a random sample of 60 students from his college. He asks each student how many hours per week, on average, they spend listening to music during term. From his results he calculates the following statistics.
    Mean
    Standard
    deviation
    Median
    Lower
    quartile
    Upper
    quartile
    21.04.2020.518.022.9
  3. Sundip tells Zac that, during term, she spends on average 30 hours per week listening to music. Discuss briefly whether this value should be considered an outlier.
  4. Layla claims that, during term, each student spends on average 20 hours per week listening to music. Zac believes that the true figure is higher than 20 hours. He uses his results to carry out a hypothesis test at the 5\% significance level. Assume that the time spent listening to music is normally distributed with standard deviation 4.20 hours. Carry out the test.
Edexcel AS Paper 2 2020 June Q2
5 marks Moderate -0.8
  1. Jerry is studying visibility for Camborne using the large data set June 1987.
The table below contains two extracts from the large data set.
It shows the daily maximum relative humidity and the daily mean visibility.
Date
Daily Maximum
Relative Humidity
Daily Mean Visibility
Units\(\%\)
\(10 / 06 / 1987\)905300
\(28 / 06 / 1987\)1000
(The units for Daily Mean Visibility are deliberately omitted.)
Given that daily mean visibility is given to the nearest 100,
  1. write down the range of distances in metres that corresponds to the recorded value 0 for the daily mean visibility. Jerry drew the following scatter diagram, Figure 2, and calculated some statistics using the June 1987 data for Camborne from the large data set. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{d62e5a00-cd23-417f-b244-8b3e24da4aa2-04_823_1764_1281_137} \captionsetup{labelformat=empty} \caption{Figure 2}
    \end{figure} Jerry defines an outlier as a value that is more than 1.5 times the interquartile range above \(Q _ { 3 }\) or more than 1.5 times the interquartile range below \(Q _ { 1 }\).
  2. Show that the point circled on the scatter diagram is an outlier for visibility.
  3. Interpret the correlation between the daily mean visibility and the daily maximum relative humidity. Jerry drew the following scatter diagram, Figure 3, using the June 1987 data for Camborne from the large data set, but forgot to label the \(x\)-axis.
    \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{d62e5a00-cd23-417f-b244-8b3e24da4aa2-05_730_1056_342_386} \captionsetup{labelformat=empty} \caption{Figure 3}
    \end{figure}
  4. Using your knowledge of the large data set, suggest which variable the \(x\)-axis on this scatter diagram represents.
Edexcel AS Paper 2 2024 June Q1
4 marks Easy -1.8
  1. A coach recorded the heights of some adult rugby players and found the following summary statistics.
$$\begin{array} { r } \text { Median } = 1.85 \mathrm {~m} \\ \text { Range } = 0.28 \mathrm {~m} \\ \text { Interquartile range } = 0.11 \mathrm {~m} \end{array}$$ The coach also noticed that
  • the height of the shortest player is 1.72 m
  • \(25 \%\) of the players' heights are below the height of a player whose height is 1.81 m
Draw a box and whisker plot to represent this information on the grid below. \includegraphics[max width=\textwidth, alt={}, center]{6a0b46f8-7a6a-4ed8-8c7a-9772787f155a-02_342_1096_1027_488}