Outliers and box plots

Use normal distribution properties to identify outliers, find quartiles for box plots, or interpret box plot features in context of normal model.

5 questions

CAIE S1 2009 November Q1
1
\includegraphics[max width=\textwidth, alt={}, center]{6f677fc6-3ca2-4a0d-82a2-69a7cbb8574d-2_211_1169_267_488} Measurements of wind speed on a certain island were taken over a period of one year. A box-andwhisker plot of the data obtained is displayed above, and the values of the quartiles are as shown. It is suggested that wind speed can be modelled approximately by a normal distribution with mean \(\mu \mathrm { km } \mathrm { h } ^ { - 1 }\) and standard deviation \(\sigma \mathrm { km } \mathrm { h } ^ { - 1 }\).
  1. Estimate the value of \(\mu\).
  2. Estimate the value of \(\sigma\).
Edexcel S1 2023 January Q5
  1. The lengths, \(L \mathrm {~mm}\), of housefly wings are normally distributed with \(L \sim \mathrm {~N} \left( 4.5,0.4 ^ { 2 } \right)\)
    1. Find the probability that a randomly selected housefly has a wing length of less than 3.86 mm .
    2. Find
      1. the upper quartile ( \(Q _ { 3 }\) ) of \(L\)
      2. the lower quartile ( \(Q _ { 1 }\) ) of \(L\)
    A value that is greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or smaller than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) is defined as an outlier.
  2. Find these two outlier limits. A housefly is selected at random.
  3. Using standardisation, show that the probability that this housefly is not an outlier is 0.993 to 3 decimal places. Given that this housefly is not an outlier,
  4. showing your working, find the probability that the wing length of this housefly is greater than 5 mm .
Edexcel Paper 3 2019 June Q2
2. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{d1eaaae7-c1dc-4aee-ab54-59f35519a7a4-06_321_1822_294_127} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure} The partially completed box plot in Figure 1 shows the distribution of daily mean air temperatures using the data from the large data set for Beijing in 2015 An outlier is defined as a value
more than \(1.5 \times\) IQR below \(Q _ { 1 }\) or
more than \(1.5 \times\) IQR above \(Q _ { 3 }\)
The three lowest air temperatures in the data set are \(7.6 ^ { \circ } \mathrm { C } , 8.1 ^ { \circ } \mathrm { C }\) and \(9.1 ^ { \circ } \mathrm { C }\)
The highest air temperature in the data set is \(32.5 ^ { \circ } \mathrm { C }\)
  1. Complete the box plot in Figure 1 showing clearly any outliers.
  2. Using your knowledge of the large data set, suggest from which month the two outliers are likely to have come. Using the data from the large data set, Simon produced the following summary statistics for the daily mean air temperature, \(x ^ { \circ } \mathrm { C }\), for Beijing in 2015 $$n = 184 \quad \sum x = 4153.6 \quad \mathrm {~S} _ { x x } = 4952.906$$
  3. Show that, to 3 significant figures, the standard deviation is \(5.19 ^ { \circ } \mathrm { C }\) Simon decides to model the air temperatures with the random variable $$T \sim \mathrm {~N} \left( 22.6,5.19 ^ { 2 } \right)$$
  4. Using Simon's model, calculate the 10th to 90th interpercentile range. Simon wants to model another variable from the large data set for Beijing using a normal distribution.
  5. State two variables from the large data set for Beijing that are not suitable to be modelled by a normal distribution. Give a reason for each answer.
    \includegraphics[max width=\textwidth, alt={}, center]{d1eaaae7-c1dc-4aee-ab54-59f35519a7a4-09_473_1813_2161_127}
    (Total for Question 2 is 11 marks)
Edexcel S1 Specimen Q7
  1. The distances travelled to work, \(D \mathrm {~km}\), by the employees at a large company are normally distributed with \(D \sim \mathrm {~N} \left( 30,8 ^ { 2 } \right)\).
    1. Find the probability that a randomly selected employee has a journey to work of more than 20 km .
    2. Find the upper quartile, \(Q _ { 3 }\), of \(D\).
    3. Write down the lower quartile, \(Q _ { 1 }\), of \(D\).
    An outlier is defined as any value of \(D\) such that \(D < h\) or \(D > k\) where $$h = Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \quad \text { and } \quad k = Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  2. Find the value of \(h\) and the value of \(k\). An employee is selected at random.
  3. Find the probability that the distance travelled to work by this employee is an outlier.
    END
Edexcel S1 2010 June Q7
7. The distances travelled to work, \(D \mathrm {~km}\), by the employees at a large company are normally distributed with \(D \sim \mathrm {~N} \left( 30,8 ^ { 2 } \right)\).
  1. Find the probability that a randomly selected employee has a journey to work of more than 20 km .
  2. Find the upper quartile, \(Q _ { 3 }\), of \(D\).
  3. Write down the lower quartile, \(Q _ { 1 }\), of \(D\). An outlier is defined as any value of \(D\) such that \(D < h\) or \(D > k\) where $$h = Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \quad \text { and } \quad k = Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  4. Find the value of \(h\) and the value of \(k\). An employee is selected at random.
  5. Find the probability that the distance travelled to work by this employee is an outlier.