2.02h Recognize outliers

154 questions

Sort by: Default | Easiest first | Hardest first
AQA S1 2012 June Q2
10 marks Moderate -0.8
2 Katy works as a clerical assistant for a small company. Each morning, she collects the company's post from a secure box in the nearby Royal Mail sorting office. Katy's supervisor asks her to keep a daily record of the number of letters that she collects. Her records for a period of 175 days are summarised in the table.
Daily number of letters (x)Number of days (f)
0-95
10-1916
2023
2127
2231
2334
2416
25-2910
30-345
35-393
40-494
50 or more1
Total175
  1. For these data:
    1. state the modal value;
    2. determine values for the median and the interquartile range.
  2. The most letters that Katy collected on any of the 175 days was 54. Calculate estimates of the mean and the standard deviation of the daily number of letters collected by Katy.
  3. During the same period, a total of 280 letters was also delivered to the company by private courier firms. Calculate an estimate of the mean daily number of all letters received by the company during the 175 days.
AQA S1 2014 June Q1
6 marks Easy -1.8
1 The weights, in kilograms, of a random sample of 15 items of cabin luggage on an aeroplane were as follows. \section*{\(\begin{array} { l l l l l l l l l l l l l l l } 4.6 & 3.8 & 3.9 & 4.5 & 4.9 & 3.6 & 3.7 & 5.2 & 4.0 & 5.1 & 4.1 & 3.3 & 4.7 & 5.0 & 4.8 \end{array}\)} For these data:
  1. find values for the median and the interquartile range;
  2. find the value for the range;
  3. state why the mode is not an appropriate measure of average.
AQA S1 2014 June Q1
11 marks Easy -1.3
1 Henrietta lives on a small farm where she keeps some hens.
For a period of 35 weeks during the hens' first laying season, she records, each week, the total number of eggs laid by the hens. Her records are shown in the table.
Total number of eggs laid in a week ( \(\boldsymbol { x }\) )Number of weeks ( f)
661
672
683
695
707
718
724
732
742
751
Total35
  1. For these data:
    1. state values for the mode and the range;
    2. find values for the median and the interquartile range;
    3. calculate values for the mean and the standard deviation.
  2. Each week, for the 35 weeks, Henrietta sells 60 eggs to a local shop, keeping the remainder for her own use. State values for the mean and the standard deviation of the number of eggs that she keeps.
    [0pt] [2 marks]
Edexcel S1 Q5
12 marks Easy -1.3
5. For a project, a student asked 40 people to draw two straight lines with what they thought was an angle of \(75 ^ { \circ }\) between them, using just a ruler and a pencil. She then measured the size of the angles that had been drawn and her data are summarised in this stem and leaf diagram.
Angle( \(6 \mid 4\) means \(64 ^ { \circ }\) )Totals
41(1)
4(0)
5024(3)
5589(3)
611334(5)
655789(5)
7011233444(9)
75667799(7)
801134(5)
856(2)
  1. Find the median and quartiles of these data. Given that any values outside of the limits \(\mathrm { Q } _ { 1 } - 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) and \(\mathrm { Q } _ { 3 } + 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) are to be regarded as outliers,
  2. determine if there are any outliers in these data,
  3. draw a box plot representing these data on graph paper,
  4. describe the skewness of the distribution and suggest a reason for it.
Edexcel S1 Q5
16 marks Easy -1.3
5. Each child in class 3A was given a packet of seeds to plant. The stem and leaf diagram below shows how many seedlings were visible in each child's tray one week after planting.
Number of seedlings(2 | 1 means 21)Totals
002(2)
0(0)
11(1)
157(2)
201334(5)
25777899(7)
30001224(7)
35688(4)
4134(3)
  1. Find the median and interquartile range for these data.
  2. Use the quartiles to describe the skewness of the data. Show your method clearly. The mean and standard deviation for these data were 27.2 and 10.3 respectively.
  3. Explaining your answer, state whether you would recommend using these values or your answers to part (a) to summarise these data. Outliers are defined to be values outside of the limits \(\mathrm { Q } _ { 1 } - 2 s\) and \(\mathrm { Q } _ { 3 } + 2 s\) where \(s\) is the standard deviation given above.
  4. Represent these data with a boxplot identifying clearly any outliers.
Edexcel S1 Q5
14 marks Easy -1.2
5. In a survey unemployed people were asked how many months it had been, to the nearest month, since they were last employed on a full-time basis. The data collected is summarised in this stem and leaf diagram.
Number of months(2 | 1 means 21 months)Totals
011224446779(11)
102355689( )
21568( )
3079( )
45( )
527(2)
63(1)
70(1)
  1. Write down the values needed to complete the totals column on the stem and leaf diagram.
  2. State the mode of these data.
  3. Find the median and quartiles of these data. Given that any values outside of the limits \(\mathrm { Q } _ { 1 } - 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) and \(\mathrm { Q } _ { 3 } + 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) are to be regarded as outliers,
  4. determine if there are any outliers in these data,
  5. draw a box plot representing these data on graph paper,
  6. describe the skewness of these data and suggest a reason for it.
Edexcel S1 2022 January Q3
10 marks Moderate -0.8
  1. The stem and leaf diagram shows the number of deliveries made by Pat each day for 24 days
\begin{table}[h]
\captionsetup{labelformat=empty} \caption{Key: 10 \(\mathbf { 8 }\) represents 108 deliveries}
1089(2)
1103666889999(11)
1245555558(8)
13\(a\)\(b\)\(c\)(3)
\end{table} where \(a\), \(b\) and \(c\) are positive integers with \(a < b < c\) An outlier is defined as any value greater than \(1.5 \times\) interquartile range above the upper quartile. Given that there is only one outlier for these data,
  1. show that \(c = 9\) The number of deliveries made by Pat each day is represented by \(d\) The data in the stem and leaf diagram are coded using $$x = d - 125$$ and the following summary statistics are obtained $$\sum x = - 96 \quad \text { and } \quad \sum ( x - \bar { x } ) ^ { 2 } = 1306$$
  2. Find the mean number of deliveries.
  3. Find the standard deviation of the number of deliveries. One of these 24 days is selected at random. The random variable \(D\) represents the number of deliveries made by Pat on this day. The random variable \(X = D - 125\)
  4. Find \(\mathrm { P } ( D > 118 \mid X < 0 )\)
Edexcel S1 2017 October Q1
14 marks Easy -1.2
  1. At the start of a course, an instructor asked a group of 80 apprentices to estimate the length of a piece of pipe. The error (true length - estimated length) was recorded in centimetres. The results are summarised in the box plot below. \includegraphics[max width=\textwidth, alt={}, center]{77ae01cd-2b58-48ab-889f-272e27ecf99d-02_291_1445_397_246}
    1. Find the range for these data.
    2. Find the interquartile range for these data.
    One month later, the instructor asked the 80 apprentices to estimate the length of a different piece of pipe and recorded their errors. The results are summarised in the table below.
    Error ( \(\boldsymbol { e }\) cm)Number of apprentices
    \(- 40 < e \leqslant - 16\)2
    \(- 16 < e \leqslant - 8\)18
    \(- 8 < e \leqslant 0\)33
    \(0 < e \leqslant 8\)14
    \(8 < e \leqslant 16\)10
    \(16 < e \leqslant 40\)3
  2. Use linear interpolation to estimate the median error for these data.
  3. Show that the upper quartile for these data, to the nearest centimetre, is 4 . For these data, the lower quartile is - 8 and the five worst errors were \(- 25 , - 21,18,23,28\) An outlier is a value that falls either more than \(1.5 \times\) (interquartile range) above the upper quartile or more than \(1.5 \times\) (interquartile range) below the lower quartile.
    1. Show that there are only 2 outliers for these data.
    2. Draw a box plot for these data on the grid on page 3.
  4. State, giving reasons, whether or not the apprentices' ability to estimate the length of a piece of pipe has improved over the first month of the course. \includegraphics[max width=\textwidth, alt={}, center]{77ae01cd-2b58-48ab-889f-272e27ecf99d-03_412_1520_2222_173}
Edexcel S1 2017 October Q5
13 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. Last year's performance score \(x\) and annual salary \(y\), in thousands of dollars, were recorded for a random sample of 10 employees of the company.
The performance scores were $$\begin{array} { l l l l l l l l l l } 15 & 24 & 32 & 39 & 41 & 18 & 16 & 22 & 34 & 42 \end{array}$$ (You may use \(\sum x ^ { 2 } = 9011\) )
  1. Find the mean and the variance of these performance scores. The corresponding \(y\) values for these 10 employees are summarised by $$\sum y = 306.1 \quad \text { and } \quad \mathrm { S } _ { y y } = 546.3$$
  2. Find the mean and the variance of these \(y\) values. The regression line of \(y\) on \(x\) based on this sample is $$y = 12.0 + 0.659 x$$
  3. Find the product moment correlation coefficient for these data.
  4. State, giving a reason, whether or not the value of the product moment correlation coefficient supports the use of a regression line to model the relationship between performance score and annual salary. The company decides to use this regression model to determine future salaries.
  5. Find the proposed annual salary, in dollars, for an employee who has a performance score of 35
Edexcel S1 2021 October Q2
12 marks Moderate -0.5
2. A large company is analysing how much money it spends on paper in its offices each year. The number of employees in the office, \(x\), and the amount spent on paper in a year, \(p\) (\$ hundreds), in each of 12 randomly selected offices were recorded. The results are summarised in the following statistics. $$\sum x = 93 \quad \mathrm {~S} _ { x x } = 148.25 \quad \sum p = 273 \quad \sum p ^ { 2 } = 6602.72 \quad \sum x p = 2347$$
  1. Show that \(\mathrm { S } _ { x p } = 231.25\)
  2. Find the product moment correlation coefficient for these data.
  3. Find the equation of the regression line of \(p\) on \(x\) in the form \(p = a + b x\)
  4. Give an interpretation of the gradient of your regression line. The director of the company wants to reduce the amount spent on paper each year. He wants each office to aim for a model of the form \(p = \frac { 4 } { 5 } a + \frac { 1 } { 2 } b x\), where \(a\) and \(b\) are the values found in part (c). Using the data for the 93 employees from the 12 offices,
  5. estimate the percentage saving in the amount spent on paper each year by the company using the director's model.
Edexcel S1 2021 October Q3
14 marks Moderate -0.8
  1. The stem and leaf diagram shows the ages of the 35 male passengers on a cruise.
Age
13\(( 1 )\)
279\(( 2 )\)
31288\(( 4 )\)
45567889\(( 7 )\)
52233445668\(( 10 )\)
60114447\(( 7 )\)
736\(( 2 )\)
878\(( 2 )\)
Key: 1 | 3 represents an age of 13 years
  1. Find the median age of the male passengers.
  2. Show that the interquartile range (IQR) of these ages is 16 An outlier is defined as a value that is more than \(1.5 \times\) IQR above the upper quartile
    or \(1.5 \times\) IQR below the lower quartile
  3. Show that there are 3 outliers amongst these ages.
  4. On the grid in Figure 1 on page 9, draw a box plot for the ages of the male passengers on the cruise. Figure 1 on page 9 also shows a box plot for the ages of the female passengers on the cruise.
  5. Comment on any difference in the distributions of ages of male and female passengers on the cruise.
    State the values of any statistics you have used to support your comment.
    (1) Anja, along with her 2 daughters and a granddaughter, now join the cruise.
    Anja's granddaughter is younger than both of Anja's daughters.
    Anja had her 23rd birthday on the day her eldest daughter was born.
    When their 4 ages are included with the other female passengers on the cruise, the box plot does not change.
  6. State, giving reasons, what you can say about
    1. the granddaughter's age
    2. Anja's age.
      (3)
      \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{29ac0c0b-f963-40a1-beba-7146bbb2d021-09_1025_1593_1541_182} \captionsetup{labelformat=empty} \caption{Figure 1}
      \end{figure}
Edexcel S1 Q1
Easy -1.2
  1. The students in a class were each asked to write down how many CDs they owned. The student with the least number of CDs had 14 and all but one of the others owned 60 or fewer. The remaining student owned 65 . The quartiles for the class were 30,34 and 42 respectively.
Outliers are defined to be any values outside the limits of \(1.5 \left( Q _ { 3 } - Q _ { 1 } \right)\) below the lower quartile or above the upper quartile. On graph paper draw a box plot to represent these data, indicating clearly any outliers.
(7 marks)
Edexcel S1 Q4
Easy -1.2
4. Aeroplanes fly from City \(A\) to City \(B\). Over a long period of time the number of minutes delay in take-off from City \(A\) was recorded. The minimum delay was 5 minutes and the maximum delay was 63 minutes. A quarter of all delays were at most 12 minutes, half were at most 17 minutes and \(75 \%\) were at most 28 minutes. Only one of the delays was longer than 45 minutes. An outlier is an observation that falls either \(1.5 \times\) (interquartile range) above the upper quartile or \(1.5 \times\) (interquartile range) below the lower quartile.
  1. On the graph paper opposite draw a box plot to represent these data.
  2. Comment on the distribution of delays. Justify your answer.
  3. Suggest how the distribution might be interpreted by a passenger who frequently flies from City \(A\) to City \(B\). \includegraphics[max width=\textwidth, alt={}, center]{3d4f7bfb-b235-418a-9411-a4d0b3188254-008_1190_1487_278_223}
Edexcel S1 2003 June Q3
10 marks Moderate -0.8
3. A company owns two petrol stations \(P\) and \(Q\) along a main road. Total daily sales in the same week for \(P ( \pounds p )\) and for \(Q ( \pounds q )\) are summarised in the table below.
\(p\)\(q\)
Monday47605380
Tuesday53954460
Wednesday58404640
Thursday46505450
Friday53654340
Saturday49905550
Sunday43655840
When these data are coded using \(x = \frac { p - 4365 } { 100 }\) and \(y = \frac { q - 4340 } { 100 }\), $$\Sigma x = 48.1 , \Sigma y = 52.8 , \Sigma x ^ { 2 } = 486.44 , \Sigma y ^ { 2 } = 613.22 \text { and } \Sigma x y = 204.95 .$$
  1. Calculate \(S _ { x y } , S _ { x x }\) and \(S _ { y y }\).
  2. Calculate, to 3 significant figures, the value of the product moment correlation coefficient between \(x\) and \(y\).
    1. Write down the value of the product moment correlation coefficient between \(p\) and \(q\).
    2. Give an interpretation of this value.
Edexcel S1 2003 June Q6
16 marks Moderate -0.8
6. The number of bags of potato crisps sold per day in a bar was recorded over a two-week period. The results are shown below. $$20,15,10,30,33,40,5,11,13,20,25,42,31,17$$
  1. Calculate the mean of these data.
  2. Draw a stem and leaf diagram to represent these data.
  3. Find the median and the quartiles of these data. An outlier is an observation that falls either \(1.5 \times\) (interquartile range) above the upper quartile or \(1.5 \times\) (interquartile range) below the lower quartile.
  4. Determine whether or not any items of data are outliers.
  5. On graph paper draw a box plot to represent these data. Show your scale clearly.
  6. Comment on the skewness of the distribution of bags of crisps sold per day. Justify your answer.
AQA AS Paper 2 2019 June Q13
6 marks Easy -1.2
13 Denzel wants to buy a car with a propulsion type other than petrol or diesel.
He takes a sample, from the Large Data Set, of the CO2 emissions, in \(\mathrm { g } / \mathrm { km }\), of cars with one particular propulsion type. The sample is as follows $$\begin{array} { l l l l l l l l } 82 & 13 & 96 & 49 & 96 & 92 & 70 & 81 \end{array}$$ 13
  1. Using your knowledge of the Large Data Set, state which propulsion type this sample is for, giving a reason for your answer.
    13
  2. Calculate the mean of the sample.
    13
  3. Calculate the standard deviation of the sample.
    13
  4. Denzel claims that the value 13 is an outlier. 13 (d) (i) Any value more than 2 standard deviations from the mean can be regarded as an outlier. Verify that Denzel's claim is correct.
    13 (d) (ii) State what effect, if any, removing the value 13 from the sample would have on the standard deviation.
Edexcel AS Paper 2 2018 June Q4
8 marks Moderate -0.8
  1. Helen is studying the daily mean wind speed for Camborne using the large data set from 1987. The data for one month are summarised in Table 1 below.
\begin{table}[h]
Windspeed\(\mathrm { n } / \mathrm { a }\)67891112131416
Frequency13232231212
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Calculate the mean for these data.
  2. Calculate the standard deviation for these data and state the units. The means and standard deviations of the daily mean wind speed for the other months from the large data set for Camborne in 1987 are given in Table 2 below. The data are not in month order. \begin{table}[h]
    Month\(A\)\(B\)\(C\)\(D\)\(E\)
    Mean7.588.268.578.5711.57
    Standard Deviation2.933.893.463.874.64
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  3. Using your knowledge of the large data set, suggest, giving a reason, which month had a mean of 11.57 The data for these months are summarised in the box plots on the opposite page. They are not in month order or the same order as in Table 2.
    1. State the meaning of the * symbol on some of the box plots.
    2. Suggest, giving your reasons, which of the months in Table 2 is most likely to be summarised in the box plot marked \(Y\). \includegraphics[max width=\textwidth, alt={}, center]{2edcf965-9c93-4a9b-9395-2d3c023801af-11_1177_1216_324_427}
Edexcel AS Paper 2 Specimen Q4
7 marks Moderate -0.8
  1. Sara was studying the relationship between rainfall, \(r \mathrm {~mm}\), and humidity, \(h \%\), in the UK. She takes a random sample of 11 days from May 1987 for Leuchars from the large data set.
She obtained the following results.
\(h\)9386959786949797879786
\(r\)1.10.33.720.6002.41.10.10.90.1
Sara examined the rainfall figures and found $$Q _ { 1 } = 0.1 \quad Q _ { 2 } = 0.9 \quad Q _ { 3 } = 2.4$$ A value that is more than 1.5 times the interquartile range (IQR) above \(Q _ { 3 }\) is called an outlier.
  1. Show that \(r = 20.6\) is an outlier.
  2. Give a reason why Sara might:
    1. include
    2. exclude
      this day's reading. Sara decided to exclude this day's reading and drew the following scatter diagram for the remaining 10 days' values of \(r\) and \(h\). \includegraphics[max width=\textwidth, alt={}, center]{8f3dbcb4-3260-4493-a230-12577b4ed691-08_988_1081_1555_420}
  3. Give an interpretation of the correlation between rainfall and humidity. The equation of the regression line of \(r\) on \(h\) for these 10 days is \(r = - 12.8 + 0.15 h\)
  4. Give an interpretation of the gradient of this regression line.
    1. Comment on the suitability of Sara's sampling method for this study.
    2. Suggest how Sara could make better use of the large data set for her study.
Edexcel AS Paper 2 Specimen Q3
6 marks Standard +0.3
  1. Pete is investigating the relationship between daily rainfall, \(w \mathrm {~mm}\), and daily mean pressure, \(p\) hPa , in Perth during 2015. He used the large data set to take a sample of size 12.
He obtained the following results.
\(p\)100710121013100910191010101010101013101110141022
\(w\)102.063.063.038.438.035.034.232.030.428.028.015
Pete drew the following scatter diagram for the values of \(w\) and \(p\) and calculated the quartiles.
Q 1Q 2Q 3
\(p\)10101011.51013.5
\(w\)29.234.650.7
\includegraphics[max width=\textwidth, alt={}]{b29b0411-8401-420b-9227-befe25c245d8-04_818_1081_989_477}
An outlier is a value which is more than 1.5 times the interquartile range above Q3 or more than 1.5 times the interquartile range below Q1.
  1. Show that the 3 points circled on the scatter diagram above are outliers.
    (2)
  2. Describe the effect of removing the 3 outliers on the correlation between daily rainfall and daily mean pressure in this sample.
    (1) John has also been studying the large data set and believes that the sample Pete has taken is not random.
  3. From your knowledge of the large data set, explain why Pete's sample is unlikely to be a random sample. John finds that the equation of the regression line of \(w\) on \(p\), using all the data in the large data set, is $$w = 1023 - 0.223 p$$
  4. Give an interpretation of the figure - 0.223 in this regression line. John decided to use the regression line to estimate the daily rainfall for a day in December when the daily mean pressure is 1011 hPa .
  5. Using your knowledge of the large data set, comment on the reliability of John's estimate.
    (Total for Question 3 is 6 marks)
Edexcel Paper 3 2018 June Q4
13 marks Easy -1.3
  1. Charlie is studying the time it takes members of his company to travel to the office. He stands by the door to the office from 0840 to 0850 one morning and asks workers, as they arrive, how long their journey was.
    1. State the sampling method Charlie used.
    2. State and briefly describe an alternative method of non-random sampling Charlie could have used to obtain a sample of 40 workers.
    Taruni decided to ask every member of the company the time, \(x\) minutes, it takes them to travel to the office.
  2. State the data selection process Taruni used. Taruni's results are summarised by the box plot and summary statistics below. \includegraphics[max width=\textwidth, alt={}, center]{65e4b254-fb7b-45c2-9702-32f034018193-10_378_1349_1050_367} $$n = 95 \quad \sum x = 4133 \quad \sum x ^ { 2 } = 202294$$
  3. Write down the interquartile range for these data.
  4. Calculate the mean and the standard deviation for these data.
  5. State, giving a reason, whether you would recommend using the mean and standard deviation or the median and interquartile range to describe these data. Rana and David both work for the company and have both moved house since Taruni collected her data. Rana's journey to work has changed from 75 minutes to 35 minutes and David's journey to work has changed from 60 minutes to 33 minutes. Taruni drew her box plot again and only had to change two values.
  6. Explain which two values Taruni must have changed and whether each of these values has increased or decreased.
Edexcel S1 2024 October Q1
Easy -1.2
  1. The back-to-back stem and leaf diagram on page 3 shows information about the running times of 31 Action films and 31 Comedy films.
    The running times are given to the nearest minute.
    1. Write down the modal running time for these Action films.
    Some of the quartiles for these two distributions are shown in the table below.
    Action filmsComedy films
    Lower quartile121\(a\)
    Median\(b\)117
    Upper quartile138\(c\)
  2. Find the value of \(a\), the value of \(b\) and the value of \(c\)
  3. For these Action films find, to one decimal place,
    1. the mean running time,
    2. the standard deviation of the running times.
      (You may use \(\sum x = 4016\) and \(\sum x ^ { 2 } = 525056\) where \(x\) is the running time, in minutes, of an Action film.) One measure of skewness is found using $$\frac { \text { mean - mode } } { \text { standard deviation } }$$
  4. Evaluate this measure and describe the skewness for the running times of these Action films.
  5. Comment on one difference between the distribution of the running times of these Action films and the distribution of the running times of these Comedy films. State the values of any statistics you have used to support your comment.
    TotalsAction filmsComedy filmsTotals
    (1)092235(5)
    (0)10356689(6)
    (5)986421102467999(8)
    (10)99876543101212466777789(11)
    (8)87775421131(1)
    (7)776643114(0)
    Key: \(0 | 9 | 2\) means 90 minutes for an Action film and 92 minutes for a Comedy film
Pre-U Pre-U 9794/3 2017 June Q1
5 marks Moderate -0.8
1 Levels of nitrogen dioxide in the atmosphere are being monitored at the side of a road in a busy city centre. A sample of 18 measurements taken (in suitable units) is as follows. $$\begin{array} { l l l l l l l l l l l l l l l l l l } 83 & 44 & 95 & 92 & 98 & 63 & 69 & 76 & 19 & 91 & 70 & 91 & 74 & 65 & 62 & 70 & 95 & 108 \end{array}$$
  1. Find the mean and standard deviation of the sample.
  2. Hence identify, with justification, any possible outliers.
Pre-U Pre-U 9794/1 Specimen Q12
6 marks Moderate -0.8
12 A set of data is shown in the table below.
\(x\)012345678
frequency3104320001
  1. Calculate the mean and standard deviation of the data. The value 8 may be regarded as an outlier.
  2. Explain how you would treat this outlier if the data represents
    1. the difference of the scores obtained when throwing a pair of ordinary dice,
    2. the number of thunderstorms per year in Cambridgeshire over a 23-year period.
    3. Without doing any further calculations state what effect, if any, removing the outlier would have on the mean and standard deviation.
CAIE S1 2015 June Q3
6 marks Easy -1.2
\includegraphics{figure_3} In an open-plan office there are 88 computers. The times taken by these 88 computers to access a particular web page are represented in the cumulative frequency diagram.
  1. On graph paper draw a box-and-whisker plot to summarise this information. [4]
An 'outlier' is defined as any data value which is more than 1.5 times the interquartile range above the upper quartile, or more than 1.5 times the interquartile range below the lower quartile.
  1. Show that there are no outliers. [2]
Edexcel S1 2023 June Q3
9 marks Moderate -0.8
Jim records the length, \(l\) mm, of 81 salmon. The data are coded using \(x = l - 600\) and the following summary statistics are obtained. $$n = 81 \quad \sum x = 3711 \quad \sum x^2 = 475181$$
  1. Find the mean length of these salmon. [3]
  2. Find the variance of the lengths of these salmon. [2]
The weight, \(w\) grams, of each of the 81 salmon is recorded to the nearest gram. The recorded results for the 81 salmon are summarised in the box plot below. \includegraphics{figure_2}
  1. Find the maximum number of salmon that have weights in the interval $$4600 < w \leqslant 7700$$ [1]
Raj says that the box plot is incorrect as Jim has not included outliers. For these data an outlier is defined as a value that is more than \(1.5 \times\) IQR above the upper quartile \quad or \quad \(1.5 \times\) IQR below the lower quartile
  1. Show that there are no outliers. [3]