2.02h Recognize outliers

154 questions

Sort by: Default | Easiest first | Hardest first
Edexcel AS Paper 2 2021 November Q2
9 marks Moderate -0.3
  1. The partially completed table and partially completed histogram give information about the ages of passengers on an airline.
There were no passengers aged 90 or over.
Age ( \(x\) years)\(0 \leqslant x < 5\)\(5 \leqslant x < 20\)\(20 \leqslant x < 40\)\(40 \leqslant x < 65\)\(65 \leqslant x < 80\)\(80 \leqslant x < 90\)
Frequency545901
\includegraphics[max width=\textwidth, alt={}, center]{6dfefd72-338f-40be-ac37-aef56bfaccaa-04_1173_1792_721_139}
  1. Complete the histogram.
  2. Use linear interpolation to estimate the median age. An outlier is defined as a value greater than \(Q _ { 3 } + 1.5 \times\) interquartile range.
    Given that \(Q _ { 1 } = 27.3\) and \(Q _ { 3 } = 58.9\)
  3. determine, giving a reason, whether or not the oldest passenger could be considered as an outlier.
    (2)
Edexcel Paper 3 2020 October Q3
10 marks Moderate -0.3
  1. Each member of a group of 27 people was timed when completing a puzzle.
The time taken, \(x\) minutes, for each member of the group was recorded.
These times are summarised in the following box and whisker plot. \includegraphics[max width=\textwidth, alt={}, center]{2b63aa7f-bc50-4422-8dc0-e661b521c221-08_353_1436_458_319}
  1. Find the range of the times.
  2. Find the interquartile range of the times. For these 27 people \(\sum x = 607.5\) and \(\sum x ^ { 2 } = 17623.25\)
  3. calculate the mean time taken to complete the puzzle,
  4. calculate the standard deviation of the times taken to complete the puzzle. Taruni defines an outlier as a value more than 3 standard deviations above the mean.
  5. State how many outliers Taruni would say there are in these data, giving a reason for your answer. Adam and Beth also completed the puzzle in \(a\) minutes and \(b\) minutes respectively, where \(a > b\).
    When their times are included with the data of the other 27 people
    • the median time increases
    • the mean time does not change
    • Suggest a possible value for \(a\) and a possible value for \(b\), explaining how your values satisfy the above conditions.
    • Without carrying out any further calculations, explain why the standard deviation of all 29 times will be lower than your answer to part (d).
OCR MEI AS Paper 2 2023 June Q6
6 marks Easy -1.8
6 An app on my new smartphone records the number of times in a day I use the phone. The data for each day since I bought the phone are shown in the stem and leaf diagram.
19
26
389
40122356799
5122234557899
601139
Key: 3|1 means 31
  1. Explain whether these data are a sample or a population.
  2. Describe the shape of the distribution.
  3. Determine the interquartile range.
  4. Use your answer to part (c) to determine whether there are any outliers in the lower tail.
OCR MEI AS Paper 2 2024 June Q10
6 marks Easy -1.2
10 The pre-release material contains information about the birth rate per 1000 people in different countries of the world. These countries have been classified into different regions. The table shows some data for three of these regions: the mean and standard deviation (sd) of the birth rate per 1000, and the number of countries for which data was used, n. \section*{Birth rate per 1000 by region}
AfricaEuropeOceania
\(n\)554921
mean29.310.017.8
sd8.431.944.50
  1. Use the information in the table to compare and contrast the birth rate per 1000 in Africa with the birth rate per 1000 in Europe.
  2. The birth rate per 1000 in Mauritius, which is in Africa, is recorded as 9.86. Use the information in the table to show that this value is an outlier.
  3. Use your knowledge of the pre-release material to explain whether the value for Mauritius should be discarded.
  4. The pre-release material identifies 27 countries in Oceania. Suggest a reason why only 21 values were used to calculate the mean and standard deviation.
OCR MEI AS Paper 2 2021 November Q7
7 marks Easy -1.2
7 The pre-release material contains information about health expenditure. Fig. 7.1 shows an extract from the data. \begin{table}[h]
CountryHealth expenditure (\% of GDP)
Algeria7.2
Egypt5.6
Libya5
Morocco5.9
Sudan8.4
Tunisia7
Western Sahara\#N/A
Angola3.3
Benin4.6
Botswana5.4
Burkina Faso5
\captionsetup{labelformat=empty} \caption{Fig. 7.1}
\end{table}
  1. Explain how the data should be cleaned before any analysis takes place. Kareem uses all the available data to conduct an investigation into health expenditure as a percentage of GDP in different countries. He calculates the mean to be 6.79 and the standard deviation to be 2.78 . Fig. 7.2 shows the smallest values and the largest values of health expenditure as a percentage of GDP. \begin{table}[h]
    Smallest values of Health expenditure (\% of GDP)Largest values of Health expenditure (\% of GDP)
    1.511.7
    1.911.9
    2.113.7
    13.7
    16.5
    17.1
    17.1
    \captionsetup{labelformat=empty} \caption{Fig. 7.2}
    \end{table}
  2. Determine which of these values are outliers. Kareem removes the outliers from the data and finds that there are 187 values left. He decides to collect a sample of size 30 . He uses the following sampling procedure.
    Assign each value a number from 1 to 187. Generate a random number, \(n\), between 1 and 13 . Starting with the \(n\)th value, choose every 6th value after that until 30 values have been chosen.
  3. Explain whether Kareem is using simple random sampling.
OCR MEI AS Paper 2 Specimen Q7
7 marks Easy -1.2
7 A farmer has 200 apple trees. She is investigating the masses of the crops of apples from individual trees. She decides to select a sample of these trees and find the mass of the crop for each tree.
  1. Explain how she can select a random sample of 10 different trees from the 200 trees. The masses of the crops from the 10 trees, measured in kg, are recorded as follows. \(\begin{array} { l l l l l l l l l l } 23.5 & 27.4 & 26.2 & 29.0 & 25.1 & 27.4 & 26.2 & 28.3 & 38.1 & 24.9 \end{array}\)
  2. For these data find
OCR MEI Paper 2 2018 June Q9
5 marks Easy -1.8
9 At the end of each school term at North End College all the science classes in year 10 are given a test. The marks out of 100 achieved by members of set 1 are shown in Fig. 9. \begin{table}[h]
35
409
5236
601356
701256899
83466889
955567
\captionsetup{labelformat=empty} \caption{Fig. 9}
\end{table} Key \(5 \quad\) 2 represents a mark of 52
  1. Describe the shape of the distribution.
  2. The teacher for set 1 claimed that a typical student in his class achieved a mark of 95. How did he justify this statement?
  3. Another teacher said that the average mark in set 1 is 76 . How did she justify this statement? Benson's mark in the test is 35 . If the mark achieved by any student is an outlier in the lower tail of the distribution, the student is moved down to set 2 .
  4. Determine whether Benson is moved down to set 2 .
OCR MEI Paper 2 2024 June Q14
8 marks Moderate -0.8
14 The pre-release material contains medical data for 103 women and 97 men.
The boxplot represents the weights in kg of 101 of the women from the pre-release material. \includegraphics[max width=\textwidth, alt={}, center]{8e48bbd3-2166-49e7-8906-833261f331ca-09_421_1232_735_244}
  1. Use your knowledge of the pre-release material to give a reason why the weights of all 103 women were not included in the diagram.
  2. Determine the range of values in which any outliers lie.
  3. Use your knowledge of the pre-release material to explain whether these outliers should be removed from any further analysis of the data.
  4. The median weight of men in the sample was found to be 79.9 kg . Explain what may be inferred by comparing the median weight of men with the median weight of women. Further analysis of the weights of both men and women is carried out. The table shows some of the results.
    meanstandard deviation
    men82.69 kg19.98 kg
    women72.5 kg19.95 kg
  5. Use the information in the table to make two inferences about the distribution of the weights of men compared with the distribution of the weights of women.
OCR MEI Paper 2 2021 November Q10
9 marks Moderate -0.8
10 Ben has an interest in birdwatching. For many years he has identified, at the start of the year, 32 days on which he will spend an hour counting the number of birds he sees in his garden. He divides the year into four using the Meteorological Office definition of seasons. Each year he uses stratified sampling to identify the 32 days on which he will count the birds in his garden, drawn equally from the four seasons. Ben's data for 2019 are shown in the stem and leaf diagram in Fig. 10.1. \begin{table}[h]
035999
100112456789
20146789
30023
4036
51
60
\captionsetup{labelformat=empty} \caption{Fig. 10.1}
\end{table}
  1. Suggest a reason why Ben chose to use stratified sampling instead of simple random sampling.
  2. Describe the shape of the distribution.
  3. Explain why the mode is not a useful measure of central tendency in this case.
  4. For Ben's sample, determine
    Ben found a boxplot for the sample of size 32 he collected using stratified sampling in 2015. The boxplot is shown in Fig. 10.2. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c9d14a4d-a1c8-42ad-9c0b-42cef6b3612f-06_483_1163_1982_242} \captionsetup{labelformat=empty} \caption{Fig. 10.2}
    \end{figure} In 2016 Ben replaced his hedge with a garden fence.
    Ben now believes that
    Jane says she can tell that the data for 2015 is definitely uniformly distributed by looking at the boxplot.
  5. Explain why Jane is wrong.
Edexcel S1 2018 June Q2
11 marks Easy -1.3
2. Two youth clubs, Eastyou and Westyou, decided to raise money for charity by running a 5 km race. All the members of the youth clubs took part and the time, in minutes, taken for each member to run the 5 km was recorded. The times for the Westyou members are summarised in Figure 1. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-06_349_1378_497_274} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure}
  1. Write down the time that is exceeded by \(75 \%\) of Westyou members. The times for the Eastyou members are summarised by the stem and leaf diagram below.
    StemLeaf
    20234\(( 4 )\)
    25688899
    300000111222234\(( 14 )\)
    355579\(( 5 )\)
    Key: 2|0 means 20 minutes
  2. Find the value of the median and interquartile range for the Eastyou members. An outlier is a value that falls either
  3. On the grid on page 7, draw a box plot to represent the times of the Eastyou members.
  4. State the skewness of each distribution. Give reasons for your answers. $$\begin{aligned} & \text { more than } 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { above } Q _ { 3 } \\ & \text { or more than } 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { below } Q _ { 1 } \end{aligned}$$
    \includegraphics[max width=\textwidth, alt={}]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-06_2255_50_314_1976}
    \includegraphics[max width=\textwidth, alt={}, center]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-07_406_1390_2224_262} Turn over for a spare grid if you need to redraw your box plot. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Only use this grid if you need to redraw your box plot.} \includegraphics[alt={},max width=\textwidth]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-09_401_1399_2261_258}
    \end{figure}
Edexcel S1 2019 June Q2
13 marks Easy -1.2
2. Chi wanted to summarise the scores of the 39 competitors in a village quiz. He started to produce the following stem and leaf diagram Key: 2|5 is a score of 25 \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Score}
11589
202589
3355789\(\ldots\)
\end{table} He did not complete the stem and leaf diagram but instead produced the following box plot. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-04_357_1237_772_356} Chi defined an outlier as a value that is $$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$ or
less than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\)
  1. Find
    1. the interquartile range
    2. the range.
  2. Describe, giving a reason, the skewness of the distribution of scores. Albert and Beth asked for their scores to be checked.
    Albert's score was changed from 25 to 37
    Beth's score was changed from 54 to 60
  3. On the grid on page 5, draw an updated box plot. Show clearly any calculations that you used. Some of the competitors complained that the questions were biased towards the younger generation. The product moment correlation coefficient between the age of the competitors and their score in the quiz is - 0.187
  4. State, giving a reason, whether or not the complaint is supported by this statistic. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-05_360_1242_2238_351} Turn over for a spare grid if you need to redraw your box plot. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-07_367_1246_2261_351}
Edexcel S1 2020 June Q4
14 marks Moderate -0.3
4. A group of students took some tests. A teacher is analysing the average mark for each student. Each student obtained a different average mark. For these average marks, the lower quartile is 24 , the median is 30 and the interquartile range (IQR) is 10
The three lowest average marks are 8, 10 and 15.5 and the three highest average marks are 45, 52.5 and 56 The teacher defines an outlier to be a value that is either
more than \(1.5 \times\) IQR below the lower quartile or
more than \(1.5 \times\) IQR above the upper quartile
  1. Determine any outliers in these data.
  2. On the grid below draw a box plot for these data, indicating clearly any outliers. \includegraphics[max width=\textwidth, alt={}, center]{81d5e460-9559-4d25-aa08-6440559aec83-12_350_1223_1128_370}
  3. Use the quartiles to describe the skewness of these data. Give a reason for your answer. Two more students also took the tests. Their average marks, which were both less than 45, are added to the data and the box plot redrawn. The median and the upper quartile are the same but the lower quartile is now 26
  4. Redraw the box plot on the grid below.
    (3) \includegraphics[max width=\textwidth, alt={}, center]{81d5e460-9559-4d25-aa08-6440559aec83-12_350_1221_2106_367}
  5. Give ranges of values within which each of these students' average marks must lie. Turn over for spare grids if you need to redraw your answer for part (b) or part (d).
    VIXV SIHIANI III IM IONOOVIAV SIHI NI JYHAM ION OOVI4V SIHI NI JLIYM ION OO
    \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Copy of grid for part (b)} \includegraphics[alt={},max width=\textwidth]{81d5e460-9559-4d25-aa08-6440559aec83-15_356_1226_1726_367}
    \end{figure} \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Copy of grid for part (d)} \includegraphics[alt={},max width=\textwidth]{81d5e460-9559-4d25-aa08-6440559aec83-15_353_1226_2240_367}
    \end{figure}
Edexcel S1 2022 June Q1
11 marks Easy -1.2
  1. The company Seafield requires contractors to record the number of hours they work each week. A random sample of 38 weeks is taken and the number of hours worked per week by contractor Kiana is summarised in the stem and leaf diagram below.
StemLeaf
144455566999(11)
212233444\(w\)9(10)
32344567779(10)
41123(4)
519(2)
64(1)
Key : 3|2 means 32 The quartiles for this distribution are summarised in the table below.
\(Q _ { 1 }\)\(Q _ { 2 }\)\(Q _ { 3 }\)
\(x\)26\(y\)
  1. Find the values of \(w , x\) and \(y\) Kiana is looking for outliers in the data. She decides to classify as outliers any observations greater than $$Q _ { 3 } + 1.0 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  2. Showing your working clearly, identify any outliers that Kiana finds.
  3. Draw a box plot for these data in the space provided on the grid opposite.
  4. Use the formula $$\text { skewness } = \frac { \left( Q _ { 3 } - Q _ { 2 } \right) - \left( Q _ { 2 } - Q _ { 1 } \right) } { \left( Q _ { 3 } - Q _ { 1 } \right) }$$ to find the skewness of these data. Give your answer to 2 significant figures. Kiana's new employer, Landacre, wishes to know the average number of hours per week she worked during her employment at Seafield to help calculate the cost of employing her.
  5. Explain why Landacre might prefer to know Kiana's mean, rather than median, number of hours worked per week. Turn over for a spare grid if you need to redraw your box plot.
Edexcel S1 2024 June Q1
13 marks Easy -1.2
  1. A researcher is investigating the growth of two types of tree, Birch and Maple. The height, to the nearest cm, a seedling grows in one year is recorded for 35 Birch trees and 32 Maple trees. The results are summarised in the back-to-back stem and leaf diagram below.
TotalsBirchMapleTotals
(2)98257789(5)
(8)9996531130266899(7)
(9)9887631114\(111 \boldsymbol { k } 78\)(6)
(9)77754321050123444(7)
(3)7656346(3)
(3)654707(2)
(1)5800(2)
Key: 5 | 6 | 3 means 65 cm for a Birch tree and 63 cm for a Maple tree
The median height that these Maple trees grow in one year is 45 cm .
  1. Find the value of \(\boldsymbol { k }\), used in the stem and leaf diagram.
  2. Find the lower quartile and the upper quartile of the height grown in one year for these Birch trees. The researcher defines an outlier as an observation that is $$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { or less than } Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  3. Show that there is only one outlier amongst the Birch trees. The grid on page 3 shows a box plot for the heights that the Maple trees grow in one year.
  4. On the same grid draw a box plot for the heights that the Birch trees grow in one year.
  5. Comment on any difference in the distributions of the growth of these Birch trees and the growth of these Maple trees.
    State the values of any statistics you have used to support your comment. The researcher realises he has missed out 4 pieces of data for the Maple trees. The heights each seedling grows in one year, to the nearest cm, in ascending order, for these 4 Maple trees are \(27 \mathrm {~cm} , a \mathrm {~cm} , 48 \mathrm {~cm} , 2 a \mathrm {~cm}\). Given that there is no change to the box plot for the Maple trees given on page 3
  6. find the range of possible values for \(a\) Show your working clearly.
    \includegraphics[max width=\textwidth, alt={}]{ee0c7c12-84f3-479c-b36a-3357f8529a1c-03_1243_1659_1464_210}
    Only use this grid if you need to redraw your answer for part (d) \includegraphics[max width=\textwidth, alt={}, center]{ee0c7c12-84f3-479c-b36a-3357f8529a1c-05_1154_1643_1503_217}
    (Total for Question 1 is 13 marks)
Edexcel S1 2016 October Q6
17 marks Easy -1.2
  1. The stem and leaf diagram gives the blood pressure, \(x \mathrm { mmHg }\), for a random sample of 19 female patients.
1012
1127788
12022344557
13129
Key: 10 | 1 means blood pressure of 101 mmHg
  1. Find the median and the quartiles for these data.
  2. Find the interquartile range ( \(Q _ { 3 } - Q _ { 1 }\) ) An outlier is a value that is greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or less than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\)
  3. Showing your working clearly, identify any outliers for these data.
  4. On the grid on page 21 draw a box and whisker plot to represent these data. Show any outliers clearly. The above data can be summarised by $$\sum x = 2299 \text { and } \sum x ^ { 2 } = 279709$$
  5. Calculate the mean and the standard deviation for these data. For a random sample taken from a normal distribution, a rule for determining outliers is: an outlier is more than \(2.7 \times\) standard deviation above or below the mean.
  6. Find the limits to determine outliers using this rule.
  7. State, giving a reason based on some of the above calculations, whether or not a normal distribution is a suitable model for these data. \includegraphics[max width=\textwidth, alt={}, center]{8ff7539e-fa44-4388-af8c-80656f081528-21_2281_73_308_15}
    Turn over for a spare diagram if you need to redraw your plot.
    \includegraphics[max width=\textwidth, alt={}]{8ff7539e-fa44-4388-af8c-80656f081528-24_2639_1830_121_121}
Edexcel S1 2018 October Q2
11 marks Easy -1.3
  1. The weights, to the nearest kilogram, of a sample of 33 female spotted hyenas living in the Serengeti are summarised in the stem and leaf diagram below.
\begin{table}[h]
\captionsetup{labelformat=empty} \caption{Weight (kg)}
3237
413345569
5122344555788999
6233
7147
84
\end{table} Totals
  1. Find the median and quartiles for the weights of the female spotted hyenas. An outlier is defined as any value greater than \(c\) or any value less than \(d\) where $$\begin{aligned} & c = Q _ { 3 } + 1.5 \left( Q _ { 3 } - Q _ { 1 } \right) \\ & d = Q _ { 1 } - 1.5 \left( Q _ { 3 } - Q _ { 1 } \right) \end{aligned}$$
  2. Showing your working clearly, identify any outliers for these data.
    (3) The weights, to the nearest kilogram, of a sample of male spotted hyenas living in the Serengeti are summarised below. \includegraphics[max width=\textwidth, alt={}, center]{0377c6e9-ab4f-477d-9236-0732fe81f25e-06_755_1568_1537_185}
  3. In the space provided in the grid above, draw a box and whisker plot to represent the weights of female spotted hyenas living in the Serengeti. Indicate clearly any outliers. (A copy of this grid is on page 9 if you need to redraw your box and whisker plot.)
  4. Compare the weights of male and female spotted hyenas living in the Serengeti. Key: 3|2 means 32
    \includegraphics[max width=\textwidth, alt={}, center]{0377c6e9-ab4f-477d-9236-0732fe81f25e-09_2658_101_107_9}
Edexcel S1 2023 October Q2
13 marks Easy -1.2
  1. The weights, to the nearest kilogram, of a sample of 33 red kangaroos taken in December are summarised in the stem and leaf diagram below.
Weight (kg)Totals
16(1)
236(2)
3246(3)
42556678(7)
534777899(8)
6022338(7)
728(2)
826(2)
94(1)
Key: 3 | 2 represents 32 kg
  1. Find
    1. the value of the median
    2. the value of \(Q _ { 1 }\) and the value of \(Q _ { 3 }\) for the weights of these red kangaroos. For these data an outlier is defined as a value that is
      greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or smaller than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\)
  2. Show that there are 2 outliers for these data. Figure 1 on page 7 shows a box plot for the weights of the same 33 red kangaroos taken in February, earlier in the year.
  3. In the space on Figure 1, draw a box plot to represent the weights of these red kangaroos in December.
  4. Compare the distribution of the weights of red kangaroos taken in February with the distribution of the weights of red kangaroos taken in December of the same year. You should interpret your comparisons in the context of the question.
    \includegraphics[max width=\textwidth, alt={}]{f94b29e0-081f-45e8-99a7-ac835eec91e5-07_2267_51_307_36}
    \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{f94b29e0-081f-45e8-99a7-ac835eec91e5-07_766_1803_1777_132} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} Turn over for a spare grid if you need to redraw your box plot. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{f94b29e0-081f-45e8-99a7-ac835eec91e5-09_901_1833_1653_114} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} \begin{verbatim} (Total for Question 2 is 13 marks) \end{verbatim}
Edexcel S1 2018 Specimen Q4
12 marks Moderate -0.8
  1. A researcher recorded the time, \(t\) minutes, spent using a mobile phone during a particular afternoon, for each child in a club.
The researcher coded the data using \(v = \frac { t - 5 } { 10 }\) and the results are summarised in the table below.
Coded Time (v)Frequency ( \(\boldsymbol { f }\) )Coded Time Midpoint (m)
\(0 \leqslant v < 5\)202.5
\(5 \leqslant v < 10\)24\(a\)
\(10 \leqslant v < 15\)1612.5
\(15 \leqslant v < 20\)1417.5
\(20 \leqslant v < 30\)6\(b\)
$$\text { (You may use } \sum f m = 825 \text { and } \sum f m ^ { 2 } = 12012.5 \text { ) }$$
  1. Write down the value of \(a\) and the value of \(b\).
  2. Calculate an estimate of the mean of \(v\).
  3. Calculate an estimate of the standard deviation of \(v\).
  4. Use linear interpolation to estimate the median of \(v\).
  5. Hence describe the skewness of the distribution. Give a reason for your answer.
  6. Calculate estimates of the mean and the standard deviation of the time spent using a mobile phone during the afternoon by the children in this club. \(\_\_\_\_\) VAYV SIHI NI JIIIM ION OC
    VJYV SIHI NI JIIIM ION OC
    VJYV SIHI NI JLIYM ION OC
Edexcel S1 Specimen Q5
14 marks Moderate -0.3
  1. A teacher selects a random sample of 56 students and records, to the nearest hour, the time spent watching television in a particular week.
Hours\(1 - 10\)\(11 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 40\)\(41 - 59\)
Frequency615111383
Mid-point5.515.52850
  1. Find the mid-points of the 21-25 hour and 31-40 hour groups. A histogram was drawn to represent these data. The 11-20 group was represented by a bar of width 4 cm and height 6 cm .
  2. Find the width and height of the 26-30 group.
  3. Estimate the mean and standard deviation of the time spent watching television by these students.
  4. Use linear interpolation to estimate the median length of time spent watching television by these students. The teacher estimated the lower quartile and the upper quartile of the time spent watching television to be 15.8 and 29.3 respectively.
  5. State, giving a reason, the skewness of these data.
Edexcel S1 Specimen Q7
12 marks Moderate -0.3
  1. The distances travelled to work, \(D \mathrm {~km}\), by the employees at a large company are normally distributed with \(D \sim \mathrm {~N} \left( 30,8 ^ { 2 } \right)\).
    1. Find the probability that a randomly selected employee has a journey to work of more than 20 km .
    2. Find the upper quartile, \(Q _ { 3 }\), of \(D\).
    3. Write down the lower quartile, \(Q _ { 1 }\), of \(D\).
    An outlier is defined as any value of \(D\) such that \(D < h\) or \(D > k\) where $$h = Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \quad \text { and } \quad k = Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  2. Find the value of \(h\) and the value of \(k\). An employee is selected at random.
  3. Find the probability that the distance travelled to work by this employee is an outlier.
    END
Edexcel S1 2001 January Q1
7 marks Easy -1.3
  1. The students in a class were each asked to write down how many CDs they owned. The student with the least number of CDs had 14 and all but one of the others owned 60 or fewer. The remaining student owned 65 . The quartiles for the class were 30,34 and 42 respectively.
Outliers are defined to be any values outside the limits of \(1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) below the lower quartile or above the upper quartile. On graph paper draw a box plot to represent these data, indicating clearly any outliers.
marks)
Edexcel S1 2001 January Q5
17 marks Moderate -0.3
5. The following grouped frequency distribution summarises the number of minutes, to the nearest minute, that a random sample of 200 motorists were delayed by roadworks on a stretch of motorway.
Delay (mins)Number of motorists
\(4 - 6\)15
\(7 - 8\)28
949
1053
\(11 - 12\)30
\(13 - 15\)15
\(16 - 20\)10
  1. Using graph paper represent these data by a histogram.
  2. Give a reason to justify the use of a histogram to represent these data.
  3. Use interpolation to estimate the median of this distribution.
  4. Calculate an estimate of the mean and an estimate of the standard deviation of these data. One coefficient of skewness is given by $$\frac { 3 ( \text { mean - median } ) } { \text { standard deviation } } .$$
  5. Evaluate this coefficient for the above data.
  6. Explain why the normal distribution may not be suitable to model the number of minutes that motorists are delayed by these roadworks.
Edexcel S1 2003 January Q4
16 marks Easy -1.2
4. A restaurant owner is concerned about the amount of time customers have to wait before being served. He collects data on the waiting times, to the nearest minute, of 20 customers. These data are listed below.
15,14,16,15,17,16,15,14,15,16,
17,16,15,14,16,17,15,25,18,16
  1. Find the median and inter-quartile range of the waiting times. An outlier is an observation that falls either \(1.5 \times\) (inter-quartile range) above the upper quartile or \(1.5 \times\) (inter-quartile range) below the lower quartile.
  2. Draw a boxplot to represent these data, clearly indicating any outliers.
  3. Find the mean of these data.
  4. Comment on the skewness of these data. Justify your answer.
Edexcel S1 2008 January Q2
14 marks Easy -1.3
2. Cotinine is a chemical that is made by the body from nicotine which is found in cigarette smoke. A doctor tested the blood of 12 patients, who claimed to smoke a packet of cigarettes a day, for cotinine. The results, in appropriate units, are shown below.
Patient\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)\(I\)\(J\)\(K\)\(L\)
Cotinine
level, \(X\)
160390169175125420171250210258186243
$$\text { [You may use } \sum x ^ { 2 } = 724 \text { 961] }$$
  1. Find the mean and standard deviation of the level of cotinine in a patient's blood.
  2. Find the median, upper and lower quartiles of these data. A doctor suspects that some of his patients have been smoking more than a packet of cigarettes per day. He decides to use \(\mathrm { Q } _ { 3 } + 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) to determine if any of the cotinine results are far enough away from the upper quartile to be outliers.
  3. Identify which patient(s) may have been smoking more than a packet of cigarettes a day. Show your working clearly. Research suggests that cotinine levels in the blood form a skewed distribution.
    One measure of skewness is found using \(\frac { \left( Q _ { 1 } - 2 Q _ { 2 } + Q _ { 3 } \right) } { \left( Q _ { 3 } - Q _ { 1 } \right) }\).
  4. Evaluate this measure and describe the skewness of these data.
Edexcel S1 2009 January Q4
14 marks Moderate -0.8
4. In a study of how students use their mobile telephones, the phone usage of a random sample of 11 students was examined for a particular week. The total length of calls, \(y\) minutes, for the 11 students were $$17,23,35,36,51,53,54,55,60,77,110$$
  1. Find the median and quartiles for these data. A value that is greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or smaller than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) is defined as an outlier.
  2. Show that 110 is the only outlier.
  3. Using the graph paper on page 15 draw a box plot for these data indicating clearly the position of the outlier. The value of 110 is omitted.
  4. Show that \(S _ { y y }\) for the remaining 10 students is 2966.9 These 10 students were each asked how many text messages, \(x\), they sent in the same week. The values of \(S _ { x x }\) and \(S _ { x y }\) for these 10 students are \(S _ { x x } = 3463.6\) and \(S _ { x y } = - 18.3\).
  5. Calculate the product moment correlation coefficient between the number of text messages sent and the total length of calls for these 10 students. A parent believes that a student who sends a large number of text messages will spend fewer minutes on calls.
  6. Comment on this belief in the light of your calculation in part (e). \includegraphics[max width=\textwidth, alt={}, center]{d5d000c7-de42-461a-ba05-6c8b2c333780-09_611_1593_297_178}