2.02f Measures of average and spread

447 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI Paper 2 2023 June Q8
6 marks Easy -1.2
8 A garden centre stocks coniferous hedging plants. These are displayed in 10 rows, each of 120 plants. An employee collects a sample of the heights of these plants by recording the height of each plant on the front row of the display.
  1. Explain whether the data collected by the employee is a simple random sample. The data are shown in the cumulative frequency curve below. \includegraphics[max width=\textwidth, alt={}, center]{11788aaf-98fb-4a78-8a40-a40743b1fe15-06_1376_1344_680_233} The owner states that at least \(75 \%\) of the plants are between 40 cm and 80 cm tall.
  2. Show that the data collected by the employee supports this statement.
  3. Explain whether all samples of 120 plants would necessarily support the owner's statement.
OCR MEI Paper 2 2023 June Q18
11 marks Standard +0.3
18 Riley is investigating the daily water consumption, in litres, of his household.
He records the amount used for a random sample of 120 days from the previous twelve-month period. The daily water consumption, in litres, is denoted by \(x\). Summary statistics for Riley's sample are given below. \(\sum \mathrm { x } = 31164.7 \sum \mathrm { x } ^ { 2 } = 8101050.91 \mathrm { n } = 120\)
  1. Calculate the sample mean giving your answer correct to \(\mathbf { 3 }\) significant figures. Riley displays the data in a histogram. \includegraphics[max width=\textwidth, alt={}, center]{11788aaf-98fb-4a78-8a40-a40743b1fe15-13_832_1383_934_242}
  2. Find the number of days on which between 255 and 260 litres were used.
  3. Give two reasons why a Normal distribution may be an appropriate model for the daily consumption of water. Riley uses the sample mean and the sample variance, both correct to \(\mathbf { 3 }\) significant figures, as parameters of a Normal distribution to model the daily consumption of water.
  4. Use Riley's model to calculate the probability that on a randomly chosen day the household uses less than 255 litres of water.
  5. Calculate the probability that the household uses less than 255 litres of water on at least 5 days out of a random sample of 28 days. The company which supplies the water makes charges relating to water consumption which are shown in the table below.
    Standing charge per day in pence7.8
    Charge per litre in pence0.18
  6. Adapt Riley's model for daily water consumption to model the daily charges for water consumption. \section*{END OF QUESTION PAPER}
OCR MEI Paper 2 2024 June Q14
8 marks Moderate -0.8
14 The pre-release material contains medical data for 103 women and 97 men.
The boxplot represents the weights in kg of 101 of the women from the pre-release material. \includegraphics[max width=\textwidth, alt={}, center]{8e48bbd3-2166-49e7-8906-833261f331ca-09_421_1232_735_244}
  1. Use your knowledge of the pre-release material to give a reason why the weights of all 103 women were not included in the diagram.
  2. Determine the range of values in which any outliers lie.
  3. Use your knowledge of the pre-release material to explain whether these outliers should be removed from any further analysis of the data.
  4. The median weight of men in the sample was found to be 79.9 kg . Explain what may be inferred by comparing the median weight of men with the median weight of women. Further analysis of the weights of both men and women is carried out. The table shows some of the results.
    meanstandard deviation
    men82.69 kg19.98 kg
    women72.5 kg19.95 kg
  5. Use the information in the table to make two inferences about the distribution of the weights of men compared with the distribution of the weights of women.
OCR MEI Paper 2 2021 November Q10
9 marks Moderate -0.8
10 Ben has an interest in birdwatching. For many years he has identified, at the start of the year, 32 days on which he will spend an hour counting the number of birds he sees in his garden. He divides the year into four using the Meteorological Office definition of seasons. Each year he uses stratified sampling to identify the 32 days on which he will count the birds in his garden, drawn equally from the four seasons. Ben's data for 2019 are shown in the stem and leaf diagram in Fig. 10.1. \begin{table}[h]
035999
100112456789
20146789
30023
4036
51
60
\captionsetup{labelformat=empty} \caption{Fig. 10.1}
\end{table}
  1. Suggest a reason why Ben chose to use stratified sampling instead of simple random sampling.
  2. Describe the shape of the distribution.
  3. Explain why the mode is not a useful measure of central tendency in this case.
  4. For Ben's sample, determine
    Ben found a boxplot for the sample of size 32 he collected using stratified sampling in 2015. The boxplot is shown in Fig. 10.2. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c9d14a4d-a1c8-42ad-9c0b-42cef6b3612f-06_483_1163_1982_242} \captionsetup{labelformat=empty} \caption{Fig. 10.2}
    \end{figure} In 2016 Ben replaced his hedge with a garden fence.
    Ben now believes that
    Jane says she can tell that the data for 2015 is definitely uniformly distributed by looking at the boxplot.
  5. Explain why Jane is wrong.
AQA Further AS Paper 2 Statistics 2021 June Q5
5 marks Easy -1.2
5 In a game it is known that:
  • 25\% of players score 0
  • 30\% of players score 5
  • 35\% of players score 10
  • 10\% of players score 20
Players receive prize money, in pounds, equal to 100 times their score.
5
  1. State the modal score.
    [0pt] [1 mark] 5
  2. Find the median score.
    5
  3. Find the mean prize money received by a player.
Edexcel S1 2016 June Q4
12 marks Moderate -0.8
4. A researcher recorded the time, \(t\) minutes, spent using a mobile phone during a particular afternoon, for each child in a club. The researcher coded the data using \(v = \frac { t - 5 } { 10 }\) and the results are summarised in the table below.
Coded Time (v)Frequency ( \(\boldsymbol { f }\) )Coded Time Midpoint (m)
\(0 \leqslant v < 5\)202.5
\(5 \leqslant v < 10\)24\(a\)
\(10 \leqslant v < 15\)1612.5
\(15 \leqslant v < 20\)1417.5
\(20 \leqslant v < 30\)6\(b\)
$$\text { (You may use } \sum f m = 825 \text { and } \sum f m ^ { 2 } = 12012.5 \text { ) }$$
  1. Write down the value of \(a\) and the value of \(b\).
  2. Calculate an estimate of the mean of \(v\).
  3. Calculate an estimate of the standard deviation of \(v\).
  4. Use linear interpolation to estimate the median of \(v\).
  5. Hence describe the skewness of the distribution. Give a reason for your answer.
  6. Calculate estimates of the mean and the standard deviation of the time spent using a mobile phone during the afternoon by the children in this club.
Edexcel S1 2018 June Q2
11 marks Easy -1.3
2. Two youth clubs, Eastyou and Westyou, decided to raise money for charity by running a 5 km race. All the members of the youth clubs took part and the time, in minutes, taken for each member to run the 5 km was recorded. The times for the Westyou members are summarised in Figure 1. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-06_349_1378_497_274} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure}
  1. Write down the time that is exceeded by \(75 \%\) of Westyou members. The times for the Eastyou members are summarised by the stem and leaf diagram below.
    StemLeaf
    20234\(( 4 )\)
    25688899
    300000111222234\(( 14 )\)
    355579\(( 5 )\)
    Key: 2|0 means 20 minutes
  2. Find the value of the median and interquartile range for the Eastyou members. An outlier is a value that falls either
  3. On the grid on page 7, draw a box plot to represent the times of the Eastyou members.
  4. State the skewness of each distribution. Give reasons for your answers. $$\begin{aligned} & \text { more than } 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { above } Q _ { 3 } \\ & \text { or more than } 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { below } Q _ { 1 } \end{aligned}$$
    \includegraphics[max width=\textwidth, alt={}]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-06_2255_50_314_1976}
    \includegraphics[max width=\textwidth, alt={}, center]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-07_406_1390_2224_262} Turn over for a spare grid if you need to redraw your box plot. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Only use this grid if you need to redraw your box plot.} \includegraphics[alt={},max width=\textwidth]{b115bffa-1190-4a2b-b6f2-b006580e8dbd-09_401_1399_2261_258}
    \end{figure}
Edexcel S1 2018 June Q5
13 marks Moderate -0.8
5. The weights, in grams, of a random sample of 48 broad beans are summarised in the table.
Weight in grams ( \(\boldsymbol { x }\) )Frequency (f)Class midpoint (y)
\(0.9 < x \leqslant 1.1\)91.0
\(1.1 < x \leqslant 1.3\)121.2
\(1.3 < x \leqslant 1.5\)111.4
\(1.5 < x \leqslant 1.7\)81.6
\(1.7 < x \leqslant 1.9\)31.8
\(1.9 < x \leqslant 2.1\)32.0
\(2.1 < x \leqslant 2.7\)22.4
(You may assume \(\sum \mathrm { fy } { } ^ { 2 } = 101.56\) ) A histogram was drawn to represent these data. The \(2.1 < x \leqslant 2.7\) class was represented by a bar of width 1.5 cm and height 1 cm .
  1. Find the width and height of the \(0.9 < x \leqslant 1.1\) class.
  2. Give a reason to justify the use of a histogram to represent these data.
  3. Estimate the mean and the standard deviation of the weights of these broad beans.
  4. Use linear interpolation to estimate the median of the weights of these broad beans. One of these broad beans is selected at random.
  5. Estimate the probability that its weight lies between 1.1 grams and 1.6 grams. One of these broad beans having a recorded weight of 0.95 grams was incorrectly weighed. The correct weight is 1.4 grams.
  6. State, giving a reason, the effect this would have on your answers to part (c). Do not carry out any further calculations.
Edexcel S1 2019 June Q1
9 marks Easy -1.2
  1. The heights, \(x\) metres, of 40 children were recorded by a teacher. The results are summarised as follows
$$\sum x = 58 \quad \sum x ^ { 2 } = 84.829$$
  1. Find the mean and the variance of the heights of these 40 children. The teacher decided that these statistics would be more useful in centimetres.
  2. Find
    1. the mean of these heights in centimetres,
    2. the standard deviation of these heights in centimetres. Two more children join the group. Their heights are 130 cm and 160 cm .
    1. State, giving a reason, the mean height of the 42 children.
    2. Without recalculating the standard deviation, state, giving a reason, whether the standard deviation of the heights of the 42 children will be greater than, less than or the same as the standard deviation of the heights of the group of 40 children.
Edexcel S1 2019 June Q2
13 marks Easy -1.2
2. Chi wanted to summarise the scores of the 39 competitors in a village quiz. He started to produce the following stem and leaf diagram Key: 2|5 is a score of 25 \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Score}
11589
202589
3355789\(\ldots\)
\end{table} He did not complete the stem and leaf diagram but instead produced the following box plot. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-04_357_1237_772_356} Chi defined an outlier as a value that is $$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$ or
less than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\)
  1. Find
    1. the interquartile range
    2. the range.
  2. Describe, giving a reason, the skewness of the distribution of scores. Albert and Beth asked for their scores to be checked.
    Albert's score was changed from 25 to 37
    Beth's score was changed from 54 to 60
  3. On the grid on page 5, draw an updated box plot. Show clearly any calculations that you used. Some of the competitors complained that the questions were biased towards the younger generation. The product moment correlation coefficient between the age of the competitors and their score in the quiz is - 0.187
  4. State, giving a reason, whether or not the complaint is supported by this statistic. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-05_360_1242_2238_351} Turn over for a spare grid if you need to redraw your box plot. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-07_367_1246_2261_351}
Edexcel S1 2021 June Q3
14 marks Moderate -0.8
  1. A random sample of 100 carrots is taken from a farm and their lengths, \(L \mathrm {~cm}\), recorded. The data are summarised in the following table.
Length, \(L\) cmFrequency, fClass mid point, \(\boldsymbol { x } \mathbf { c m }\)
\(5 \leqslant L < 8\)56.5
\(8 \leqslant L < 10\)139
\(10 \leqslant L < 12\)1611
\(12 \leqslant L < 15\)2513.5
\(15 \leqslant L < 20\)3017.5
\(20 \leqslant L < 28\)1124
A histogram is drawn to represent these data.
The bar representing the class \(5 \leqslant L < 8\) is 1.5 cm wide and 1 cm high.
  1. Find the width and height of the bar representing the class \(15 \leqslant L < 20\)
  2. Use linear interpolation to estimate the median length of these carrots.
  3. Estimate
    1. the mean length of these carrots,
    2. the standard deviation of the lengths of these carrots. A supermarket will only buy carrots with length between 9 cm and 22 cm .
  4. Estimate the proportion of carrots from the farm that the supermarket will buy. Any carrots that the supermarket does not buy are sold as animal feed. The farm makes a profit of 2.2 pence on each carrot sold to the supermarket, a profit of 0.8 pence on each carrot longer than 22 cm and a loss of 1.2 pence on each carrot shorter than 9 cm .
  5. Find an estimate of the mean profit per carrot made by the farm.
Edexcel S1 2022 June Q1
11 marks Easy -1.2
  1. The company Seafield requires contractors to record the number of hours they work each week. A random sample of 38 weeks is taken and the number of hours worked per week by contractor Kiana is summarised in the stem and leaf diagram below.
StemLeaf
144455566999(11)
212233444\(w\)9(10)
32344567779(10)
41123(4)
519(2)
64(1)
Key : 3|2 means 32 The quartiles for this distribution are summarised in the table below.
\(Q _ { 1 }\)\(Q _ { 2 }\)\(Q _ { 3 }\)
\(x\)26\(y\)
  1. Find the values of \(w , x\) and \(y\) Kiana is looking for outliers in the data. She decides to classify as outliers any observations greater than $$Q _ { 3 } + 1.0 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  2. Showing your working clearly, identify any outliers that Kiana finds.
  3. Draw a box plot for these data in the space provided on the grid opposite.
  4. Use the formula $$\text { skewness } = \frac { \left( Q _ { 3 } - Q _ { 2 } \right) - \left( Q _ { 2 } - Q _ { 1 } \right) } { \left( Q _ { 3 } - Q _ { 1 } \right) }$$ to find the skewness of these data. Give your answer to 2 significant figures. Kiana's new employer, Landacre, wishes to know the average number of hours per week she worked during her employment at Seafield to help calculate the cost of employing her.
  5. Explain why Landacre might prefer to know Kiana's mean, rather than median, number of hours worked per week. Turn over for a spare grid if you need to redraw your box plot.
Edexcel S1 2022 June Q3
14 marks Moderate -0.3
  1. Gill buys a bag of logs to use in her stove. The lengths, \(l \mathrm {~cm}\), of the 88 logs in the bag are summarised in the table below.
Length \(( \boldsymbol { l } )\)Frequency \(( \boldsymbol { f } )\)
\(15 < l \leqslant 20\)19
\(20 < l \leqslant 25\)35
\(25 < l \leqslant 27\)16
\(27 < l \leqslant 30\)15
\(30 < l \leqslant 40\)3
A histogram is drawn to represent these data.
The bar representing logs with length \(27 < l \leqslant 30\) has a width of 1.5 cm and a height of 4 cm .
  1. Calculate the width and height of the bar representing log lengths of \(20 < l \leqslant 25\)
  2. Use linear interpolation to estimate the median of \(l\) The maximum length of log Gill can use in her stove is 26 cm .
    Gill estimates, using linear interpolation, that \(x\) logs from the bag will fit into her stove.
  3. Show that \(x = 62\) Gill randomly selects 4 logs from the bag.
  4. Using \(x = 62\), find the probability that all 4 logs will fit into her stove. The weights, \(W\) grams, of the logs in the bag are coded using \(y = 0.5 w - 255\) and summarised by $$n = 88 \quad \sum y = 924 \quad \sum y ^ { 2 } = 12862$$
  5. Calculate
    1. the mean of \(W\)
    2. the variance of \(W\)
Edexcel S1 2024 June Q1
13 marks Easy -1.2
  1. A researcher is investigating the growth of two types of tree, Birch and Maple. The height, to the nearest cm, a seedling grows in one year is recorded for 35 Birch trees and 32 Maple trees. The results are summarised in the back-to-back stem and leaf diagram below.
TotalsBirchMapleTotals
(2)98257789(5)
(8)9996531130266899(7)
(9)9887631114\(111 \boldsymbol { k } 78\)(6)
(9)77754321050123444(7)
(3)7656346(3)
(3)654707(2)
(1)5800(2)
Key: 5 | 6 | 3 means 65 cm for a Birch tree and 63 cm for a Maple tree
The median height that these Maple trees grow in one year is 45 cm .
  1. Find the value of \(\boldsymbol { k }\), used in the stem and leaf diagram.
  2. Find the lower quartile and the upper quartile of the height grown in one year for these Birch trees. The researcher defines an outlier as an observation that is $$\text { greater than } Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right) \text { or less than } Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)$$
  3. Show that there is only one outlier amongst the Birch trees. The grid on page 3 shows a box plot for the heights that the Maple trees grow in one year.
  4. On the same grid draw a box plot for the heights that the Birch trees grow in one year.
  5. Comment on any difference in the distributions of the growth of these Birch trees and the growth of these Maple trees.
    State the values of any statistics you have used to support your comment. The researcher realises he has missed out 4 pieces of data for the Maple trees. The heights each seedling grows in one year, to the nearest cm, in ascending order, for these 4 Maple trees are \(27 \mathrm {~cm} , a \mathrm {~cm} , 48 \mathrm {~cm} , 2 a \mathrm {~cm}\). Given that there is no change to the box plot for the Maple trees given on page 3
  6. find the range of possible values for \(a\) Show your working clearly.
    \includegraphics[max width=\textwidth, alt={}]{ee0c7c12-84f3-479c-b36a-3357f8529a1c-03_1243_1659_1464_210}
    Only use this grid if you need to redraw your answer for part (d) \includegraphics[max width=\textwidth, alt={}, center]{ee0c7c12-84f3-479c-b36a-3357f8529a1c-05_1154_1643_1503_217}
    (Total for Question 1 is 13 marks)
Edexcel S1 2024 June Q3
14 marks Moderate -0.8
  1. The lengths, \(x \mathrm {~mm}\), of 50 pebbles are summarised in the table below.
LengthFrequency
\(20 \leqslant x < 30\)2
\(30 \leqslant x < 32\)16
\(32 \leqslant x < 36\)20
\(36 \leqslant x < 40\)8
\(40 \leqslant x < 45\)3
\(45 \leqslant x < 50\)1
A histogram is drawn to represent these data.
The bar representing the class \(32 \leqslant x < 36\) is 2.5 cm wide and 7.5 cm tall.
  1. Calculate the width and the height of the bar representing the class \(30 \leqslant x < 32\)
  2. Using linear interpolation, estimate the median of \(x\) The weight, \(w\) grams, of each of the 50 pebbles is coded using \(10 y = w - 20\) These coded data are summarised by $$\sum y = 104 \quad \sum y ^ { 2 } = 233.54$$
  3. Show that the mean of \(w\) is 40.8
  4. Calculate the standard deviation of \(w\) The weight of a pebble recorded as 40.8 grams is added to the sample.
  5. Without carrying out any further calculations, state, giving a reason, what effect this would have on the value of
    1. the mean of \(w\)
    2. the standard deviation of \(w\)
Edexcel S1 2016 October Q6
17 marks Easy -1.2
  1. The stem and leaf diagram gives the blood pressure, \(x \mathrm { mmHg }\), for a random sample of 19 female patients.
1012
1127788
12022344557
13129
Key: 10 | 1 means blood pressure of 101 mmHg
  1. Find the median and the quartiles for these data.
  2. Find the interquartile range ( \(Q _ { 3 } - Q _ { 1 }\) ) An outlier is a value that is greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or less than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\)
  3. Showing your working clearly, identify any outliers for these data.
  4. On the grid on page 21 draw a box and whisker plot to represent these data. Show any outliers clearly. The above data can be summarised by $$\sum x = 2299 \text { and } \sum x ^ { 2 } = 279709$$
  5. Calculate the mean and the standard deviation for these data. For a random sample taken from a normal distribution, a rule for determining outliers is: an outlier is more than \(2.7 \times\) standard deviation above or below the mean.
  6. Find the limits to determine outliers using this rule.
  7. State, giving a reason based on some of the above calculations, whether or not a normal distribution is a suitable model for these data. \includegraphics[max width=\textwidth, alt={}, center]{8ff7539e-fa44-4388-af8c-80656f081528-21_2281_73_308_15}
    Turn over for a spare diagram if you need to redraw your plot.
    \includegraphics[max width=\textwidth, alt={}]{8ff7539e-fa44-4388-af8c-80656f081528-24_2639_1830_121_121}
Edexcel S1 2018 October Q2
11 marks Easy -1.3
  1. The weights, to the nearest kilogram, of a sample of 33 female spotted hyenas living in the Serengeti are summarised in the stem and leaf diagram below.
\begin{table}[h]
\captionsetup{labelformat=empty} \caption{Weight (kg)}
3237
413345569
5122344555788999
6233
7147
84
\end{table} Totals
  1. Find the median and quartiles for the weights of the female spotted hyenas. An outlier is defined as any value greater than \(c\) or any value less than \(d\) where $$\begin{aligned} & c = Q _ { 3 } + 1.5 \left( Q _ { 3 } - Q _ { 1 } \right) \\ & d = Q _ { 1 } - 1.5 \left( Q _ { 3 } - Q _ { 1 } \right) \end{aligned}$$
  2. Showing your working clearly, identify any outliers for these data.
    (3) The weights, to the nearest kilogram, of a sample of male spotted hyenas living in the Serengeti are summarised below. \includegraphics[max width=\textwidth, alt={}, center]{0377c6e9-ab4f-477d-9236-0732fe81f25e-06_755_1568_1537_185}
  3. In the space provided in the grid above, draw a box and whisker plot to represent the weights of female spotted hyenas living in the Serengeti. Indicate clearly any outliers. (A copy of this grid is on page 9 if you need to redraw your box and whisker plot.)
  4. Compare the weights of male and female spotted hyenas living in the Serengeti. Key: 3|2 means 32
    \includegraphics[max width=\textwidth, alt={}, center]{0377c6e9-ab4f-477d-9236-0732fe81f25e-09_2658_101_107_9}
Edexcel S1 2018 October Q3
13 marks Moderate -0.8
3. The parking times, \(t\) hours, for cars in a car park are summarised below.
Time (t hours)Frequency (f)Time midpoint (m)
\(0 \leqslant t < 1\)100.5
\(1 \leqslant t < 2\)181.5
\(2 \leqslant t < 4\)153
\(4 \leqslant t < 6\)125
\(6 \leqslant t < 12\)59
$$\text { (You may use } \sum \mathrm { fm } = 182 \text { and } \sum \mathrm { fm } ^ { 2 } = 883 \text { ) }$$ A histogram is drawn to represent these data.
The bar representing the time \(1 \leqslant t < 2\) has a width of 1.5 cm and a height of 6 cm .
  1. Calculate the width and the height of the bar representing the time \(4 \leqslant t < 6\)
  2. Use linear interpolation to estimate the median parking time for the cars in the car park.
  3. Estimate the mean and the standard deviation of the parking time for the cars in the car park.
  4. Describe, giving a reason, the skewness of the data. One of these cars is selected at random.
  5. Estimate the probability that this car is parked for more than 75 minutes.
Edexcel S1 2022 October Q1
11 marks Moderate -0.8
  1. The stem lengths of a sample of 120 tulips are recorded in the grouped frequency table below.
Stem length (cm)Frequency
\(40 \leqslant x < 42\)12
\(42 \leqslant x < 45\)18
\(45 \leqslant x < 50\)23
\(50 \leqslant x < 55\)35
\(55 \leqslant x < 58\)24
\(58 \leqslant x < 60\)8
A histogram is drawn to represent these data.
The area of the bar representing the \(40 \leqslant x < 42\) class is \(16.5 \mathrm {~cm} ^ { 2 }\)
  1. Calculate the exact area of the bar representing the \(42 \leqslant x < 45\) class. The height of the tallest bar in the histogram is 10 cm .
  2. Find the exact height of the second tallest bar. \(Q _ { 1 }\) for these data is 45 cm .
  3. Use linear interpolation to find an estimate for
    1. \(Q _ { 2 }\)
    2. the interquartile range. One measure of skewness is given by $$\frac { Q _ { 3 } - 2 Q _ { 2 } + Q _ { 1 } } { Q _ { 3 } - Q _ { 1 } }$$
    (d) By calculating this measure, describe the skewness of these data.
Edexcel S1 2022 October Q3
10 marks Moderate -0.5
  1. Morgan is investigating the body length, \(b\) centimetres, of squirrels.
A random sample of 8 squirrels is taken and the data for each squirrel is coded using $$x = \frac { b - 21 } { 2 }$$ The results for the coded data are summarised below $$\sum x = - 1.2 \quad \sum x ^ { 2 } = 5.1$$
  1. Find the mean of \(b\)
  2. Find the standard deviation of \(b\) A 9th squirrel is added to the sample. Given that for all 9 squirrels \(\sum x = 0\)
  3. find
    1. the body length of the 9th squirrel,
    2. the standard deviation of \(x\) for all 9 squirrels.
Edexcel S1 2023 October Q2
13 marks Easy -1.2
  1. The weights, to the nearest kilogram, of a sample of 33 red kangaroos taken in December are summarised in the stem and leaf diagram below.
Weight (kg)Totals
16(1)
236(2)
3246(3)
42556678(7)
534777899(8)
6022338(7)
728(2)
826(2)
94(1)
Key: 3 | 2 represents 32 kg
  1. Find
    1. the value of the median
    2. the value of \(Q _ { 1 }\) and the value of \(Q _ { 3 }\) for the weights of these red kangaroos. For these data an outlier is defined as a value that is
      greater than \(Q _ { 3 } + 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\) or smaller than \(Q _ { 1 } - 1.5 \times \left( Q _ { 3 } - Q _ { 1 } \right)\)
  2. Show that there are 2 outliers for these data. Figure 1 on page 7 shows a box plot for the weights of the same 33 red kangaroos taken in February, earlier in the year.
  3. In the space on Figure 1, draw a box plot to represent the weights of these red kangaroos in December.
  4. Compare the distribution of the weights of red kangaroos taken in February with the distribution of the weights of red kangaroos taken in December of the same year. You should interpret your comparisons in the context of the question.
    \includegraphics[max width=\textwidth, alt={}]{f94b29e0-081f-45e8-99a7-ac835eec91e5-07_2267_51_307_36}
    \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{f94b29e0-081f-45e8-99a7-ac835eec91e5-07_766_1803_1777_132} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} Turn over for a spare grid if you need to redraw your box plot. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{f94b29e0-081f-45e8-99a7-ac835eec91e5-09_901_1833_1653_114} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} \begin{verbatim} (Total for Question 2 is 13 marks) \end{verbatim}
Edexcel S1 Specimen Q5
14 marks Moderate -0.3
  1. A teacher selects a random sample of 56 students and records, to the nearest hour, the time spent watching television in a particular week.
Hours\(1 - 10\)\(11 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 40\)\(41 - 59\)
Frequency615111383
Mid-point5.515.52850
  1. Find the mid-points of the 21-25 hour and 31-40 hour groups. A histogram was drawn to represent these data. The 11-20 group was represented by a bar of width 4 cm and height 6 cm .
  2. Find the width and height of the 26-30 group.
  3. Estimate the mean and standard deviation of the time spent watching television by these students.
  4. Use linear interpolation to estimate the median length of time spent watching television by these students. The teacher estimated the lower quartile and the upper quartile of the time spent watching television to be 15.8 and 29.3 respectively.
  5. State, giving a reason, the skewness of these data.
Edexcel S1 2001 January Q5
17 marks Moderate -0.3
5. The following grouped frequency distribution summarises the number of minutes, to the nearest minute, that a random sample of 200 motorists were delayed by roadworks on a stretch of motorway.
Delay (mins)Number of motorists
\(4 - 6\)15
\(7 - 8\)28
949
1053
\(11 - 12\)30
\(13 - 15\)15
\(16 - 20\)10
  1. Using graph paper represent these data by a histogram.
  2. Give a reason to justify the use of a histogram to represent these data.
  3. Use interpolation to estimate the median of this distribution.
  4. Calculate an estimate of the mean and an estimate of the standard deviation of these data. One coefficient of skewness is given by $$\frac { 3 ( \text { mean - median } ) } { \text { standard deviation } } .$$
  5. Evaluate this coefficient for the above data.
  6. Explain why the normal distribution may not be suitable to model the number of minutes that motorists are delayed by these roadworks.
Edexcel S1 2003 January Q4
16 marks Easy -1.2
4. A restaurant owner is concerned about the amount of time customers have to wait before being served. He collects data on the waiting times, to the nearest minute, of 20 customers. These data are listed below.
15,14,16,15,17,16,15,14,15,16,
17,16,15,14,16,17,15,25,18,16
  1. Find the median and inter-quartile range of the waiting times. An outlier is an observation that falls either \(1.5 \times\) (inter-quartile range) above the upper quartile or \(1.5 \times\) (inter-quartile range) below the lower quartile.
  2. Draw a boxplot to represent these data, clearly indicating any outliers.
  3. Find the mean of these data.
  4. Comment on the skewness of these data. Justify your answer.
Edexcel S1 2005 January Q2
14 marks Easy -1.8
2. The number of caravans on Seaview caravan site on each night in August last year is summarised in the following stem and leaf diagram.
Caravans110 means 10Totals
10(2)
218(4)
30347(8)
41588(9)
5267(5)
62(3)
  1. Find the three quartiles of these data. During the same month, the least number of caravans on Northcliffe caravan site was 31. The maximum number of caravans on this site on any night that month was 72 . The three quartiles for this site were 38,45 and 52 respectively.
  2. On graph paper and using the same scale, draw box plots to represent the data for both caravan sites. You may assume that there are no outliers.
  3. Compare and contrast these two box plots.
  4. Give an interpretation to the upper quartiles of these two distributions.