2.02f Measures of average and spread

447 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 Q5
12 marks Easy -1.3
5. For a project, a student asked 40 people to draw two straight lines with what they thought was an angle of \(75 ^ { \circ }\) between them, using just a ruler and a pencil. She then measured the size of the angles that had been drawn and her data are summarised in this stem and leaf diagram.
Angle( \(6 \mid 4\) means \(64 ^ { \circ }\) )Totals
41(1)
4(0)
5024(3)
5589(3)
611334(5)
655789(5)
7011233444(9)
75667799(7)
801134(5)
856(2)
  1. Find the median and quartiles of these data. Given that any values outside of the limits \(\mathrm { Q } _ { 1 } - 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) and \(\mathrm { Q } _ { 3 } + 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) are to be regarded as outliers,
  2. determine if there are any outliers in these data,
  3. draw a box plot representing these data on graph paper,
  4. describe the skewness of the distribution and suggest a reason for it.
Edexcel S1 Q6
13 marks Moderate -0.8
6. The number of people visiting a new art gallery each day is recorded over a three-month period and the results are summarised in the table below.
Number of visitorsNumber of days
400-4593
460-4798
480-49913
500-51912
520-53918
540-55911
560-5999
600-6995
  1. Draw a histogram on graph paper to illustrate these data. In order to calculate summary statistics for the data it is coded using \(y = \frac { x - 509.5 } { 10 }\), where \(x\) is the mid-point of each class.
  2. Find \(\sum\) fy. You may assume that \(\sum f y ^ { 2 } = 2041\).
  3. Using these values for \(\sum f y\) and \(\sum f y ^ { 2 }\), calculate estimates of the mean and standard deviation of the number of visitors per day.
    (6 marks)
Edexcel S1 Q4
11 marks Standard +0.3
4. The ages of 300 houses in a village are recorded giving the following table of results.
Age (years)Number of houses
0 -36
20 -92
40 -74
60 -39
100 -14
200 -27
300-50018
Use linear interpolation to estimate for these data
  1. the median,
  2. the limits between which the middle \(80 \%\) of the ages lie. An estimate of the mean of these data is calculated to be 86.6 years.
  3. Explain why the mean and median are so different and hence say which you consider best represents the data.
Edexcel S1 Q1
9 marks Moderate -0.8
  1. The weight in kilograms, \(w\), of the 15 players in a rugby team was recorded and the results summarised as follows.
$$\Sigma w = 1145.3 , \quad \Sigma w ^ { 2 } = 88042.14$$
  1. Calculate the mean and variance of the weight of the players. Due to injury, one of the players who weighed 79.2 kg was replaced with another player who weighed 63.5 kg .
  2. Without further calculation state the effect of this change on the mean and variance of the weight of the players in the team. Explain your answers.
    (4 marks)
Edexcel S1 Q5
16 marks Easy -1.3
5. Each child in class 3A was given a packet of seeds to plant. The stem and leaf diagram below shows how many seedlings were visible in each child's tray one week after planting.
Number of seedlings(2 | 1 means 21)Totals
002(2)
0(0)
11(1)
157(2)
201334(5)
25777899(7)
30001224(7)
35688(4)
4134(3)
  1. Find the median and interquartile range for these data.
  2. Use the quartiles to describe the skewness of the data. Show your method clearly. The mean and standard deviation for these data were 27.2 and 10.3 respectively.
  3. Explaining your answer, state whether you would recommend using these values or your answers to part (a) to summarise these data. Outliers are defined to be values outside of the limits \(\mathrm { Q } _ { 1 } - 2 s\) and \(\mathrm { Q } _ { 3 } + 2 s\) where \(s\) is the standard deviation given above.
  4. Represent these data with a boxplot identifying clearly any outliers.
Edexcel S1 Q3
9 marks Moderate -0.8
3. A magazine collected data on the total cost of the reception at each of a random sample of 80 weddings. The data is grouped and coded using \(y = \frac { C - 3250 } { 250 }\), where \(C\) is the mid-point in pounds of each class, giving \(\sum f y = 37\) and \(\sum f y ^ { 2 } = 2317\).
  1. Using these values, calculate estimates of the mean and standard deviation of the cost of the receptions in the sample.
  2. Explain why your answers to part (a) are only estimates. The median of the data was \(\pounds 3050\).
  3. Comment on the skewness of the data and suggest a reason for it.
Edexcel S1 Q5
11 marks Standard +0.3
5. A group of children were each asked to try and complete a task to test hand-eye coordination. Each child repeated the task until he or she had been successful or had made four attempts. The number of attempts made by the children in the group are summarised in the table below.
Number of attempts1234
Number of children4326133
  1. Calculate the mean and standard deviation of the number of attempts made by each child. It is suggested that the number of attempts made by each child could be modelled by a discrete random variable \(X\) with the probability function $$P ( X = x ) = \left\{ \begin{array} { c c } k \left( 20 - x ^ { 2 } \right) , & x = 1,2,3,4 \\ 0 , & \text { otherwise } \end{array} \right.$$
  2. Show that \(k = \frac { 1 } { 50 }\).
  3. Find \(\mathrm { E } ( X )\).
  4. Comment on the suitability of this model.
Edexcel S1 Q6
13 marks Moderate -0.3
6. A cinema recorded the number of people at each showing of each film during a one-week period. The results are summarised in the table below.
Number of peopleNumber of showings
1-4036
41-6020
61-8033
81-10024
101-15036
151-20039
201-30052
  1. Draw a histogram on graph paper to illustrate these data.
  2. Calculate estimates of the median and quartiles of these data.
  3. Use your answers to part (b) to show that the data is positively skewed.
Edexcel S1 Q1
11 marks Moderate -0.8
  1. A net was used to catch swallows so that they could be ringed and examined. The weights of 55 adult birds were recorded and the results are summarised in the table below.
Weight (g)\(14 - 19\)\(20 - 21\)\(22 - 23\)\(24 - 25\)\(26 - 29\)\(30 - 35\)
Frequency36152092
  1. For these data calculate estimates of
    1. the median,
    2. the \(33 ^ { \text {rd } }\) percentile. These data are represented by a histogram and the bar representing the 24-25 group is 1 cm wide and 20 cm high.
  2. Calculate the dimensions of the bars representing the groups
    1. 20-21
    2. 26-29
Edexcel S1 Q3
11 marks Moderate -0.3
3. A soccer fan collected data on the number of minutes of league football, \(m\), played by each team in the four main divisions before first scoring a goal at the start of a new season. Her results are shown in the table below.
\(m\) (minutes)Number of teams
\(0 \leq m < 40\)36
\(40 \leq m < 80\)28
\(80 \leq m < 120\)10
\(120 \leq m < 160\)4
\(160 \leq m < 200\)5
\(200 \leq m < 300\)4
\(300 \leq m < 400\)2
\(400 \leq m < 600\)3
  1. Calculate estimates of the mean and standard deviation of these data.
  2. Explain why the mean and standard deviation might not be the best summary statistics to use with these data.
  3. Suggest alternative summary statistics that would better represent these data.
Edexcel S1 Q5
14 marks Easy -1.2
5. In a survey unemployed people were asked how many months it had been, to the nearest month, since they were last employed on a full-time basis. The data collected is summarised in this stem and leaf diagram.
Number of months(2 | 1 means 21 months)Totals
011224446779(11)
102355689( )
21568( )
3079( )
45( )
527(2)
63(1)
70(1)
  1. Write down the values needed to complete the totals column on the stem and leaf diagram.
  2. State the mode of these data.
  3. Find the median and quartiles of these data. Given that any values outside of the limits \(\mathrm { Q } _ { 1 } - 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) and \(\mathrm { Q } _ { 3 } + 1.5 \left( \mathrm { Q } _ { 3 } - \mathrm { Q } _ { 1 } \right)\) are to be regarded as outliers,
  4. determine if there are any outliers in these data,
  5. draw a box plot representing these data on graph paper,
  6. describe the skewness of these data and suggest a reason for it.
Edexcel S1 Q5
12 marks Moderate -0.3
5. An antiques shop recorded the value of items stolen to the nearest pound during each week for a year giving the data in the table below.
Value of goods stolen (£)Number of weeks
0-19931
200-3996
400-5993
600-7994
800-9995
1000-19992
2000-29991
Letting \(x\) represent the mid-point of each group and using the coding \(y = \frac { x - 699.5 } { 200 }\),
  1. find \(\sum\) fy.
  2. estimate to the nearest pound the mean and standard deviation of the value of the goods stolen each week using your value for \(\sum f y\) and \(\sum f y ^ { 2 } = 424\).
    (6 marks)
    The median for these data is \(\pounds 82\).
  3. Explain why the manager of the shop might be reluctant to use either the mean or the median in summarising these data.
    (3 marks)
Edexcel S1 Q7
15 marks Moderate -0.8
7. A cyber-cafe recorded how long each user stayed during one day giving the following results.
Length of stay
(minutes)
\(0 -\)\(30 -\)\(60 -\)\(90 -\)\(120 -\)\(240 -\)\(360 -\)
Number of users153132231720
  1. Use linear interpolation to estimate the median and quartiles of these data. The results of a previous study had led to the suggestion that the length of time each user stays can be modelled by a normal distribution with a mean of 72 minutes and a standard deviation of 48 minutes.
  2. Find the median and quartiles that this model would predict.
  3. Comment on the suitability of the suggested model in the light of the new results.
Edexcel S2 Q6
15 marks Standard +0.3
A sample of radioactive material decays randomly, with an approximate mean of 1.5 counts per minute.
  1. Name a distribution that would be suitable for modelling the number of counts per minute. Give any parameters required for the model.
  2. Find the probability of at least 4 counts in a randomly chosen minute.
  3. Find the probability of 3 counts or fewer in a random interval lasting 5 minutes. More careful measurements, over 50 one-minute intervals, give the following data for \(x\), the number of counts per minute: $$\sum x = 84 , \quad \sum x ^ { 2 } = 226$$
  4. Decide whether these data support your answer to part (a).
  5. Use the improved data to find probability of exactly two counts in a given one-minute interval.
OCR H240/02 2018 September Q9
12 marks Moderate -0.3
9 The finance department of a retail firm recorded the daily income each day for 300 days. The results are summarised in the histogram. \includegraphics[max width=\textwidth, alt={}, center]{85de9a39-f8be-40ee-b0c8-e2e632be93d8-6_689_1575_488_246}
  1. Find the number of days on which the daily income was between \(\pounds 4000\) and \(\pounds 6000\).
  2. Calculate an estimate of the number of days on which the daily income was between \(\pounds 2700\) and \(\pounds 3600\).
  3. Use the midpoints of the classes to show that an estimate of the mean daily income is \(\pounds 3275\). An estimate of the standard deviation of the daily income is \(\pounds 1060\). The finance department uses the distribution \(\mathrm { N } \left( 3275,1060 ^ { 2 } \right)\) to model the daily income, in pounds.
  4. Calculate the number of days on which, according to this model, the daily income would be between \(\pounds 4000\) and \(\pounds 6000\).
  5. It is given that approximately \(95 \%\) of values of the distribution \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\) lie within the range \(\mu \pm 2 \sigma\). Without further calculation, use this fact to comment briefly on whether the proposed model is a good fit to the data illustrated in the histogram.
Edexcel S1 2022 January Q3
10 marks Moderate -0.8
  1. The stem and leaf diagram shows the number of deliveries made by Pat each day for 24 days
\begin{table}[h]
\captionsetup{labelformat=empty} \caption{Key: 10 \(\mathbf { 8 }\) represents 108 deliveries}
1089(2)
1103666889999(11)
1245555558(8)
13\(a\)\(b\)\(c\)(3)
\end{table} where \(a\), \(b\) and \(c\) are positive integers with \(a < b < c\) An outlier is defined as any value greater than \(1.5 \times\) interquartile range above the upper quartile. Given that there is only one outlier for these data,
  1. show that \(c = 9\) The number of deliveries made by Pat each day is represented by \(d\) The data in the stem and leaf diagram are coded using $$x = d - 125$$ and the following summary statistics are obtained $$\sum x = - 96 \quad \text { and } \quad \sum ( x - \bar { x } ) ^ { 2 } = 1306$$
  2. Find the mean number of deliveries.
  3. Find the standard deviation of the number of deliveries. One of these 24 days is selected at random. The random variable \(D\) represents the number of deliveries made by Pat on this day. The random variable \(X = D - 125\)
  4. Find \(\mathrm { P } ( D > 118 \mid X < 0 )\)
Edexcel S1 2017 June Q1
8 marks Easy -1.2
  1. Nina weighed a random sample of 50 carrots from her shop and recorded the weight, in grams to the nearest gram, for each carrot. The results are summarised below.
Weight of carrotFrequency (f)Weight midpoint \(( \boldsymbol { x }\) grams \()\)
\(45 - 54\)549.5
\(55 - 59\)1057
\(60 - 64\)2262
\(65 - 74\)1369.5
$$\text { (You may use } \sum \mathrm { f } x ^ { 2 } = 192102.5 \text { ) }$$
  1. Use linear interpolation to estimate the median weight of these carrots.
  2. Find an estimate for the mean weight of these carrots.
  3. Find an estimate for the standard deviation of the weights of these carrots. A carrot is selected at random from Nina's shop.
  4. Estimate the probability that the weight of this carrot is more than 70 grams.
Edexcel S1 2017 June Q2
11 marks Easy -1.2
2. The box plot shows the times, \(t\) minutes, it takes a group of office workers to travel to work. \includegraphics[max width=\textwidth, alt={}, center]{7d45bacd-20ac-49b4-8f3f-613edf3739f9-04_365_1237_351_356}
  1. Find the range of the times.
  2. Find the interquartile range of the times.
  3. Using the quartiles, describe the skewness of these data. Give a reason for your answer. Chetna believes that house prices will be higher if the time to travel to work is shorter. She asks a random sample of these office workers for their house prices \(\pounds x\), where \(x\) is measured in thousands, and obtains the following statistics $$\mathrm { S } _ { x x } = 5514 \quad \mathrm {~S} _ { x t } = 10 \quad \mathrm {~S} _ { t t } = 1145.6$$
  4. Calculate the product moment correlation coefficient between \(x\) and \(t\).
  5. State, giving a reason, whether or not your correlation coefficient supports Chetna's belief. Adam and Betty are part of the group of office workers and they have both moved house. Adam's time to travel to work changes from 32 minutes to 36 minutes. Betty's time to travel to work changes from 38 minutes to 58 minutes. Outliers are defined as values that are more than 1.5 times the interquartile range above the upper quartile.
  6. Showing all necessary calculations, determine how the box plot of times to travel to work will change and draw a new box plot on the grid on page 5. \includegraphics[max width=\textwidth, alt={}, center]{7d45bacd-20ac-49b4-8f3f-613edf3739f9-05_499_1413_2122_180}
Edexcel S1 2017 October Q1
14 marks Easy -1.2
At the start of a course, an instructor asked a group of 80 apprentices to estimate the length of a piece of pipe. The error (true length - estimated length) was recorded in centimetres. The results are summarised in the box plot below. \includegraphics[max width=\textwidth, alt={}, center]{77ae01cd-2b58-48ab-889f-272e27ecf99d-02_291_1445_397_246}
  1. Find the range for these data.
  2. Find the interquartile range for these data. One month later, the instructor asked the 80 apprentices to estimate the length of a different piece of pipe and recorded their errors. The results are summarised in the table below.
    Error ( \(\boldsymbol { e }\) cm)Number of apprentices
    \(- 40 < e \leqslant - 16\)2
    \(- 16 < e \leqslant - 8\)18
    \(- 8 < e \leqslant 0\)33
    \(0 < e \leqslant 8\)14
    \(8 < e \leqslant 16\)10
    \(16 < e \leqslant 40\)3
  3. Use linear interpolation to estimate the median error for these data.
  4. Show that the upper quartile for these data, to the nearest centimetre, is 4 . For these data, the lower quartile is - 8 and the five worst errors were \(- 25 , - 21,18,23,28\) An outlier is a value that falls either more than \(1.5 \times\) (interquartile range) above the upper quartile or more than \(1.5 \times\) (interquartile range) below the lower quartile.
    1. Show that there are only 2 outliers for these data.
    2. Draw a box plot for these data on the grid on page 3.
  5. State, giving reasons, whether or not the apprentices' ability to estimate the length of a piece of pipe has improved over the first month of the course. \includegraphics[max width=\textwidth, alt={}, center]{77ae01cd-2b58-48ab-889f-272e27ecf99d-03_412_1520_2222_173}
Edexcel S1 2021 October Q3
14 marks Moderate -0.8
  1. The stem and leaf diagram shows the ages of the 35 male passengers on a cruise.
Age
13\(( 1 )\)
279\(( 2 )\)
31288\(( 4 )\)
45567889\(( 7 )\)
52233445668\(( 10 )\)
60114447\(( 7 )\)
736\(( 2 )\)
878\(( 2 )\)
Key: 1 | 3 represents an age of 13 years
  1. Find the median age of the male passengers.
  2. Show that the interquartile range (IQR) of these ages is 16 An outlier is defined as a value that is more than \(1.5 \times\) IQR above the upper quartile
    or \(1.5 \times\) IQR below the lower quartile
  3. Show that there are 3 outliers amongst these ages.
  4. On the grid in Figure 1 on page 9, draw a box plot for the ages of the male passengers on the cruise. Figure 1 on page 9 also shows a box plot for the ages of the female passengers on the cruise.
  5. Comment on any difference in the distributions of ages of male and female passengers on the cruise.
    State the values of any statistics you have used to support your comment.
    (1) Anja, along with her 2 daughters and a granddaughter, now join the cruise.
    Anja's granddaughter is younger than both of Anja's daughters.
    Anja had her 23rd birthday on the day her eldest daughter was born.
    When their 4 ages are included with the other female passengers on the cruise, the box plot does not change.
  6. State, giving reasons, what you can say about
    1. the granddaughter's age
    2. Anja's age.
      (3)
      \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{29ac0c0b-f963-40a1-beba-7146bbb2d021-09_1025_1593_1541_182} \captionsetup{labelformat=empty} \caption{Figure 1}
      \end{figure}
Edexcel S1 Q1
Easy -1.2
  1. The students in a class were each asked to write down how many CDs they owned. The student with the least number of CDs had 14 and all but one of the others owned 60 or fewer. The remaining student owned 65 . The quartiles for the class were 30,34 and 42 respectively.
Outliers are defined to be any values outside the limits of \(1.5 \left( Q _ { 3 } - Q _ { 1 } \right)\) below the lower quartile or above the upper quartile. On graph paper draw a box plot to represent these data, indicating clearly any outliers.
(7 marks)
Edexcel S1 Q5
Moderate -0.3
5. The following grouped frequency distribution summarises the number of minutes, to the nearest minute, that a random sample of 200 motorists were delayed by roadworks on a stretch of motorway.
Delay (mins)Number of motorists
\(4 - 6\)15
\(7 - 8\)28
949
1053
\(11 - 12\)30
\(13 - 15\)15
\(16 - 20\)10
  1. Using graph paper represent these data by a histogram.
  2. Give a reason to justify the use of a histogram to represent these data.
  3. Use interpolation to estimate the median of this distribution.
  4. Calculate an estimate of the mean and an estimate of the standard deviation of these data. One coefficient of skewness is given by $$\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } } .$$
  5. Evaluate this coefficient for the above data.
  6. Explain why the normal distribution may not be suitable to model the number of minutes that motorists are delayed by these roadworks.
Edexcel S1 Q4
Easy -1.2
4. Aeroplanes fly from City \(A\) to City \(B\). Over a long period of time the number of minutes delay in take-off from City \(A\) was recorded. The minimum delay was 5 minutes and the maximum delay was 63 minutes. A quarter of all delays were at most 12 minutes, half were at most 17 minutes and \(75 \%\) were at most 28 minutes. Only one of the delays was longer than 45 minutes. An outlier is an observation that falls either \(1.5 \times\) (interquartile range) above the upper quartile or \(1.5 \times\) (interquartile range) below the lower quartile.
  1. On the graph paper opposite draw a box plot to represent these data.
  2. Comment on the distribution of delays. Justify your answer.
  3. Suggest how the distribution might be interpreted by a passenger who frequently flies from City \(A\) to City \(B\). \includegraphics[max width=\textwidth, alt={}, center]{3d4f7bfb-b235-418a-9411-a4d0b3188254-008_1190_1487_278_223}
Edexcel S1 2003 June Q6
16 marks Moderate -0.8
6. The number of bags of potato crisps sold per day in a bar was recorded over a two-week period. The results are shown below. $$20,15,10,30,33,40,5,11,13,20,25,42,31,17$$
  1. Calculate the mean of these data.
  2. Draw a stem and leaf diagram to represent these data.
  3. Find the median and the quartiles of these data. An outlier is an observation that falls either \(1.5 \times\) (interquartile range) above the upper quartile or \(1.5 \times\) (interquartile range) below the lower quartile.
  4. Determine whether or not any items of data are outliers.
  5. On graph paper draw a box plot to represent these data. Show your scale clearly.
  6. Comment on the skewness of the distribution of bags of crisps sold per day. Justify your answer.
AQA S1 2007 January Q1
9 marks Easy -1.2
1 The times, in seconds, taken by 20 people to solve a simple numerical puzzle were
17192226283134363839
41424347505153555758
  1. Calculate the mean and the standard deviation of these times.
  2. In fact, 23 people solved the puzzle. However, 3 of them failed to solve it within the allotted time of 60 seconds. Calculate the median and the interquartile range of the times taken by all 23 people.
    (4 marks)
  3. For the times taken by all 23 people, explain why:
    1. the mode is not an appropriate numerical measure;
    2. the range is not an appropriate numerical measure.