Estimate mean and standard deviation from frequency table

Questions that provide a grouped frequency table directly (with or without midpoints pre-calculated) and ask the student to calculate estimates of mean and/or standard deviation.

20 questions · Moderate -0.7

2.02g Calculate mean and standard deviation
Sort by: Default | Easiest first | Hardest first
OCR S1 2006 June Q7
13 marks Moderate -0.8
7 In a UK government survey in 2000, smokers were asked to estimate the time between their waking and their having the first cigarette of the day. For heavy smokers, the results were as follows.
Time between waking
and first cigarette
1 to 4
minutes
5 to 14
minutes
15 to 29
minutes
30 to 59
minutes
At least 60
minutes
Percentage of smokers312719149
Times are given correct to the nearest minute.
  1. Assuming that 'At least 60 minutes' means 'At least 60 minutes but less than 240 minutes', calculate estimates for the mean and standard deviation of the time between waking and first cigarette for these smokers.
  2. Find an estimate for the interquartile range of the time between waking and first cigarette for these smokers. Give your answer correct to the nearest minute.
  3. The meaning of 'At least 60 minutes' is now changed to 'At least 60 minutes but less than 480 minutes'. Without further calculation, state whether this would cause an increase, a decrease or no change in the estimated value of
    1. the mean,
    2. the standard deviation,
    3. the interquartile range.
Edexcel S1 2019 January Q4
13 marks Moderate -0.8
4. A group of 100 adults recorded the amount of time, \(t\) minutes, they spent exercising each day. Their results are summarised in the table below.
Time (t minutes)Frequency (f)Time midpoint (x)
\(0 \leqslant t < 15\)257.5
\(15 \leqslant t < 30\)1722.5
\(30 \leqslant t < 60\)2845
\(60 \leqslant t < 120\)2490
\(120 \leqslant t \leqslant 240\)6180
[You may use \(\sum \mathrm { f } x ^ { 2 } = 455\) 512.5]
A histogram is drawn to represent these data.
The bar representing the time \(0 \leqslant t < 15\) has width 0.5 cm and height 6 cm .
  1. Calculate the width and height of the bar representing a time of \(60 \leqslant t < 120\)
  2. Use linear interpolation to estimate the median time spent exercising by these adults each day.
  3. Find an estimate of the mean time spent exercising by these adults each day.
  4. Calculate an estimate for the standard deviation of these times.
  5. Describe, giving a reason, the skewness of these data. Further analysis of the above data revealed that 18 of the 25 adults in the \(0 \leqslant t < 15\) group took no exercise each day.
  6. State, giving a reason, what effect, if any, this new information would have on your answers to
    1. the estimate of the median in part (b),
    2. the estimate of the mean in part (c),
    3. the estimate of the standard deviation in part (d).
Edexcel S1 2014 June Q2
14 marks Moderate -0.8
  1. The table below shows the distances (to the nearest km ) travelled to work by the 50 employees in an office.
Distance (km)Frequency (f)Distance midpoint (x)
0-2161.25
3-5124
6-10108
11-20815.5
21-40430.5
$$\text { [You may use } \left. \sum \mathrm { f } x = 394 , \quad \sum \mathrm { f } x ^ { 2 } = 6500 \right]$$ A histogram has been drawn to represent these data.
The bar representing the distance of \(3 - 5\) has a width of 1.5 cm and a height of 6 cm .
  1. Calculate the width and height of the bar representing the distance of 6-10
  2. Use linear interpolation to estimate the median distance travelled to work.
    1. Show that an estimate of the mean distance travelled to work is 7.88 km .
    2. Estimate the standard deviation of the distances travelled to work.
  3. Describe, giving a reason, the skewness of these data. Peng starts to work in this office as the \(51 ^ { \text {st } }\) employee.
    She travels a distance of 7.88 km to work.
  4. Without carrying out any further calculations, state, giving a reason, what effect Peng's addition to the workforce would have on your estimates of the
    1. mean,
    2. median,
    3. standard deviation
      of the distances travelled to work.
OCR MEI AS Paper 2 2020 November Q2
3 marks Moderate -0.8
2 A student measures the upper arm lengths of a sample of 97 women. The results are summarised in the frequency table in Fig. 2.1. \begin{table}[h]
Arm length in cm\(30 -\)\(31 -\)\(32 -\)\(33 -\)\(34 -\)\(35 -\)\(36 -\)\(37 -\)\(38 -\)\(39 -\)\(40 - 41\)
Frequency145913191717435
\captionsetup{labelformat=empty} \caption{Fig. 2.1}
\end{table} The student constructs two cumulative frequency diagrams to represent the data using different class intervals. These are shown in Fig. 2.2 opposite One of these diagrams is correct and the other is incorrect.
  1. State which diagram is incorrect, justifying your answer.
  2. Use the correct diagram in Fig. 2.2 to find an estimate of the median. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c08a2212-3104-425e-8aee-7f2d46f23924-05_2256_1230_191_148} \captionsetup{labelformat=empty} \caption{Fig. 2.2}
    \end{figure}
OCR MEI Paper 2 2018 June Q14
9 marks Moderate -0.8
14 The pre-release material includes data on unemployment rates in different countries. A sample from this material has been taken. All the countries in the sample are in Europe. The data have been grouped and are shown in Fig 14.1. \begin{table}[h]
Unemployment rate\(0 -\)\(5 -\)\(10 -\)\(15 -\)\(20 -\)\(35 - 50\)
Frequency15215522
\captionsetup{labelformat=empty} \caption{Fig. 14.1}
\end{table} A cumulative frequency curve has been generated for the sample data using a spreadsheet. This is shown in Fig. 14.2. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{d8ff9511-aff7-45ea-ba55-e6667e8ba760-08_639_1081_808_466} \captionsetup{labelformat=empty} \caption{Fig. 14.2}
\end{figure} Hodge used Fig. 14.2 to estimate the median unemployment rate in Europe. He obtained the answer 5.0. The correct value for this sample is 6.9.
  1. (A) There is a systematic error in the diagram.
    The scatter diagram shown in Fig. 14.3 shows the unemployment rate and life expectancy at birth for the 47 countries in the sample for which this information is available. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Scatter diagram to show life expectancy at birth against unemployment rate} \includegraphics[alt={},max width=\textwidth]{d8ff9511-aff7-45ea-ba55-e6667e8ba760-09_627_1281_456_367}
    \end{figure} Fig. 14.3 The product moment correlation coefficient for the 47 items in the sample is - 0.2607 .
    The \(p\)-value associated with \(r = - 0.2607\) and \(n = 47\) is 0.0383 .
  2. Does this information suggest that there is an association between unemployment rate and life expectancy at birth in countries in Europe? Hodge uses the spreadsheet tools to obtain the equation of a line of best fit for this data.
  3. The unemployment rate in Kosovo is 35.3 , but there is no data available on life expectancy. Is it reasonable to use Hodge's line of best fit to estimate life expectancy at birth in Kosovo?
Edexcel S1 2018 June Q5
13 marks Moderate -0.8
5. The weights, in grams, of a random sample of 48 broad beans are summarised in the table.
Weight in grams ( \(\boldsymbol { x }\) )Frequency (f)Class midpoint (y)
\(0.9 < x \leqslant 1.1\)91.0
\(1.1 < x \leqslant 1.3\)121.2
\(1.3 < x \leqslant 1.5\)111.4
\(1.5 < x \leqslant 1.7\)81.6
\(1.7 < x \leqslant 1.9\)31.8
\(1.9 < x \leqslant 2.1\)32.0
\(2.1 < x \leqslant 2.7\)22.4
(You may assume \(\sum \mathrm { fy } { } ^ { 2 } = 101.56\) ) A histogram was drawn to represent these data. The \(2.1 < x \leqslant 2.7\) class was represented by a bar of width 1.5 cm and height 1 cm .
  1. Find the width and height of the \(0.9 < x \leqslant 1.1\) class.
  2. Give a reason to justify the use of a histogram to represent these data.
  3. Estimate the mean and the standard deviation of the weights of these broad beans.
  4. Use linear interpolation to estimate the median of the weights of these broad beans. One of these broad beans is selected at random.
  5. Estimate the probability that its weight lies between 1.1 grams and 1.6 grams. One of these broad beans having a recorded weight of 0.95 grams was incorrectly weighed. The correct weight is 1.4 grams.
  6. State, giving a reason, the effect this would have on your answers to part (c). Do not carry out any further calculations.
Edexcel S1 2021 June Q3
14 marks Moderate -0.8
  1. A random sample of 100 carrots is taken from a farm and their lengths, \(L \mathrm {~cm}\), recorded. The data are summarised in the following table.
Length, \(L\) cmFrequency, fClass mid point, \(\boldsymbol { x } \mathbf { c m }\)
\(5 \leqslant L < 8\)56.5
\(8 \leqslant L < 10\)139
\(10 \leqslant L < 12\)1611
\(12 \leqslant L < 15\)2513.5
\(15 \leqslant L < 20\)3017.5
\(20 \leqslant L < 28\)1124
A histogram is drawn to represent these data.
The bar representing the class \(5 \leqslant L < 8\) is 1.5 cm wide and 1 cm high.
  1. Find the width and height of the bar representing the class \(15 \leqslant L < 20\)
  2. Use linear interpolation to estimate the median length of these carrots.
  3. Estimate
    1. the mean length of these carrots,
    2. the standard deviation of the lengths of these carrots. A supermarket will only buy carrots with length between 9 cm and 22 cm .
  4. Estimate the proportion of carrots from the farm that the supermarket will buy. Any carrots that the supermarket does not buy are sold as animal feed. The farm makes a profit of 2.2 pence on each carrot sold to the supermarket, a profit of 0.8 pence on each carrot longer than 22 cm and a loss of 1.2 pence on each carrot shorter than 9 cm .
  5. Find an estimate of the mean profit per carrot made by the farm.
Edexcel S1 2018 October Q3
13 marks Moderate -0.8
3. The parking times, \(t\) hours, for cars in a car park are summarised below.
Time (t hours)Frequency (f)Time midpoint (m)
\(0 \leqslant t < 1\)100.5
\(1 \leqslant t < 2\)181.5
\(2 \leqslant t < 4\)153
\(4 \leqslant t < 6\)125
\(6 \leqslant t < 12\)59
$$\text { (You may use } \sum \mathrm { fm } = 182 \text { and } \sum \mathrm { fm } ^ { 2 } = 883 \text { ) }$$ A histogram is drawn to represent these data.
The bar representing the time \(1 \leqslant t < 2\) has a width of 1.5 cm and a height of 6 cm .
  1. Calculate the width and the height of the bar representing the time \(4 \leqslant t < 6\)
  2. Use linear interpolation to estimate the median parking time for the cars in the car park.
  3. Estimate the mean and the standard deviation of the parking time for the cars in the car park.
  4. Describe, giving a reason, the skewness of the data. One of these cars is selected at random.
  5. Estimate the probability that this car is parked for more than 75 minutes.
Edexcel S1 Specimen Q5
14 marks Moderate -0.3
  1. A teacher selects a random sample of 56 students and records, to the nearest hour, the time spent watching television in a particular week.
Hours\(1 - 10\)\(11 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 40\)\(41 - 59\)
Frequency615111383
Mid-point5.515.52850
  1. Find the mid-points of the 21-25 hour and 31-40 hour groups. A histogram was drawn to represent these data. The 11-20 group was represented by a bar of width 4 cm and height 6 cm .
  2. Find the width and height of the 26-30 group.
  3. Estimate the mean and standard deviation of the time spent watching television by these students.
  4. Use linear interpolation to estimate the median length of time spent watching television by these students. The teacher estimated the lower quartile and the upper quartile of the time spent watching television to be 15.8 and 29.3 respectively.
  5. State, giving a reason, the skewness of these data.
Edexcel S1 2013 January Q5
15 marks Moderate -0.8
  1. A survey of 100 households gave the following results for weekly income \(\pounds y\).
Income \(y\) (£)Mid-pointFrequency \(f\)
\(0 \leqslant y < 200\)10012
\(200 \leqslant y < 240\)22028
\(240 \leqslant y < 320\)28022
\(320 \leqslant y < 400\)36018
\(400 \leqslant y < 600\)50012
\(600 \leqslant y < 800\)7008
(You may use \(\sum f y ^ { 2 } = 12452\) 800)
A histogram was drawn and the class \(200 \leqslant y < 240\) was represented by a rectangle of width 2 cm and height 7 cm .
  1. Calculate the width and the height of the rectangle representing the class $$320 \leqslant y < 400$$
  2. Use linear interpolation to estimate the median weekly income to the nearest pound.
  3. Estimate the mean and the standard deviation of the weekly income for these data. One measure of skewness is \(\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }\).
  4. Use this measure to calculate the skewness for these data and describe its value. Katie suggests using the random variable \(X\) which has a normal distribution with mean 320 and standard deviation 150 to model the weekly income for these data.
  5. Find \(\mathrm { P } ( 240 < X < 400 )\).
  6. With reference to your calculations in parts (d) and (e) and the data in the table, comment on Katie's suggestion.
Edexcel S1 2013 June Q3
13 marks Moderate -0.8
3. An agriculturalist is studying the yields, \(y \mathrm {~kg}\), from tomato plants. The data from a random sample of 70 tomato plants are summarised below.
Yield ( \(y \mathrm {~kg}\) )Frequency (f)Yield midpoint ( \(x \mathrm {~kg}\) )
\(0 \leqslant y < 5\)162.5
\(5 \leqslant y < 10\)247.5
\(10 \leqslant y < 15\)1412.5
\(15 \leqslant y < 25\)1220
\(25 \leqslant y < 35\)430
$$\text { (You may use } \sum \mathrm { f } x = 755 \text { and } \sum \mathrm { f } x ^ { 2 } = 12037.5 \text { ) }$$ A histogram has been drawn to represent these data. The bar representing the yield \(5 \leqslant y < 10\) has a width of 1.5 cm and a height of 8 cm .
  1. Calculate the width and the height of the bar representing the yield \(15 \leqslant y < 25\)
  2. Use linear interpolation to estimate the median yield of the tomato plants.
  3. Estimate the mean and the standard deviation of the yields of the tomato plants.
  4. Describe, giving a reason, the skewness of the data.
  5. Estimate the number of tomato plants in the sample that have a yield of more than 1 standard deviation above the mean.
Edexcel S1 2013 June Q4
14 marks Moderate -0.8
4. The following table summarises the times, \(t\) minutes to the nearest minute, recorded for a group of students to complete an exam.
Time (minutes) \(t\)\(11 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 35\)\(36 - 45\)\(46 - 60\)
Number of students f628816131110
$$\text { [You may use } \sum \mathrm { f } t ^ { 2 } = 134281.25 \text { ] }$$
  1. Estimate the mean and standard deviation of these data.
  2. Use linear interpolation to estimate the value of the median.
  3. Show that the estimated value of the lower quartile is 18.6 to 3 significant figures.
  4. Estimate the interquartile range of this distribution.
  5. Give a reason why the mean and standard deviation are not the most appropriate summary statistics to use with these data. The person timing the exam made an error and each student actually took 5 minutes less than the times recorded above. The table below summarises the actual times.
    Time (minutes) \(t\)\(6 - 15\)\(16 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 40\)\(41 - 55\)
    Number of students f628816131110
  6. Without further calculations, explain the effect this would have on each of the estimates found in parts (a), (b), (c) and (d).
Edexcel S1 2016 June Q5
17 marks Moderate -0.8
5. A midwife records the weights, in kg , of a sample of 50 babies born at a hospital. Her results are given in the table below.
Weight ( \(\boldsymbol { w } \mathbf { ~ k g }\) )Frequency (f)Weight midpoint (x)
\(0 \leqslant w < 2\)11
\(2 \leqslant w < 3\)82.5
\(3 \leqslant w < 3.5\)173.25
\(3.5 \leqslant w < 4\)173.75
\(4 \leqslant w < 5\)74.5
[You may use \(\sum \mathrm { f } x ^ { 2 } = 611.375\) ] A histogram has been drawn to represent these data. The bar representing the weight \(2 \leqslant w < 3\) has a width of 1 cm and a height of 4 cm .
  1. Calculate the width and height of the bar representing a weight of \(3 \leqslant w < 3.5\)
  2. Use linear interpolation to estimate the median weight of these babies.
    1. Show that an estimate of the mean weight of these babies is 3.43 kg .
    2. Find an estimate of the standard deviation of the weights of these babies. Shyam decides to model the weights of babies born at the hospital, by the random variable \(W\), where \(W \sim \mathrm {~N} \left( 3.43,0.65 ^ { 2 } \right)\)
  3. Find \(\mathrm { P } ( W < 3 )\)
  4. With reference to your answers to (b), (c)(i) and (d) comment on Shyam's decision. A newborn baby weighing 3.43 kg is born at the hospital.
  5. Without carrying out any further calculations, state, giving a reason, what effect the addition of this newborn baby to the sample would have on your estimate of the
    1. mean,
    2. standard deviation.
Edexcel S1 2017 June Q2
14 marks Moderate -0.8
2. An estate agent is studying the cost of office space in London. He takes a random sample of 90 offices and calculates the cost, \(\pounds x\) per square foot. His results are given in the table below.
Cost (£ \(\boldsymbol { x }\) )Frequency (f)Midpoint (£y)
\(20 \leqslant x < 40\)1230
\(40 \leqslant x < 45\)1342.5
\(45 \leqslant x < 50\)2547.5
\(50 \leqslant x < 60\)3255
\(60 \leqslant x < 80\)870
A histogram is drawn for these data and the bar representing \(50 \leqslant x < 60\) is 2 cm wide and 8 cm high.
  1. Calculate the width and height of the bar representing \(20 \leqslant x < 40\)
  2. Use linear interpolation to estimate the median cost.
  3. Estimate the mean cost of office space for these data.
  4. Estimate the standard deviation for these data.
  5. Describe, giving a reason, the skewness. Rika suggests that the cost of office space in London can be modelled by a normal distribution with mean \(\pounds 50\) and standard deviation \(\pounds 10\)
  6. With reference to your answer to part (e), comment on Rika's suggestion.
  7. Use Rika's model to estimate the 80th percentile of the cost of office space in London.
Edexcel S1 2018 June Q2
12 marks Moderate -0.8
2. The following grouped frequency distribution summarises the number of minutes, to the nearest minute, that a random sample of 100 motorists were delayed by roadworks on a stretch of motorway one Monday.
Delay (minutes)Number of motorists (f)Delay midpoint (x)
3-6384.5
7-8257.5
9-10189.5
11-151213
16-20718
(You may use \(\sum \mathrm { f } x ^ { 2 } = 8096.25\) ) A histogram has been drawn to represent these data. The bar representing a delay of (3-6) minutes has a width of 2 cm and a height of 9.5 cm .
  1. Calculate the width and the height of the bar representing a delay of (11-15) minutes.
  2. Use linear interpolation to estimate the median delay.
  3. Calculate an estimate of the mean delay.
  4. Calculate an estimate of the standard deviation of the delays. One coefficient of skewness is given by \(\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }\)
  5. Evaluate this coefficient for the above data, giving your answer to 2 significant figures. On the following Friday, the coefficient of skewness for the delays on this stretch of motorway was - 0.22
  6. State, giving a reason, how the delays on this stretch of motorway on Friday are different from the delays on Monday.
Edexcel S1 Q1
11 marks Moderate -0.8
  1. A net was used to catch swallows so that they could be ringed and examined. The weights of 55 adult birds were recorded and the results are summarised in the table below.
Weight (g)\(14 - 19\)\(20 - 21\)\(22 - 23\)\(24 - 25\)\(26 - 29\)\(30 - 35\)
Frequency36152092
  1. For these data calculate estimates of
    1. the median,
    2. the \(33 ^ { \text {rd } }\) percentile. These data are represented by a histogram and the bar representing the 24-25 group is 1 cm wide and 20 cm high.
  2. Calculate the dimensions of the bars representing the groups
    1. 20-21
    2. 26-29
AQA S1 2015 June Q2
6 marks Easy -1.2
2 The table summarises the diameters, \(d\) millimetres, of a random sample of 60 new cricket balls to be used in junior cricket.
Edexcel AS Paper 2 Specimen Q1
9 marks Moderate -0.8
  1. A company manager is investigating the time taken, \(t\) minutes, to complete an aptitude test. The human resources manager produced the table below of coded times, \(x\) minutes, for a random sample of 30 applicants.
Coded time ( \(x\) minutes)Frequency (f)Coded time midpoint (y minutes)
\(0 \leq x < 5\)32.5
\(5 \leq x < 10\)157.5
\(10 \leq x < 15\)212.5
\(15 \leq x < 25\)920
\(25 \leq x < 35\)130
(You may use \(\sum f y = 355\) and \(\sum f y ^ { 2 } = 5675\) )
  1. Use linear interpolation to estimate the median of the coded times.
  2. Estimate the standard deviation of the coded times. The company manager is told by the human resources manager that he subtracted 15 from each of the times and then divided by 2 , to calculate the coded times.
  3. Calculate an estimate for the median and the standard deviation of \(t\).
    (3) The following year, the company has 25 positions available. The company manager decides not to offer a position to any applicant who takes 35 minutes or more to complete the aptitude test. The company has 60 applicants.
  4. Comment on whether or not the company manager's decision will result in the company being able to fill the 25 positions available from these 60 applicants. Give a reason for your answer.
OCR MEI S1 2011 January Q7
19 marks Moderate -0.3
The incomes of a sample of 918 households on an island are given in the table below.
Income (x thousand pounds)\(0 \leqslant x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x \leqslant 60\)\(60 < x \leqslant 100\)\(100 < x \leqslant 200\)
Frequency23836514212845
  1. Draw a histogram to illustrate the data. [5]
  2. Calculate an estimate of the mean income. [3]
  3. Calculate an estimate of the standard deviation of the incomes. [4]
  4. Use your answers to parts (ii) and (iii) to show there are almost certainly some outliers in the sample. Explain whether or not it would be appropriate to exclude the outliers from the calculation of the mean and the standard deviation. [4]
  5. The incomes were converted into another currency using the formula \(y = 1.15x\). Calculate estimates of the mean and variance of the incomes in the new currency. [3]
OCR MEI Paper 2 Specimen Q15
15 marks Standard +0.3
A quality control department checks the lifetimes of batteries produced by a company. The lifetimes, \(x\) minutes, for a random sample of 80 'Superstrength' batteries are shown in the table below.
Lifetime\(160 \leq x < 165\)\(165 \leq x < 168\)\(168 \leq x < 170\)\(170 \leq x < 172\)\(172 \leq x < 175\)\(175 \leq x < 180\)
Frequency5142021164
  1. Estimate the proportion of these batteries which have a lifetime of at least 174.0 minutes. [2]
  2. Use the data in the table to estimate
    [3]
The data in the table on the previous page are represented in the following histogram, Fig 15. \includegraphics{figure_15} A quality control manager models the data by a Normal distribution with the mean and standard deviation you calculated in part (b).
  1. Comment briefly on whether the histogram supports this choice of model. [2]
    1. Use this model to estimate the probability that a randomly selected battery will have a lifetime of more than 174.0 minutes.
    2. Compare your answer with your answer to part (a). [3]
The company also manufactures 'Ultrapower' batteries, which are stated to have a mean lifetime of 210 minutes.
  1. A random sample of 8 Ultrapower batteries is selected. The mean lifetime of these batteries is 207.3 minutes. Carry out a hypothesis test at the 5% level to investigate whether the mean lifetime is as high as stated. You should use the following hypotheses \(\text{H}_0 : \mu = 210\), \(\text{H}_1 : \mu < 210\), where \(\mu\) represents the population mean for Ultrapower batteries. You should assume that the population is Normally distributed with standard deviation 3.4. [5]