Estimate mean and standard deviation from frequency table

Edexcel S1 2019 January Q4

4. A group of 100 adults recorded the amount of time, $t$ minutes, they spent exercising each day. Their results are summarised in the table below.

Time (t minutes)	Frequency (f)	Time midpoint (x)
$0 \leqslant t < 15$	25	7.5
$15 \leqslant t < 30$	17	22.5
$30 \leqslant t < 60$	28	45
$60 \leqslant t < 120$	24	90
$120 \leqslant t \leqslant 240$	6	180

[You may use $\sum \mathrm { f } x ^ { 2 } = 455$ 512.5]
A histogram is drawn to represent these data.
The bar representing the time $0 \leqslant t < 15$ has width 0.5 cm and height 6 cm .

Calculate the width and height of the bar representing a time of $60 \leqslant t < 120$
Use linear interpolation to estimate the median time spent exercising by these adults each day.
Find an estimate of the mean time spent exercising by these adults each day.
Calculate an estimate for the standard deviation of these times.
Describe, giving a reason, the skewness of these data. Further analysis of the above data revealed that 18 of the 25 adults in the $0 \leqslant t < 15$ group took no exercise each day.
State, giving a reason, what effect, if any, this new information would have on your answers to
1. the estimate of the median in part (b),
2. the estimate of the mean in part (c),
3. the estimate of the standard deviation in part (d).

Edexcel S1 2014 June Q2

The table below shows the distances (to the nearest km ) travelled to work by the 50 employees in an office.

Distance (km)	Frequency (f)	Distance midpoint (x)
0-2	16	1.25
3-5	12	4
6-10	10	8
11-20	8	15.5
21-40	4	30.5

$$\text { [You may use } \left. \sum \mathrm { f } x = 394 , \quad \sum \mathrm { f } x ^ { 2 } = 6500 \right]$$ A histogram has been drawn to represent these data.
The bar representing the distance of $3 - 5$ has a width of 1.5 cm and a height of 6 cm .

Calculate the width and height of the bar representing the distance of 6-10
Use linear interpolation to estimate the median distance travelled to work.
1. Show that an estimate of the mean distance travelled to work is 7.88 km .
2. Estimate the standard deviation of the distances travelled to work.
Describe, giving a reason, the skewness of these data. Peng starts to work in this office as the $51 ^ { \text {st } }$ employee.
She travels a distance of 7.88 km to work.
Without carrying out any further calculations, state, giving a reason, what effect Peng's addition to the workforce would have on your estimates of the
1. mean,
2. median,
3. standard deviation
  of the distances travelled to work.

OCR MEI AS Paper 2 2020 November Q2

2 A student measures the upper arm lengths of a sample of 97 women. The results are summarised in the frequency table in Fig. 2.1. \begin{table}[h]

Arm length in cm	$30 -$	$31 -$	$32 -$	$33 -$	$34 -$	$35 -$	$36 -$	$37 -$	$38 -$	$39 -$	$40 - 41$
Frequency	1	4	5	9	13	19	17	17	4	3	5

\captionsetup{labelformat=empty} \caption{Fig. 2.1}

\end{table} The student constructs two cumulative frequency diagrams to represent the data using different class intervals. These are shown in Fig. 2.2 opposite One of these diagrams is correct and the other is incorrect.

State which diagram is incorrect, justifying your answer.
Use the correct diagram in Fig. 2.2 to find an estimate of the median. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{c08a2212-3104-425e-8aee-7f2d46f23924-05_2256_1230_191_148} \captionsetup{labelformat=empty} \caption{Fig. 2.2}
\end{figure}

OCR MEI Paper 2 2018 June Q14

14 The pre-release material includes data on unemployment rates in different countries. A sample from this material has been taken. All the countries in the sample are in Europe. The data have been grouped and are shown in Fig 14.1. \begin{table}[h]

Unemployment rate	$0 -$	$5 -$	$10 -$	$15 -$	$20 -$	$35 - 50$
Frequency	15	21	5	5	2	2

\captionsetup{labelformat=empty} \caption{Fig. 14.1}

\end{table} A cumulative frequency curve has been generated for the sample data using a spreadsheet. This is shown in Fig. 14.2. \begin{figure}[h]

\includegraphics[alt={},max width=\textwidth]{d8ff9511-aff7-45ea-ba55-e6667e8ba760-08_639_1081_808_466} \captionsetup{labelformat=empty} \caption{Fig. 14.2}

\end{figure} Hodge used Fig. 14.2 to estimate the median unemployment rate in Europe. He obtained the answer 5.0. The correct value for this sample is 6.9.

(A) There is a systematic error in the diagram.
- Identify this error.
- State how this error affects Hodge’s estimate.
  (B) There is another factor which has affected Hodge’s estimate.
- Identify this factor.
- State how this factor affects Hodge’s estimate.
- Use your knowledge of the pre-release material to give another reason why any estimation of the median unemployment rate in Europe may be unreliable.
- Use your knowledge of the pre-release material to explain why it is very unlikely that the sample has been randomly selected from the pre-release material.
The scatter diagram shown in Fig. 14.3 shows the unemployment rate and life expectancy at birth for the 47 countries in the sample for which this information is available. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Scatter diagram to show life expectancy at birth against unemployment rate} \includegraphics[alt={},max width=\textwidth]{d8ff9511-aff7-45ea-ba55-e6667e8ba760-09_627_1281_456_367}
\end{figure} Fig. 14.3 The product moment correlation coefficient for the 47 items in the sample is - 0.2607 .
The $p$-value associated with $r = - 0.2607$ and $n = 47$ is 0.0383 .
Does this information suggest that there is an association between unemployment rate and life expectancy at birth in countries in Europe? Hodge uses the spreadsheet tools to obtain the equation of a line of best fit for this data.
The unemployment rate in Kosovo is 35.3 , but there is no data available on life expectancy. Is it reasonable to use Hodge’s line of best fit to estimate life expectancy at birth in Kosovo?

Edexcel S1 2018 June Q5

5. The weights, in grams, of a random sample of 48 broad beans are summarised in the table.

Weight in grams ( $\boldsymbol { x }$ )	Frequency (f)	Class midpoint (y)
$0.9 < x \leqslant 1.1$	9	1.0
$1.1 < x \leqslant 1.3$	12	1.2
$1.3 < x \leqslant 1.5$	11	1.4
$1.5 < x \leqslant 1.7$	8	1.6
$1.7 < x \leqslant 1.9$	3	1.8
$1.9 < x \leqslant 2.1$	3	2.0
$2.1 < x \leqslant 2.7$	2	2.4

(You may assume $\sum \mathrm { fy } { } ^ { 2 } = 101.56$ ) A histogram was drawn to represent these data. The $2.1 < x \leqslant 2.7$ class was represented by a bar of width 1.5 cm and height 1 cm .

Find the width and height of the $0.9 < x \leqslant 1.1$ class.
Give a reason to justify the use of a histogram to represent these data.
Estimate the mean and the standard deviation of the weights of these broad beans.
Use linear interpolation to estimate the median of the weights of these broad beans. One of these broad beans is selected at random.
Estimate the probability that its weight lies between 1.1 grams and 1.6 grams. One of these broad beans having a recorded weight of 0.95 grams was incorrectly weighed. The correct weight is 1.4 grams.
State, giving a reason, the effect this would have on your answers to part (c). Do not carry out any further calculations.

Edexcel S1 2021 June Q3

A random sample of 100 carrots is taken from a farm and their lengths, $L \mathrm {~cm}$, recorded. The data are summarised in the following table.

Length, $L$ cm	Frequency, f	Class mid point, $\boldsymbol { x } \mathbf { c m }$
$5 \leqslant L < 8$	5	6.5
$8 \leqslant L < 10$	13	9
$10 \leqslant L < 12$	16	11
$12 \leqslant L < 15$	25	13.5
$15 \leqslant L < 20$	30	17.5
$20 \leqslant L < 28$	11	24

A histogram is drawn to represent these data.
The bar representing the class $5 \leqslant L < 8$ is 1.5 cm wide and 1 cm high.

Find the width and height of the bar representing the class $15 \leqslant L < 20$
Use linear interpolation to estimate the median length of these carrots.
Estimate
1. the mean length of these carrots,
2. the standard deviation of the lengths of these carrots. A supermarket will only buy carrots with length between 9 cm and 22 cm .
Estimate the proportion of carrots from the farm that the supermarket will buy. Any carrots that the supermarket does not buy are sold as animal feed. The farm makes a profit of 2.2 pence on each carrot sold to the supermarket, a profit of 0.8 pence on each carrot longer than 22 cm and a loss of 1.2 pence on each carrot shorter than 9 cm .
Find an estimate of the mean profit per carrot made by the farm.

Edexcel S1 2018 October Q3

3. The parking times, $t$ hours, for cars in a car park are summarised below.

Time (t hours)	Frequency (f)	Time midpoint (m)
$0 \leqslant t < 1$	10	0.5
$1 \leqslant t < 2$	18	1.5
$2 \leqslant t < 4$	15	3
$4 \leqslant t < 6$	12	5
$6 \leqslant t < 12$	5	9

$$\text { (You may use } \sum \mathrm { fm } = 182 \text { and } \sum \mathrm { fm } ^ { 2 } = 883 \text { ) }$$ A histogram is drawn to represent these data.
The bar representing the time $1 \leqslant t < 2$ has a width of 1.5 cm and a height of 6 cm .

Calculate the width and the height of the bar representing the time $4 \leqslant t < 6$
Use linear interpolation to estimate the median parking time for the cars in the car park.
Estimate the mean and the standard deviation of the parking time for the cars in the car park.
Describe, giving a reason, the skewness of the data. One of these cars is selected at random.
Estimate the probability that this car is parked for more than 75 minutes.

Edexcel S1 Specimen Q5

A teacher selects a random sample of 56 students and records, to the nearest hour, the time spent watching television in a particular week.

Hours	$1 - 10$	$11 - 20$	$21 - 25$	$26 - 30$	$31 - 40$	$41 - 59$
Frequency	6	15	11	13	8	3
Mid-point	5.5	15.5		28		50

Find the mid-points of the 21-25 hour and 31-40 hour groups. A histogram was drawn to represent these data. The 11-20 group was represented by a bar of width 4 cm and height 6 cm .
Find the width and height of the 26-30 group.
Estimate the mean and standard deviation of the time spent watching television by these students.
Use linear interpolation to estimate the median length of time spent watching television by these students. The teacher estimated the lower quartile and the upper quartile of the time spent watching television to be 15.8 and 29.3 respectively.
State, giving a reason, the skewness of these data.

Edexcel S1 2013 January Q5

A survey of 100 households gave the following results for weekly income $\pounds y$.

Income $y$ (£)	Mid-point	Frequency $f$
$0 \leqslant y < 200$	100	12
$200 \leqslant y < 240$	220	28
$240 \leqslant y < 320$	280	22
$320 \leqslant y < 400$	360	18
$400 \leqslant y < 600$	500	12
$600 \leqslant y < 800$	700	8

(You may use $\sum f y ^ { 2 } = 12452$ 800)
A histogram was drawn and the class $200 \leqslant y < 240$ was represented by a rectangle of width 2 cm and height 7 cm .

Calculate the width and the height of the rectangle representing the class $$320 \leqslant y < 400$$
Use linear interpolation to estimate the median weekly income to the nearest pound.
Estimate the mean and the standard deviation of the weekly income for these data. One measure of skewness is $\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }$.
Use this measure to calculate the skewness for these data and describe its value. Katie suggests using the random variable $X$ which has a normal distribution with mean 320 and standard deviation 150 to model the weekly income for these data.
Find $\mathrm { P } ( 240 < X < 400 )$.
With reference to your calculations in parts (d) and (e) and the data in the table, comment on Katie's suggestion.

Edexcel S1 2013 June Q3

3. An agriculturalist is studying the yields, $y \mathrm {~kg}$, from tomato plants. The data from a random sample of 70 tomato plants are summarised below.

Yield ( $y \mathrm {~kg}$ )	Frequency (f)	Yield midpoint ( $x \mathrm {~kg}$ )
$0 \leqslant y < 5$	16	2.5
$5 \leqslant y < 10$	24	7.5
$10 \leqslant y < 15$	14	12.5
$15 \leqslant y < 25$	12	20
$25 \leqslant y < 35$	4	30

$$\text { (You may use } \sum \mathrm { f } x = 755 \text { and } \sum \mathrm { f } x ^ { 2 } = 12037.5 \text { ) }$$ A histogram has been drawn to represent these data. The bar representing the yield $5 \leqslant y < 10$ has a width of 1.5 cm and a height of 8 cm .

Calculate the width and the height of the bar representing the yield $15 \leqslant y < 25$
Use linear interpolation to estimate the median yield of the tomato plants.
Estimate the mean and the standard deviation of the yields of the tomato plants.
Describe, giving a reason, the skewness of the data.
Estimate the number of tomato plants in the sample that have a yield of more than 1 standard deviation above the mean.

Edexcel S1 2013 June Q4

4. The following table summarises the times, $t$ minutes to the nearest minute, recorded for a group of students to complete an exam.

Time (minutes) $t$	$11 - 20$	$21 - 25$	$26 - 30$	$31 - 35$	$36 - 45$	$46 - 60$
Number of students f	62	88	16	13	11	10

$$\text { [You may use } \sum \mathrm { f } t ^ { 2 } = 134281.25 \text { ] }$$

Estimate the mean and standard deviation of these data.
Use linear interpolation to estimate the value of the median.
Show that the estimated value of the lower quartile is 18.6 to 3 significant figures.
Estimate the interquartile range of this distribution.
Give a reason why the mean and standard deviation are not the most appropriate summary statistics to use with these data. The person timing the exam made an error and each student actually took 5 minutes less than the times recorded above. The table below summarises the actual times.
Time (minutes) $t$ $6 - 15$ $16 - 20$ $21 - 25$ $26 - 30$ $31 - 40$ $41 - 55$
Number of students f 62 88 16 13 11 10
Without further calculations, explain the effect this would have on each of the estimates found in parts (a), (b), (c) and (d).

Edexcel S1 2016 June Q5

5. A midwife records the weights, in kg , of a sample of 50 babies born at a hospital. Her results are given in the table below.

Weight ( $\boldsymbol { w } \mathbf { ~ k g }$ )	Frequency (f)	Weight midpoint (x)
$0 \leqslant w < 2$	1	1
$2 \leqslant w < 3$	8	2.5
$3 \leqslant w < 3.5$	17	3.25
$3.5 \leqslant w < 4$	17	3.75
$4 \leqslant w < 5$	7	4.5

[You may use $\sum \mathrm { f } x ^ { 2 } = 611.375$ ] A histogram has been drawn to represent these data. The bar representing the weight $2 \leqslant w < 3$ has a width of 1 cm and a height of 4 cm .

Calculate the width and height of the bar representing a weight of $3 \leqslant w < 3.5$
Use linear interpolation to estimate the median weight of these babies.
1. Show that an estimate of the mean weight of these babies is 3.43 kg .
2. Find an estimate of the standard deviation of the weights of these babies. Shyam decides to model the weights of babies born at the hospital, by the random variable $W$, where $W \sim \mathrm {~N} \left( 3.43,0.65 ^ { 2 } \right)$
Find $\mathrm { P } ( W < 3 )$
With reference to your answers to (b), (c)(i) and (d) comment on Shyam's decision. A newborn baby weighing 3.43 kg is born at the hospital.
Without carrying out any further calculations, state, giving a reason, what effect the addition of this newborn baby to the sample would have on your estimate of the
1. mean,
2. standard deviation.

Edexcel S1 2017 June Q2

2. An estate agent is studying the cost of office space in London. He takes a random sample of 90 offices and calculates the cost, $\pounds x$ per square foot. His results are given in the table below.

Cost (£ $\boldsymbol { x }$ )	Frequency (f)	Midpoint (£y)
$20 \leqslant x < 40$	12	30
$40 \leqslant x < 45$	13	42.5
$45 \leqslant x < 50$	25	47.5
$50 \leqslant x < 60$	32	55
$60 \leqslant x < 80$	8	70

A histogram is drawn for these data and the bar representing $50 \leqslant x < 60$ is 2 cm wide and 8 cm high.

Calculate the width and height of the bar representing $20 \leqslant x < 40$
Use linear interpolation to estimate the median cost.
Estimate the mean cost of office space for these data.
Estimate the standard deviation for these data.
Describe, giving a reason, the skewness. Rika suggests that the cost of office space in London can be modelled by a normal distribution with mean $\pounds 50$ and standard deviation $\pounds 10$
With reference to your answer to part (e), comment on Rika's suggestion.
Use Rika's model to estimate the 80th percentile of the cost of office space in London.

Edexcel S1 2018 June Q2

2. The following grouped frequency distribution summarises the number of minutes, to the nearest minute, that a random sample of 100 motorists were delayed by roadworks on a stretch of motorway one Monday.

Delay (minutes)	Number of motorists (f)	Delay midpoint (x)
3-6	38	4.5
7-8	25	7.5
9-10	18	9.5
11-15	12	13
16-20	7	18

(You may use $\sum \mathrm { f } x ^ { 2 } = 8096.25$ ) A histogram has been drawn to represent these data. The bar representing a delay of (3-6) minutes has a width of 2 cm and a height of 9.5 cm .

Calculate the width and the height of the bar representing a delay of (11-15) minutes.
Use linear interpolation to estimate the median delay.
Calculate an estimate of the mean delay.
Calculate an estimate of the standard deviation of the delays. One coefficient of skewness is given by $\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }$
Evaluate this coefficient for the above data, giving your answer to 2 significant figures. On the following Friday, the coefficient of skewness for the delays on this stretch of motorway was - 0.22
State, giving a reason, how the delays on this stretch of motorway on Friday are different from the delays on Monday.

Edexcel S1 Q1

A net was used to catch swallows so that they could be ringed and examined. The weights of 55 adult birds were recorded and the results are summarised in the table below.

Weight (g)	$14 - 19$	$20 - 21$	$22 - 23$	$24 - 25$	$26 - 29$	$30 - 35$
Frequency	3	6	15	20	9	2

For these data calculate estimates of
1. the median,
2. the $33 ^ { \text {rd } }$ percentile. These data are represented by a histogram and the bar representing the 24-25 group is 1 cm wide and 20 cm high.
Calculate the dimensions of the bars representing the groups
1. 20-21
2. 26-29

AQA S1 2015 June Q2

6 marks

2 The table summarises the diameters, $d$ millimetres, of a random sample of 60 new cricket balls to be used in junior cricket.

Edexcel AS Paper 2 Specimen Q1

A company manager is investigating the time taken, $t$ minutes, to complete an aptitude test. The human resources manager produced the table below of coded times, $x$ minutes, for a random sample of 30 applicants.

Coded time ( $x$ minutes)	Frequency (f)	Coded time midpoint (y minutes)
$0 \leq x < 5$	3	2.5
$5 \leq x < 10$	15	7.5
$10 \leq x < 15$	2	12.5
$15 \leq x < 25$	9	20
$25 \leq x < 35$	1	30

(You may use $\sum f y = 355$ and $\sum f y ^ { 2 } = 5675$ )

Use linear interpolation to estimate the median of the coded times.
Estimate the standard deviation of the coded times. The company manager is told by the human resources manager that he subtracted 15 from each of the times and then divided by 2 , to calculate the coded times.
Calculate an estimate for the median and the standard deviation of $t$.
(3) The following year, the company has 25 positions available. The company manager decides not to offer a position to any applicant who takes 35 minutes or more to complete the aptitude test. The company has 60 applicants.
Comment on whether or not the company manager's decision will result in the company being able to fill the 25 positions available from these 60 applicants. Give a reason for your answer.

Arm length in cm	\(30 -\)	\(31 -\)	\(32 -\)	\(33 -\)	\(34 -\)	\(35 -\)	\(36 -\)	\(37 -\)	\(38 -\)	\(39 -\)	\(40 - 41\)
Frequency	1	4	5	9	13	19	17	17	4	3	5

Unemployment rate	\(0 -\)	\(5 -\)	\(10 -\)	\(15 -\)	\(20 -\)	\(35 - 50\)
Frequency	15	21	5	5	2	2

Weight (g)	\(14 - 19\)	\(20 - 21\)	\(22 - 23\)	\(24 - 25\)	\(26 - 29\)	\(30 - 35\)
Frequency	3	6	15	20	9	2

Weight in grams ( \(\boldsymbol { x }\) )	Frequency (f)	Class midpoint (y)
\(0.9 < x \leqslant 1.1\)	9	1.0
\(1.1 < x \leqslant 1.3\)	12	1.2
\(1.3 < x \leqslant 1.5\)	11	1.4
\(1.5 < x \leqslant 1.7\)	8	1.6
\(1.7 < x \leqslant 1.9\)	3	1.8
\(1.9 < x \leqslant 2.1\)	3	2.0
\(2.1 < x \leqslant 2.7\)	2	2.4

Length, \(L\) cm	Frequency, f	Class mid point, \(\boldsymbol { x } \mathbf { c m }\)
\(5 \leqslant L < 8\)	5	6.5
\(8 \leqslant L < 10\)	13	9
\(10 \leqslant L < 12\)	16	11
\(12 \leqslant L < 15\)	25	13.5
\(15 \leqslant L < 20\)	30	17.5
\(20 \leqslant L < 28\)	11	24

Hours	\(1 - 10\)	\(11 - 20\)	\(21 - 25\)	\(26 - 30\)	\(31 - 40\)	\(41 - 59\)
Frequency	6	15	11	13	8	3
Mid-point	5.5	15.5		28		50

Income \(y\) (£)	Mid-point	Frequency \(f\)
\(0 \leqslant y < 200\)	100	12
\(200 \leqslant y < 240\)	220	28
\(240 \leqslant y < 320\)	280	22
\(320 \leqslant y < 400\)	360	18
\(400 \leqslant y < 600\)	500	12
\(600 \leqslant y < 800\)	700	8

Yield ( \(y \mathrm {~kg}\) )	Frequency (f)	Yield midpoint ( \(x \mathrm {~kg}\) )
\(0 \leqslant y < 5\)	16	2.5
\(5 \leqslant y < 10\)	24	7.5
\(10 \leqslant y < 15\)	14	12.5
\(15 \leqslant y < 25\)	12	20
\(25 \leqslant y < 35\)	4	30

Time (minutes) \(t\)	\(11 - 20\)	\(21 - 25\)	\(26 - 30\)	\(31 - 35\)	\(36 - 45\)	\(46 - 60\)
Number of students f	62	88	16	13	11	10

Time (minutes) \(t\)	\(6 - 15\)	\(16 - 20\)	\(21 - 25\)	\(26 - 30\)	\(31 - 40\)	\(41 - 55\)
Number of students f	62	88	16	13	11	10

Weight ( \(\boldsymbol { w } \mathbf { ~ k g }\) )	Frequency (f)	Weight midpoint (x)
\(0 \leqslant w < 2\)	1	1
\(2 \leqslant w < 3\)	8	2.5
\(3 \leqslant w < 3.5\)	17	3.25
\(3.5 \leqslant w < 4\)	17	3.75
\(4 \leqslant w < 5\)	7	4.5

Cost (£ \(\boldsymbol { x }\) )	Frequency (f)	Midpoint (£y)
\(20 \leqslant x < 40\)	12	30
\(40 \leqslant x < 45\)	13	42.5
\(45 \leqslant x < 50\)	25	47.5
\(50 \leqslant x < 60\)	32	55
\(60 \leqslant x < 80\)	8	70

Coded time ( \(x\) minutes)	Frequency (f)	Coded time midpoint (y minutes)
\(0 \leq x < 5\)	3	2.5
\(5 \leq x < 10\)	15	7.5
\(10 \leq x < 15\)	2	12.5
\(15 \leq x < 25\)	9	20
\(25 \leq x < 35\)	1	30