2.02f Measures of average and spread

447 questions

Sort by: Default | Easiest first | Hardest first
AQA S1 2010 January Q2
8 marks Moderate -0.8
2 Lizzie, the receptionist at a dental practice, was asked to keep a weekly record of the number of patients who failed to turn up for an appointment. Her records for the first 15 weeks were as follows. $$\begin{array} { l l l l l l l l l l l l l l l } 20 & 26 & 32 & a & 37 & 14 & 27 & 34 & 15 & 18 & b & 25 & 37 & 29 & 25 \end{array}$$ Unfortunately, Lizzie forgot to record the actual values for two of the 15 weeks, so she recorded them as \(a\) and \(b\). However, she did remember that \(a < 10\) and that \(b > 40\).
  1. Calculate the median and the interquartile range of these 15 values.
  2. Give a reason why, for these data:
    1. the mode is not an appropriate measure of average;
    2. the standard deviation cannot be used as a measure of spread.
  3. Subsequent investigations revealed that the missing values were 8 and 43 . Calculate the mean and the standard deviation of the 15 values.
AQA S1 2005 June Q6
12 marks Standard +0.3
6 On arrival at a business centre, all visitors are required to register at the reception desk. An analysis of the register, for a random sample of 100 days, results in the following information on the number, \(X\), of visitors per day.
Number of visitors per dayNumber of days
1-1013
11-2033
21-2517
26-3012
31-358
36-405
41-505
51-1007
Total100
  1. Calculate an estimate of:
    1. \(\mu\), the mean number of visitors per day;
    2. \(\sigma\), the standard deviation of the number of visitors per day.
  2. Give a reason, based upon the data provided, why \(X\) is unlikely to be normally distributed.
    1. Give a reason why \(\bar { X }\), the mean of a random sample of 100 observations on \(X\), may be assumed to be normally distributed.
    2. State, in terms of \(\mu\) and \(\sigma\), the mean and variance of \(\bar { X }\).
  3. Hence construct a \(99 \%\) confidence interval for \(\mu\).
  4. The receptionist claims that she registers on average more than 30 visitors per day, and frequently registers more than 50 visitors on any one day. Comment on each of these two claims.
AQA S1 2015 June Q1
6 marks Easy -1.2
1 The number of passengers getting off the 11.45 am train at a railway station on each of 35 days is summarised as follows.
AQA S1 2015 June Q2
6 marks Easy -1.2
2 The table summarises the diameters, \(d\) millimetres, of a random sample of 60 new cricket balls to be used in junior cricket.
OCR S1 Q5
13 marks Moderate -0.8
5 The examination marks obtained by 1200 candidates are illustrated on the cumulative frequency graph, where the data points are joined by a smooth curve. \includegraphics[max width=\textwidth, alt={}, center]{11316ea6-3999-4003-b77d-bee8b547c1da-04_1335_1319_404_413} Use the curve to estimate
  1. the interquartile range of the marks,
  2. \(x\), if \(40 \%\) of the candidates scored more than \(x\) marks,
  3. the number of candidates who scored more than 68 marks. Five of the candidates are selected at random, with replacement.
  4. Estimate the probability that all five scored more than 68 marks. It is subsequently discovered that the candidates' marks in the range 35 to 55 were evenly distributed - that is, roughly equal numbers of candidates scored \(35,36,37 , \ldots , 55\).
  5. What does this information suggest about the estimate of the interquartile range found in part (i)? \section*{June 2005}
AQA S2 2009 June Q2
14 marks Moderate -0.3
2 John works from home. The number of business letters, \(X\), that he receives on a weekday may be modelled by a Poisson distribution with mean 5.0. The number of private letters, \(Y\), that he receives on a weekday may be modelled by a Poisson distribution with mean 1.5.
  1. Find, for a given weekday:
    1. \(\mathrm { P } ( X < 4 )\);
    2. \(\quad \mathrm { P } ( Y = 4 )\).
    1. Assuming that \(X\) and \(Y\) are independent random variables, determine the probability that, on a given weekday, John receives a total of more than 5 business and private letters.
    2. Hence calculate the probability that John receives a total of more than 5 business and private letters on at least 7 out of 8 given weekdays.
  2. The numbers of letters received by John's neighbour, Brenda, on 10 consecutive weekdays are $$\begin{array} { l l l l l l l l l l } 15 & 8 & 14 & 7 & 6 & 8 & 2 & 8 & 9 & 3 \end{array}$$
    1. Calculate the mean and the variance of these data.
    2. State, giving a reason based on your answers to part (c)(i), whether or not a Poisson distribution might provide a suitable model for the number of letters received by Brenda on a weekday.
OCR H240/02 2022 June Q9
14 marks Standard +0.3
9 The heights, in centimetres, of a random sample of 150 plants of a certain variety were measured. The results are summarised in the histogram. \includegraphics[max width=\textwidth, alt={}, center]{cb83836f-753f-4b3a-99e8-a18aff0f49ff-08_842_1651_495_207} One of the 150 plants is chosen at random, and its height, \(X \mathrm {~cm}\), is noted.
  1. Show that \(\mathrm { P } ( 20 < X < 30 ) = 0.147\), correct to 3 significant figures. Sam suggests that the distribution of \(X\) can be well modelled by the distribution \(\mathrm { N } ( 40,100 )\).
    1. Give a brief justification for the use of the normal distribution in this context.
    2. Give a brief justification for the choice of the parameter values 40 and 100 .
  2. Use Sam's model to find \(\mathrm { P } ( 20 < X < 30 )\). Nina suggests a different model. She uses the midpoints of the classes to calculate estimates, \(m\) and \(s\), for the mean and standard deviation respectively, in centimetres, of the 150 heights. She then uses the distribution \(\mathrm { N } \left( m , s ^ { 2 } \right)\) as her model.
  3. Use Nina's model to find \(\mathrm { P } ( 20 < X < 30 )\).
    1. Complete the table in the Printed Answer Booklet to show the probabilities obtained from Sam's model and Nina's model.
    2. By considering the different ranges of values of \(X\) given in the table, discuss how well the two models fit the original distribution.
AQA AS Paper 2 2021 June Q16
5 marks Easy -1.2
16 An analysis was carried out using the Large Data Set to compare the \(\mathrm { CO } _ { 2 }\) emissions (in g/km) from 2002 and 2016. The summary statistics for the \(\mathrm { CO } _ { 2 }\) emissions, \(X\), for all cars registered as owned by either females or males is given in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}\(\mathbf { 2 0 0 2 }\)\(\mathbf { 2 0 1 6 }\)
\(\sum \boldsymbol { x }\)207901142103
Sample size12151144
16
  1. Find the reduction in the mean of the \(\mathrm { CO } _ { 2 }\) emissions in 2016 compared to the mean of the CO2 emissions in 2002.
    16
  2. It is claimed that the move to more electric and gas/petrol powered cars has caused the reduction in the mean \(\mathrm { CO } _ { 2 }\) emissions found in part (a). Using your knowledge of the Large Data Set, state whether you agree with this claim.
    Give a reason for your answer.
    16
  3. There are 3827 data values in the Large Data Set. It is claimed that the data in the table above must have been summarised incorrectly.
    16 (c) (i) Explain why this claim is being made. 16 (c) (ii) State whether this claim is correct.
    Give a reason for your answer.
AQA AS Paper 2 2022 June Q11
1 marks Easy -1.8
11 Which of the terms below best describes the distribution represented by the boxplot shown in Figure 1? \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Figure 1} \includegraphics[alt={},max width=\textwidth]{11168e8f-5ba5-4d27-83ab-0327cc23d08c-14_154_831_927_584}
\end{figure} \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Figure 1} \includegraphics[alt={},max width=\textwidth]{11168e8f-5ba5-4d27-83ab-0327cc23d08c-14_76_1143_1151_450}
\end{figure} Circle your answer.
even
negatively skewed
positively skewed
symmetric
AQA AS Paper 2 2022 June Q13
6 marks Moderate -0.8
13 Two random samples of 12 NOX emissions (in \(\mathrm { g } / \mathrm { km }\) ) were taken from the Large Data Set. One sample was taken from the 2002 data and the other sample from the 2016 data.
The sample data are shown below:
\multirow{2}{*}{2002}0.0310.0190.0910.0250.0300.061
0.0470.0290.0590.3630.3300.376
\multirow{2}{*}{2016}0.0050.0470.0530.0630.0260.013
0.0580.0120.0100.0100.0080.008
The mean and standard deviation of the 2002 sample data are 0.122 and 0.137 respectively. 13
  1. Find the mean and standard deviation of the 2016 sample data giving your answers correct to three decimal places.
    13
  2. Siti claims these samples show that, on average, the NOX emissions across all makes of car in all areas of the UK have fallen by over 75\% between 2002 and 2016. 13 (b) (i) Show how Siti's claim of 'over 75\%' has been obtained.
    13 (b) (ii) Using your knowledge of the Large Data Set, make two comments on the validity of Siti's claim. Comment 1
    \section*{Comment 2}
Edexcel AS Paper 2 2018 June Q4
8 marks Moderate -0.8
  1. Helen is studying the daily mean wind speed for Camborne using the large data set from 1987. The data for one month are summarised in Table 1 below.
\begin{table}[h]
Windspeed\(\mathrm { n } / \mathrm { a }\)67891112131416
Frequency13232231212
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Calculate the mean for these data.
  2. Calculate the standard deviation for these data and state the units. The means and standard deviations of the daily mean wind speed for the other months from the large data set for Camborne in 1987 are given in Table 2 below. The data are not in month order. \begin{table}[h]
    Month\(A\)\(B\)\(C\)\(D\)\(E\)
    Mean7.588.268.578.5711.57
    Standard Deviation2.933.893.463.874.64
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  3. Using your knowledge of the large data set, suggest, giving a reason, which month had a mean of 11.57 The data for these months are summarised in the box plots on the opposite page. They are not in month order or the same order as in Table 2.
    1. State the meaning of the * symbol on some of the box plots.
    2. Suggest, giving your reasons, which of the months in Table 2 is most likely to be summarised in the box plot marked \(Y\). \includegraphics[max width=\textwidth, alt={}, center]{2edcf965-9c93-4a9b-9395-2d3c023801af-11_1177_1216_324_427}
Edexcel AS Paper 2 Specimen Q1
9 marks Moderate -0.8
  1. A company manager is investigating the time taken, \(t\) minutes, to complete an aptitude test. The human resources manager produced the table below of coded times, \(x\) minutes, for a random sample of 30 applicants.
Coded time ( \(x\) minutes)Frequency (f)Coded time midpoint (y minutes)
\(0 \leq x < 5\)32.5
\(5 \leq x < 10\)157.5
\(10 \leq x < 15\)212.5
\(15 \leq x < 25\)920
\(25 \leq x < 35\)130
(You may use \(\sum f y = 355\) and \(\sum f y ^ { 2 } = 5675\) )
  1. Use linear interpolation to estimate the median of the coded times.
  2. Estimate the standard deviation of the coded times. The company manager is told by the human resources manager that he subtracted 15 from each of the times and then divided by 2 , to calculate the coded times.
  3. Calculate an estimate for the median and the standard deviation of \(t\).
    (3) The following year, the company has 25 positions available. The company manager decides not to offer a position to any applicant who takes 35 minutes or more to complete the aptitude test. The company has 60 applicants.
  4. Comment on whether or not the company manager's decision will result in the company being able to fill the 25 positions available from these 60 applicants. Give a reason for your answer.
Edexcel Paper 3 2018 June Q4
13 marks Easy -1.3
  1. Charlie is studying the time it takes members of his company to travel to the office. He stands by the door to the office from 0840 to 0850 one morning and asks workers, as they arrive, how long their journey was.
    1. State the sampling method Charlie used.
    2. State and briefly describe an alternative method of non-random sampling Charlie could have used to obtain a sample of 40 workers.
    Taruni decided to ask every member of the company the time, \(x\) minutes, it takes them to travel to the office.
  2. State the data selection process Taruni used. Taruni's results are summarised by the box plot and summary statistics below. \includegraphics[max width=\textwidth, alt={}, center]{65e4b254-fb7b-45c2-9702-32f034018193-10_378_1349_1050_367} $$n = 95 \quad \sum x = 4133 \quad \sum x ^ { 2 } = 202294$$
  3. Write down the interquartile range for these data.
  4. Calculate the mean and the standard deviation for these data.
  5. State, giving a reason, whether you would recommend using the mean and standard deviation or the median and interquartile range to describe these data. Rana and David both work for the company and have both moved house since Taruni collected her data. Rana's journey to work has changed from 75 minutes to 35 minutes and David's journey to work has changed from 60 minutes to 33 minutes. Taruni drew her box plot again and only had to change two values.
  6. Explain which two values Taruni must have changed and whether each of these values has increased or decreased.
Edexcel Paper 3 Specimen Q1
13 marks Easy -1.3
  1. The number of hours of sunshine each day, \(y\), for the month of July at Heathrow are summarised in the table below.
Hours\(0 \leqslant y < 5\)\(5 \leqslant y < 8\)\(8 \leqslant y < 11\)\(11 \leqslant y < 12\)\(12 \leqslant y < 14\)
Frequency126832
A histogram was drawn to represent these data. The \(8 \leqslant y < 11\) group was represented by a bar of width 1.5 cm and height 8 cm .
  1. Find the width and the height of the \(0 \leqslant y < 5\) group.
  2. Use your calculator to estimate the mean and the standard deviation of the number of hours of sunshine each day, for the month of July at Heathrow.
    Give your answers to 3 significant figures. The mean and standard deviation for the number of hours of daily sunshine for the same month in Hurn are 5.98 hours and 4.12 hours respectably.
    Thomas believes that the further south you are the more consistent should be the number of hours of daily sunshine.
  3. State, giving a reason, whether or not the calculations in part (b) support Thomas' belief.
  4. Estimate the number of days in July at Heathrow where the number of hours of sunshine is more than 1 standard deviation above the mean. Helen models the number of hours of sunshine each day, for the month of July at Heathrow by \(\mathrm { N } \left( 6.6,3.7 ^ { 2 } \right)\).
  5. Use Helen's model to predict the number of days in July at Heathrow when the number of hours of sunshine is more than 1 standard deviation above the mean.
  6. Use your answers to part (d) and part (e) to comment on the suitability of Helen's model.
Edexcel Paper 3 Specimen Q1
14 marks Standard +0.3
  1. Kaff coffee is sold in packets. A seller measures the masses of the contents of a random sample of 90 packets of Kaff coffee from her stock. The results are shown in the table below.
Mass \(w ( \mathrm {~g} )\)Midpoint \(y ( \mathrm {~g} )\)Frequency f
\(240 \leq w < 245\)242.58
\(245 \leq w < 248\)246.515
\(248 \leq w < 252\)250.035
\(252 \leq w < 255\)253.523
\(255 \leq w < 260\)257.59
$$\text { (You may use } \sum \mathrm { fy } ^ { 2 } = 5644 \text { 171.75) }$$ A histogram is drawn and the class \(245 \leq w < 248\) is represented by a rectangle of width 1.2 cm and height 10 cm .
  1. Calculate the width and the height of the rectangle representing the class \(255 \leq w < 260\).
  2. Use linear interpolation to estimate the median mass of the contents of a packet of Kaff coffee to 1 decimal place.
  3. Estimate the mean and the standard deviation of the mass of the contents of a packet of Kaff coffee to 1 decimal place. The seller claims that the mean mass of the contents of the packets is more than the stated mass. Given that the stated mass of the contents of a packet of Kaff coffee is 250 g and the actual standard deviation of the contents of a packet of Kaff coffee is 4 g ,
  4. test, using a 5\% level of significance, whether or not the seller's claim is justified. State your hypotheses clearly.
    (You may assume that the mass of the contents of a packet is normally distributed.)
  5. Using your answers to parts (b) and (c), comment on the assumption that the mass of the contents of a packet is normally distributed.
    (Total 14 marks)
OCR MEI S1 Q2
Easy -1.2
2 Every day, George attempts the quiz in a national newspaper. The quiz always consists of 7 questions. In the first 25 days of January, the numbers of questions George answers correctly each day are summarised in the table below.
Number correct123
Frequency123
  1. Draw a vertical line chart to illustrate the data.
  2. State the type of skewness shown by your diagram.
  3. Calculate the mean and the mean squared deviation of the data.
  4. How many correct answers would George need to average over the next 6 days if he is to achieve an average of 5 correct answers for all 31 days of January?
Edexcel S1 2024 October Q1
Easy -1.2
  1. The back-to-back stem and leaf diagram on page 3 shows information about the running times of 31 Action films and 31 Comedy films.
    The running times are given to the nearest minute.
    1. Write down the modal running time for these Action films.
    Some of the quartiles for these two distributions are shown in the table below.
    Action filmsComedy films
    Lower quartile121\(a\)
    Median\(b\)117
    Upper quartile138\(c\)
  2. Find the value of \(a\), the value of \(b\) and the value of \(c\)
  3. For these Action films find, to one decimal place,
    1. the mean running time,
    2. the standard deviation of the running times.
      (You may use \(\sum x = 4016\) and \(\sum x ^ { 2 } = 525056\) where \(x\) is the running time, in minutes, of an Action film.) One measure of skewness is found using $$\frac { \text { mean - mode } } { \text { standard deviation } }$$
  4. Evaluate this measure and describe the skewness for the running times of these Action films.
  5. Comment on one difference between the distribution of the running times of these Action films and the distribution of the running times of these Comedy films. State the values of any statistics you have used to support your comment.
    TotalsAction filmsComedy filmsTotals
    (1)092235(5)
    (0)10356689(6)
    (5)986421102467999(8)
    (10)99876543101212466777789(11)
    (8)87775421131(1)
    (7)776643114(0)
    Key: \(0 | 9 | 2\) means 90 minutes for an Action film and 92 minutes for a Comedy film
Edexcel S1 2024 October Q5
Moderate -0.3
5.
\includegraphics[max width=\textwidth, alt={}]{fe416f2e-bc81-444b-a0ca-f0eae9a8b149-16_990_1473_246_296}
The histogram shows the number of hours worked in a given week by a group of 64 freelance photographers.
  1. Give a reason to justify the use of a histogram to represent these data. Given that 16 of these freelance photographers spent between 10 and 20 hours working in this week,
  2. estimate the number that spent between 12 and 24 hours working in this week.
  3. Find an estimate for the median time spent working in this week by these 64 freelance photographers. Charlie decides to model these data using a normal distribution. Charlie calculates an estimate of the mean to be 23.9 hours to one decimal place.
  4. Comment on Charlie's decision to use a normal distribution. Give a justification for your answer.
Pre-U Pre-U 9794/3 2013 June Q1
4 marks Easy -1.3
1 Pupils at a certain school carried out a survey of traffic passing the school during a two-hour period one morning. One pupil recorded the number of people in each of the first 100 cars. Her results were as follows.
Number of people12345
Number of cars482614102
Find the mean and the standard deviation of the number of people per car in her sample.
Pre-U Pre-U 9794/3 2016 Specimen Q3
5 marks Easy -1.2
3 The table shows fuel economy figures in miles per gallon (mpg) for some new cars.
CarABCDEFGHIJKLMNO
Mpg574034331117302731203524262332
  1. Find the median and quartiles for the mpg of these fifteen cars.
  2. Use the values in part (i) to identify any cars for which the mpg is an outlier.
Pre-U Pre-U 9794/1 Specimen Q12
6 marks Moderate -0.8
12 A set of data is shown in the table below.
\(x\)012345678
frequency3104320001
  1. Calculate the mean and standard deviation of the data. The value 8 may be regarded as an outlier.
  2. Explain how you would treat this outlier if the data represents
    1. the difference of the scores obtained when throwing a pair of ordinary dice,
    2. the number of thunderstorms per year in Cambridgeshire over a 23-year period.
    3. Without doing any further calculations state what effect, if any, removing the outlier would have on the mean and standard deviation.
CAIE P1 2024 November Q7
10 marks Standard +0.3
\includegraphics{figure_7} The diagram shows a metal plate \(ABCDEF\) consisting of five parts. The parts \(BCD\) and \(DEF\) are semicircles. The part \(BAFO\) is a sector of a circle with centre \(O\) and radius 20 cm, and \(D\) lies on this circle. The parts \(OBD\) and \(ODF\) are triangles. Angles \(BOD\) and \(DOF\) are both \(\theta\) radians.
  1. Given that \(\theta = 1.2\), find the area of the metal plate. Give your answer correct to 3 significant figures. [5]
  2. Given instead that the area of each semicircle is \(50\pi \text{ cm}^2\), find the exact perimeter of the metal plate. [5]
CAIE S1 2023 March Q1
8 marks Moderate -0.8
Each year the total number of hours, \(x\), of sunshine in Kintoo is recorded during the month of June. The results for the last 60 years are summarised in the table.
\(x\)\(30 \leqslant x < 60\)\(60 \leqslant x < 90\)\(90 \leqslant x < 110\)\(110 \leqslant x < 140\)\(140 \leqslant x < 180\)\(180 \leqslant x \leqslant 240\)
Number of years48142572
  1. Draw a cumulative frequency graph to illustrate the data. [3]
  2. Use your graph to estimate the 70th percentile of the data. [2]
  3. Calculate an estimate for the mean number of hours of sunshine in Kintoo during June over the last 60 years. [3]
CAIE S1 2002 June Q2
6 marks Easy -1.2
The manager of a company noted the times spent in 80 meetings. The results were as follows.
Time (\(t\) minutes)\(0 < t \leq 15\)\(15 < t \leq 30\)\(30 < t \leq 60\)\(60 < t \leq 90\)\(90 < t \leq 120\)
Number of meetings4724387
Draw a cumulative frequency graph and use this to estimate the median time and the interquartile range. [6]
CAIE S1 2010 June Q1
5 marks Moderate -0.8
The times in minutes for seven students to become proficient at a new computer game were measured. The results are shown below. $$15 \quad 10 \quad 48 \quad 10 \quad 19 \quad 14 \quad 16$$
  1. Find the mean and standard deviation of these times. [2]
  2. State which of the mean, median or mode you consider would be most appropriate to use as a measure of central tendency to represent the data in this case. [1]
  3. For each of the two measures of average you did not choose in part (ii), give a reason why you consider it inappropriate. [2]