2.02g Calculate mean and standard deviation

382 questions

Sort by: Default | Easiest first | Hardest first
AQA S1 2010 January Q2
8 marks Moderate -0.8
2 Lizzie, the receptionist at a dental practice, was asked to keep a weekly record of the number of patients who failed to turn up for an appointment. Her records for the first 15 weeks were as follows. $$\begin{array} { l l l l l l l l l l l l l l l } 20 & 26 & 32 & a & 37 & 14 & 27 & 34 & 15 & 18 & b & 25 & 37 & 29 & 25 \end{array}$$ Unfortunately, Lizzie forgot to record the actual values for two of the 15 weeks, so she recorded them as \(a\) and \(b\). However, she did remember that \(a < 10\) and that \(b > 40\).
  1. Calculate the median and the interquartile range of these 15 values.
  2. Give a reason why, for these data:
    1. the mode is not an appropriate measure of average;
    2. the standard deviation cannot be used as a measure of spread.
  3. Subsequent investigations revealed that the missing values were 8 and 43 . Calculate the mean and the standard deviation of the 15 values.
AQA S1 2005 June Q6
12 marks Standard +0.3
6 On arrival at a business centre, all visitors are required to register at the reception desk. An analysis of the register, for a random sample of 100 days, results in the following information on the number, \(X\), of visitors per day.
Number of visitors per dayNumber of days
1-1013
11-2033
21-2517
26-3012
31-358
36-405
41-505
51-1007
Total100
  1. Calculate an estimate of:
    1. \(\mu\), the mean number of visitors per day;
    2. \(\sigma\), the standard deviation of the number of visitors per day.
  2. Give a reason, based upon the data provided, why \(X\) is unlikely to be normally distributed.
    1. Give a reason why \(\bar { X }\), the mean of a random sample of 100 observations on \(X\), may be assumed to be normally distributed.
    2. State, in terms of \(\mu\) and \(\sigma\), the mean and variance of \(\bar { X }\).
  3. Hence construct a \(99 \%\) confidence interval for \(\mu\).
  4. The receptionist claims that she registers on average more than 30 visitors per day, and frequently registers more than 50 visitors on any one day. Comment on each of these two claims.
AQA S1 2015 June Q1
6 marks Easy -1.2
1 The number of passengers getting off the 11.45 am train at a railway station on each of 35 days is summarised as follows.
AQA S1 2015 June Q2
6 marks Easy -1.2
2 The table summarises the diameters, \(d\) millimetres, of a random sample of 60 new cricket balls to be used in junior cricket.
AQA S2 2009 June Q2
14 marks Moderate -0.3
2 John works from home. The number of business letters, \(X\), that he receives on a weekday may be modelled by a Poisson distribution with mean 5.0. The number of private letters, \(Y\), that he receives on a weekday may be modelled by a Poisson distribution with mean 1.5.
  1. Find, for a given weekday:
    1. \(\mathrm { P } ( X < 4 )\);
    2. \(\quad \mathrm { P } ( Y = 4 )\).
    1. Assuming that \(X\) and \(Y\) are independent random variables, determine the probability that, on a given weekday, John receives a total of more than 5 business and private letters.
    2. Hence calculate the probability that John receives a total of more than 5 business and private letters on at least 7 out of 8 given weekdays.
  2. The numbers of letters received by John's neighbour, Brenda, on 10 consecutive weekdays are $$\begin{array} { l l l l l l l l l l } 15 & 8 & 14 & 7 & 6 & 8 & 2 & 8 & 9 & 3 \end{array}$$
    1. Calculate the mean and the variance of these data.
    2. State, giving a reason based on your answers to part (c)(i), whether or not a Poisson distribution might provide a suitable model for the number of letters received by Brenda on a weekday.
OCR H240/02 2022 June Q9
14 marks Standard +0.3
9 The heights, in centimetres, of a random sample of 150 plants of a certain variety were measured. The results are summarised in the histogram. \includegraphics[max width=\textwidth, alt={}, center]{cb83836f-753f-4b3a-99e8-a18aff0f49ff-08_842_1651_495_207} One of the 150 plants is chosen at random, and its height, \(X \mathrm {~cm}\), is noted.
  1. Show that \(\mathrm { P } ( 20 < X < 30 ) = 0.147\), correct to 3 significant figures. Sam suggests that the distribution of \(X\) can be well modelled by the distribution \(\mathrm { N } ( 40,100 )\).
    1. Give a brief justification for the use of the normal distribution in this context.
    2. Give a brief justification for the choice of the parameter values 40 and 100 .
  2. Use Sam's model to find \(\mathrm { P } ( 20 < X < 30 )\). Nina suggests a different model. She uses the midpoints of the classes to calculate estimates, \(m\) and \(s\), for the mean and standard deviation respectively, in centimetres, of the 150 heights. She then uses the distribution \(\mathrm { N } \left( m , s ^ { 2 } \right)\) as her model.
  3. Use Nina's model to find \(\mathrm { P } ( 20 < X < 30 )\).
    1. Complete the table in the Printed Answer Booklet to show the probabilities obtained from Sam's model and Nina's model.
    2. By considering the different ranges of values of \(X\) given in the table, discuss how well the two models fit the original distribution.
AQA AS Paper 2 2019 June Q13
6 marks Easy -1.2
13 Denzel wants to buy a car with a propulsion type other than petrol or diesel.
He takes a sample, from the Large Data Set, of the CO2 emissions, in \(\mathrm { g } / \mathrm { km }\), of cars with one particular propulsion type. The sample is as follows $$\begin{array} { l l l l l l l l } 82 & 13 & 96 & 49 & 96 & 92 & 70 & 81 \end{array}$$ 13
  1. Using your knowledge of the Large Data Set, state which propulsion type this sample is for, giving a reason for your answer.
    13
  2. Calculate the mean of the sample.
    13
  3. Calculate the standard deviation of the sample.
    13
  4. Denzel claims that the value 13 is an outlier. 13 (d) (i) Any value more than 2 standard deviations from the mean can be regarded as an outlier. Verify that Denzel's claim is correct.
    13 (d) (ii) State what effect, if any, removing the value 13 from the sample would have on the standard deviation.
AQA AS Paper 2 2021 June Q16
5 marks Easy -1.2
16 An analysis was carried out using the Large Data Set to compare the \(\mathrm { CO } _ { 2 }\) emissions (in g/km) from 2002 and 2016. The summary statistics for the \(\mathrm { CO } _ { 2 }\) emissions, \(X\), for all cars registered as owned by either females or males is given in the table below.
\cline { 2 - 3 } \multicolumn{1}{c|}{}\(\mathbf { 2 0 0 2 }\)\(\mathbf { 2 0 1 6 }\)
\(\sum \boldsymbol { x }\)207901142103
Sample size12151144
16
  1. Find the reduction in the mean of the \(\mathrm { CO } _ { 2 }\) emissions in 2016 compared to the mean of the CO2 emissions in 2002.
    16
  2. It is claimed that the move to more electric and gas/petrol powered cars has caused the reduction in the mean \(\mathrm { CO } _ { 2 }\) emissions found in part (a). Using your knowledge of the Large Data Set, state whether you agree with this claim.
    Give a reason for your answer.
    16
  3. There are 3827 data values in the Large Data Set. It is claimed that the data in the table above must have been summarised incorrectly.
    16 (c) (i) Explain why this claim is being made. 16 (c) (ii) State whether this claim is correct.
    Give a reason for your answer.
AQA AS Paper 2 2022 June Q13
6 marks Moderate -0.8
13 Two random samples of 12 NOX emissions (in \(\mathrm { g } / \mathrm { km }\) ) were taken from the Large Data Set. One sample was taken from the 2002 data and the other sample from the 2016 data.
The sample data are shown below:
\multirow{2}{*}{2002}0.0310.0190.0910.0250.0300.061
0.0470.0290.0590.3630.3300.376
\multirow{2}{*}{2016}0.0050.0470.0530.0630.0260.013
0.0580.0120.0100.0100.0080.008
The mean and standard deviation of the 2002 sample data are 0.122 and 0.137 respectively. 13
  1. Find the mean and standard deviation of the 2016 sample data giving your answers correct to three decimal places.
    13
  2. Siti claims these samples show that, on average, the NOX emissions across all makes of car in all areas of the UK have fallen by over 75\% between 2002 and 2016. 13 (b) (i) Show how Siti's claim of 'over 75\%' has been obtained.
    13 (b) (ii) Using your knowledge of the Large Data Set, make two comments on the validity of Siti's claim. Comment 1
    \section*{Comment 2}
Edexcel AS Paper 2 2018 June Q4
8 marks Moderate -0.8
  1. Helen is studying the daily mean wind speed for Camborne using the large data set from 1987. The data for one month are summarised in Table 1 below.
\begin{table}[h]
Windspeed\(\mathrm { n } / \mathrm { a }\)67891112131416
Frequency13232231212
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. Calculate the mean for these data.
  2. Calculate the standard deviation for these data and state the units. The means and standard deviations of the daily mean wind speed for the other months from the large data set for Camborne in 1987 are given in Table 2 below. The data are not in month order. \begin{table}[h]
    Month\(A\)\(B\)\(C\)\(D\)\(E\)
    Mean7.588.268.578.5711.57
    Standard Deviation2.933.893.463.874.64
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  3. Using your knowledge of the large data set, suggest, giving a reason, which month had a mean of 11.57 The data for these months are summarised in the box plots on the opposite page. They are not in month order or the same order as in Table 2.
    1. State the meaning of the * symbol on some of the box plots.
    2. Suggest, giving your reasons, which of the months in Table 2 is most likely to be summarised in the box plot marked \(Y\). \includegraphics[max width=\textwidth, alt={}, center]{2edcf965-9c93-4a9b-9395-2d3c023801af-11_1177_1216_324_427}
Edexcel AS Paper 2 Specimen Q1
4 marks Easy -1.8
  1. Sara is investigating the variation in daily maximum gust, \(t \mathrm { kn }\), for Camborne in June and July 1987.
She used the large data set to select a sample of size 20 from the June and July data for 1987. Sara selected the first value using a random number from 1 to 4 and then selected every third value after that.
  1. State the sampling technique Sara used.
  2. From your knowledge of the large data set explain why this process may not generate a sample of size 20 . The data Sara collected are summarised as follows $$n = 20 \quad \sum t = 374 \quad \sum t ^ { 2 } = 7600$$
  3. Calculate the standard deviation.
Edexcel AS Paper 2 Specimen Q1
9 marks Moderate -0.8
  1. A company manager is investigating the time taken, \(t\) minutes, to complete an aptitude test. The human resources manager produced the table below of coded times, \(x\) minutes, for a random sample of 30 applicants.
Coded time ( \(x\) minutes)Frequency (f)Coded time midpoint (y minutes)
\(0 \leq x < 5\)32.5
\(5 \leq x < 10\)157.5
\(10 \leq x < 15\)212.5
\(15 \leq x < 25\)920
\(25 \leq x < 35\)130
(You may use \(\sum f y = 355\) and \(\sum f y ^ { 2 } = 5675\) )
  1. Use linear interpolation to estimate the median of the coded times.
  2. Estimate the standard deviation of the coded times. The company manager is told by the human resources manager that he subtracted 15 from each of the times and then divided by 2 , to calculate the coded times.
  3. Calculate an estimate for the median and the standard deviation of \(t\).
    (3) The following year, the company has 25 positions available. The company manager decides not to offer a position to any applicant who takes 35 minutes or more to complete the aptitude test. The company has 60 applicants.
  4. Comment on whether or not the company manager's decision will result in the company being able to fill the 25 positions available from these 60 applicants. Give a reason for your answer.
Edexcel Paper 3 2018 June Q4
13 marks Easy -1.3
  1. Charlie is studying the time it takes members of his company to travel to the office. He stands by the door to the office from 0840 to 0850 one morning and asks workers, as they arrive, how long their journey was.
    1. State the sampling method Charlie used.
    2. State and briefly describe an alternative method of non-random sampling Charlie could have used to obtain a sample of 40 workers.
    Taruni decided to ask every member of the company the time, \(x\) minutes, it takes them to travel to the office.
  2. State the data selection process Taruni used. Taruni's results are summarised by the box plot and summary statistics below. \includegraphics[max width=\textwidth, alt={}, center]{65e4b254-fb7b-45c2-9702-32f034018193-10_378_1349_1050_367} $$n = 95 \quad \sum x = 4133 \quad \sum x ^ { 2 } = 202294$$
  3. Write down the interquartile range for these data.
  4. Calculate the mean and the standard deviation for these data.
  5. State, giving a reason, whether you would recommend using the mean and standard deviation or the median and interquartile range to describe these data. Rana and David both work for the company and have both moved house since Taruni collected her data. Rana's journey to work has changed from 75 minutes to 35 minutes and David's journey to work has changed from 60 minutes to 33 minutes. Taruni drew her box plot again and only had to change two values.
  6. Explain which two values Taruni must have changed and whether each of these values has increased or decreased.
Edexcel Paper 3 Specimen Q1
13 marks Easy -1.3
  1. The number of hours of sunshine each day, \(y\), for the month of July at Heathrow are summarised in the table below.
Hours\(0 \leqslant y < 5\)\(5 \leqslant y < 8\)\(8 \leqslant y < 11\)\(11 \leqslant y < 12\)\(12 \leqslant y < 14\)
Frequency126832
A histogram was drawn to represent these data. The \(8 \leqslant y < 11\) group was represented by a bar of width 1.5 cm and height 8 cm .
  1. Find the width and the height of the \(0 \leqslant y < 5\) group.
  2. Use your calculator to estimate the mean and the standard deviation of the number of hours of sunshine each day, for the month of July at Heathrow.
    Give your answers to 3 significant figures. The mean and standard deviation for the number of hours of daily sunshine for the same month in Hurn are 5.98 hours and 4.12 hours respectably.
    Thomas believes that the further south you are the more consistent should be the number of hours of daily sunshine.
  3. State, giving a reason, whether or not the calculations in part (b) support Thomas' belief.
  4. Estimate the number of days in July at Heathrow where the number of hours of sunshine is more than 1 standard deviation above the mean. Helen models the number of hours of sunshine each day, for the month of July at Heathrow by \(\mathrm { N } \left( 6.6,3.7 ^ { 2 } \right)\).
  5. Use Helen's model to predict the number of days in July at Heathrow when the number of hours of sunshine is more than 1 standard deviation above the mean.
  6. Use your answers to part (d) and part (e) to comment on the suitability of Helen's model.
Edexcel Paper 3 Specimen Q1
14 marks Standard +0.3
  1. Kaff coffee is sold in packets. A seller measures the masses of the contents of a random sample of 90 packets of Kaff coffee from her stock. The results are shown in the table below.
Mass \(w ( \mathrm {~g} )\)Midpoint \(y ( \mathrm {~g} )\)Frequency f
\(240 \leq w < 245\)242.58
\(245 \leq w < 248\)246.515
\(248 \leq w < 252\)250.035
\(252 \leq w < 255\)253.523
\(255 \leq w < 260\)257.59
$$\text { (You may use } \sum \mathrm { fy } ^ { 2 } = 5644 \text { 171.75) }$$ A histogram is drawn and the class \(245 \leq w < 248\) is represented by a rectangle of width 1.2 cm and height 10 cm .
  1. Calculate the width and the height of the rectangle representing the class \(255 \leq w < 260\).
  2. Use linear interpolation to estimate the median mass of the contents of a packet of Kaff coffee to 1 decimal place.
  3. Estimate the mean and the standard deviation of the mass of the contents of a packet of Kaff coffee to 1 decimal place. The seller claims that the mean mass of the contents of the packets is more than the stated mass. Given that the stated mass of the contents of a packet of Kaff coffee is 250 g and the actual standard deviation of the contents of a packet of Kaff coffee is 4 g ,
  4. test, using a 5\% level of significance, whether or not the seller's claim is justified. State your hypotheses clearly.
    (You may assume that the mass of the contents of a packet is normally distributed.)
  5. Using your answers to parts (b) and (c), comment on the assumption that the mass of the contents of a packet is normally distributed.
    (Total 14 marks)
Edexcel S1 2024 October Q1
Easy -1.2
  1. The back-to-back stem and leaf diagram on page 3 shows information about the running times of 31 Action films and 31 Comedy films.
    The running times are given to the nearest minute.
    1. Write down the modal running time for these Action films.
    Some of the quartiles for these two distributions are shown in the table below.
    Action filmsComedy films
    Lower quartile121\(a\)
    Median\(b\)117
    Upper quartile138\(c\)
  2. Find the value of \(a\), the value of \(b\) and the value of \(c\)
  3. For these Action films find, to one decimal place,
    1. the mean running time,
    2. the standard deviation of the running times.
      (You may use \(\sum x = 4016\) and \(\sum x ^ { 2 } = 525056\) where \(x\) is the running time, in minutes, of an Action film.) One measure of skewness is found using $$\frac { \text { mean - mode } } { \text { standard deviation } }$$
  4. Evaluate this measure and describe the skewness for the running times of these Action films.
  5. Comment on one difference between the distribution of the running times of these Action films and the distribution of the running times of these Comedy films. State the values of any statistics you have used to support your comment.
    TotalsAction filmsComedy filmsTotals
    (1)092235(5)
    (0)10356689(6)
    (5)986421102467999(8)
    (10)99876543101212466777789(11)
    (8)87775421131(1)
    (7)776643114(0)
    Key: \(0 | 9 | 2\) means 90 minutes for an Action film and 92 minutes for a Comedy film
Pre-U Pre-U 9794/3 2012 June Q1
4 marks Easy -1.8
1 The heights in centimetres of 10 young women were measured and are given below. $$\begin{array} { l l l l l l l l l l } 140 & 145 & 162 & 174 & 153 & 167 & 147 & 151 & 148 & 156 \end{array}$$ Calculate the mean height of these women and show that the standard deviation is approximately 10 cm .
Pre-U Pre-U 9794/3 2013 June Q1
4 marks Easy -1.3
1 Pupils at a certain school carried out a survey of traffic passing the school during a two-hour period one morning. One pupil recorded the number of people in each of the first 100 cars. Her results were as follows.
Number of people12345
Number of cars482614102
Find the mean and the standard deviation of the number of people per car in her sample.
Pre-U Pre-U 9794/3 2017 June Q1
5 marks Moderate -0.8
1 Levels of nitrogen dioxide in the atmosphere are being monitored at the side of a road in a busy city centre. A sample of 18 measurements taken (in suitable units) is as follows. $$\begin{array} { l l l l l l l l l l l l l l l l l l } 83 & 44 & 95 & 92 & 98 & 63 & 69 & 76 & 19 & 91 & 70 & 91 & 74 & 65 & 62 & 70 & 95 & 108 \end{array}$$
  1. Find the mean and standard deviation of the sample.
  2. Hence identify, with justification, any possible outliers.
Pre-U Pre-U 9794/1 Specimen Q12
6 marks Moderate -0.8
12 A set of data is shown in the table below.
\(x\)012345678
frequency3104320001
  1. Calculate the mean and standard deviation of the data. The value 8 may be regarded as an outlier.
  2. Explain how you would treat this outlier if the data represents
    1. the difference of the scores obtained when throwing a pair of ordinary dice,
    2. the number of thunderstorms per year in Cambridgeshire over a 23-year period.
    3. Without doing any further calculations state what effect, if any, removing the outlier would have on the mean and standard deviation.
CAIE S1 2002 June Q4
7 marks Moderate -0.8
  1. In a spot check of the speeds \(x \text{ km h}^{-1}\) of 30 cars on a motorway, the data were summarised by \(\Sigma(x - 110) = -47.2\) and \(\Sigma(x - 110)^2 = 5460\). Calculate the mean and standard deviation of these speeds. [4]
  2. On another day the mean speed of cars on the motorway was found to be \(107.6 \text{ km h}^{-1}\) and the standard deviation was \(13.8 \text{ km h}^{-1}\). Assuming these speeds follow a normal distribution and that the speed limit is \(110 \text{ km h}^{-1}\), find what proportion of cars exceed the speed limit. [3]
CAIE S1 2010 June Q1
5 marks Moderate -0.8
The times in minutes for seven students to become proficient at a new computer game were measured. The results are shown below. $$15 \quad 10 \quad 48 \quad 10 \quad 19 \quad 14 \quad 16$$
  1. Find the mean and standard deviation of these times. [2]
  2. State which of the mean, median or mode you consider would be most appropriate to use as a measure of central tendency to represent the data in this case. [1]
  3. For each of the two measures of average you did not choose in part (ii), give a reason why you consider it inappropriate. [2]
CAIE S1 2015 June Q2
5 marks Moderate -0.8
120 people were asked to read an article in a newspaper. The times taken, to the nearest second, by the people to read the article are summarised in the following table.
Time (seconds)1 -- 2526 -- 3536 -- 4546 -- 5556 -- 90
Number of people424383420
Calculate estimates of the mean and standard deviation of the reading times. [5]
CAIE S1 2014 November Q6
9 marks Easy -1.2
On a certain day in spring, the heights of 200 daffodils are measured, correct to the nearest centimetre. The frequency distribution is given below.
Height (cm)\(4 - 10\)\(11 - 15\)\(16 - 20\)\(21 - 25\)\(26 - 30\)
Frequency2232784028
  1. Draw a cumulative frequency graph to illustrate the data. [4]
  2. 28\% of these daffodils are of height \(h\) cm or more. Estimate \(h\). [2]
  3. You are given that the estimate of the mean height of these daffodils, calculated from the table, is 18.39 cm. Calculate an estimate of the standard deviation of the heights of these daffodils. [3]
Edexcel S1 2023 June Q3
9 marks Moderate -0.8
Jim records the length, \(l\) mm, of 81 salmon. The data are coded using \(x = l - 600\) and the following summary statistics are obtained. $$n = 81 \quad \sum x = 3711 \quad \sum x^2 = 475181$$
  1. Find the mean length of these salmon. [3]
  2. Find the variance of the lengths of these salmon. [2]
The weight, \(w\) grams, of each of the 81 salmon is recorded to the nearest gram. The recorded results for the 81 salmon are summarised in the box plot below. \includegraphics{figure_2}
  1. Find the maximum number of salmon that have weights in the interval $$4600 < w \leqslant 7700$$ [1]
Raj says that the box plot is incorrect as Jim has not included outliers. For these data an outlier is defined as a value that is more than \(1.5 \times\) IQR above the upper quartile \quad or \quad \(1.5 \times\) IQR below the lower quartile
  1. Show that there are no outliers. [3]