2.02g Calculate mean and standard deviation

382 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 Q1
9 marks Moderate -0.8
  1. The weight in kilograms, \(w\), of the 15 players in a rugby team was recorded and the results summarised as follows.
$$\Sigma w = 1145.3 , \quad \Sigma w ^ { 2 } = 88042.14$$
  1. Calculate the mean and variance of the weight of the players. Due to injury, one of the players who weighed 79.2 kg was replaced with another player who weighed 63.5 kg .
  2. Without further calculation state the effect of this change on the mean and variance of the weight of the players in the team. Explain your answers.
    (4 marks)
Edexcel S1 Q5
16 marks Easy -1.3
5. Each child in class 3A was given a packet of seeds to plant. The stem and leaf diagram below shows how many seedlings were visible in each child's tray one week after planting.
Number of seedlings(2 | 1 means 21)Totals
002(2)
0(0)
11(1)
157(2)
201334(5)
25777899(7)
30001224(7)
35688(4)
4134(3)
  1. Find the median and interquartile range for these data.
  2. Use the quartiles to describe the skewness of the data. Show your method clearly. The mean and standard deviation for these data were 27.2 and 10.3 respectively.
  3. Explaining your answer, state whether you would recommend using these values or your answers to part (a) to summarise these data. Outliers are defined to be values outside of the limits \(\mathrm { Q } _ { 1 } - 2 s\) and \(\mathrm { Q } _ { 3 } + 2 s\) where \(s\) is the standard deviation given above.
  4. Represent these data with a boxplot identifying clearly any outliers.
Edexcel S1 Q3
9 marks Moderate -0.8
3. A magazine collected data on the total cost of the reception at each of a random sample of 80 weddings. The data is grouped and coded using \(y = \frac { C - 3250 } { 250 }\), where \(C\) is the mid-point in pounds of each class, giving \(\sum f y = 37\) and \(\sum f y ^ { 2 } = 2317\).
  1. Using these values, calculate estimates of the mean and standard deviation of the cost of the receptions in the sample.
  2. Explain why your answers to part (a) are only estimates. The median of the data was \(\pounds 3050\).
  3. Comment on the skewness of the data and suggest a reason for it.
Edexcel S1 Q5
11 marks Standard +0.3
5. A group of children were each asked to try and complete a task to test hand-eye coordination. Each child repeated the task until he or she had been successful or had made four attempts. The number of attempts made by the children in the group are summarised in the table below.
Number of attempts1234
Number of children4326133
  1. Calculate the mean and standard deviation of the number of attempts made by each child. It is suggested that the number of attempts made by each child could be modelled by a discrete random variable \(X\) with the probability function $$P ( X = x ) = \left\{ \begin{array} { c c } k \left( 20 - x ^ { 2 } \right) , & x = 1,2,3,4 \\ 0 , & \text { otherwise } \end{array} \right.$$
  2. Show that \(k = \frac { 1 } { 50 }\).
  3. Find \(\mathrm { E } ( X )\).
  4. Comment on the suitability of this model.
Edexcel S1 Q4
13 marks Moderate -0.8
4. A company offering a bicycle courier service within London collected data on the delivery times for a sample of jobs completed by staff at each of its two offices. The times, \(t\) minutes, for 20 deliveries handled by the company's Hammersmith office were summarised by $$\Sigma t = 427 , \text { and } \Sigma t ^ { 2 } = 11077$$
  1. Find the mean and variance of the delivery times in this sample. The company's Holborn office handles more business, so the delivery times for a sample of 30 jobs handled by this office was taken. The mean and standard deviation of this sample were 18.5 minutes and 8.2 minutes respectively.
  2. Find the mean and variance of the delivery times of the combined sample of 50 deliveries.
Edexcel S1 Q3
11 marks Moderate -0.3
3. A soccer fan collected data on the number of minutes of league football, \(m\), played by each team in the four main divisions before first scoring a goal at the start of a new season. Her results are shown in the table below.
\(m\) (minutes)Number of teams
\(0 \leq m < 40\)36
\(40 \leq m < 80\)28
\(80 \leq m < 120\)10
\(120 \leq m < 160\)4
\(160 \leq m < 200\)5
\(200 \leq m < 300\)4
\(300 \leq m < 400\)2
\(400 \leq m < 600\)3
  1. Calculate estimates of the mean and standard deviation of these data.
  2. Explain why the mean and standard deviation might not be the best summary statistics to use with these data.
  3. Suggest alternative summary statistics that would better represent these data.
Edexcel S1 Q5
12 marks Moderate -0.3
5. An antiques shop recorded the value of items stolen to the nearest pound during each week for a year giving the data in the table below.
Value of goods stolen (£)Number of weeks
0-19931
200-3996
400-5993
600-7994
800-9995
1000-19992
2000-29991
Letting \(x\) represent the mid-point of each group and using the coding \(y = \frac { x - 699.5 } { 200 }\),
  1. find \(\sum\) fy.
  2. estimate to the nearest pound the mean and standard deviation of the value of the goods stolen each week using your value for \(\sum f y\) and \(\sum f y ^ { 2 } = 424\).
    (6 marks)
    The median for these data is \(\pounds 82\).
  3. Explain why the manager of the shop might be reluctant to use either the mean or the median in summarising these data.
    (3 marks)
Edexcel S1 Q7
15 marks Moderate -0.8
7. A cyber-cafe recorded how long each user stayed during one day giving the following results.
Length of stay
(minutes)
\(0 -\)\(30 -\)\(60 -\)\(90 -\)\(120 -\)\(240 -\)\(360 -\)
Number of users153132231720
  1. Use linear interpolation to estimate the median and quartiles of these data. The results of a previous study had led to the suggestion that the length of time each user stays can be modelled by a normal distribution with a mean of 72 minutes and a standard deviation of 48 minutes.
  2. Find the median and quartiles that this model would predict.
  3. Comment on the suitability of the suggested model in the light of the new results.
Edexcel S2 Q6
15 marks Standard +0.3
A sample of radioactive material decays randomly, with an approximate mean of 1.5 counts per minute.
  1. Name a distribution that would be suitable for modelling the number of counts per minute. Give any parameters required for the model.
  2. Find the probability of at least 4 counts in a randomly chosen minute.
  3. Find the probability of 3 counts or fewer in a random interval lasting 5 minutes. More careful measurements, over 50 one-minute intervals, give the following data for \(x\), the number of counts per minute: $$\sum x = 84 , \quad \sum x ^ { 2 } = 226$$
  4. Decide whether these data support your answer to part (a).
  5. Use the improved data to find probability of exactly two counts in a given one-minute interval.
Edexcel CP AS 2019 June Q6
9 marks Moderate -0.8
  1. An art display consists of an arrangement of \(n\) marbles.
When arranged in ascending order of mass, the mass of the first marble is 10 grams. The mass of each subsequent marble is 3 grams more than the mass of the previous one, so that the \(r\) th marble has mass \(( 7 + 3 r )\) grams.
  1. Show that the mean mass, in grams, of the marbles in the display is given by $$\frac { 1 } { 2 } ( 3 n + 17 )$$ Given that there are 85 marbles in the display,
  2. use the standard summation formulae to find the standard deviation of the mass of the marbles in the display, giving your answer, in grams, to one decimal place.
Edexcel FS2 2019 June Q3
8 marks Standard +0.8
3 Yin grows two varieties of potato, plant \(A\) and plant \(B\). A random sample of each variety of potato is taken and the yield, \(x \mathrm {~kg}\), produced by each plant is measured. The following statistics are obtained from the data.
Number of plants\(\sum x\)\(\sum x ^ { 2 }\)
\(A\)25194.71637.37
\(B\)26227.52031.19
  1. Stating your hypotheses clearly, test, at the \(10 \%\) significance level, whether or not the variances of the yields of the two varieties of potato are the same.
  2. State an assumption you have made in order to carry out the test in part (a).
Edexcel FS2 2019 June Q6
9 marks Standard +0.3
6 A company manufactures bolts. The diameter of the bolts follows a normal distribution with a mean diameter of 5 mm . Stan believes that the mean diameter of the bolts is less than 5 mm . He takes a random sample of 10 bolts and measures their diameters. He calculates some statistics but spills ink on his work before completing them. The only information he has left is as follows \includegraphics[max width=\textwidth, alt={}, center]{67df73d4-6ce4-45f7-8a69-aa94292ea814-16_394_1150_527_456} Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether or not Stan's belief is supported.
Edexcel FS2 2020 June Q1
6 marks Standard +0.3
1 Gina receives a large number of packages from two companies, \(A\) and \(B\). She believes that the variance of the weights of packages from company \(A\) is greater than the variance of the weights of packages from company \(B\). Gina takes a random sample of 7 packages from company \(A\) and an independent random sample of 10 packages from company \(B\). Her results are summarised below $$\bar { a } = 300 \quad \mathrm {~S} _ { a a } = 145496 \quad \bar { b } = 233.4 \quad \mathrm {~S} _ { b b } = 56364.4$$ [You may assume that the weights of packages from the two companies are normally distributed.]
Test Gina's belief. Use a \(5 \%\) level of significance and state your hypotheses clearly.
OCR H240/02 2018 September Q9
12 marks Moderate -0.3
9 The finance department of a retail firm recorded the daily income each day for 300 days. The results are summarised in the histogram. \includegraphics[max width=\textwidth, alt={}, center]{85de9a39-f8be-40ee-b0c8-e2e632be93d8-6_689_1575_488_246}
  1. Find the number of days on which the daily income was between \(\pounds 4000\) and \(\pounds 6000\).
  2. Calculate an estimate of the number of days on which the daily income was between \(\pounds 2700\) and \(\pounds 3600\).
  3. Use the midpoints of the classes to show that an estimate of the mean daily income is \(\pounds 3275\). An estimate of the standard deviation of the daily income is \(\pounds 1060\). The finance department uses the distribution \(\mathrm { N } \left( 3275,1060 ^ { 2 } \right)\) to model the daily income, in pounds.
  4. Calculate the number of days on which, according to this model, the daily income would be between \(\pounds 4000\) and \(\pounds 6000\).
  5. It is given that approximately \(95 \%\) of values of the distribution \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\) lie within the range \(\mu \pm 2 \sigma\). Without further calculation, use this fact to comment briefly on whether the proposed model is a good fit to the data illustrated in the histogram.
Edexcel S1 2022 January Q2
6 marks Moderate -0.8
2. Tom's car holds 50 litres of petrol when the fuel tank is full. For each of 10 journeys, each starting with 50 litres of petrol in the fuel tank, Tom records the distance travelled, \(d\) kilometres, and the amount of petrol used, \(p\) litres. The summary statistics for the 10 journeys are given below. $$\sum d = 1029 \quad \sum p = 50.8 \quad \sum d p = 5240.8 \quad \mathrm {~S} _ { d d } = 344.9 \quad \mathrm {~S} _ { p p } = 0.576$$
  1. Calculate the product moment correlation coefficient between \(d\) and \(p\) The amount of petrol remaining in the fuel tank for each journey, \(w\) litres, is recorded.
    1. Write down an equation for \(w\) in terms of \(p\)
    2. Hence, write down the value of the product moment correlation coefficient between \(w\) and \(p\)
  2. Write down the value of the product moment correlation coefficient between \(d\) and \(w\)
Edexcel S1 2022 January Q3
10 marks Moderate -0.8
  1. The stem and leaf diagram shows the number of deliveries made by Pat each day for 24 days
\begin{table}[h]
\captionsetup{labelformat=empty} \caption{Key: 10 \(\mathbf { 8 }\) represents 108 deliveries}
1089(2)
1103666889999(11)
1245555558(8)
13\(a\)\(b\)\(c\)(3)
\end{table} where \(a\), \(b\) and \(c\) are positive integers with \(a < b < c\) An outlier is defined as any value greater than \(1.5 \times\) interquartile range above the upper quartile. Given that there is only one outlier for these data,
  1. show that \(c = 9\) The number of deliveries made by Pat each day is represented by \(d\) The data in the stem and leaf diagram are coded using $$x = d - 125$$ and the following summary statistics are obtained $$\sum x = - 96 \quad \text { and } \quad \sum ( x - \bar { x } ) ^ { 2 } = 1306$$
  2. Find the mean number of deliveries.
  3. Find the standard deviation of the number of deliveries. One of these 24 days is selected at random. The random variable \(D\) represents the number of deliveries made by Pat on this day. The random variable \(X = D - 125\)
  4. Find \(\mathrm { P } ( D > 118 \mid X < 0 )\)
Edexcel S1 2017 June Q1
8 marks Easy -1.2
  1. Nina weighed a random sample of 50 carrots from her shop and recorded the weight, in grams to the nearest gram, for each carrot. The results are summarised below.
Weight of carrotFrequency (f)Weight midpoint \(( \boldsymbol { x }\) grams \()\)
\(45 - 54\)549.5
\(55 - 59\)1057
\(60 - 64\)2262
\(65 - 74\)1369.5
$$\text { (You may use } \sum \mathrm { f } x ^ { 2 } = 192102.5 \text { ) }$$
  1. Use linear interpolation to estimate the median weight of these carrots.
  2. Find an estimate for the mean weight of these carrots.
  3. Find an estimate for the standard deviation of the weights of these carrots. A carrot is selected at random from Nina's shop.
  4. Estimate the probability that the weight of this carrot is more than 70 grams.
Edexcel S1 2017 June Q2
11 marks Easy -1.2
2. The box plot shows the times, \(t\) minutes, it takes a group of office workers to travel to work. \includegraphics[max width=\textwidth, alt={}, center]{7d45bacd-20ac-49b4-8f3f-613edf3739f9-04_365_1237_351_356}
  1. Find the range of the times.
  2. Find the interquartile range of the times.
  3. Using the quartiles, describe the skewness of these data. Give a reason for your answer. Chetna believes that house prices will be higher if the time to travel to work is shorter. She asks a random sample of these office workers for their house prices \(\pounds x\), where \(x\) is measured in thousands, and obtains the following statistics $$\mathrm { S } _ { x x } = 5514 \quad \mathrm {~S} _ { x t } = 10 \quad \mathrm {~S} _ { t t } = 1145.6$$
  4. Calculate the product moment correlation coefficient between \(x\) and \(t\).
  5. State, giving a reason, whether or not your correlation coefficient supports Chetna's belief. Adam and Betty are part of the group of office workers and they have both moved house. Adam's time to travel to work changes from 32 minutes to 36 minutes. Betty's time to travel to work changes from 38 minutes to 58 minutes. Outliers are defined as values that are more than 1.5 times the interquartile range above the upper quartile.
  6. Showing all necessary calculations, determine how the box plot of times to travel to work will change and draw a new box plot on the grid on page 5. \includegraphics[max width=\textwidth, alt={}, center]{7d45bacd-20ac-49b4-8f3f-613edf3739f9-05_499_1413_2122_180}
Edexcel S1 2017 June Q5
15 marks Moderate -0.3
  1. Tomas is studying the relationship between temperature and hours of sunshine in Seapron. He records the midday temperature, \(t ^ { \circ } \mathrm { C }\), and the hours of sunshine, \(s\) hours, for a random sample of 9 days in October. He calculated the following statistics
$$\sum s = 15 \quad \sum s ^ { 2 } = 44.22 \quad \sum t = 127 \quad \mathrm {~S} _ { t t } = 10.89$$
  1. Calculate \(\mathrm { S } _ { s s }\) Tomas calculated the product moment correlation coefficient between \(s\) and \(t\) to be 0.832 correct to 3 decimal places.
  2. State, giving a reason, whether or not this correlation coefficient supports the use of a linear regression model to describe the relationship between midday temperature and hours of sunshine.
  3. State, giving a reason, why the hours of sunshine would be the explanatory variable in a linear regression model between midday temperature and hours of sunshine.
  4. Find \(\mathrm { S } _ { s t }\)
  5. Calculate a suitable linear regression equation to model the relationship between midday temperature and hours of sunshine.
  6. Calculate the standard deviation of \(s\) Tomas uses this model to estimate the midday temperature in Seapron for a day in October with 5 hours of sunshine.
  7. State the value of Tomas' estimate. Given that the values of \(s\) are all within 2 standard deviations of the mean,
  8. comment, giving your reason, on the reliability of this estimate.
Edexcel S1 2017 October Q5
13 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. Last year's performance score \(x\) and annual salary \(y\), in thousands of dollars, were recorded for a random sample of 10 employees of the company.
The performance scores were $$\begin{array} { l l l l l l l l l l } 15 & 24 & 32 & 39 & 41 & 18 & 16 & 22 & 34 & 42 \end{array}$$ (You may use \(\sum x ^ { 2 } = 9011\) )
  1. Find the mean and the variance of these performance scores. The corresponding \(y\) values for these 10 employees are summarised by $$\sum y = 306.1 \quad \text { and } \quad \mathrm { S } _ { y y } = 546.3$$
  2. Find the mean and the variance of these \(y\) values. The regression line of \(y\) on \(x\) based on this sample is $$y = 12.0 + 0.659 x$$
  3. Find the product moment correlation coefficient for these data.
  4. State, giving a reason, whether or not the value of the product moment correlation coefficient supports the use of a regression line to model the relationship between performance score and annual salary. The company decides to use this regression model to determine future salaries.
  5. Find the proposed annual salary, in dollars, for an employee who has a performance score of 35
Edexcel S1 2021 October Q2
12 marks Moderate -0.5
2. A large company is analysing how much money it spends on paper in its offices each year. The number of employees in the office, \(x\), and the amount spent on paper in a year, \(p\) (\$ hundreds), in each of 12 randomly selected offices were recorded. The results are summarised in the following statistics. $$\sum x = 93 \quad \mathrm {~S} _ { x x } = 148.25 \quad \sum p = 273 \quad \sum p ^ { 2 } = 6602.72 \quad \sum x p = 2347$$
  1. Show that \(\mathrm { S } _ { x p } = 231.25\)
  2. Find the product moment correlation coefficient for these data.
  3. Find the equation of the regression line of \(p\) on \(x\) in the form \(p = a + b x\)
  4. Give an interpretation of the gradient of your regression line. The director of the company wants to reduce the amount spent on paper each year. He wants each office to aim for a model of the form \(p = \frac { 4 } { 5 } a + \frac { 1 } { 2 } b x\), where \(a\) and \(b\) are the values found in part (c). Using the data for the 93 employees from the 12 offices,
  5. estimate the percentage saving in the amount spent on paper each year by the company using the director's model.
Edexcel S1 Q5
Moderate -0.3
5. The following grouped frequency distribution summarises the number of minutes, to the nearest minute, that a random sample of 200 motorists were delayed by roadworks on a stretch of motorway.
Delay (mins)Number of motorists
\(4 - 6\)15
\(7 - 8\)28
949
1053
\(11 - 12\)30
\(13 - 15\)15
\(16 - 20\)10
  1. Using graph paper represent these data by a histogram.
  2. Give a reason to justify the use of a histogram to represent these data.
  3. Use interpolation to estimate the median of this distribution.
  4. Calculate an estimate of the mean and an estimate of the standard deviation of these data. One coefficient of skewness is given by $$\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } } .$$
  5. Evaluate this coefficient for the above data.
  6. Explain why the normal distribution may not be suitable to model the number of minutes that motorists are delayed by these roadworks.
Edexcel S1 2003 June Q3
10 marks Moderate -0.8
3. A company owns two petrol stations \(P\) and \(Q\) along a main road. Total daily sales in the same week for \(P ( \pounds p )\) and for \(Q ( \pounds q )\) are summarised in the table below.
\(p\)\(q\)
Monday47605380
Tuesday53954460
Wednesday58404640
Thursday46505450
Friday53654340
Saturday49905550
Sunday43655840
When these data are coded using \(x = \frac { p - 4365 } { 100 }\) and \(y = \frac { q - 4340 } { 100 }\), $$\Sigma x = 48.1 , \Sigma y = 52.8 , \Sigma x ^ { 2 } = 486.44 , \Sigma y ^ { 2 } = 613.22 \text { and } \Sigma x y = 204.95 .$$
  1. Calculate \(S _ { x y } , S _ { x x }\) and \(S _ { y y }\).
  2. Calculate, to 3 significant figures, the value of the product moment correlation coefficient between \(x\) and \(y\).
    1. Write down the value of the product moment correlation coefficient between \(p\) and \(q\).
    2. Give an interpretation of this value.
Edexcel S1 2003 June Q6
16 marks Moderate -0.8
6. The number of bags of potato crisps sold per day in a bar was recorded over a two-week period. The results are shown below. $$20,15,10,30,33,40,5,11,13,20,25,42,31,17$$
  1. Calculate the mean of these data.
  2. Draw a stem and leaf diagram to represent these data.
  3. Find the median and the quartiles of these data. An outlier is an observation that falls either \(1.5 \times\) (interquartile range) above the upper quartile or \(1.5 \times\) (interquartile range) below the lower quartile.
  4. Determine whether or not any items of data are outliers.
  5. On graph paper draw a box plot to represent these data. Show your scale clearly.
  6. Comment on the skewness of the distribution of bags of crisps sold per day. Justify your answer.
AQA S1 2007 January Q1
9 marks Easy -1.2
1 The times, in seconds, taken by 20 people to solve a simple numerical puzzle were
17192226283134363839
41424347505153555758
  1. Calculate the mean and the standard deviation of these times.
  2. In fact, 23 people solved the puzzle. However, 3 of them failed to solve it within the allotted time of 60 seconds. Calculate the median and the interquartile range of the times taken by all 23 people.
    (4 marks)
  3. For the times taken by all 23 people, explain why:
    1. the mode is not an appropriate numerical measure;
    2. the range is not an appropriate numerical measure.