Calculate statistics from large data set

Questions that require students to compute summary statistics (mean, standard deviation, frequencies) from given data extracted from the large data set.

5 questions · Easy -1.1

Sort by: Default | Easiest first | Hardest first
OCR MEI AS Paper 2 2019 June Q6
13 marks Moderate -0.8
6 The large data set gives information about life expectancy at birth for males and females in different London boroughs. Fig. 6.1 shows summary statistics for female life expectancy at birth for the years 2012-2014. Fig. 6.2 shows summary statistics for male life expectancy at birth for the years 2012-2014. \section*{Female Life Expectancy at Birth} \begin{table}[h]
n32
Mean84.2313
s1.1563
\(\sum x\)2695.4
\(\sum x ^ { 2 }\)227078.36
Min82.1
Q183.45
Median84
Q384.9
Max86.7
\captionsetup{labelformat=empty} \caption{Fig. 6.1}
\end{table} Male Life Expectancy at Birth \begin{table}[h]
n32
Mean80.2844
s1.4294
\(\sum x\)2569.1
\(\sum x ^ { 2 }\)206321.93
Min77.6
Q179
Median80.25
Q381.15
Max83.3
\captionsetup{labelformat=empty} \caption{Fig. 6.2}
\end{table}
  1. Use the information in Fig. 6.1 and Fig. 6.2 to draw two box plots. Draw one box plot for female life expectancy at birth in London boroughs and one box plot for male life expectancy at birth in London boroughs.
  2. Compare and contrast the distribution of male life expectancy at birth with the distribution of female life expectancy at birth in London boroughs in 2012-2014. Lorraine, who lives in Lancashire, says she wishes her daughter (who was born in 2013) had been born in the London borough of Barnet, because her daughter would have had a higher life expectancy.
  3. Give two reasons why there is no evidence in the large data set to support Lorraine's comment.
  4. Use the mean and standard deviation for the summary statistics given in Fig. 6.1 and Fig. 6.2 to show that there is at least one outlier in each set. The scatter diagram in Fig. 6.3 shows male life expectancy at birth plotted against female life expectancy at birth for London boroughs in 2012-14. The outliers have been removed. Male life expectancy at birth against female life expectancy at birth \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{11e5167f-9f95-4494-9b66-b59fdce8b1ef-5_593_1054_1260_246} \captionsetup{labelformat=empty} \caption{Fig. 6.3}
    \end{figure}
  5. Describe the association between male life expectancy at birth and female life expectancy at birth in London boroughs in 2012-14.
AQA AS Paper 2 2019 June Q13
6 marks Easy -1.2
13 Denzel wants to buy a car with a propulsion type other than petrol or diesel.
He takes a sample, from the Large Data Set, of the CO2 emissions, in \(\mathrm { g } / \mathrm { km }\), of cars with one particular propulsion type. The sample is as follows $$\begin{array} { l l l l l l l l } 82 & 13 & 96 & 49 & 96 & 92 & 70 & 81 \end{array}$$ 13
  1. Using your knowledge of the Large Data Set, state which propulsion type this sample is for, giving a reason for your answer.
    13
  2. Calculate the mean of the sample.
    13
  3. Calculate the standard deviation of the sample.
    13
  4. Denzel claims that the value 13 is an outlier. 13 (d) (i) Any value more than 2 standard deviations from the mean can be regarded as an outlier. Verify that Denzel's claim is correct.
    13 (d) (ii) State what effect, if any, removing the value 13 from the sample would have on the standard deviation.
AQA AS Paper 2 2020 June Q15
3 marks Moderate -0.8
A random sample of ten CO₂ emissions was selected from the Large Data Set. The emissions in grams per kilogram were: 13 \quad 45 \quad 45 \quad 0 \quad 49 \quad 77 \quad 49 \quad 49 \quad 49 \quad 78
  1. Find the standard deviation of the sample. [1 mark]
  2. An environmentalist calculated the average CO₂ emissions for cars in the Large Data Set registered in 2002 and in 2016. The averages are listed below.
    Year of registration20022016
    Average CO₂ emission171.2120.4
    The environmentalist claims that the average CO₂ emissions for 2002 and 2016 combined is 145.8 Determine whether this claim is correct. Fully justify your answer. [2 marks]
AQA AS Paper 2 2024 June Q16
5 marks Easy -1.8
An investigation into the hydrocarbon emissions, \(X\) g/km, from cars in the Large Data Set was carried out. The results are summarised below. $$\sum x = 128.657 \qquad \sum x^2 = 8.701 \, 707 \qquad n = 2405$$ where \(n\) is the total number of cars which had a measured hydrocarbon emission in the Large Data Set.
    1. Find the mean of \(X\) [1 mark]
    2. Find the standard deviation of \(X\) [2 marks]
    1. The Large Data Set is a sample taken from the entire UK Department for Transport Stock Vehicle Database. It is claimed that the values in part (a)(i) and part (a)(ii) obtained from the Large Data Set should be reliable estimates for the mean and standard deviation of the hydrocarbon emissions for the entire UK Department for Transport Stock Vehicle Database. State, with a reason, whether this claim is likely to be correct. [1 mark]
    2. State one type of emission where more than 80% of the data is known for cars in the entire UK Department for Transport Stock Vehicle Database. [1 mark]
AQA Paper 3 2021 June Q13
6 marks Moderate -0.8
The table below is an extract from the Large Data Set.
Propulsion TypeRegionEngine SizeMassCO₂Particulate Emissions
2London189615331540.04
2North West189614231460.029
2North West189613531380.025
2South West199815471590.026
2London189613881380.025
2South West189612141300.011
2South West189614801460.029
2South West189614131460.024
2South West249616951920.034
2South West142212511220.025
2South West199520751750.034
2London189612851400.036
2North West18960146
    1. Calculate the mean and standard deviation of CO₂ emissions in the table. [2 marks]
    2. Any value more than 2 standard deviations from the mean can be identified as an outlier. Determine, using this definition of an outlier, if there are any outliers in this sample of CO₂ emissions. Fully justify your answer. [2 marks]
  1. Maria claims that the last line in the table must contain two errors. Use your knowledge of the Large Data Set to comment on Maria's claim. [2 marks]