350 questions · 45 question types identified
Questions that provide a pre-constructed stem-and-leaf diagram (single or back-to-back) and ask to find median, quartiles, or IQR directly from it.
Questions that provide a discrete frequency distribution (where x takes specific values like 0, 1, 2, 3, etc.) and ask to calculate mean, variance, or standard deviation.
A question is this type if and only if it provides Σ(x - c) and Σx (or similar coded sums) and asks to find n or the mean.
Questions that ask to calculate unbiased estimates of population mean and/or variance from given sample data using standard formulas, without additional constraints or reverse-engineering.
Questions where both groups are given with either (mean, SD, n) or (Σx, Σx², n) directly, and you combine them using standard formulas for pooled mean and variance.
Questions that provide raw ungrouped data (a list of individual values) and ask to calculate mean, variance, or standard deviation directly from those values.
Questions that provide summary statistics like Σx, Σx², n, or mean and ask to calculate variance, standard deviation, or Σ(x - x̄)² using algebraic formulas.
Questions that provide two datasets as raw lists of numbers and ask students to construct or draw a back-to-back stem-and-leaf diagram from scratch.
Questions that provide an already-constructed back-to-back stem-and-leaf diagram and ask students to interpret, analyse, or extract information from it without constructing one.
| \(A\) | \(B\) | \multirow{3}{*}{(4)} | |
| 310 | 15 | 1335 | |
| 41 | 16 | 2234457778 | |
| 833 | 17 | 01333466799 | (11) |
| 988655432110 | 18 | 247 | (3) |
| 99886542 | 19 | 15 | (2) |
| 98710 | 20 | 4 | (1) |
Given original mean and/or standard deviation (or raw data to calculate them), and a linear transformation y = ax + b, find the mean and standard deviation of y.
Questions that provide raw data and explicitly ask students to first construct a stem-and-leaf diagram, then find median, quartiles, or IQR from their diagram.
| 115 | 120 | 158 | 132 | 125 |
| 104 | 142 | 160 | 145 | 104 |
| 162 | 117 | 109 | 124 | 134 |
Given Σ(x - c) and Σ(x - c)², calculate variance or standard deviation directly using the standard formula Var(x) = Σ(x - c)²/n - [Σ(x - c)/n]².
A question is this type if and only if it asks to construct a vertical line chart (bar chart for discrete data) from a frequency table.
| Number of people | 1 | 2 | 3 | 4 |
| Frequency | 50 | 31 | 16 | 5 |
Questions that provide a grouped frequency distribution with class intervals (continuous data grouped into ranges) and ask to calculate mean, variance, or standard deviation using midpoints.
| Area \(( x )\) | \(0 < x \leqslant 3\) | \(3 < x \leqslant 5\) | \(5 < x \leqslant 7\) | \(7 < x \leqslant 10\) | \(10 < x \leqslant 20\) |
| Frequency | 3 | 8 | 13 | 14 | 6 |
Questions that ask students to explain data cleaning needs, identify variable types, state units, or describe structural features of the large data set without performing calculations.
| 16 |
| where \(n\) is the total number of cars which had a measured hydrocarbon emission in the Large Data Set. |
| 16 |
| Find the mean of \(X\) |
| [1 mark] |
| 16 |
Use a linear transformation to simplify calculations with awkward numbers, then transform back to find statistics of the original variable.
| 1761.6 | 1758.5 | 1762.3 | 1761.4 | 1759.4 | 1759.1 |
| 1762.5 | 1761.9 | 1762.4 | 1761.9 | 1762.8 | 1761.0 |
Questions where one or more data values are added to an existing dataset and the effect on mean and/or standard deviation must be calculated.
Question asks to determine if specific given values are outliers using the Q₁ - 1.5×IQR or Q₃ + 1.5×IQR criterion, where quartiles must be calculated from raw data or are provided.
| 25 | 0 | ||
| 26 | 0 | 5 | 8 |
| 27 | 7 | 9 | |
| 28 | 1 | 4 | 5 |
| 29 | 0 | 0 | 2 |
| 30 | 7 | 7 | |
| 31 | 6 | ||
| 32 | 0 | 4 | 7 |
| 33 | 3 | 3 |
A cumulative frequency graph is provided, and the question asks to read off values such as median, quartiles, percentiles, or frequencies at specific points directly from the graph.
A frequency or cumulative frequency table is provided, the question requires drawing/constructing a cumulative frequency graph first, then using it to estimate median, quartiles, or other measures.
Questions that require calculating two or more measures of central tendency (mean, median, mode, or midrange) and commenting on their relative usefulness or appropriateness for the given context.
A question is this type if and only if it asks to calculate a confidence interval for the population mean at a specified confidence level.
Questions that provide data as a simple list or table of numbers and ask to find median, quartiles, or IQR without requiring construction of a stem-and-leaf diagram.
Given raw data values or summary statistics (Σx, Σx², mean, SD), calculate Σ(x - c) and/or Σ(x - c)² for a specified constant c.
Questions that ask to identify, explain, or compare different sampling methods (stratified, systematic, simple random) or discuss advantages/disadvantages of sampling approaches.
Questions where the grouped frequency table uses continuous class intervals (e.g., 0 ≤ t < 20, 20 ≤ t < 30) and the histogram is drawn directly from these boundaries.
| Time taken \(( t\) minutes \()\) | \(0 \leqslant t < 20\) | \(20 \leqslant t < 40\) | \(40 \leqslant t < 50\) | \(50 \leqslant t < 60\) | \(60 \leqslant t < 100\) |
| Frequency | 32 | 46 | 96 | 52 | 24 |
Questions where data is recorded to the nearest unit (e.g., 10-19, 20-29 to nearest cm) requiring conversion to continuous boundaries (9.5-19.5, 19.5-29.5) before calculating frequency densities.
| Mass (kg) | \(10 - 14\) | \(15 - 19\) | \(20 - 24\) | \(25 - 34\) | \(35 - 59\) |
| Frequency | 6 | 12 | 14 | 10 | 8 |
A question is this type if and only if it asks to identify the type of skewness from a diagram or to relate skewness to the positions of mean, median, and mode.
Questions that ask how to use random numbers (from calculators, tables, or generators) to select a sample, including converting random decimals to sample numbers.
Questions that present a dataset with one or more outliers and ask students to choose or explain which measure of central tendency is most appropriate (typically median over mean due to outlier influence).
Given the mean and standard deviation of transformed data y = ax + b, find the mean and standard deviation of the original variable x.
Questions where the unbiased estimate of variance is given and students must work backwards to find an unknown sample value or parameter.
Questions that provide raw data or summary statistics (mean, standard deviation, median, IQR) and ask students to calculate and compare measures of location and spread between two datasets.
| Team \(A\) | 150 | 220 | 77 | 30 | 298 | 118 | 160 | 57 |
| Team \(B\) | 166 | 142 | 170 | 93 | 111 | 130 | 148 | 86 |
Questions that provide only graphical representations (cumulative frequency graphs, box plots, stem-and-leaf diagrams) and ask students to extract and compare features of distributions without calculating statistics from raw data.
Questions that require students to compute summary statistics (mean, standard deviation, frequencies) from given data extracted from the large data set.
Questions where one or more data values are removed from an existing dataset and the effect on mean and/or standard deviation must be calculated.
Questions focused on defining populations, sampling frames, sampling units, or explaining why certain samples might be biased or unsatisfactory.
Question asks to show a value is an outlier using a criterion based on mean ± k×standard deviation (typically k=2 or k=3), not the IQR rule.
A question is this type if and only if it asks to draw a box-and-whisker plot from summary statistics or to interpret features from a given box plot.
Questions that ask students to critique or validate claims made about the large data set using their knowledge of its limitations, scope, or structure.
Questions requiring calculation of pooled estimates of variance from multiple samples, typically involving combining information from different groups.
Given coded sums with one constant (e.g., Σ(x - c₁)) and asked to find coded sums with a different constant (e.g., Σ(x - c₂)), or convert between linear transformations like y = (x - a)/b.
| \(p\) | 1840 | 1848 | 1830 | 1824 | 1819 | 1834 | 1850 |
| \(q\) | 4.0 | 4.8 | 3.0 | 2.4 | 1.9 | 3.4 | 5.0 |
| House | Price \(( \pounds )\) | Size \(\left( \mathrm { m } ^ { 2 } \right)\) |
| \(H\) | 156400 | 85 |
| \(J\) | 172900 | 95 |
Questions where at least one group's data is given in coded form like Σ(x-a) and Σ(x-a)², requiring decoding before combining groups.
Questions where one or more data values are removed and simultaneously replaced with different values, requiring calculation of the effect on mean and/or standard deviation.
| 51 | 57 | 58 | 59 | 61 | 64 | 64 | 65 | 67 | 68 |
Questions that ask students to give reasons why a particular measure (usually mean or mode) is not suitable for a given dataset without necessarily calculating all alternatives.
Questions not yet assigned to a type.
| 3 (ii) (b) | |
| 3 (ii) (c) | |
| 3 (ii) (d) | |
8
|
| \(l\) | \(10 - 12\) | \(13 - 15\) | \(16 - 20\) | \(21 - 30\) |
| Frequency | 1 | 13 | 20 | 6 |