| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2013 |
| Session | June |
| Marks | 13 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Data representation |
| Type | Estimate mean and standard deviation from frequency table |
| Difficulty | Moderate -0.8 This is a routine S1 statistics question testing standard procedures: histogram bar calculations using frequency density, linear interpolation for median, mean/SD formulas from grouped data (with summations provided), skewness interpretation, and a simple counting application. All techniques are textbook exercises requiring only methodical application of formulas with no problem-solving insight needed. |
| Spec | 2.02b Histogram: area represents frequency2.02g Calculate mean and standard deviation2.02h Recognize outliers |
| Yield ( \(y \mathrm {~kg}\) ) | Frequency (f) | Yield midpoint ( \(x \mathrm {~kg}\) ) |
| \(0 \leqslant y < 5\) | 16 | 2.5 |
| \(5 \leqslant y < 10\) | 24 | 7.5 |
| \(10 \leqslant y < 15\) | 14 | 12.5 |
| \(15 \leqslant y < 25\) | 12 | 20 |
| \(25 \leqslant y < 35\) | 4 | 30 |
| Answer | Marks | Guidance |
|---|---|---|
| (a) Width \(= 2 \times 1.5 = \mathbf{3 \text{ (cm)}}\), Area \(= 8 \times 1.5 = 12 \text{ cm}^2\) Frequency \(= 24\) so \(1 \text{ cm}^2 = 2\) plants (o.e.), Frequency of 12 corresponds to area of 6 so height \(= \mathbf{2 \text{ (cm)}}\) | B1, M1, A1 | (3) |
| (b) \([Q_2] = (5+) \frac{19}{24} \times 5\) or (use of \((n+1)\)) \((5+) \frac{19.5}{24} \times 5 = 8.9583...\text{ awrt } \mathbf{8.96}\) or \(9.0625... \text{ awrt } \mathbf{9.06}\) | M1, A1 | (2) |
| (c) \([\bar{x}] = \frac{755}{70}\) or awrt 10.8 | B1 | |
| \([\sigma_x] = \sqrt{\frac{12037.5}{70} - \bar{x}^2} = \sqrt{55.6326...} = \text{awrt } \mathbf{7.46}\) (Accept \(s = \text{awrt } 7.51\)) | M1A1ft, A1 | (4) |
| (d) \(\bar{x} > Q_2\) | B1ft, dB1 | (2) |
| So positive skew | dB1 |
| Answer | Marks | Guidance |
|---|---|---|
| (e) \(\bar{x} + \sigma \approx 18.3\) so number of plants is e.g. \(\frac{(25 - "18.3")}{10} \times 12 (+4)\) (o.e.) \(= 12.04\) so \(\mathbf{12}\) plants | M1, A1 | (2) |
**(a)** Width $= 2 \times 1.5 = \mathbf{3 \text{ (cm)}}$, Area $= 8 \times 1.5 = 12 \text{ cm}^2$ Frequency $= 24$ so $1 \text{ cm}^2 = 2$ plants (o.e.), Frequency of 12 corresponds to area of 6 so height $= \mathbf{2 \text{ (cm)}}$ | B1, M1, A1 | (3) | B1 for width = 3 (cm); M1 for forming relationship between area and no. of plants or their width $\times$ their height = 6; A1 for height of 2 (cm). Make sure the 2 refers to height and not plants!
**(b)** $[Q_2] = (5+) \frac{19}{24} \times 5$ or (use of $(n+1)$) $(5+) \frac{19.5}{24} \times 5 = 8.9583...\text{ awrt } \mathbf{8.96}$ or $9.0625... \text{ awrt } \mathbf{9.06}$ | M1, A1 | (2) | M1 for suitable fraction $\times 5$ (ignore end points); A1 for awrt 8.96 (or $\frac{215}{24}$ or $8\frac{23}{24}$) or 9.06 (or $\frac{145}{16}$ or $9\frac{1}{16}$ if using $(n+1)$)
**(c)** $[\bar{x}] = \frac{755}{70}$ or awrt 10.8 | B1 | | B1 for correct mean. Accept exact fraction or awrt 10.8
$[\sigma_x] = \sqrt{\frac{12037.5}{70} - \bar{x}^2} = \sqrt{55.6326...} = \text{awrt } \mathbf{7.46}$ (Accept $s = \text{awrt } 7.51$) | M1A1ft, A1 | (4) | M1 for correct expression for $\sigma$ or $\sigma^2$. Condone mixed up labelling- ft their mean; A1ft for correct expression – ft their mean but must have square root; A1 for awrt 7.46 (use of $s = \text{awrt } 7.51$). Condone correct working and answer called variance.
**(d)** $\bar{x} > Q_2$ | B1ft, dB1 | (2) | 1st B1ft for correct comparison of their $\bar{x}$ and their $Q_2$
So positive skew | dB1 | | 2nd dB1 Dependent on suitable reason for concluding "positive skew". "correlation" is B0
**ALT** Allow use of formula for skewness that involves $(\bar{x} - Q_2)$ or use of quartiles but must have correct values. NB $Q_1 = 5.31, Q_3 = 14.46$ (awrt 14.5), $Q_3 - Q_2 \approx 5.5, Q_2 - Q_1 \approx 3.7/6$
**(e)** $\bar{x} + \sigma \approx 18.3$ so number of plants is e.g. $\frac{(25 - "18.3")}{10} \times 12 (+4)$ (o.e.) $= 12.04$ so $\mathbf{12}$ plants | M1, A1 | (2) | M1 for suitable expression involving some interpolation (condone missing 4 so accept awrt 8); A1 for 12 (condone awrt 12). Answer only 2/2
---
3. An agriculturalist is studying the yields, $y \mathrm {~kg}$, from tomato plants. The data from a random sample of 70 tomato plants are summarised below.
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
Yield ( $y \mathrm {~kg}$ ) & Frequency (f) & Yield midpoint ( $x \mathrm {~kg}$ ) \\
\hline
$0 \leqslant y < 5$ & 16 & 2.5 \\
\hline
$5 \leqslant y < 10$ & 24 & 7.5 \\
\hline
$10 \leqslant y < 15$ & 14 & 12.5 \\
\hline
$15 \leqslant y < 25$ & 12 & 20 \\
\hline
$25 \leqslant y < 35$ & 4 & 30 \\
\hline
\end{tabular}
\end{center}
$$\text { (You may use } \sum \mathrm { f } x = 755 \text { and } \sum \mathrm { f } x ^ { 2 } = 12037.5 \text { ) }$$
A histogram has been drawn to represent these data.
The bar representing the yield $5 \leqslant y < 10$ has a width of 1.5 cm and a height of 8 cm .
\begin{enumerate}[label=(\alph*)]
\item Calculate the width and the height of the bar representing the yield $15 \leqslant y < 25$
\item Use linear interpolation to estimate the median yield of the tomato plants.
\item Estimate the mean and the standard deviation of the yields of the tomato plants.
\item Describe, giving a reason, the skewness of the data.
\item Estimate the number of tomato plants in the sample that have a yield of more than 1 standard deviation above the mean.
\end{enumerate}
\hfill \mbox{\textit{Edexcel S1 2013 Q3 [13]}}