| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2007 |
| Session | January |
| Marks | 14 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Measures of Location and Spread |
| Type | Histogram from discrete rounded data |
| Difficulty | Moderate -0.8 This is a routine S1 statistics question testing standard procedures: describing distribution shape, linear interpolation for median, calculating mean/SD from summary statistics, and applying a given skewness formula. All parts follow textbook methods with no problem-solving or novel insight required, making it easier than average A-level questions. |
| Spec | 2.02a Interpret single variable data: tables and diagrams2.02f Measures of average and spread2.02g Calculate mean and standard deviation |
| Distance (to the nearest mile) | Number of commuters |
| 0-9 | 10 |
| 10-19 | 19 |
| 20-29 | 43 |
| 30-39 | 25 |
| 40-49 | 8 |
| 50-59 | 6 |
| 60-69 | 5 |
| 70-79 | 3 |
| 80-89 | 1 |
| Answer | Marks | Guidance |
|---|---|---|
| (a) Positive skew | (both bits) | B1 |
| (b) \(19.5 + \frac{(60 - 29)}{43} \times 10 = 26.7093....\) | M1, A1 | awrt 26.7 |
| (N.B. Use of 60.5 gives 26.825... so allow awrt 26.8) | (2 marks) | |
| (c) \(\mu = \frac{3550}{120} = 29.5833...\) or \(29\frac{7}{12}\) | B1 | awrt 29.6 |
| \(\sigma^2 = \frac{138020}{120} - \mu^2\) or \(\sigma = \sqrt{\frac{138020}{120} - \mu^2}\) | M1 | |
| \(\sigma = 16.5829...\) or (\(s = 16.652...\)) | A1 | awrt 16.6 (or \(s = 16.7\)) |
| (d) \(\frac{3(29.6 - 26.7)}{16.6} = 0.52....\) | M1 A1 f.t. | awrt 0.520 (or with \(s\) awrt 0.518) |
| Answer | Marks | Guidance |
|---|---|---|
| (e) \(0.520 > 0\) | B1 f.t | |
| So it is consistent with their (d) being >0 or <0 | dB1 f.t | ft their (d) |
| (f) Use Median | B1 | |
| Since the data is skewed or less affected by outliers/extreme values | dB1 | (2 marks) |
| (g) If the data are symmetrical or skewness is zero or normal/uniform distribution ("mean = median" or "no outliers" or "evenly distributed" all score B0) | B1 | (1 mark) |
**(a)** Positive skew | (both bits) | B1 | (1 mark)
**(b)** $19.5 + \frac{(60 - 29)}{43} \times 10 = 26.7093....$ | M1, A1 | awrt 26.7 |
(N.B. Use of 60.5 gives 26.825... so allow awrt 26.8) | (2 marks)
**(c)** $\mu = \frac{3550}{120} = 29.5833...$ or $29\frac{7}{12}$ | B1 | awrt **29.6** |
$\sigma^2 = \frac{138020}{120} - \mu^2$ or $\sigma = \sqrt{\frac{138020}{120} - \mu^2}$ | M1 |
$\sigma = 16.5829...$ or ($s = 16.652...$) | A1 | awrt **16.6** (or $s = 16.7$) | (3 marks)
**(d)** $\frac{3(29.6 - 26.7)}{16.6} = 0.52....$ | M1 A1 f.t. | awrt **0.520** (or with $s$ awrt **0.518**) | A1 | (3 marks)
(N.B. 60.5 in (b) ...awrt 0.499 [or with $s$ awrt 0.497])
**(e)** $0.520 > 0$ | B1 f.t |
So it is consistent with their (d) being >0 or <0 | dB1 f.t | ft their (d) | (2 marks)
**(f)** Use Median | B1 |
Since the data is skewed or less affected by outliers/extreme values | dB1 | (2 marks)
**(g)** If the data are symmetrical or skewness is zero or normal/uniform distribution ("mean = median" or "no outliers" or "evenly distributed" all score B0) | B1 | (1 mark)
**Total: 14 marks**
---
\begin{enumerate}
\item Summarised below are the distances, to the nearest mile, travelled to work by a random sample of 120 commuters.
\end{enumerate}
\begin{center}
\begin{tabular}{|l|l|}
\hline
Distance (to the nearest mile) & Number of commuters \\
\hline
0-9 & 10 \\
\hline
10-19 & 19 \\
\hline
20-29 & 43 \\
\hline
30-39 & 25 \\
\hline
40-49 & 8 \\
\hline
50-59 & 6 \\
\hline
60-69 & 5 \\
\hline
70-79 & 3 \\
\hline
80-89 & 1 \\
\hline
\end{tabular}
\end{center}
For this distribution,\\
(a) describe its shape,\\
(b) use linear interpolation to estimate its median.
The mid-point of each class was represented by $x$ and its corresponding frequency by $f$ giving
$$\Sigma f x = 3550 \text { and } \Sigma f x ^ { 2 } = 138020$$
(c) Estimate the mean and the standard deviation of this distribution.
One coefficient of skewness is given by
$$\frac { 3 ( \text { mean - median } ) } { \text { standard deviation } } .$$
(d) Evaluate this coefficient for this distribution.\\
(e) State whether or not the value of your coefficient is consistent with your description in part (a). Justify your answer.\\
(f) State, with a reason, whether you should use the mean or the median to represent the data in this distribution.\\
(g) State the circumstance under which it would not matter whether you used the mean or the median to represent a set of data.\\
\hfill \mbox{\textit{Edexcel S1 2007 Q4 [14]}}