Edexcel S1 2007 January — Question 4 14 marks

Exam BoardEdexcel
ModuleS1 (Statistics 1)
Year2007
SessionJanuary
Marks14
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicMeasures of Location and Spread
TypeHistogram from discrete rounded data
DifficultyModerate -0.8 This is a routine S1 statistics question testing standard procedures: describing distribution shape, linear interpolation for median, calculating mean/SD from summary statistics, and applying a given skewness formula. All parts follow textbook methods with no problem-solving or novel insight required, making it easier than average A-level questions.
Spec2.02a Interpret single variable data: tables and diagrams2.02f Measures of average and spread2.02g Calculate mean and standard deviation

  1. Summarised below are the distances, to the nearest mile, travelled to work by a random sample of 120 commuters.
Distance (to the nearest mile)Number of commuters
0-910
10-1919
20-2943
30-3925
40-498
50-596
60-695
70-793
80-891
For this distribution,
  1. describe its shape,
  2. use linear interpolation to estimate its median. The mid-point of each class was represented by \(x\) and its corresponding frequency by \(f\) giving $$\Sigma f x = 3550 \text { and } \Sigma f x ^ { 2 } = 138020$$
  3. Estimate the mean and the standard deviation of this distribution. One coefficient of skewness is given by $$\frac { 3 ( \text { mean - median } ) } { \text { standard deviation } } .$$
  4. Evaluate this coefficient for this distribution.
  5. State whether or not the value of your coefficient is consistent with your description in part (a). Justify your answer.
  6. State, with a reason, whether you should use the mean or the median to represent the data in this distribution.
  7. State the circumstance under which it would not matter whether you used the mean or the median to represent a set of data.

AnswerMarks Guidance
(a) Positive skew(both bits) B1
(b) \(19.5 + \frac{(60 - 29)}{43} \times 10 = 26.7093....\)M1, A1 awrt 26.7
(N.B. Use of 60.5 gives 26.825... so allow awrt 26.8)(2 marks)
(c) \(\mu = \frac{3550}{120} = 29.5833...\) or \(29\frac{7}{12}\)B1 awrt 29.6
\(\sigma^2 = \frac{138020}{120} - \mu^2\) or \(\sigma = \sqrt{\frac{138020}{120} - \mu^2}\)M1
\(\sigma = 16.5829...\) or (\(s = 16.652...\))A1 awrt 16.6 (or \(s = 16.7\))
(d) \(\frac{3(29.6 - 26.7)}{16.6} = 0.52....\)M1 A1 f.t. awrt 0.520 (or with \(s\) awrt 0.518)
(N.B. 60.5 in (b) ...awrt 0.499 [or with \(s\) awrt 0.497])
AnswerMarks Guidance
(e) \(0.520 > 0\)B1 f.t
So it is consistent with their (d) being >0 or <0dB1 f.t ft their (d)
(f) Use MedianB1
Since the data is skewed or less affected by outliers/extreme valuesdB1 (2 marks)
(g) If the data are symmetrical or skewness is zero or normal/uniform distribution ("mean = median" or "no outliers" or "evenly distributed" all score B0)B1 (1 mark)
Total: 14 marks
**(a)** Positive skew | (both bits) | B1 | (1 mark)

**(b)** $19.5 + \frac{(60 - 29)}{43} \times 10 = 26.7093....$ | M1, A1 | awrt 26.7 |
(N.B. Use of 60.5 gives 26.825... so allow awrt 26.8) | (2 marks)

**(c)** $\mu = \frac{3550}{120} = 29.5833...$ or $29\frac{7}{12}$ | B1 | awrt **29.6** |
$\sigma^2 = \frac{138020}{120} - \mu^2$ or $\sigma = \sqrt{\frac{138020}{120} - \mu^2}$ | M1 |
$\sigma = 16.5829...$ or ($s = 16.652...$) | A1 | awrt **16.6** (or $s = 16.7$) | (3 marks)

**(d)** $\frac{3(29.6 - 26.7)}{16.6} = 0.52....$ | M1 A1 f.t. | awrt **0.520** (or with $s$ awrt **0.518**) | A1 | (3 marks)
(N.B. 60.5 in (b) ...awrt 0.499 [or with $s$ awrt 0.497])

**(e)** $0.520 > 0$ | B1 f.t |
So it is consistent with their (d) being >0 or <0 | dB1 f.t | ft their (d) | (2 marks)

**(f)** Use Median | B1 |
Since the data is skewed or less affected by outliers/extreme values | dB1 | (2 marks)

**(g)** If the data are symmetrical or skewness is zero or normal/uniform distribution ("mean = median" or "no outliers" or "evenly distributed" all score B0) | B1 | (1 mark)

**Total: 14 marks**

---
\begin{enumerate}
  \item Summarised below are the distances, to the nearest mile, travelled to work by a random sample of 120 commuters.
\end{enumerate}

\begin{center}
\begin{tabular}{|l|l|}
\hline
Distance (to the nearest mile) & Number of commuters \\
\hline
0-9 & 10 \\
\hline
10-19 & 19 \\
\hline
20-29 & 43 \\
\hline
30-39 & 25 \\
\hline
40-49 & 8 \\
\hline
50-59 & 6 \\
\hline
60-69 & 5 \\
\hline
70-79 & 3 \\
\hline
80-89 & 1 \\
\hline
\end{tabular}
\end{center}

For this distribution,\\
(a) describe its shape,\\
(b) use linear interpolation to estimate its median.

The mid-point of each class was represented by $x$ and its corresponding frequency by $f$ giving

$$\Sigma f x = 3550 \text { and } \Sigma f x ^ { 2 } = 138020$$

(c) Estimate the mean and the standard deviation of this distribution.

One coefficient of skewness is given by

$$\frac { 3 ( \text { mean - median } ) } { \text { standard deviation } } .$$

(d) Evaluate this coefficient for this distribution.\\
(e) State whether or not the value of your coefficient is consistent with your description in part (a). Justify your answer.\\
(f) State, with a reason, whether you should use the mean or the median to represent the data in this distribution.\\
(g) State the circumstance under which it would not matter whether you used the mean or the median to represent a set of data.\\

\hfill \mbox{\textit{Edexcel S1 2007 Q4 [14]}}