| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2009 |
| Session | January |
| Marks | 16 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Data representation |
| Type | Calculate using histogram bar dimensions |
| Difficulty | Standard +0.3 This is a standard S1 histogram question testing routine procedures: calculating frequency density for bar dimensions, linear interpolation for median/IQR, and mean/SD from grouped data. All techniques are textbook exercises requiring careful arithmetic but no problem-solving insight. Slightly easier than average due to being entirely procedural. |
| Spec | 2.02b Histogram: area represents frequency2.02f Measures of average and spread2.02g Calculate mean and standard deviation2.02h Recognize outliers |
| Number of hours | Mid-point | Frequency |
| 0-5 | 2.75 | 20 |
| 6-7 | 6.5 | 16 |
| 8-10 | 9 | 18 |
| 11-15 | 13 | 25 |
| 16-25 | 20.5 | 15 |
| 26-50 | 38 | 10 |
| Answer | Marks | Guidance |
|---|---|---|
| 16–25 hours: width \(= 25.5 - 15.5 = 10\), represented by 5 cm | B1, M1 | B1 for attempting both frequency densities \(\frac{18}{3}(=6)\) and \(\frac{15}{10}\), and \(\frac{15}{10} \times \text{SF}\) where \(\text{SF} \neq 1\) |
| Answer | Marks |
|---|---|
| 16–25 hours: height \(= \text{fd} = 15/10 = 1.5\), represented by 0.75 cm | A1 (3) |
| Answer | Marks | Guidance |
|---|---|---|
| \(Q_2 = 7.5 + \frac{(52-36)}{18} \times 3 = 10.2\) | M1, A1 | M1 for identifying correct interval and correct fraction. 1st A1 for 10.2 (using \(n+1\) allow AWRT 10.3) |
| \(Q_1 = 5.5 + \frac{(26-20)}{16} \times 2 = 6.25\) or \(6.3\), or \(5.5 + \frac{(26.25-20)}{16} \times 2 [=6.3]\) | A1 | 2nd A1 for correct expression for either \(Q_1\) or \(Q_3\) |
| \(Q_3 = 10.5 + \frac{(78-54)}{25} \times 5 [=15.3]\) or \(10.5 + \frac{(78.75-54)}{25} \times 5 [=15.45 \approx 15.5]\) | A1, A1ft | 3rd A1 for correct expressions for both \(Q_1\) and \(Q_3\); 4th A1ft for IQR, ft their quartiles |
| \(\text{IQR} = (15.3 - 6.3) = 9\) | (5) |
| Answer | Marks | Guidance |
|---|---|---|
| \(\sum fx = 1333.5 \Rightarrow \bar{x} = \frac{1333.5}{104}\) AWRT 12.8 | M1 A1 | 1st M1 for attempting \(\sum fx\) and \(\bar{x}\) |
| \(\sum fx^2 = 27254 \Rightarrow \sigma_x = \sqrt{\frac{27254}{104} - \bar{x}^2} = \sqrt{262.05 - \bar{x}^2}\) AWRT 9.88 | M1 A1 (4) | 2nd M1 for attempting \(\sum fx^2\) and \(\sigma_x\); \(\sqrt{\phantom{x}}\) is needed for M1. Allow \(s =\) AWRT 9.93 |
| Answer | Marks | Guidance |
|---|---|---|
| \(Q_3 - Q_2 [=5.1] > Q_2 - Q_1 [=3.9]\) or \(Q_2 < \bar{x}\) | B1ft | 1st B1ft for suitable test; values need not be seen but statement must be compatible with values used |
| So data is positively skew | dB1 (2) | 2nd dB1 dependent on test showing positive skew and for stating positive skew. If test shows negative skew score 1st B1 but lose 2nd |
| Answer | Marks | Guidance |
|---|---|---|
| Use median and IQR, since data is skewed or not affected by extreme values or outliers | B1, B1 (2) | 1st B1 for choosing median and IQR — must mention both; 2nd B1 for suitable reason. "Use median because data is skewed" scores B0B1 since IQR not mentioned |
## Question 5:
**(a)**
8–10 hours: width $= 10.5 - 7.5 = 3$, represented by 1.5 cm
16–25 hours: width $= 25.5 - 15.5 = 10$, represented by 5 cm | B1, M1 | B1 for attempting both frequency densities $\frac{18}{3}(=6)$ and $\frac{15}{10}$, and $\frac{15}{10} \times \text{SF}$ where $\text{SF} \neq 1$
8–10 hours: height $= \text{fd} = 18/3 = 6$, represented by 3 cm
16–25 hours: height $= \text{fd} = 15/10 = 1.5$, represented by 0.75 cm | A1 (3) |
**(b)**
$Q_2 = 7.5 + \frac{(52-36)}{18} \times 3 = 10.2$ | M1, A1 | M1 for identifying correct interval and correct fraction. 1st A1 for 10.2 (using $n+1$ allow AWRT 10.3)
$Q_1 = 5.5 + \frac{(26-20)}{16} \times 2 = 6.25$ or $6.3$, or $5.5 + \frac{(26.25-20)}{16} \times 2 [=6.3]$ | A1 | 2nd A1 for correct expression for either $Q_1$ or $Q_3$
$Q_3 = 10.5 + \frac{(78-54)}{25} \times 5 [=15.3]$ or $10.5 + \frac{(78.75-54)}{25} \times 5 [=15.45 \approx 15.5]$ | A1, A1ft | 3rd A1 for correct expressions for both $Q_1$ and $Q_3$; 4th A1ft for IQR, ft their quartiles
$\text{IQR} = (15.3 - 6.3) = 9$ | (5) |
**(c)**
$\sum fx = 1333.5 \Rightarrow \bar{x} = \frac{1333.5}{104}$ AWRT 12.8 | M1 A1 | 1st M1 for attempting $\sum fx$ and $\bar{x}$
$\sum fx^2 = 27254 \Rightarrow \sigma_x = \sqrt{\frac{27254}{104} - \bar{x}^2} = \sqrt{262.05 - \bar{x}^2}$ AWRT 9.88 | M1 A1 (4) | 2nd M1 for attempting $\sum fx^2$ and $\sigma_x$; $\sqrt{\phantom{x}}$ is needed for M1. Allow $s =$ AWRT 9.93
**(d)**
$Q_3 - Q_2 [=5.1] > Q_2 - Q_1 [=3.9]$ or $Q_2 < \bar{x}$ | B1ft | 1st B1ft for suitable test; values need not be seen but statement must be compatible with values used
So data is positively skew | dB1 (2) | 2nd dB1 dependent on test showing positive skew and for stating positive skew. If test shows negative skew score 1st B1 but lose 2nd
**(e)**
Use median and IQR, since data is skewed or not affected by extreme values or outliers | B1, B1 (2) | 1st B1 for choosing median and IQR — must mention both; 2nd B1 for suitable reason. "Use median because data is skewed" scores B0B1 since IQR not mentioned
5. In a shopping survey a random sample of 104 teenagers were asked how many hours, to the nearest hour, they spent shopping in the last month. The results are summarised in the table below.
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
Number of hours & Mid-point & Frequency \\
\hline
0-5 & 2.75 & 20 \\
\hline
6-7 & 6.5 & 16 \\
\hline
8-10 & 9 & 18 \\
\hline
11-15 & 13 & 25 \\
\hline
16-25 & 20.5 & 15 \\
\hline
26-50 & 38 & 10 \\
\hline
\end{tabular}
\end{center}
A histogram was drawn and the group ( $8 - 10$ ) hours was represented by a rectangle that was 1.5 cm wide and 3 cm high.
\begin{enumerate}[label=(\alph*)]
\item Calculate the width and height of the rectangle representing the group (16-25) hours.
\item Use linear interpolation to estimate the median and interquartile range.
\item Estimate the mean and standard deviation of the number of hours spent shopping.
\item State, giving a reason, the skewness of these data.
\item State, giving a reason, which average and measure of dispersion you would recommend to use to summarise these data.
\end{enumerate}
\hfill \mbox{\textit{Edexcel S1 2009 Q5 [16]}}