| Exam Board | OCR |
|---|---|
| Module | Further Statistics (Further Statistics) |
| Year | 2021 |
| Session | November |
| Marks | 11 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Chi-squared goodness of fit |
| Type | Chi-squared goodness of fit: Normal |
| Difficulty | Standard +0.3 This is a standard chi-squared goodness of fit test with all calculations provided. Parts (a) and (c) are routine bookwork, part (b) requires basic normal distribution probability calculations and the chi-squared formula (both straightforward), and part (d) asks for interpretation of residuals and sketching an adjusted normal curve. All techniques are standard for Further Statistics with no novel problem-solving required. |
| Spec | 5.06b Fit prescribed distribution: chi-squared test5.06c Fit other distributions: discrete and continuous |
| Time | \(0 \leqslant X < 80\) | \(80 \leqslant X < 90\) | \(90 \leqslant X < 100\) | \(100 \leqslant X < 110\) | \(X \geqslant 110\) |
| Observed frequency \(O\) | 36 | 95 | 137 | 129 | 103 |
| Expected frequency \(E\) | 45.606 | 80.641 | 123.754 | 123.754 | 126.246 |
| \(\frac { ( O - E ) ^ { 2 } } { E }\) | 2.023 | 2.557 | 1.418 | 0.222 | 4.280 |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (a) | H : Data consistent with N(100, 152) |
| Answer | Marks | Guidance |
|---|---|---|
| 1 | B1 | |
| [1] | 1.1 | Allow: “follows N(100, 152)” or “can be modelled by”. |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (b) | P(100 ≤ X < 110) = 0.2475 BC |
| Answer | Marks |
|---|---|
| 123.754 | B1 |
| Answer | Marks |
|---|---|
| [3] | 3.4 |
| Answer | Marks |
|---|---|
| 2.2a | Probability needs to be seen |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (c) | ΣX 2 = 10.5 |
| Answer | Marks |
|---|---|
| Significant evidence that data is not consistent with N(100, 152). | B1 |
| Answer | Marks |
|---|---|
| [4] | 1.1 |
| Answer | Marks |
|---|---|
| 2.2b | Like-with-like comparison needed |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (d) | (i) |
| data truncated, etc | B1 | |
| [1] | 3.5b | Any relevant point, needn’t refer to values of X 2 |
| Answer | Marks |
|---|---|
| (ii) | Black = PAB version, |
| Answer | Marks |
|---|---|
| version | B1 |
| Answer | Marks |
|---|---|
| [2] | 3.3 |
| 3.5c | Deal with aspect identified in (i) |
| Answer | Marks |
|---|---|
| Data truncated but worse truncation shown | B0 |
Question 6:
6 | (a) | H : Data consistent with N(100, 152)
0
H : Data not consistent with N(100, 152)
1 | B1
[1] | 1.1 | Allow: “follows N(100, 152)” or “can be modelled by”.
Parameters not needed. No other alternatives seen!
6 | (b) | P(100 ≤ X < 110) = 0.2475 BC
Expected frequency = 500 × 0.2475 [= 123.754]
(129−123.754)2
[= 0.222…, AG]
123.754 | B1
M1
A1
[3] | 3.4
2.1
2.2a | Probability needs to be seen
Sufficient working to justify AG, needs 123.754 at least
6 | (c) | ΣX 2 = 10.5
χ2(4) = 9.488 and 10.5 > 9.488
Reject H .
0
Significant evidence that data is not consistent with N(100, 152). | B1
B1
M1ft
A1ft
[4] | 1.1
1.1
1.1
2.2b | Like-with-like comparison needed
FT on TS or CV here. Needn’t be stated if next line right
FT on TS (but not CV) if method correct.
Wrong CV, e.g. 5.991: B1B0M1A0. No ft on H /H
0 1
6 | (d) | (i) | E.g. Too few in X ≥ 110 or in X ≤ 80, or too many in others, or
data truncated, etc | B1
[1] | 3.5b | Any relevant point, needn’t refer to values of X 2
“Divide into 5 minute groups”: B1.
“Data discrete”: B0. “The variance” (uncalculated): B0
(ii) | Black = PAB version,
red = candidate’s
version | B1
B1
[2] | 3.3
3.5c | Deal with aspect identified in (i)
Basically correct, areas roughly same
Examples:
Uses “data discrete” in (i)
More below 100, so translate to left
More above 110 so translate to right
Divide into 5-minute groups
Variance changed, areas not equal
Data truncated but worse truncation shown | B0
B2
B2
B0
B1
B0
6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, $X$ minutes, taken by candidates to complete the paper.
The organiser starts by carrying out a goodness-of-fit test for the distribution $\mathrm { N } \left( 100,15 ^ { 2 } \right)$ at the $5 \%$ significance level. The grouped data and the results of some of the calculations are shown in the following table.
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
Time & $0 \leqslant X < 80$ & $80 \leqslant X < 90$ & $90 \leqslant X < 100$ & $100 \leqslant X < 110$ & $X \geqslant 110$ \\
\hline
Observed frequency $O$ & 36 & 95 & 137 & 129 & 103 \\
\hline
Expected frequency $E$ & 45.606 & 80.641 & 123.754 & 123.754 & 126.246 \\
\hline
$\frac { ( O - E ) ^ { 2 } } { E }$ & 2.023 & 2.557 & 1.418 & 0.222 & 4.280 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\alph*)]
\item State suitable hypotheses for the test.
\item Show how the figures 123.754 and 0.222 in the column for $100 \leqslant X < 110$ were obtained. [3]
\item Carry out the test.
The organiser now wants to suggest an improved model for the data.
\item \begin{enumerate}[label=(\roman*)]
\item Suggest an aspect of the data that the organiser should take into account in considering an improved model.
\item The graph of the probability density function for the distribution $\mathrm { N } \left( 100,15 ^ { 2 } \right)$ is shown in the diagram in the Printed Answer Booklet.
On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
\end{enumerate}\end{enumerate}
\hfill \mbox{\textit{OCR Further Statistics 2021 Q6 [11]}}