OCR Further Statistics 2021 November — Question 6 11 marks

Exam BoardOCR
ModuleFurther Statistics (Further Statistics)
Year2021
SessionNovember
Marks11
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared goodness of fit
TypeChi-squared goodness of fit: Normal
DifficultyStandard +0.3 This is a standard chi-squared goodness of fit test with all calculations provided. Parts (a) and (c) are routine bookwork, part (b) requires basic normal distribution probability calculations and the chi-squared formula (both straightforward), and part (d) asks for interpretation of residuals and sketching an adjusted normal curve. All techniques are standard for Further Statistics with no novel problem-solving required.
Spec5.06b Fit prescribed distribution: chi-squared test5.06c Fit other distributions: discrete and continuous

6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, \(X\) minutes, taken by candidates to complete the paper. The organiser starts by carrying out a goodness-of-fit test for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) at the \(5 \%\) significance level. The grouped data and the results of some of the calculations are shown in the following table.
Time\(0 \leqslant X < 80\)\(80 \leqslant X < 90\)\(90 \leqslant X < 100\)\(100 \leqslant X < 110\)\(X \geqslant 110\)
Observed frequency \(O\)3695137129103
Expected frequency \(E\)45.60680.641123.754123.754126.246
\(\frac { ( O - E ) ^ { 2 } } { E }\)2.0232.5571.4180.2224.280
  1. State suitable hypotheses for the test.
  2. Show how the figures 123.754 and 0.222 in the column for \(100 \leqslant X < 110\) were obtained. [3]
  3. Carry out the test. The organiser now wants to suggest an improved model for the data.
    1. Suggest an aspect of the data that the organiser should take into account in considering an improved model.
    2. The graph of the probability density function for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) is shown in the diagram in the Printed Answer Booklet. On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).

Question 6:
AnswerMarks Guidance
6(a) H : Data consistent with N(100, 152)
0
H : Data not consistent with N(100, 152)
AnswerMarks Guidance
1B1
[1]1.1 Allow: “follows N(100, 152)” or “can be modelled by”.
Parameters not needed. No other alternatives seen!
AnswerMarks Guidance
6(b) P(100 ≤ X < 110) = 0.2475 BC
Expected frequency = 500 × 0.2475 [= 123.754]
(129−123.754)2
[= 0.222…, AG]
AnswerMarks
123.754B1
M1
A1
AnswerMarks
[3]3.4
2.1
AnswerMarks
2.2aProbability needs to be seen
Sufficient working to justify AG, needs 123.754 at least
AnswerMarks Guidance
6(c) ΣX 2 = 10.5
χ2(4) = 9.488 and 10.5 > 9.488
Reject H .
0
AnswerMarks
Significant evidence that data is not consistent with N(100, 152).B1
B1
M1ft
A1ft
AnswerMarks
[4]1.1
1.1
1.1
AnswerMarks
2.2bLike-with-like comparison needed
FT on TS or CV here. Needn’t be stated if next line right
FT on TS (but not CV) if method correct.
Wrong CV, e.g. 5.991: B1B0M1A0. No ft on H /H
0 1
AnswerMarks Guidance
6(d) (i)
data truncated, etcB1
[1]3.5b Any relevant point, needn’t refer to values of X 2
“Divide into 5 minute groups”: B1.
“Data discrete”: B0. “The variance” (uncalculated): B0
AnswerMarks
(ii)Black = PAB version,
red = candidate’s
AnswerMarks
versionB1
B1
AnswerMarks
[2]3.3
3.5cDeal with aspect identified in (i)
Basically correct, areas roughly same
Examples:
Uses “data discrete” in (i)
More below 100, so translate to left
More above 110 so translate to right
Divide into 5-minute groups
Variance changed, areas not equal
AnswerMarks
Data truncated but worse truncation shownB0
B2
B2
B0
B1
B0
Question 6:
6 | (a) | H : Data consistent with N(100, 152)
0
H : Data not consistent with N(100, 152)
1 | B1
[1] | 1.1 | Allow: “follows N(100, 152)” or “can be modelled by”.
Parameters not needed. No other alternatives seen!
6 | (b) | P(100 ≤ X < 110) = 0.2475 BC
Expected frequency = 500 × 0.2475 [= 123.754]
(129−123.754)2
[= 0.222…, AG]
123.754 | B1
M1
A1
[3] | 3.4
2.1
2.2a | Probability needs to be seen
Sufficient working to justify AG, needs 123.754 at least
6 | (c) | ΣX 2 = 10.5
χ2(4) = 9.488 and 10.5 > 9.488
Reject H .
0
Significant evidence that data is not consistent with N(100, 152). | B1
B1
M1ft
A1ft
[4] | 1.1
1.1
1.1
2.2b | Like-with-like comparison needed
FT on TS or CV here. Needn’t be stated if next line right
FT on TS (but not CV) if method correct.
Wrong CV, e.g. 5.991: B1B0M1A0. No ft on H /H
0 1
6 | (d) | (i) | E.g. Too few in X ≥ 110 or in X ≤ 80, or too many in others, or
data truncated, etc | B1
[1] | 3.5b | Any relevant point, needn’t refer to values of X 2
“Divide into 5 minute groups”: B1.
“Data discrete”: B0. “The variance” (uncalculated): B0
(ii) | Black = PAB version,
red = candidate’s
version | B1
B1
[2] | 3.3
3.5c | Deal with aspect identified in (i)
Basically correct, areas roughly same
Examples:
Uses “data discrete” in (i)
More below 100, so translate to left
More above 110 so translate to right
Divide into 5-minute groups
Variance changed, areas not equal
Data truncated but worse truncation shown | B0
B2
B2
B0
B1
B0
6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, $X$ minutes, taken by candidates to complete the paper.

The organiser starts by carrying out a goodness-of-fit test for the distribution $\mathrm { N } \left( 100,15 ^ { 2 } \right)$ at the $5 \%$ significance level. The grouped data and the results of some of the calculations are shown in the following table.

\begin{center}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
Time & $0 \leqslant X < 80$ & $80 \leqslant X < 90$ & $90 \leqslant X < 100$ & $100 \leqslant X < 110$ & $X \geqslant 110$ \\
\hline
Observed frequency $O$ & 36 & 95 & 137 & 129 & 103 \\
\hline
Expected frequency $E$ & 45.606 & 80.641 & 123.754 & 123.754 & 126.246 \\
\hline
$\frac { ( O - E ) ^ { 2 } } { E }$ & 2.023 & 2.557 & 1.418 & 0.222 & 4.280 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\alph*)]
\item State suitable hypotheses for the test.
\item Show how the figures 123.754 and 0.222 in the column for $100 \leqslant X < 110$ were obtained. [3]
\item Carry out the test.

The organiser now wants to suggest an improved model for the data.
\item \begin{enumerate}[label=(\roman*)]
\item Suggest an aspect of the data that the organiser should take into account in considering an improved model.
\item The graph of the probability density function for the distribution $\mathrm { N } \left( 100,15 ^ { 2 } \right)$ is shown in the diagram in the Printed Answer Booklet.

On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
\end{enumerate}\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics 2021 Q6 [11]}}