OCR MEI S3 2013 June — Question 3 19 marks

Exam BoardOCR MEI
ModuleS3 (Statistics 3)
Year2013
SessionJune
Marks19
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared goodness of fit
TypeChi-squared goodness of fit: Other continuous
DifficultyChallenging +1.2 This is a multi-part question combining continuous probability distributions with chi-squared testing. Parts (i)-(iii) involve standard A-level techniques (sketching, integration, using normalization), while parts (iv)-(v) require calculating expected frequencies and performing a routine goodness-of-fit test. The integration is straightforward polynomial work, and the hypothesis test follows a standard template. More demanding than average due to the combination of pure and applied statistics and the multi-step nature, but all techniques are standard S3 material with no novel insights required.
Spec5.03a Continuous random variables: pdf and cdf5.03b Solve problems: using pdf5.03c Calculate mean/variance: by integration5.03e Find cdf: by integration5.06b Fit prescribed distribution: chi-squared test

3 The random variable \(X\) has the following probability density function, \(\mathrm { f } ( x )\). $$f ( x ) = \begin{cases} k x ( x - 5 ) ^ { 2 } & 0 \leqslant x < 5 \\ 0 & \text { elsewhere } \end{cases}$$
  1. Sketch \(\mathrm { f } ( x )\).
  2. Find, in terms of \(k\), the cumulative distribution function, \(\mathrm { F } ( x )\).
  3. Hence show that \(k = \frac { 12 } { 625 }\). The random variable \(X\) is proposed as a model for the amount of time, in minutes, lost due to stoppages during a football match. The times lost in a random sample of 60 matches are summarised in the table. The table also shows some of the corresponding expected frequencies given by the model.
    Time (minutes)\(0 \leqslant x < 1\)\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)
    Observed frequency51523116
    Expected frequency17.769.121.632
  4. Find the remaining expected frequencies.
  5. Carry out a goodness of fit test, using a significance level of \(2.5 \%\), to see if the model might be suitable in this context.

Question 3:
Part (i)
AnswerMarks Guidance
AnswerMarks Guidance
Curve through the origin and in the first quadrant onlyG1
A single maximum; curve returns to \(y=0\); nothing to the right of \(x=5\)G1
No turning point at \(x=0\); turning point at \(x=5\); \((5,0)\) labelled (p.i. by an indicated scale)G1
[3]
Part (ii)
AnswerMarks Guidance
AnswerMarks Guidance
\(F(x) = k\int_0^x t(t-5)^2\, dt\)M1 Correct integral for \(F(x)\) with limits (which may appear later)
\(= k\left[\dfrac{t^4}{4} - \dfrac{10t^3}{3} + \dfrac{25t^2}{2}\right]_0^x\)M1 Correctly integrated
\(= k\left(\dfrac{x^4}{4} - \dfrac{10x^3}{3} + \dfrac{25x^2}{2}\right)\)A1 Limits used correctly to obtain expression. Condone absence of "\(-0\)". Do not require complete definition of \(F(x)\). Dependent on both M1s
[3]
Question 3:
Part (iii)
AnswerMarks Guidance
AnswerMark Guidance
\(F(5) = 1\)
\(\therefore k\left(\frac{5^4}{4} - \frac{10 \times 5^3}{3} + \frac{25 \times 5^2}{2}\right) = 1\)M1 Substitute \(x = 5\) and equate to 1.
\(\therefore k\left(\frac{1875 - 5000 + 3750}{12}\right) = 1\) Expect to see evidence of at least this line of working (oe) for A1.
\(\therefore k \times \frac{625}{12} = 1\)
\(\therefore k = \frac{12}{625}\)A1 Convincingly shown. Beware printed answer.
[2]
Part (iv)
AnswerMarks Guidance
AnswerMark Guidance
For \(0 \leq x < 1\), Expected \(f = 60 \times F(1)\)M1 Use of \(60 \times F(x)\) with correct \(k\).
\(= 60 \times \frac{12}{625}\left(\frac{1^4}{4} - \frac{10 \times 1^3}{3} + \frac{25 \times 1^2}{2}\right) = 10.848\)A1 Allow also \(31.488\) – frequency for \(1 \leq x < 2\) provided that one found using \(F(x)\). Allow either frequency found by integration.
For \(1 \leq x < 2\), Expected \(f = 60 - \Sigma(\text{the rest}) = 20.64\)B1 FT \(31.488\) – previous answer. Or allow \(60 \times (F(2) - F(1))\)
[3]
Part (v)
AnswerMarks Guidance
AnswerMark Guidance
\(H_0\): The model is suitable / fits the data. \(H_1\): The model is not suitable / does not fit the data.B1 Both hypotheses. Must be the right way round. Do not accept "data fit model" oe.
Merge last 2 cells: Obs \(f = 17\), Exp \(f = 10.752\)M1
\(X^2 = 3.1525 + 1.5411 + 1.5460 + 3.6307 = 9.870\)M1 Calculation of \(X^2\).
A1c.a.o.
Refer to \(\chi^2_3\).M1 Allow correct df (= cells \(- 1\)) from wrongly grouped table and ft. Otherwise, no ft if wrong.
Upper \(2.5\%\) point is \(9.348\).A1 No ft from here if wrong. \(P(X^2 > 9.870) = 0.0197\).
Significant.A1 ft only c's test statistic.
Sufficient evidence to suggest that the model is not suitable in this context.A1 ft only c's test statistic. Conclusion in context. Do not accept "data do not fit model" oe.
[8]
# Question 3:

## Part (i)

| Answer | Marks | Guidance |
|--------|-------|----------|
| Curve through the origin and in the first quadrant only | G1 | |
| A single maximum; curve returns to $y=0$; nothing to the right of $x=5$ | G1 | |
| No turning point at $x=0$; turning point at $x=5$; $(5,0)$ labelled (p.i. by an indicated scale) | G1 | |

**[3]**

## Part (ii)

| Answer | Marks | Guidance |
|--------|-------|----------|
| $F(x) = k\int_0^x t(t-5)^2\, dt$ | M1 | Correct integral for $F(x)$ with limits (which may appear later) |
| $= k\left[\dfrac{t^4}{4} - \dfrac{10t^3}{3} + \dfrac{25t^2}{2}\right]_0^x$ | M1 | Correctly integrated |
| $= k\left(\dfrac{x^4}{4} - \dfrac{10x^3}{3} + \dfrac{25x^2}{2}\right)$ | A1 | Limits used correctly to obtain expression. Condone absence of "$-0$". Do not require complete definition of $F(x)$. Dependent on both M1s |

**[3]**

# Question 3:

## Part (iii)

| Answer | Mark | Guidance |
|--------|------|----------|
| $F(5) = 1$ | | |
| $\therefore k\left(\frac{5^4}{4} - \frac{10 \times 5^3}{3} + \frac{25 \times 5^2}{2}\right) = 1$ | M1 | Substitute $x = 5$ and equate to 1. |
| $\therefore k\left(\frac{1875 - 5000 + 3750}{12}\right) = 1$ | | Expect to see evidence of at least this line of working (oe) for A1. |
| $\therefore k \times \frac{625}{12} = 1$ | | |
| $\therefore k = \frac{12}{625}$ | A1 | Convincingly shown. Beware printed answer. |
| **[2]** | | |

## Part (iv)

| Answer | Mark | Guidance |
|--------|------|----------|
| For $0 \leq x < 1$, Expected $f = 60 \times F(1)$ | M1 | Use of $60 \times F(x)$ with correct $k$. |
| $= 60 \times \frac{12}{625}\left(\frac{1^4}{4} - \frac{10 \times 1^3}{3} + \frac{25 \times 1^2}{2}\right) = 10.848$ | A1 | Allow also $31.488$ – frequency for $1 \leq x < 2$ provided that one found using $F(x)$. Allow either frequency found by integration. |
| For $1 \leq x < 2$, Expected $f = 60 - \Sigma(\text{the rest}) = 20.64$ | B1 | FT $31.488$ – previous answer. Or allow $60 \times (F(2) - F(1))$ |
| **[3]** | | |

## Part (v)

| Answer | Mark | Guidance |
|--------|------|----------|
| $H_0$: The model is suitable / fits the data. $H_1$: The model is not suitable / does not fit the data. | B1 | Both hypotheses. Must be the right way round. Do not accept "data fit model" oe. |
| Merge last 2 cells: Obs $f = 17$, Exp $f = 10.752$ | M1 | |
| $X^2 = 3.1525 + 1.5411 + 1.5460 + 3.6307 = 9.870$ | M1 | Calculation of $X^2$. |
| | A1 | c.a.o. |
| Refer to $\chi^2_3$. | M1 | Allow correct df (= cells $- 1$) from wrongly grouped table and ft. Otherwise, no ft if wrong. |
| Upper $2.5\%$ point is $9.348$. | A1 | No ft from here if wrong. $P(X^2 > 9.870) = 0.0197$. |
| Significant. | A1 | ft only c's test statistic. |
| Sufficient evidence to suggest that the model is not suitable in this context. | A1 | ft only c's test statistic. Conclusion in context. Do not accept "data do not fit model" oe. |
| **[8]** | | |

---
3 The random variable $X$ has the following probability density function, $\mathrm { f } ( x )$.

$$f ( x ) = \begin{cases} k x ( x - 5 ) ^ { 2 } & 0 \leqslant x < 5 \\ 0 & \text { elsewhere } \end{cases}$$

(i) Sketch $\mathrm { f } ( x )$.\\
(ii) Find, in terms of $k$, the cumulative distribution function, $\mathrm { F } ( x )$.\\
(iii) Hence show that $k = \frac { 12 } { 625 }$.

The random variable $X$ is proposed as a model for the amount of time, in minutes, lost due to stoppages during a football match. The times lost in a random sample of 60 matches are summarised in the table. The table also shows some of the corresponding expected frequencies given by the model.

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | }
\hline
Time (minutes) & $0 \leqslant x < 1$ & $1 \leqslant x < 2$ & $2 \leqslant x < 3$ & $3 \leqslant x < 4$ & $4 \leqslant x < 5$ \\
\hline
Observed frequency & 5 & 15 & 23 & 11 & 6 \\
\hline
Expected frequency &  &  & 17.76 & 9.12 & 1.632 \\
\hline
\end{tabular}
\end{center}

(iv) Find the remaining expected frequencies.\\
(v) Carry out a goodness of fit test, using a significance level of $2.5 \%$, to see if the model might be suitable in this context.

\hfill \mbox{\textit{OCR MEI S3 2013 Q3 [19]}}