OCR Further Statistics AS 2023 June — Question 6 12 marks

Exam BoardOCR
ModuleFurther Statistics AS (Further Statistics AS)
Year2023
SessionJune
Marks12
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicChi-squared goodness of fit
TypeAssess model suitability before testing
DifficultyStandard +0.3 This is a straightforward chi-squared goodness of fit test with given ratios. Students must calculate expected frequencies from ratios (standard procedure), complete a partially-filled table, perform the test with given significance level, and make a basic interpretation. The only slight elevation above average is part (a) requiring comparison of mean and variance for binomial validity, but overall this follows a standard template with no novel problem-solving required.
Spec5.06c Fit other distributions: discrete and continuous5.06d Goodness of fit: chi-squared test

6 A machine is used to toss a coin repeatedly. Rosa believes that the outcome of each toss made by the machine is not independent of the previous toss. Rosa gets the machine to toss a coin 6 times and record the number of heads, \(X\), obtained. After recording the number of heads obtained, Rosa resets the machine and gets it to toss the coin 6 more times. Rosa again records the number of heads obtained and she repeats this procedure until she has recorded 88 independent values of \(X\).
  1. The sample mean and sample variance of \(X\) are 3.35 and 3.392 respectively. Explain what these results suggest about the validity of a binomial model \(\mathrm { B } ( 6 , p )\) for the data. Rosa uses a computer spreadsheet to work out the probabilities for a more sophisticated model in which the outcome of each toss is dependent on the outcome of the previous toss. Her model suggests that the probabilities \(\mathrm { P } ( X = x )\), for \(x = 0,1,2,3,4,5,6\), are approximately in the ratio \(5 : 6 : 7 : 8 : 7 : 6 : 5\). She carries out a \(\chi ^ { 2 }\) test to investigate whether this model is a good fit for the data. The following table shows the full results of the experiments, together with some of the calculations needed for the test.
    \(x\)0123456Total
    Observed frequency710161515111488
    Expected frequency
    Contribution to \(\chi ^ { 2 }\) statistic0.90.33330.28570.06250.0714
  2. In the Printed Answer Booklet, complete the table.
  3. Carry out the test, using a 10\% significance level.
  4. Rosa says that the results definitely show that one of the two proposed models is correct. Comment on this statement.

Question 6:
Part (a):
AnswerMarks Guidance
AnswerMarks Guidance
*Either* \(6p = 3.35 \Rightarrow p = 0.558(33)\)M1 Use \(np\) and \(npq\). Attempt to use Poisson: M0
\(\Rightarrow\) variance should be 1.48 (1.47958)A1 Correct relevant calculation, e.g. \(q = 1.025...\), or \(p = -0.0125\) or solve \(6p^2 - 6p + 3.392 = 0\) to get both \(p \approx 1.4\) or \(-0.4\), but *not* from \(p = 0.5\)
Not close to 3.392 so \(B(6,p)\) not a good modelA1 [3] Validly deduce that \(B(6,p)\) not valid, e.g. \(0 < p < 1\), and state conclusion. SC: 0.5 used: M1A0A1
*Or* \(npq > np\); so \(q > 1\) which is impossible. Hence \(B(6,p)\) not a good modelM1A1 A1 (qualitative argument)
Part (b):
AnswerMarks Guidance
AnswerMarks Guidance
Expected frequencies 10, 12, 14, 16, 14, 12, 10B1
Use \(\frac{(O-E)^2}{E}\)M1 Allow from at least one of \(0.083(...)\) and 1.6 correct
\(0.083(3...)\), 1.6 and total 3.3362 or 3.3363A1 [3] Allow 3.34, 3.336 or better. If total omitted, or "0", in (b), can be recovered from (c) ("0" probably comes from misunderstanding "Total")
Part (c):
AnswerMarks Guidance
AnswerMarks Guidance
\(H_0\): data consistent with proposed model, \(H_1\): not soB1 Allow "data follows …" but *not* "data is in ratio …" nor "evidence that …"
\(3.336(2) < 10.64\)B1ft Compare *their* 3.336 with correct CV (3.336 may be from calculator)
Do not reject \(H_0\)M1ft Correct first conclusion, FT on their TS and on CV 9.236 or 12.59
Insufficient evidence that proposed model does not fit dataA1ft [4] Contextualised, not over-assertive. Needs 'double negative', *not* "significant evidence that data is consistent", etc. A0 if hypotheses wrong way round
Part (d):
AnswerMarks Guidance
AnswerMarks Guidance
Inferences from a hypothesis test are not "definite"B1 "Definite" stated to be too strong, oe (not *just* "Rosa is wrong")
All we have is evidence / Sample size is small / other experiments might produce different resultsB1 [2] Relevant valid comment, e.g. "data might be misleading", "second model likely to be correct", "either could be correct", and no wrong extras. "Neither/both good" etc, from wrong conclusion to (a) or (c): max B1B0
# Question 6:

## Part (a):

| Answer | Marks | Guidance |
|--------|-------|----------|
| *Either* $6p = 3.35 \Rightarrow p = 0.558(33)$ | **M1** | Use $np$ and $npq$. Attempt to use Poisson: M0 |
| $\Rightarrow$ variance should be 1.48 (1.47958) | **A1** | Correct relevant calculation, e.g. $q = 1.025...$, or $p = -0.0125$ or solve $6p^2 - 6p + 3.392 = 0$ to get both $p \approx 1.4$ or $-0.4$, but *not* from $p = 0.5$ |
| Not close to 3.392 so $B(6,p)$ not a good model | **A1 [3]** | Validly deduce that $B(6,p)$ not valid, e.g. $0 < p < 1$, and state conclusion. SC: 0.5 used: M1A0A1 |
| *Or* $npq > np$; so $q > 1$ which is impossible. Hence $B(6,p)$ not a good model | **M1A1 A1** | (qualitative argument) |

## Part (b):

| Answer | Marks | Guidance |
|--------|-------|----------|
| Expected frequencies 10, 12, 14, 16, 14, 12, 10 | **B1** | |
| Use $\frac{(O-E)^2}{E}$ | **M1** | Allow from at least one of $0.083(...)$ and 1.6 correct |
| $0.083(3...)$, 1.6 and total 3.3362 or 3.3363 | **A1 [3]** | Allow 3.34, 3.336 or better. If total omitted, or "0", in **(b)**, can be recovered from **(c)** ("0" probably comes from misunderstanding "Total") |

## Part (c):

| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0$: data consistent with proposed model, $H_1$: not so | **B1** | Allow "data follows …" but *not* "data is in ratio …" nor "evidence that …" |
| $3.336(2) < 10.64$ | **B1ft** | Compare *their* 3.336 with correct CV (3.336 may be from calculator) |
| Do not reject $H_0$ | **M1ft** | Correct first conclusion, FT on their TS and on CV 9.236 or 12.59 |
| Insufficient evidence that proposed model does not fit data | **A1ft [4]** | Contextualised, not over-assertive. Needs 'double negative', *not* "significant evidence that data is consistent", etc. A0 if hypotheses wrong way round |

## Part (d):

| Answer | Marks | Guidance |
|--------|-------|----------|
| Inferences from a hypothesis test are not "definite" | **B1** | "Definite" stated to be too strong, oe (not *just* "Rosa is wrong") |
| All we have is evidence / Sample size is small / other experiments might produce different results | **B1 [2]** | Relevant valid comment, e.g. "data might be misleading", "second model likely to be correct", "either could be correct", and no wrong extras. "Neither/both good" etc, from wrong conclusion to **(a)** or **(c)**: max B1B0 |

---
6 A machine is used to toss a coin repeatedly. Rosa believes that the outcome of each toss made by the machine is not independent of the previous toss. Rosa gets the machine to toss a coin 6 times and record the number of heads, $X$, obtained. After recording the number of heads obtained, Rosa resets the machine and gets it to toss the coin 6 more times. Rosa again records the number of heads obtained and she repeats this procedure until she has recorded 88 independent values of $X$.
\begin{enumerate}[label=(\alph*)]
\item The sample mean and sample variance of $X$ are 3.35 and 3.392 respectively.

Explain what these results suggest about the validity of a binomial model $\mathrm { B } ( 6 , p )$ for the data.

Rosa uses a computer spreadsheet to work out the probabilities for a more sophisticated model in which the outcome of each toss is dependent on the outcome of the previous toss. Her model suggests that the probabilities $\mathrm { P } ( X = x )$, for $x = 0,1,2,3,4,5,6$, are approximately in the ratio $5 : 6 : 7 : 8 : 7 : 6 : 5$. She carries out a $\chi ^ { 2 }$ test to investigate whether this model is a good fit for the data.

The following table shows the full results of the experiments, together with some of the calculations needed for the test.

\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|l|l|}
\hline
$x$ & 0 & 1 & 2 & 3 & 4 & 5 & 6 & Total \\
\hline
Observed frequency & 7 & 10 & 16 & 15 & 15 & 11 & 14 & 88 \\
\hline
Expected frequency &  &  &  &  &  &  &  &  \\
\hline
Contribution to $\chi ^ { 2 }$ statistic & 0.9 & 0.3333 & 0.2857 & 0.0625 & 0.0714 &  &  &  \\
\hline
\end{tabular}
\end{center}
\item In the Printed Answer Booklet, complete the table.
\item Carry out the test, using a 10\% significance level.
\item Rosa says that the results definitely show that one of the two proposed models is correct.

Comment on this statement.
\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics AS 2023 Q6 [12]}}