OCR Further Statistics AS 2022 June — Question 7 10 marks

Exam BoardOCR
ModuleFurther Statistics AS (Further Statistics AS)
Year2022
SessionJune
Marks10
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicGeometric Distribution
TypeNon-geometric distribution identification
DifficultyStandard +0.8 This question requires understanding three different probability distributions (binomial with/without replacement, geometric), calculating a geometric probability, and critically analyzing which experimental results match which distribution based on mean, variance, and shape. Part (b) demands statistical reasoning without formal tests, requiring students to justify answers using multiple properties—this goes beyond routine application to require genuine understanding of distributional characteristics.
Spec5.02f Geometric distribution: conditions5.02g Geometric probabilities: P(X=r) = p(1-p)^(r-1)

7 Each of three students, \(\mathrm { X } , \mathrm { Y }\) and Z , was given an identical pack of 48 cards, of which 12 cards were red and 36 were blue. They were each told to carry out a different experiment, as follows: Student X: Choose a card from the pack, at random, 20 times altogether, with replacement. Record how many times you obtain a red card. Student Y: Choose a card from the pack, at random, 20 times altogether, without replacement. Record how many times you obtain a red card. Student Z: Choose single cards from the pack at random, with replacement, until you obtain the first red card. Record how many cards you have chosen, including the first red card.
  1. Find the probability that student Z has to choose more than 8 cards in order to obtain the first red card. Each student carries out their experiment 30 times. The frequencies of the results recorded by each student are shown in the following table, but not necessarily with the rows in the order \(\mathrm { X } , \mathrm { Y } , \mathrm { Z }\) :
    Number recorded012345678\(\geqslant 9\)Observed MeanObserved Variance
    \multirow{3}{*}{Observed Frequencies}Student 100137864105.031.97
    Student 208542332124.0311.57
    Student 301254653404.973.70
    \section*{(b) In this question you must show detailed reasoning.} Two other students make the following statements about the results. For each of the statements, explain whether you agree with the statement. Do not carry out any hypothesis tests, but in each case you should give two justifications for your answer.
    1. "The second row is a good match with the expected results for student Z ."
    2. "The third row is definitely student X 's results."

Question 7:
Part (a):
AnswerMarks Guidance
\(\text{Geo}(0.25)\)M1 Stated or implied. NB: \(12/36 = \frac{1}{3}\) may be MR
\(P(>8) = 0.75^8\)M1 Or \(1 - 0.25(1 + 0.75 + \ldots + 0.75^7)\). \(0.75^9 = 0.075\) or equivalent addition: M1M1A0
\(= 0.1(00113)\) or \(\frac{6561}{65536}\)A1 [3] Exact or awrt \(0.100\), allow "\(0.1\)"
Part (b)(i):
AnswerMarks Guidance
DR — *Either*: Geometric stated or impliedM1 Identify geometric for \(Z\). SR if M0: B1 for one clear correct comparison, B1 a second one; all correct B1; conclusion "agree/good match" B1
\(E(Z) = 4\) which is close to \(4.03\)A1 One correct calculation interpreted
\(\text{Var}(Z) = 12\) which is close to \(11.57\)A1 Another correct calculation interpreted
So second row is a good match for expected results for student \(Z\)A1 [4] Needs two pieces of correct evidence, and a conclusion. Accept "The statement is true"
Or: Geometric stated or impliedM1
Frequencies are (generally) decreasingA1
\(4.03 \approx 4\)M1
So second row is a good match for expected results for student \(Z\)A1 Needs adequate correct evidence – conclusion needed! Or equivalent with variance
Part (b)(ii):
AnswerMarks Guidance
DR — \(X \sim B(20, 0.25)\)M1 Stated or implied. \((B(30, 0.25)\): M1A0A0). SR if M0: B1 for one clear correct comparison, B1 a second one; all correct and conclusion "very likely but not definite" B1 [e.g. sampling without replacement reduces probabilities of higher results]
\(E(X) = 5\) which is close to \(4.97\)A1 Two correct comparisons
\(\text{Var}(X) = 3.75\) which is close to \(3.7\)
AnswerMarks Guidance
Therefore quite strong indication that the third row is \(X\), but not definiteA1 [3] Consistent conclusion, indicate uncertainty, must deny "definitely". Needs both previous marks. Needn't actually say "very likely" as denying "definitely" is enough!
Mark Scheme Extracts - Y532/01 June 2022
Q3 Conclusions
*(In general, allow "Accept \(H_0\)" as a synonym for "Do not reject \(H_0\)")*
AnswerMarks Guidance
LabelAnswer/Working Mark
αWrong but validly obtained TS leading consistently to "Reject \(H_0\). There is significant evidence that runners who do better in one race tend to do better in the other" M1A1
βDo not reject \(H_0\). Runners who do better in the first race do not tend to do better in the second M1A0
γDo not reject \(H_0\). There is insufficient evidence of association M1A0
δInsufficient evidence to reject \(H_0\). Runners who do better in the first race do not tend to do better in the second M1A1
εDo not reject \(H_0\). There is evidence that runners who do well in the first race do not do any better than others in the second race M1A0
ζ(from correct TS and CV) Reject \(H_0\). (+ anything) M0A0
ηIf \(H_0\), \(H_1\) the wrong way round, but calculations right: Do not reject \(H_0\). (+ anything) M1A0
Q4 Conclusions
*(Similar general principles apply here as in Q2)*
AnswerMarks Guidance
LabelAnswer/Working Mark
α(from miscalculated comparison) Reject \(H_0\). There is evidence of an association between direction of journey and timing M1A0
βSignificant evidence to reject \(H_0\). There is association between direction of journey and delays M1A1
γReject \(H_0\). There is association between direction of journey and delays M1A0
δReject \(H_0\). There is association between them M1A0
ε(from correct comparison) Do not reject \(H_0\). (+ anything) M0A0
ζIf \(H_0\), \(H_1\) the wrong way round but calculations right: Reject \(H_0\). (+ anything) M1A0
ηReject \(H_0\). There is significant evidence of association between direction of journey and delays M1A1
Q5(c)
AnswerMarks Guidance
LabelAnswer/Working Mark
αNot valid as it is based on a small sample B0
β14 is not equal to 10 so it is unlikely to be valid B0
γ14 is not close to 10 so it may not be valid B1
δNot too far from part (iii), so valid enough but not completely reliable B1
εIt is not very valid as the values should be closer B1
ζProbably valid as 14 is not far from 10 B1
ηLess valid as 14 is not close to 10 B1
Q7(b)(i)
AnswerMarks Guidance
LabelAnswer/Working Mark
αSecond row is a good match. Average number of cards chosen before red is 4, as only \(\frac{1}{4}\) of the pack is red and student continuously picks up cards; 10% chance of picking up 8 cards before a red; 25% chance of all reds being close together M1A0, A1A0
βI agree. The expected number for \(\text{Geo}(\frac{1}{4})\) is 4 which is very similar to the mean (4.03). The variance is much higher which is accurate for geometric distributions as it is not averaged over a certain number of trials M1A1, A0A0
γ\(4 \approx 4.03\) and \(12 \approx 11.57\) so the student is correct M1A1, A1A1
δIn geometric, prob of getting red is same on each attempt and since replacement is occurring the number of reds is likely to be less than number of reds for student Y since there is no replacement M0, SR B1
Q7(b)(ii)
AnswerMarks Guidance
LabelAnswer/Working Mark
αSome differences, but very similar, and more similar than other rows, so highly likely that row 3 is student X M0
βI agree. \(E(X) = 5\) which is close to 4.97. \(\text{Var}(X) = 3.75\) which is close to 5 M1A0
γVery likely that student X's results are the third row since the number of reds in the pack remains constant; \(\frac{12}{48}\) chance that the student would pick a red. Therefore 4.97 is close to 5 (which is \(\frac{1}{4}\) of 20) M1A0
δI agree. \(E(X) = 5\) which is close to 4.97; \(\text{Var}(X) = 3.75\) which is close to 3.7 M1A1A0
εIt cannot be said that row 3 is definitely Student X's results, since the observed values of mean and variance may be unlikely results from Y or Z's distributions. It's however likely that row 3 is student X since the observed mean and variance are very similar to the expected mean and variance (5 and 3.75) of X's distribution M1A1A1
ζI agree that it could be student X's results as the expected number of reds is 5 which is similar to the observed mean of 4.97. However, I disagree that it is definitely student X's results as student 1 has an observed mean of 5.03 which is just as close M1A1A1
ηThere are more 8s recorded for student 3 than for student 1, as expected as the cards are replaced, but it cannot be said for certain that X is student 3. Additionally student 3 has a lower mean than student 1, but this would not be expected for student X B1B0B0
# Question 7:

## Part (a):
$\text{Geo}(0.25)$ | **M1** | Stated or implied. NB: $12/36 = \frac{1}{3}$ may be MR

$P(>8) = 0.75^8$ | **M1** | Or $1 - 0.25(1 + 0.75 + \ldots + 0.75^7)$. $0.75^9 = 0.075$ or equivalent addition: M1M1A0

$= 0.1(00113)$ or $\frac{6561}{65536}$ | **A1 [3]** | Exact or awrt $0.100$, allow "$0.1$"

## Part (b)(i):
**DR** — *Either*: Geometric stated or implied | **M1** | Identify geometric for $Z$. SR if M0: B1 for one clear correct comparison, B1 a second one; all correct B1; conclusion "agree/good match" B1

$E(Z) = 4$ which is close to $4.03$ | **A1** | One correct calculation interpreted

$\text{Var}(Z) = 12$ which is close to $11.57$ | **A1** | Another correct calculation interpreted

So second row is a good match for expected results for student $Z$ | **A1 [4]** | Needs two pieces of correct evidence, and a conclusion. Accept "The statement is true"

**Or:** Geometric stated or implied | **M1** |

Frequencies are (generally) decreasing | **A1** |

$4.03 \approx 4$ | **M1** |

So second row is a good match for expected results for student $Z$ | **A1** | Needs adequate correct evidence – conclusion needed! Or equivalent with variance

## Part (b)(ii):
**DR** — $X \sim B(20, 0.25)$ | **M1** | Stated or implied. $(B(30, 0.25)$: M1A0A0). SR if M0: B1 for one clear correct comparison, B1 a second one; all correct and conclusion "very likely but not definite" B1 [e.g. sampling without replacement reduces probabilities of higher results]

$E(X) = 5$ which is close to $4.97$ | **A1** | Two correct comparisons

$\text{Var}(X) = 3.75$ which is close to $3.7$

Therefore quite strong indication that the third row is $X$, but not definite | **A1 [3]** | Consistent conclusion, indicate uncertainty, must deny "definitely". Needs both previous marks. Needn't actually say "very likely" as denying "definitely" is enough!

# Mark Scheme Extracts - Y532/01 June 2022

---

## Q3 Conclusions

*(In general, allow "Accept $H_0$" as a synonym for "Do not reject $H_0$")*

| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Wrong but validly obtained TS leading consistently to "Reject $H_0$. There is significant evidence that runners who do better in one race tend to do better in the other" | M1A1 | standard FT |
| β | Do not reject $H_0$. Runners who do better in the first race do not tend to do better in the second | M1A0 | too definite |
| γ | Do not reject $H_0$. There is insufficient evidence of association | M1A0 | no context |
| δ | Insufficient evidence to reject $H_0$. Runners who do better in the first race do not tend to do better in the second | M1A1 | "Evidence" used, albeit in wrong sentence, but BOD |
| ε | Do not reject $H_0$. There is evidence that runners who do well in the first race do not do any better than others in the second race | M1A0 | Non-rejection doesn't give positive evidence that $H_0$ is correct |
| ζ | (from correct TS and CV) Reject $H_0$. (+ anything) | M0A0 | inconsistent |
| η | If $H_0$, $H_1$ the wrong way round, but calculations right: Do not reject $H_0$. (+ anything) | M1A0 | |

---

## Q4 Conclusions

*(Similar general principles apply here as in Q2)*

| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | (from miscalculated comparison) Reject $H_0$. There is evidence of an association between direction of journey and timing | M1A0 | |
| β | Significant evidence to reject $H_0$. There is association between direction of journey and delays | M1A1 | as in Q2 ex δ |
| γ | Reject $H_0$. There is association between direction of journey and delays | M1A0 | too definite; as in Q2 ex β |
| δ | Reject $H_0$. There is association between them | M1A0 | no context |
| ε | (from correct comparison) Do not reject $H_0$. (+ anything) | M0A0 | |
| ζ | If $H_0$, $H_1$ the wrong way round but calculations right: Reject $H_0$. (+ anything) | M1A0 | |
| η | Reject $H_0$. There is significant evidence of association between direction of journey and delays | M1A1 | best answer |

---

## Q5(c)

| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Not valid as it is based on a small sample | B0 | |
| β | 14 is not equal to 10 so it is unlikely to be valid | B0 | don't expect them to be exactly equal |
| γ | 14 is not close to 10 so it may not be valid | B1 | dubious about this as "may not be" is always true, but BOD |
| δ | Not too far from part (iii), so valid enough but not completely reliable | B1 | allow this judgement |
| ε | It is not very valid as the values should be closer | B1 | BOD. Don't want to penalise the idea of something being "partly valid" |
| ζ | Probably valid as 14 is not far from 10 | B1 | |
| η | Less valid as 14 is not close to 10 | B1 | |

---

## Q7(b)(i)

| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Second row is a good match. Average number of cards chosen before red is 4, as only $\frac{1}{4}$ of the pack is red and student continuously picks up cards; 10% chance of picking up 8 cards before a red; 25% chance of all reds being close together | M1A0, A1A0 | enough to imply $\text{Geo}(\frac{1}{4})$, even though wrong, but no more |
| β | I agree. The expected number for $\text{Geo}(\frac{1}{4})$ is 4 which is very similar to the mean (4.03). The variance is much higher which is accurate for geometric distributions as it is not averaged over a certain number of trials | M1A1, A0A0 | I suspect candidate has forgotten variance formula! |
| γ | $4 \approx 4.03$ and $12 \approx 11.57$ so the student is correct | M1A1, A1A1 | "it is a good match". Allow this |
| δ | In geometric, prob of getting red is same on each attempt and since replacement is occurring the number of reds is likely to be less than number of reds for student Y since there is no replacement | M0, SR B1 | "geometric" confused, so use SR |

---

## Q7(b)(ii)

| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Some differences, but very similar, and more similar than other rows, so highly likely that row 3 is student X | M0 | |
| β | I agree. $E(X) = 5$ which is close to 4.97. $\text{Var}(X) = 3.75$ which is close to 5 | M1A0 | 5 and 3.75 imply M1 but wrong number (5) then used |
| γ | Very likely that student X's results are the third row since the number of reds in the pack remains constant; $\frac{12}{48}$ chance that the student would pick a red. Therefore 4.97 is close to 5 (which is $\frac{1}{4}$ of 20) | M1A0 | |
| δ | I agree. $E(X) = 5$ which is close to 4.97; $\text{Var}(X) = 3.75$ which is close to 3.7 | M1A1A0 | correct apart from no rejection of "definite" |
| ε | It cannot be said that row 3 is definitely Student X's results, since the observed values of mean and variance may be unlikely results from Y or Z's distributions. It's however likely that row 3 is student X since the observed mean and variance are very similar to the expected mean and variance (5 and 3.75) of X's distribution | M1A1A1 | |
| ζ | I agree that it could be student X's results as the expected number of reds is 5 which is similar to the observed mean of 4.97. However, I disagree that it is definitely student X's results as student 1 has an observed mean of 5.03 which is just as close | M1A1A1 | I think this is a good answer and I am happy to treat the two means as two different facts |
| η | There are more 8s recorded for student 3 than for student 1, as expected as the cards are replaced, but it cannot be said for certain that X is student 3. Additionally student 3 has a lower mean than student 1, but this would not be expected for student X | B1B0B0 | SC: no dist implied |
7 Each of three students, $\mathrm { X } , \mathrm { Y }$ and Z , was given an identical pack of 48 cards, of which 12 cards were red and 36 were blue. They were each told to carry out a different experiment, as follows:

Student X: Choose a card from the pack, at random, 20 times altogether, with replacement. Record how many times you obtain a red card.

Student Y: Choose a card from the pack, at random, 20 times altogether, without replacement. Record how many times you obtain a red card.

Student Z: Choose single cards from the pack at random, with replacement, until you obtain the first red card. Record how many cards you have chosen, including the first red card.\\
(a) Find the probability that student Z has to choose more than 8 cards in order to obtain the first red card.

Each student carries out their experiment 30 times.

The frequencies of the results recorded by each student are shown in the following table, but not necessarily with the rows in the order $\mathrm { X } , \mathrm { Y } , \mathrm { Z }$ :

\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|}
\hline
 & Number recorded & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & $\geqslant 9$ & Observed Mean & Observed Variance \\
\hline
\multirow{3}{*}{Observed Frequencies} & Student 1 & 0 & 0 & 1 & 3 & 7 & 8 & 6 & 4 & 1 & 0 & 5.03 & 1.97 \\
\hline
 & Student 2 & 0 & 8 & 5 & 4 & 2 & 3 & 3 & 2 & 1 & 2 & 4.03 & 11.57 \\
\hline
 & Student 3 & 0 & 1 & 2 & 5 & 4 & 6 & 5 & 3 & 4 & 0 & 4.97 & 3.70 \\
\hline
\end{tabular}
\end{center}

\section*{(b) In this question you must show detailed reasoning.}
Two other students make the following statements about the results. For each of the statements, explain whether you agree with the statement. Do not carry out any hypothesis tests, but in each case you should give two justifications for your answer.
\begin{enumerate}[label=(\roman*)]
\item "The second row is a good match with the expected results for student Z ."
\item "The third row is definitely student X 's results."
\end{enumerate}

\hfill \mbox{\textit{OCR Further Statistics AS 2022 Q7 [10]}}