| Exam Board | OCR |
|---|---|
| Module | Further Statistics AS (Further Statistics AS) |
| Year | 2022 |
| Session | June |
| Marks | 10 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Geometric Distribution |
| Type | Non-geometric distribution identification |
| Difficulty | Standard +0.8 This question requires understanding three different probability distributions (binomial with/without replacement, geometric), calculating a geometric probability, and critically analyzing which experimental results match which distribution based on mean, variance, and shape. Part (b) demands statistical reasoning without formal tests, requiring students to justify answers using multiple properties—this goes beyond routine application to require genuine understanding of distributional characteristics. |
| Spec | 5.02f Geometric distribution: conditions5.02g Geometric probabilities: P(X=r) = p(1-p)^(r-1) |
| Number recorded | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | \(\geqslant 9\) | Observed Mean | Observed Variance | |
| \multirow{3}{*}{Observed Frequencies} | Student 1 | 0 | 0 | 1 | 3 | 7 | 8 | 6 | 4 | 1 | 0 | 5.03 | 1.97 |
| Student 2 | 0 | 8 | 5 | 4 | 2 | 3 | 3 | 2 | 1 | 2 | 4.03 | 11.57 | |
| Student 3 | 0 | 1 | 2 | 5 | 4 | 6 | 5 | 3 | 4 | 0 | 4.97 | 3.70 |
| Answer | Marks | Guidance |
|---|---|---|
| \(\text{Geo}(0.25)\) | M1 | Stated or implied. NB: \(12/36 = \frac{1}{3}\) may be MR |
| \(P(>8) = 0.75^8\) | M1 | Or \(1 - 0.25(1 + 0.75 + \ldots + 0.75^7)\). \(0.75^9 = 0.075\) or equivalent addition: M1M1A0 |
| \(= 0.1(00113)\) or \(\frac{6561}{65536}\) | A1 [3] | Exact or awrt \(0.100\), allow "\(0.1\)" |
| Answer | Marks | Guidance |
|---|---|---|
| DR — *Either*: Geometric stated or implied | M1 | Identify geometric for \(Z\). SR if M0: B1 for one clear correct comparison, B1 a second one; all correct B1; conclusion "agree/good match" B1 |
| \(E(Z) = 4\) which is close to \(4.03\) | A1 | One correct calculation interpreted |
| \(\text{Var}(Z) = 12\) which is close to \(11.57\) | A1 | Another correct calculation interpreted |
| So second row is a good match for expected results for student \(Z\) | A1 [4] | Needs two pieces of correct evidence, and a conclusion. Accept "The statement is true" |
| Or: Geometric stated or implied | M1 | |
| Frequencies are (generally) decreasing | A1 | |
| \(4.03 \approx 4\) | M1 | |
| So second row is a good match for expected results for student \(Z\) | A1 | Needs adequate correct evidence – conclusion needed! Or equivalent with variance |
| Answer | Marks | Guidance |
|---|---|---|
| DR — \(X \sim B(20, 0.25)\) | M1 | Stated or implied. \((B(30, 0.25)\): M1A0A0). SR if M0: B1 for one clear correct comparison, B1 a second one; all correct and conclusion "very likely but not definite" B1 [e.g. sampling without replacement reduces probabilities of higher results] |
| \(E(X) = 5\) which is close to \(4.97\) | A1 | Two correct comparisons |
| Answer | Marks | Guidance |
|---|---|---|
| Therefore quite strong indication that the third row is \(X\), but not definite | A1 [3] | Consistent conclusion, indicate uncertainty, must deny "definitely". Needs both previous marks. Needn't actually say "very likely" as denying "definitely" is enough! |
| Answer | Marks | Guidance |
|---|---|---|
| Label | Answer/Working | Mark |
| α | Wrong but validly obtained TS leading consistently to "Reject \(H_0\). There is significant evidence that runners who do better in one race tend to do better in the other" | M1A1 |
| β | Do not reject \(H_0\). Runners who do better in the first race do not tend to do better in the second | M1A0 |
| γ | Do not reject \(H_0\). There is insufficient evidence of association | M1A0 |
| δ | Insufficient evidence to reject \(H_0\). Runners who do better in the first race do not tend to do better in the second | M1A1 |
| ε | Do not reject \(H_0\). There is evidence that runners who do well in the first race do not do any better than others in the second race | M1A0 |
| ζ | (from correct TS and CV) Reject \(H_0\). (+ anything) | M0A0 |
| η | If \(H_0\), \(H_1\) the wrong way round, but calculations right: Do not reject \(H_0\). (+ anything) | M1A0 |
| Answer | Marks | Guidance |
|---|---|---|
| Label | Answer/Working | Mark |
| α | (from miscalculated comparison) Reject \(H_0\). There is evidence of an association between direction of journey and timing | M1A0 |
| β | Significant evidence to reject \(H_0\). There is association between direction of journey and delays | M1A1 |
| γ | Reject \(H_0\). There is association between direction of journey and delays | M1A0 |
| δ | Reject \(H_0\). There is association between them | M1A0 |
| ε | (from correct comparison) Do not reject \(H_0\). (+ anything) | M0A0 |
| ζ | If \(H_0\), \(H_1\) the wrong way round but calculations right: Reject \(H_0\). (+ anything) | M1A0 |
| η | Reject \(H_0\). There is significant evidence of association between direction of journey and delays | M1A1 |
| Answer | Marks | Guidance |
|---|---|---|
| Label | Answer/Working | Mark |
| α | Not valid as it is based on a small sample | B0 |
| β | 14 is not equal to 10 so it is unlikely to be valid | B0 |
| γ | 14 is not close to 10 so it may not be valid | B1 |
| δ | Not too far from part (iii), so valid enough but not completely reliable | B1 |
| ε | It is not very valid as the values should be closer | B1 |
| ζ | Probably valid as 14 is not far from 10 | B1 |
| η | Less valid as 14 is not close to 10 | B1 |
| Answer | Marks | Guidance |
|---|---|---|
| Label | Answer/Working | Mark |
| α | Second row is a good match. Average number of cards chosen before red is 4, as only \(\frac{1}{4}\) of the pack is red and student continuously picks up cards; 10% chance of picking up 8 cards before a red; 25% chance of all reds being close together | M1A0, A1A0 |
| β | I agree. The expected number for \(\text{Geo}(\frac{1}{4})\) is 4 which is very similar to the mean (4.03). The variance is much higher which is accurate for geometric distributions as it is not averaged over a certain number of trials | M1A1, A0A0 |
| γ | \(4 \approx 4.03\) and \(12 \approx 11.57\) so the student is correct | M1A1, A1A1 |
| δ | In geometric, prob of getting red is same on each attempt and since replacement is occurring the number of reds is likely to be less than number of reds for student Y since there is no replacement | M0, SR B1 |
| Answer | Marks | Guidance |
|---|---|---|
| Label | Answer/Working | Mark |
| α | Some differences, but very similar, and more similar than other rows, so highly likely that row 3 is student X | M0 |
| β | I agree. \(E(X) = 5\) which is close to 4.97. \(\text{Var}(X) = 3.75\) which is close to 5 | M1A0 |
| γ | Very likely that student X's results are the third row since the number of reds in the pack remains constant; \(\frac{12}{48}\) chance that the student would pick a red. Therefore 4.97 is close to 5 (which is \(\frac{1}{4}\) of 20) | M1A0 |
| δ | I agree. \(E(X) = 5\) which is close to 4.97; \(\text{Var}(X) = 3.75\) which is close to 3.7 | M1A1A0 |
| ε | It cannot be said that row 3 is definitely Student X's results, since the observed values of mean and variance may be unlikely results from Y or Z's distributions. It's however likely that row 3 is student X since the observed mean and variance are very similar to the expected mean and variance (5 and 3.75) of X's distribution | M1A1A1 |
| ζ | I agree that it could be student X's results as the expected number of reds is 5 which is similar to the observed mean of 4.97. However, I disagree that it is definitely student X's results as student 1 has an observed mean of 5.03 which is just as close | M1A1A1 |
| η | There are more 8s recorded for student 3 than for student 1, as expected as the cards are replaced, but it cannot be said for certain that X is student 3. Additionally student 3 has a lower mean than student 1, but this would not be expected for student X | B1B0B0 |
# Question 7:
## Part (a):
$\text{Geo}(0.25)$ | **M1** | Stated or implied. NB: $12/36 = \frac{1}{3}$ may be MR
$P(>8) = 0.75^8$ | **M1** | Or $1 - 0.25(1 + 0.75 + \ldots + 0.75^7)$. $0.75^9 = 0.075$ or equivalent addition: M1M1A0
$= 0.1(00113)$ or $\frac{6561}{65536}$ | **A1 [3]** | Exact or awrt $0.100$, allow "$0.1$"
## Part (b)(i):
**DR** — *Either*: Geometric stated or implied | **M1** | Identify geometric for $Z$. SR if M0: B1 for one clear correct comparison, B1 a second one; all correct B1; conclusion "agree/good match" B1
$E(Z) = 4$ which is close to $4.03$ | **A1** | One correct calculation interpreted
$\text{Var}(Z) = 12$ which is close to $11.57$ | **A1** | Another correct calculation interpreted
So second row is a good match for expected results for student $Z$ | **A1 [4]** | Needs two pieces of correct evidence, and a conclusion. Accept "The statement is true"
**Or:** Geometric stated or implied | **M1** |
Frequencies are (generally) decreasing | **A1** |
$4.03 \approx 4$ | **M1** |
So second row is a good match for expected results for student $Z$ | **A1** | Needs adequate correct evidence – conclusion needed! Or equivalent with variance
## Part (b)(ii):
**DR** — $X \sim B(20, 0.25)$ | **M1** | Stated or implied. $(B(30, 0.25)$: M1A0A0). SR if M0: B1 for one clear correct comparison, B1 a second one; all correct and conclusion "very likely but not definite" B1 [e.g. sampling without replacement reduces probabilities of higher results]
$E(X) = 5$ which is close to $4.97$ | **A1** | Two correct comparisons
$\text{Var}(X) = 3.75$ which is close to $3.7$
Therefore quite strong indication that the third row is $X$, but not definite | **A1 [3]** | Consistent conclusion, indicate uncertainty, must deny "definitely". Needs both previous marks. Needn't actually say "very likely" as denying "definitely" is enough!
# Mark Scheme Extracts - Y532/01 June 2022
---
## Q3 Conclusions
*(In general, allow "Accept $H_0$" as a synonym for "Do not reject $H_0$")*
| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Wrong but validly obtained TS leading consistently to "Reject $H_0$. There is significant evidence that runners who do better in one race tend to do better in the other" | M1A1 | standard FT |
| β | Do not reject $H_0$. Runners who do better in the first race do not tend to do better in the second | M1A0 | too definite |
| γ | Do not reject $H_0$. There is insufficient evidence of association | M1A0 | no context |
| δ | Insufficient evidence to reject $H_0$. Runners who do better in the first race do not tend to do better in the second | M1A1 | "Evidence" used, albeit in wrong sentence, but BOD |
| ε | Do not reject $H_0$. There is evidence that runners who do well in the first race do not do any better than others in the second race | M1A0 | Non-rejection doesn't give positive evidence that $H_0$ is correct |
| ζ | (from correct TS and CV) Reject $H_0$. (+ anything) | M0A0 | inconsistent |
| η | If $H_0$, $H_1$ the wrong way round, but calculations right: Do not reject $H_0$. (+ anything) | M1A0 | |
---
## Q4 Conclusions
*(Similar general principles apply here as in Q2)*
| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | (from miscalculated comparison) Reject $H_0$. There is evidence of an association between direction of journey and timing | M1A0 | |
| β | Significant evidence to reject $H_0$. There is association between direction of journey and delays | M1A1 | as in Q2 ex δ |
| γ | Reject $H_0$. There is association between direction of journey and delays | M1A0 | too definite; as in Q2 ex β |
| δ | Reject $H_0$. There is association between them | M1A0 | no context |
| ε | (from correct comparison) Do not reject $H_0$. (+ anything) | M0A0 | |
| ζ | If $H_0$, $H_1$ the wrong way round but calculations right: Reject $H_0$. (+ anything) | M1A0 | |
| η | Reject $H_0$. There is significant evidence of association between direction of journey and delays | M1A1 | best answer |
---
## Q5(c)
| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Not valid as it is based on a small sample | B0 | |
| β | 14 is not equal to 10 so it is unlikely to be valid | B0 | don't expect them to be exactly equal |
| γ | 14 is not close to 10 so it may not be valid | B1 | dubious about this as "may not be" is always true, but BOD |
| δ | Not too far from part (iii), so valid enough but not completely reliable | B1 | allow this judgement |
| ε | It is not very valid as the values should be closer | B1 | BOD. Don't want to penalise the idea of something being "partly valid" |
| ζ | Probably valid as 14 is not far from 10 | B1 | |
| η | Less valid as 14 is not close to 10 | B1 | |
---
## Q7(b)(i)
| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Second row is a good match. Average number of cards chosen before red is 4, as only $\frac{1}{4}$ of the pack is red and student continuously picks up cards; 10% chance of picking up 8 cards before a red; 25% chance of all reds being close together | M1A0, A1A0 | enough to imply $\text{Geo}(\frac{1}{4})$, even though wrong, but no more |
| β | I agree. The expected number for $\text{Geo}(\frac{1}{4})$ is 4 which is very similar to the mean (4.03). The variance is much higher which is accurate for geometric distributions as it is not averaged over a certain number of trials | M1A1, A0A0 | I suspect candidate has forgotten variance formula! |
| γ | $4 \approx 4.03$ and $12 \approx 11.57$ so the student is correct | M1A1, A1A1 | "it is a good match". Allow this |
| δ | In geometric, prob of getting red is same on each attempt and since replacement is occurring the number of reds is likely to be less than number of reds for student Y since there is no replacement | M0, SR B1 | "geometric" confused, so use SR |
---
## Q7(b)(ii)
| Label | Answer/Working | Mark | Guidance |
|-------|---------------|------|----------|
| α | Some differences, but very similar, and more similar than other rows, so highly likely that row 3 is student X | M0 | |
| β | I agree. $E(X) = 5$ which is close to 4.97. $\text{Var}(X) = 3.75$ which is close to 5 | M1A0 | 5 and 3.75 imply M1 but wrong number (5) then used |
| γ | Very likely that student X's results are the third row since the number of reds in the pack remains constant; $\frac{12}{48}$ chance that the student would pick a red. Therefore 4.97 is close to 5 (which is $\frac{1}{4}$ of 20) | M1A0 | |
| δ | I agree. $E(X) = 5$ which is close to 4.97; $\text{Var}(X) = 3.75$ which is close to 3.7 | M1A1A0 | correct apart from no rejection of "definite" |
| ε | It cannot be said that row 3 is definitely Student X's results, since the observed values of mean and variance may be unlikely results from Y or Z's distributions. It's however likely that row 3 is student X since the observed mean and variance are very similar to the expected mean and variance (5 and 3.75) of X's distribution | M1A1A1 | |
| ζ | I agree that it could be student X's results as the expected number of reds is 5 which is similar to the observed mean of 4.97. However, I disagree that it is definitely student X's results as student 1 has an observed mean of 5.03 which is just as close | M1A1A1 | I think this is a good answer and I am happy to treat the two means as two different facts |
| η | There are more 8s recorded for student 3 than for student 1, as expected as the cards are replaced, but it cannot be said for certain that X is student 3. Additionally student 3 has a lower mean than student 1, but this would not be expected for student X | B1B0B0 | SC: no dist implied |
7 Each of three students, $\mathrm { X } , \mathrm { Y }$ and Z , was given an identical pack of 48 cards, of which 12 cards were red and 36 were blue. They were each told to carry out a different experiment, as follows:
Student X: Choose a card from the pack, at random, 20 times altogether, with replacement. Record how many times you obtain a red card.
Student Y: Choose a card from the pack, at random, 20 times altogether, without replacement. Record how many times you obtain a red card.
Student Z: Choose single cards from the pack at random, with replacement, until you obtain the first red card. Record how many cards you have chosen, including the first red card.\\
(a) Find the probability that student Z has to choose more than 8 cards in order to obtain the first red card.
Each student carries out their experiment 30 times.
The frequencies of the results recorded by each student are shown in the following table, but not necessarily with the rows in the order $\mathrm { X } , \mathrm { Y } , \mathrm { Z }$ :
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|}
\hline
& Number recorded & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & $\geqslant 9$ & Observed Mean & Observed Variance \\
\hline
\multirow{3}{*}{Observed Frequencies} & Student 1 & 0 & 0 & 1 & 3 & 7 & 8 & 6 & 4 & 1 & 0 & 5.03 & 1.97 \\
\hline
& Student 2 & 0 & 8 & 5 & 4 & 2 & 3 & 3 & 2 & 1 & 2 & 4.03 & 11.57 \\
\hline
& Student 3 & 0 & 1 & 2 & 5 & 4 & 6 & 5 & 3 & 4 & 0 & 4.97 & 3.70 \\
\hline
\end{tabular}
\end{center}
\section*{(b) In this question you must show detailed reasoning.}
Two other students make the following statements about the results. For each of the statements, explain whether you agree with the statement. Do not carry out any hypothesis tests, but in each case you should give two justifications for your answer.
\begin{enumerate}[label=(\roman*)]
\item "The second row is a good match with the expected results for student Z ."
\item "The third row is definitely student X 's results."
\end{enumerate}
\hfill \mbox{\textit{OCR Further Statistics AS 2022 Q7 [10]}}