Questions Further Statistics

OCR Further Statistics 2018 September Q9

9 The continuous random variable $C$ has the distribution $\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)$. The sum of a random sample of 16 observations of $C$ is 224.0 .

Find an unbiased estimate of $\mu$.
It is given that an unbiased estimate of $\sigma ^ { 2 }$ is 0.24. Find the value of $\Sigma c ^ { 2 }$.
$D$ is the sum of 10 independent observations of $C$.
Explain whether $D$ has a normal distribution. The continuous random variable $F$ is normally distributed with mean 15.0, and it is known that $\mathrm { P } ( F < 13.2 ) = 0.115$.
Use the unbiased estimates of $\mu$ and $\sigma ^ { 2 }$ to find $\mathrm { P } ( D + F > 157.0 )$. \section*{OCR} \section*{Oxford Cambridge and RSA}

OCR Further Statistics 2018 December Q1

1 The performance of a piece of music is being recorded. The piece consists of three sections, $A , B$ and $C$. The times, in seconds, taken to perform the three sections are normally distributed random variables with the following means and standard deviations.

Section	Mean	Standard deviation
$A$	264	13
$B$	173	9
$C$	264	13

Assume first that the times for the three sections are independent. Find the probability that the total length of the performance is greater than 720.0 seconds.
In fact sections $A$ and $C$ are musically identical, and the recording is made by using a single performance of section $A$ twice, together with a performance of section $B$. In this case find the probability that the total length of the performance is greater than 720.0 seconds.

OCR Further Statistics 2018 December Q2

2 In a fairground game a competitor scores $0,1,2$ or 3 with probabilities given in the following table, where $a$ and $b$ are constants.

Score	0	1	2	3
Probability	$a$	$b$	$b$	$b$

The competitor's expected score is 0.9 .

Show that $b = 0.15$.
Find the variance of the score.
The competitor has to pay $\pounds 2.50$ to take part, and wins a prize of $\pounds 2 X$, where $X$ is the score achieved. Find the expectation of the competitor's loss.

OCR Further Statistics 2018 December Q3

3

Alex places 20 black counters and 8 white counters into a bag. She removes 8 counters at random without replacement. Find the probability that the bag now contains exactly 5 white counters.
Bill arranges 8 blue counters and 4 green counters in a random order in a straight line. Find the probability that exactly three of the green counters are next to one another.

OCR Further Statistics 2018 December Q4

4 Leyla investigates the number of shoppers who visit a shop between 10.30 am and 11 am on Saturday mornings. She makes the following assumptions.

Shoppers visit the shop independently of one another.
The average rate at which shoppers visit the shop between these times is constant.
1. State an appropriate distribution with which Leyla could model the number of shoppers who visit the shop between these times.

Leyla uses this distribution, with mean 14, as her model.

Calculate the probability that, between 10.35 am and 10.50 am on a randomly chosen Saturday, at least 10 shoppers visit the shop. Leyla chooses 25 Saturdays at random.

Find the expected number of Saturdays, out of 25, on which there are no visitors to the shop between 10.35 am and 10.50 am .

In fact on 5 of these Saturdays there were no visitors to the shop between 10.35 am and 10.50 am . Use this fact to comment briefly on the validity of the model that Leyla has used.

OCR Further Statistics 2018 December Q5

5 The birth rate, $x$ per thousand members of the population, and the life expectancy at birth, $y$ years, in 14 randomly selected African countries are given in the table.

Country	$x$	$y$	Country	$x$	$y$
Benin	4.8	59.2	Mozambique	5.4	54.63
Cameroon	4.7	54.87	Nigeria	5.7	52.29
Congo	4.9	61.42	Senegal	5.1	65.81
Gambia	5.7	59.83	Somalia	6.5	54.88
Liberia	4.7	60.25	Sudan	4.4	63.08
Malawi	5.1	60.97	Uganda	5.8	57.25
Mauretania	4.6	62.77	Zambia	5.4	58.75

$n = 14 , \sum x = 72.8 , \sum y = 826 , \sum x ^ { 2 } = 392.96 , \sum y ^ { 2 } = 48924.54 , \sum x y = 4279.16$

Calculate Pearson's product-moment correlation coefficient $r$ for the data.
State what would be the effect on the value of $r$ if the birth rate were given per hundred and not per thousand.
Explain what the sign of $r$ tells you about the relationship between life expectancy and birth rate for these countries.
Test at the $5 \%$ significance level whether there is correlation between birth rate and life expectancy at birth in African countries.
A researcher wants to estimate the life expectancy at birth in Zimbabwe, where the birth rate is 3.9 per thousand. Explain whether a reliable estimate could be obtained using the regression line of $y$ on $x$ for the given data.

OCR Further Statistics 2018 December Q6

6 The reaction times, in milliseconds, of all adult males in a standard experiment have a symmetrical distribution with mean and median both equal to 700 and standard deviation 125. The reaction times of a random sample of 6 international athletes are measured and the results are as follows:
$\begin{array} { l l l l l l } 702 & 631 & 540 & 714 & 575 & 480 \end{array}$ It is required to test whether international athletes have a mean reaction time which is less than 700.

Assume first that the reaction times of international athletes have the distribution $\mathrm { N } \left( \mu , 125 ^ { 2 } \right)$. Test at the $5 \%$ significance level whether $\mu < 700$.
Now assume only that the distribution of the data is symmetrical, but not necessarily normal.
1. State with a reason why a Wilcoxon test is preferable to a sign test.
2. Use an appropriate Wilcoxon test at the $5 \%$ significance level to test whether the median reaction time of international athletes is less than 700 .
Explain why the significance tests in part (a) and part (b)(ii) could produce different results.

OCR Further Statistics 2018 December Q7

3 marks

7 Sasha tends to forget his passwords. He investigates whether the number of attempts he needs to log on to a system with a password can be modelled by a geometric distribution. On 60 occasions he records the number of attempts he needs to log on, and the results are shown in the table.

Number of attempts	1	2	3	4 or more
Frequency	20	19	13	3

Test at the $1 \%$ significance level whether the results are consistent with the distribution Geo(0.4).
[0pt]
Suggest which two probabilities should be changed, and in what way, to produce an improved model. (Numerical values are not required.) You should give a reason for your suggestion. [3]

OCR Further Statistics 2018 December Q8

8 A continuous random variable $X$ has probability density function given by the following function, where $a$ is a constant.
$\mathrm { f } ( x ) = \left\{ \begin{array} { l l } \frac { 2 x } { a ^ { 2 } } & 0 \leqslant x \leqslant a ,
0 & \text { otherwise. } \end{array} \right\}$
The expected value of $X$ is 4 .

Show that $a = 6$. Five independent observations of $X$ are obtained, and the largest of them is denoted by $M$.
Find the cumulative distribution function of $M$. \section*{OCR} Oxford Cambridge and RSA

OCR Further Statistics 2017 Specimen Q2

2 The mass $J \mathrm {~kg}$ of a bag of randomly chosen Jersey potatoes is a normally distributed random variable with mean 1.00 and standard deviation 0.06. The mass Kkg of a bag of randomly chosen King Edward potatoes is an independent normally distributed random variable with mean 0.80 and standard deviation 0.04.

Find the probability that the total mass of 6 bags of Jersey potatoes and 8 bags of King Edward potatoes is greater than 12.70 kg .
Find the probability that the mass of one bag of King Edward potatoes is more than $75 \%$ of the mass of one bag of Jersey potatoes.

OCR Further Statistics 2017 Specimen Q5

5 The number of goals scored by the home team in a randomly chosen hockey match is denoted by $X$.

In order for $X$ to be modelled by a Poisson distribution it is assumed that goals scored are random events. State two other conditions needed for $X$ to be modelled by a Poisson distribution in this context. Assume now that $X$ can be modelled by the distribution $\operatorname { Po } ( 1.9 )$.
(a) Write down an expression for $\mathrm { P } ( X = r )$.
(b) Hence find $\mathrm { P } ( X = 3 )$.
Assume also that the number of goals scored by the away team in a randomly chosen hockey match has an independent Poisson distribution with mean $\lambda$ between 1.31 and 1.32 . Find an estimate for the probability that more than 3 goals are scored altogether in a randomly chosen match.

OCR Further Statistics 2017 Specimen Q6

6 A bag contains 3 green counters, 3 blue counters and $w$ white counters. Counters are selected at random, one at a time, with replacement, until a white counter is drawn. The total number of counters selected, including the white counter, is denoted by $X$.

In the case when $w = 2$,
(a) write down the distribution of $X$,
(b) find $P ( 3 < X \leq 7 )$.
In the case when $\mathrm { E } ( X ) = 2$, determine the value of $w$.
In the case when $w = 2$ and $X = 6$, find the probability that the first five counters drawn alternate in colour.

OCR Further Statistics 2017 Specimen Q7

7 Sweet pea plants grown using a standard plant food have a mean height of 1.6 m . A new plant food is used for a random sample of 49 randomly chosen plants and the heights, $x$ metres, of this sample can be summarised by the following. $$\begin{aligned} n & = 49
\Sigma x & = 74.48
\Sigma x ^ { 2 } & = 120.8896 \end{aligned}$$ Test, at the 5\% significance level, whether, when the new plant food is used, the mean height of sweet pea plants is less than 1.6 m .

OCR Further Statistics 2021 June Q1

28 marks

1 The performance of a piece of music is being recorded. The piece consists of three sections, $A , B$ and $C$. The times, in seconds, taken to perform the three sections are normally distributed random variables with the following means and standard deviations. \end{table}

Question

Answer

Mark

AO

Guidance

\multirow[t]{3}{*}{1}

\multirow[t]{3}{*}{(a)}

$A + B + C \sim \mathrm {~N} ( 701 , \ldots$

.. 419)

M1

1.1a

Normal, mean $\mu _ { A } + \mu _ { B } + \mu _ { C }$

\multirow{3}{*}{}

A1

1.1

Variance 419

$\mathrm { P } ( > 720 ) = 0.176649$

A1

1.1

Answer, 0.177 or better, www

\multirow[t]{2}{*}{1}

\multirow[t]{2}{*}{(b)}

$2 A + B \sim \mathrm {~N} ( 701,757 )$

M1

1.1a

Normal, same mean, $4 \sigma _ { A } { } ^ { 2 } + \sigma _ { B } { } ^ { 2 }$

\multirow{2}{*}{}

$\mathrm { P } ( > 720 ) = 0.244919$

A1 [2]

1.1

Answer, art 0.245

\multirow{2}{*}{2}

\multirow{2}{*}{(a)}

$\frac { { } ^ { 8 } C _ { 3 } \times { } ^ { 20 } C _ { 5 } } { { } ^ { 28 } C _ { 8 } }$

M1 A1

3.1b 1.1

(Product of two ${ } ^ { n } C _ { r }$ ) ÷ ${ } ^ { n } C _ { r }$ At least two ${ } ^ { n } C _ { r }$ correct

\multirow[t]{2}{*}{Or $\frac { 8 } { 28 } \times \frac { 7 } { 27 } \times \frac { 6 } { 26 } \times \frac { 20 } { 25 } \times \ldots \times \frac { 16 } { 21 } \times { } ^ { 8 } C _ { 3 } = 0.27934 \ldots$}

$\frac { 56 \times 15504 } { 3108105 } = 0.27934 \ldots$

A1 [3]

1.1

Any exact form or awrt 0.279

2

(b)

× B × B × B × B × B × B × B × B x

GGG in one $\mathrm { x } , \mathrm { G }$ in another: $9 \times 8$ $\div \frac { 12 ! } { 8 ! \times 4 ! }$ $= \frac { 72 } { 495 } = \frac { 8 } { 55 } \text { or } 0.145 \ldots$

M1 A1

3.1b 2.1

Or e.g. $\frac { 10 ! } { 8 ! } - 2 \times 9$

Divide by ${ } _ { 12 } \mathrm { C } _ { 4 }$ oe

Or, e.g. find ${ } _ { 12 } \mathrm { C } _ { 4 }$ - (\# (all separate) +\#(all together) $+ \# ( 2,1,1 ) \times 3 +$ \#(2,2))

М1

1.1

A1

1.1

[4]

Question

Answer

Mark

AO

Guidance

\multirow{7}{*}{3}

\multirow{7}{*}{(a)}

$\mathrm { H } _ { 0 } : \mu = 700$

B2

1.1

One error, e.g. no or wrong

Ignore failure to define $\mu$

$\mathrm { H } _ { 1 } : \mu < 700$ where $\mu$ is the mean reaction

1.1

letter, $\neq$, etc : B1

here

$\bar { x } = 607$

М1

3.3

Find sample mean

$z = - 1.822$ or $p = 0.0342$ or $\mathrm { CV } = 616.05 \ldots$

A1

3.4

Correct $z , p$ or CV

$z < - 1.645$ or $p < 0.05$ or $607 < \mathrm { CV }$

A1

1.1

Correct comparison

Reject $\mathrm { H } _ { 0 }$

M1ft

1.1

Correct first conclusion

Needs correct method, like-

Significant evidence that mean reaction times

A1ft

2.2b

Context, not too definite (e.g. not "international athletes' reaction times are shorter"

ft on their $z , p$ or CV

3

(b)

(i)

Uses more information (e.g. magnitudes of differences)

B1 [1]

2.4

\multirow{5}{*}{3}

\multirow{5}{*}{(b)}

\multirow{5}{*}{(ii)}

$\mathrm { H } _ { 0 } : m = 700 , \mathrm { H } _ { 1 } : m < 700$ where $m$ is the median reaction time for all international athletes

B1

2.5

Same as in (i) but different letter or "median" stated

$W _ { - } = 18$

$W _ { + } = 3$ so $T = 3$

For both, and $T$ correct

$n = 6 , \mathrm { CV } = 2$

A1

1.1

Correct CV

Do not reject $\mathrm { H } _ { 0 }$. Insufficient evidence that median reaction times of international athletes are shorter

A1ft [6]

2.2b

In context, not too definite

FT on their $T$

3

(c)

They use different assumptions

B1 [1]

2.3

Not "one is more accurate"

Question

Answer

Mark

AO

Guidance

4

(a)

\(\begin{aligned}

\int _ { 0 } ^ { a } x \frac { 2 x } { a ^ { 2 } } d x = 4

{ \left[ \frac { 2 x ^ { 3 } } { 3 a ^ { 2 } } \right] = 4 }

\frac { 2 } { 3 } a = 4 \Rightarrow a = 6 \end{aligned}\)

M1

B1

A1 [3]

3.1a

1.1

2.2a

4

(b)

$\mathrm { F } ( x ) = \frac { x ^ { 2 } } { 36 }$
Let the CDF of $M$ be $\mathrm { H } ( m )$. Then $\mathrm { H } ( m ) = \mathrm { P } ($ all observations less than $m )$ $= [ \mathrm { P } ( X \leqslant m ) ] ^ { 5 }$ $= \left[ \frac { m ^ { 2 } } { 36 } \right] ^ { 5 }$
\(\mathrm { H } ( m ) = \begin{cases} 0	m < 0 ,
\frac { m ^ { 10 } } { 60466176 }	0 \leq m \leq 6 ,
1	m > 6 . \end{cases}\)

M1 A1ft

M1

A1

[8]

1.1
1.1
2.1
3.1a
2.2a
2.1
2.1
1.2

Find $\mathrm { F } ( x ) ; = \frac { x ^ { 2 } } { a ^ { 2 } }$

Correct basis for CDF of $m$

Correct function, any letter Range $0 \leq m \leq 6$

Letter not $x$, and 0, 1 present

ft on their $a$

Allow

OCR Further Statistics 2021 June Q1

1 A set of bivariate data ( $X , Y$ ) is summarised as follows.
$n = 25 , \Sigma x = 9.975 , \Sigma y = 11.175 , \Sigma x ^ { 2 } = 5.725 , \Sigma y ^ { 2 } = 46.200 , \Sigma x y = 11.575$

Calculate the value of Pearson's product-moment correlation coefficient.
Calculate the equation of the regression line of $y$ on $x$. It is desired to know whether the regression line of $y$ on $x$ will provide a reliable estimate of $y$ when $x = 0.75$.
State one reason for believing that the estimate will be reliable.
State what further information is needed in order to determine whether the estimate is reliable.

OCR Further Statistics 2021 June Q2

2 The average numbers of cars, lorries and buses passing a point on a busy road in a period of 30 minutes are 400,80 and 17 respectively.

Assuming that the numbers of each type of vehicle passing the point in a period of 30 minutes have independent Poisson distributions, calculate the probability that the total number of vehicles passing the point in a randomly chosen period of 30 minutes is at least 520 .
Buses are known to run in approximate accordance with a fixed timetable. Explain why this casts doubt on the use of a Poisson distribution to model the number of buses passing the point in a fixed time interval. The greatest weight $W \mathrm {~N}$ that can be supported by a shelving bracket of traditional design is a normally distributed random variable with mean 500 and standard deviation 80 . A sample of 40 shelving brackets of a new design are tested and it is found that the mean of the greatest weights that the brackets in the sample can support is 473.0 N .
Test at the $1 \%$ significance level whether the mean of the greatest weight that a bracket of the new design can support is less than the mean of the greatest weight that a bracket of the traditional design can support.
State an assumption needed in carrying out the test in part (a).
Explain whether it is necessary to use the central limit theorem in carrying out the test.

OCR Further Statistics 2021 June Q4

4
The random variable $D$ has the distribution $\operatorname { Geo } ( p )$. It is given that $\operatorname { Var } ( D ) = \frac { 40 } { 9 }$.
Determine

$\operatorname { Var } ( 3 D + 5 )$,
$\mathrm { E } ( 3 D + 5 )$,
$\mathrm { P } ( D > \mathrm { E } ( D ) )$.

OCR Further Statistics 2021 June Q5

48 marks

5 A university course was taught by two different professors. Students could choose whether to attend the lectures given by Professor $Q$ or the lectures given by Professor $R$. At the end of the course all the students took the same examination. The examination marks of a random sample of 30 students taught by Professor $Q$ and a random sample of 24 students taught by Professor $R$ were ranked. The sum of the ranks of the students taught by Professor $Q$ was 726 . Test at the $5 \%$ significance level whether there is a difference in the ranks of the students taught by the two professors.
[0pt] [10] Total Marks for Question Set 3: 38 \section*{Mark scheme} \section*{Marking Instructions} a An element of professional judgement is required in the marking of any written paper. Remember that the mark scheme is designed to assist in marking incorrect solutions. Correct solutions leading to correct answers are awarded full marks but work must not always be judged on the answer alone, and answers that are given in the question, especially, must be validly obtained; key steps in the working must always be looked at and anything unfamiliar must be investigated thoroughly. Correct but unfamiliar or unexpected methods are often signalled by a correct result following an apparently incorrect method. Such work must be carefully assessed.
b The following types of marks are available. \section*{M} A suitable method has been selected and applied in a manner which shows that the method is essentially understood. Method marks are not usually lost for numerical errors, algebraic slips or errors in units. However, it is not usually sufficient for a candidate just to indicate an intention of using some method or just to quote a formula; the formula or idea must be applied to the specific problem in hand, e.g. by substituting the relevant quantities into the formula. In some cases the nature of the errors allowed for the award of an M mark may be specified.
A method mark may usually be implied by a correct answer unless the question includes the DR statement, the command words "Determine" or "Show that", or some other indication that the method must be given explicitly. \section*{A} Accuracy mark, awarded for a correct answer or intermediate step correctly obtained. Accuracy marks cannot be given unless the associated Method mark is earned (or implied). Therefore M0 A1 cannot ever be awarded. \section*{B} Mark for a correct result or statement independent of Method marks. \section*{E} A given result is to be established or a result has to be explained. This usually requires more working or explanation than the establishment of an unknown result. Unless otherwise indicated, marks once gained cannot subsequently be lost, e.g. wrong working following a correct form of answer is ignored. Sometimes this is reinforced in the mark scheme by the abbreviation isw. However, this would not apply to a case where a candidate passes through the correct answer as part of a wrong argument.
c When a part of a question has two or more 'method' steps, the M marks are in principle independent unless the scheme specifically says otherwise; and similarly where there are several B marks allocated. (The notation 'dep*' is used to indicate that a particular mark is dependent on an earlier, asterisked, mark in the scheme.) Of course, in practice it may happen that when a candidate has once gone wrong in a part of a question, the work from there on is worthless so that no more marks can sensibly be given. On the other hand, when two or more steps are successfully run together by the candidate, the earlier marks are implied and full credit must be given.
d The abbreviation FT implies that the A or B mark indicated is allowed for work correctly following on from previously incorrect results. Otherwise, A and B marks are given for correct work only - differences in notation are of course permitted. A (accuracy) marks are not given for answers obtained from incorrect working. When A or B marks are awarded for work at an intermediate stage of a solution, there may be various alternatives that are equally acceptable. In such cases, what is acceptable will be detailed in the mark scheme. Sometimes the answer to one part of a question is used in a later part of the same question. In this case, A marks will often be 'follow through'.
e We are usually quite flexible about the accuracy to which the final answer is expressed; over-specification is usually only penalised where the scheme explicitly says so.

When a value is given in the paper only accept an answer correct to at least as many significant figures as the given value.
When a value is not given in the paper accept any answer that agrees with the correct value to $\mathbf { 3 ~ s } . \mathbf { f }$. unless a different level of accuracy has been asked for in the question, or the mark scheme specifies an acceptable range.

Follow through should be used so that only one mark in any question is lost for each distinct accuracy error.
Candidates using a value of $9.80,9.81$ or 10 for $g$ should usually be penalised for any final accuracy marks which do not agree to the value found with 9.8 which is given in the rubric.
f Rules for replaced work and multiple attempts:

If one attempt is clearly indicated as the one to mark, or only one is left uncrossed out, then mark that attempt and ignore the others.
If more than one attempt is left not crossed out, then mark the last attempt unless it only repeats part of the first attempt or is substantially less complete.
if a candidate crosses out all of their attempts, the assessor should attempt to mark the crossed out answer(s) as above and award marks appropriately.

For a genuine misreading (of numbers or symbols) which is such that the object and the difficulty of the question remain unaltered, mark according to the scheme but following through from the candidate's data. A penalty is then applied; 1 mark is generally appropriate, though this may differ for some units. This is achieved by withholding one A or B mark in the question. Marks designated as cao may be awarded as long as there are no other errors.
If a candidate corrects the misread in a later part, do not continue to follow through. Note that a miscopy of the candidate's own working is not a misread but an accuracy error.
h If a calculator is used, some answers may be obtained with little or no working visible. Allow full marks for correct answers, provided that there is nothing in the wording of the question specifying that analytical methods are required such as the bold "In this question you must show detailed reasoning", or the command words "Show" or "Determine". Where an answer is wrong but there is some evidence of method, allow appropriate method marks. Wrong answers with no supporting method score zero. \begin{table}[h]

\captionsetup{labelformat=empty} \caption{Abbreviations}

Abbreviations used in the mark scheme	Meaning
dep*	Mark dependent on a previous mark, indicated by . The may be omitted if only one previous M mark
cao	Correct answer only
ое	Or equivalent
rot	Rounded or truncated
soi	Seen or implied
www	Without wrong working
AG	Answer given
awrt	Anything which rounds to
BC	By Calculator
DR	This question included the instruction: In this question you must show detailed reasoning.

\end{table}

Question

Answer

Mark

AO

Guidance

1

(a)

0.8392...

B1 [1]

1.1

Awrt 0.839

\(\begin{aligned} S _ { x x }

= 1.7449 \ldots , S _ { y y } = 41.2 \ldots ,

S _ { x y }

= 7.116 \ldots \end{aligned}\)

1

(b)

$y = - 1.180 + 4.0781 x$

B1

[2]

1.1

Both coeffs, awrt -1.18 and 4.08

Letters correct, needs 1 correct coefficient

1

(c)

Value of PMCC suggests that there is strong correlation, or 0.75 shown close to mean 0.399

B1

[1]

3.5a

E.g. " $r$ high so points lie close to line". " $r$ is high" alone is enough.

No wrong extras

Not "0.75 is close to mean", unless properly justified, e.g. SD (= 0.264) calculated

1

(d)

Whether $x = 0.75$ is within the data range

B1

[1]

3.5b

E.g. "maximum and minimum values of $x$ "; not "all data points".

No wrong extras

Or clear reference to interpolation. NB: 95\% CI for $x$ is ( $- 0.156,0.954$ )

2

(a)

Po(497)

$\mathrm { P } ( \geq 520 ) = 1 - \mathrm { P } ( \leq 519 )$ used correctly

$= 0.1564 \ldots$

B1

M1

A1 [3]

1.1

1.1a

1.1

Stated or implied

Allow 0.146(08) from 1 $\mathrm { P } ( \leq 520 )$

In range [0.156,0.157]

SC: Normal approx.:

N(497, 497) B1

In range [0.156, 0.157]: B2

2

(b)

Occurrence of a bus is not a random event if it runs on or close to a schedule.

B1

[1]

2.4

Needs context (not just "events").

Allow just "buses not random", or "buses not independent because time between buses is regulated"

Not "not independent" without such justification. Not "not constant rate". No extras.

Question

Answer

Mark

AO

Guidance

\multirow[t]{4}{*}{3}

(a)

$\mathrm { H } _ { 0 } : \mu = 500 , \mathrm { H } _ { 1 } : \mu < 500$

B1

1.1

One error, e.g. $\mathrm { H } _ { 1 } : \mu \neq 500$, or $\mu$ not defined, or all in words: B1

$x$ or $\bar { x } : 0$ unless defined as population mean (then B1)

\(\begin{aligned}

\bar { X } \sim \mathrm {~N} \left( 500 , \frac { 80 ^ { 2 } } { 40 } \right) = \mathrm { N } ( 500,160 ) \text { and } \bar { X }

\mathrm { P } ( \bar { X } < 473 ) = 0.01640 \text { or } z = - 2.13 ( 45 )

\text { or } \mathrm { CV } = 470.6 \end{aligned}\)

М1

3.3

$p$ or $z$ correct to 3 sf .

Can be implied by 0.0164, 0.9836, 0.433, 0.198, 0.000 but not 0.3679 or 0.00127

$p > 0.01$ or $z > - 2.326 \quad$ or $473 > 470.6$

A1

1.1

Compare $p$ with 0.01 or $z$ with -2.326, or 2.326 used in CV

Must be like-with-like, Not e.g. 0.9836 > 0.01 or $p < 2.326$

Do not reject $\mathrm { H } _ { 0 }$. Insufficient evidence that greatest weight that new design can support is less than the greatest weight that the traditional design can support.

M1ft

A1ft [7]

1.1

Correct first conclusion, needs correct method and like-with-like, ft on test statistic if method correct Contextualised, not too definite

But BOD if no explicit comparison of $p$ with 0.01 Not "the new design does not have a smaller greatest weight . . ."

3

(b)

Standard deviation/variance remains unchanged, or sample must be random

B1 [1]

1.2

No extras. Not "same distribution".

Not "assume normal"; this is not needed

3

(c)

Either: Yes as we do not know that the distribution of weights for the new design is normal

Or: $\quad$ No as the population distribution known to be normal

B1 [1]

2.1

Allow "population distribution assumed to be normal". No extras, e.g. "and sample size is large".

Allow "yes as we do not know that the distribution for the new design is normal" only if clearly refers to the new design only

Question

Answer

Mark

AO

Guidance

4

(a)

$9 \times \frac { 40 } { 9 } = 40$

B1 [1]

1.1

40 or awrt 40.0 only

4

(b)

$\frac { 1 - p } { p ^ { 2 } } = \frac { 40 } { 9 }$

М1

3.1b

Use correct formula for variance

SC: insufficient working, $\frac { 3 } { 8 }$ only: M0B1 for $\frac { 3 } { 8 }$, then B0

\(\begin{aligned}

\mathrm { E } ( D ) = 1 / p \quad \left[ = \frac { 8 } { 3 } \right]

\mathrm { E } ( 3 D + 5 ) = 3 \times \frac { 8 } { 3 } + 5 \quad [ = 13 ] \end{aligned}\)

B1ft

2.3

- formula for $\mathrm { E } ( D )$

Allow for explicit rejection of a solution even if both are wrong

$p$ doesn't need to be between 0 and 1 for either of these marks

A1ft [6]

1.1

$3 \times ($ their $\mathrm { E } ( D ) ) + 5$

SC: $\frac { 1 - p } { p ^ { 2 } } = 40$ (their 40), $p = \frac { - 1 \pm \sqrt { 161 } } { 80 }$, reject negative solution, $\mathrm { E } ( D ) = \frac { 1 + \sqrt { 161 } } { 2 } = 6.844 , \mathrm { E } ( 3 D + 5 ) = 25.53 : \quad \mathrm { M } 1 , \mathrm { M } 1 \mathrm {~A} 0 , \mathrm {~B} 1 , \mathrm {~B} 2$ total $5 / 6$

4

(c)

\(\begin{aligned}

\mathrm { P } ( D > \mathrm { E } ( D ) ) = \mathrm { P } ( D \geq 3 )

= ( 1 - p ) ^ { 2 }

= \frac { 25 } { 64 } \text { or } 0.390625 \end{aligned}\)

M1ft М1

A1 [3]

3.1a 1.1a

1.1

Convert inequality to integer, their $[ 1 / p ] + 1$, allow >

$( 1 - p ) ^ { r } , \mathrm { ft }$ on their $p , r$, e.g. 8/3 or 13

Allow $( 1 - p ) ^ { 3 } = 125 / 512$ or 0.244

Answer, exact or art 0.391, www

Not their 13

$( 1 - p ) ^ { 8 / 3 } [ 0.286 ]$ : M0M1A0

Need $0 < p < 1$ here

Allow $( 1 - p ) ^ { 6 } = 0.3876$ from SC above

Question

Answer

Mark

AO

Guidance

\multirow[t]{10}{*}{5}

\multirow{10}{*}{}

$\mathrm { H } _ { 0 } : m _ { Q } = m _ { R } , \mathrm { H } _ { 1 } : m _ { Q } \neq m _ { R }$, where $m _ { Q }$ and $m _ { R }$ are the medians of the rankings given to $Q$ and

B1

1.1

Allow $m$ undefined. If verbal, must mention medians, $m$ or distribution. Allow $m _ { d } = 0$ as opposed to $m Q = m _ { R }$

Not anything that might be $\mu$ unless symbol clearly defined as median. Not "there is no difference in the ranks ..."

Sum of ranks $= 1 / 2 \times 54 \times 55 = 1485$

М1

1.1

Find sum of ranks

$R _ { m } = 1485 - 726 = 759 \quad$ [or 561]

A1

1.1

Correct value of $R _ { m }$ seen

Allow even if 726 used later

\(\begin{array} { r } R _ { m } \sim \mathrm {~N} ( 660 ,

\quad \ldots 3300 ) \end{array}\)

М1

A1

3.1b

3.3

normal, mean their $\frac { 1 } { 2 } \times 24 \times$ 55

Allow SD/Var muddle

\(\begin{aligned} \mathrm { P } \left( R _ { m } \geq 759 \right)

= 0.0432 \text { (3 s.f.) }

{ [ \text { or } z }

= 1.715 ] \end{aligned}\)

М1

A1

3.4

1.1

Both parameters correct Standardise, their $R _ { m }$

Correct test statistic (0.0432) 0.0424 or 0.0416 (no/wrong cc): M1A0

(Same for $\mathrm { P } \left( R _ { m } \leq 561 \right)$ Allow $z \square \in [ 1.71,1.715 ]$, allow $z = 1.72$ only if cc demonstrated correct

Alternatively: $\operatorname { CV } 660 + 1.96 \sqrt { } 3300 [ = 772.6 ]$

758.5 < 772.6

M1 A1

Not 759 - or 726 - ...; not wrong tail for comparison, but allow ± Needs correct cc

Or 561.5 > 547.4 Wrong $z$-value: M1A1ft B0

$p > 0.025,2 p > 0.05 , z < 1.96$, or 1.96 used in CV

B1

1.1

Explicit correct comparison

Needs like-with-like (e.g. $p$ must be < 0.5)

Do not reject $\mathrm { H } _ { 0 }$. Insufficient evidence of a difference between the ranks.

M1ft

A1ft [10]

1.1

2.2b

Correct first conclusion, needs correct method and like-with-like Contextualised, not too definite

ft on wrong ts, or 1-tail/2-tail confusions, e.g. $p$ compared with 0.05 or not explicit, or $z \geq 1.645$

\includegraphics[max width=\textwidth, alt={}]{6cdb3135-90ca-42f1-bab1-a4b35451cea2-10_54_1750_1703_611}

OCR Further Statistics 2021 June Q1

1
The continuous random variable $X$ has the distribution $\mathrm { N } ( \mu , 30 )$. The mean of a random sample of 8 observations of $X$ is 53.1 . Determine a $95 \%$ confidence interval for $\mu$. You should give the end points of the interval correct to 4 significant figures.

OCR Further Statistics 2021 June Q2

2 A book collector compared the prices of some books, $\pounds x$, when new in 1972 and the prices of copies of the same books, $\pounds y$, on a second-hand website in 2018.
The results are shown in Table 1 and are summarised below the table. \begin{table}[h]

Book	A	B	C	D	E	F	G	H	I	J	K	L
$x$	0.95	0.65	0.70	0.90	0.55	1.40	1.50	0.50	1.15	0.35	0.20	0.35
$y$	6.06	7.00	2.00	5.87	4.00	5.36	7.19	2.50	3.00	8.29	1.37	2.00

\captionsetup{labelformat=empty} \caption{Table 1}

\end{table} $$n = 12 , \Sigma x = 9.20 , \Sigma y = 54.64 , \Sigma x ^ { 2 } = 8.9950 , \Sigma y ^ { 2 } = 310.4572 , \Sigma x y = 46.0545$$

It is given that the value of Pearson's product-moment correlation coefficient for the data is 0.381 , correct to 3 significant figures.
1. State what this information tells you about a scatter diagram illustrating the data.
2. Test at the $5 \%$ significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018.
The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books.

OCR Further Statistics 2021 June Q3

3 The numbers of CD players sold in a shop on three consecutive weekends were 7,6 and 2 . It may be assumed that sales of CD players occur randomly and that nobody buys more than one CD player at a time. The number of CD players sold on a randomly chosen weekend is denoted by $X$.

How appropriate is the Poisson distribution as a model for $X$ ? Now assume that a Poisson distribution with mean 5 is an appropriate model for $X$.
Find
1. $\mathrm { P } ( X = 6 )$,
2. $\mathrm { P } ( X \geqslant 8 )$. The number of integrated sound systems sold in a weekend at the same shop can be assumed to have the distribution $\operatorname { Po } ( 7.2 )$.
Find the probability that on a randomly chosen weekend the total number of CD players and integrated sound systems sold is between 10 and 15 inclusive.
State an assumption needed for your answer to part (c) to be valid.
Give a reason why the assumption in part (d) may not be valid in practice.

OCR Further Statistics 2021 June Q4

38 marks

4 The continuous random variable $X$ has probability density function $$\mathrm { f } ( x ) = \begin{cases} \frac { k } { x ^ { n } } & x \geqslant 1
0 & \text { otherwise } \end{cases}$$ where $n$ and $k$ are constants and $n$ is an integer greater than 1 .

Find $k$ in terms of $n$.
1. When $n = 4$, find the cumulative distribution function of $X$.
2. Hence determine $\mathrm { P } ( X > 7 \mid X > 5 )$ when $n = 4$.

Determine the values of $n$ for which $\operatorname { Var } ( X )$ is not defined. \section*{Total Marks for Question Set 5: 38} \section*{Mark scheme} \section*{Marking Instructions} a An element of professional judgement is required in the marking of any written paper. Remember that the mark scheme is designed to assist in marking incorrect solutions. Correct solutions leading to correct answers are awarded full marks but work must not always be judged on the answer alone, and answers that are given in the question, especially, must be validly obtained; key steps in the working must always be looked at and anything unfamiliar must be investigated thoroughly. Correct but unfamiliar or unexpected methods are often signalled by a correct result following an apparently incorrect method. Such work must be carefully assessed.
b The following types of marks are available. \section*{M} A suitable method has been selected and applied in a manner which shows that the method is essentially understood. Method marks are not usually lost for numerical errors, algebraic slips or errors in units. However, it is not usually sufficient for a candidate just to indicate an intention of using some method or just to quote a formula; the formula or idea must be applied to the specific problem in hand, e.g. by substituting the relevant quantities into the formula. In some cases the nature of the errors allowed for the award of an M mark may be specified.
A method mark may usually be implied by a correct answer unless the question includes the DR statement, the command words "Determine" or "Show that", or some other indication that the method must be given explicitly. \section*{A} Accuracy mark, awarded for a correct answer or intermediate step correctly obtained. Accuracy marks cannot be given unless the associated Method mark is earned (or implied). Therefore M0 A1 cannot ever be awarded. \section*{B} Mark for a correct result or statement independent of Method marks. \section*{E} A given result is to be established or a result has to be explained. This usually requires more working or explanation than the establishment of an unknown result. Unless otherwise indicated, marks once gained cannot subsequently be lost, e.g. wrong working following a correct form of answer is ignored. Sometimes this is reinforced in the mark scheme by the abbreviation isw. However, this would not apply to a case where a candidate passes through the correct answer as part of a wrong argument.
c When a part of a question has two or more 'method' steps, the M marks are in principle independent unless the scheme specifically says otherwise; and similarly where there are several B marks allocated. (The notation 'dep*' is used to indicate that a particular mark is dependent on an earlier, asterisked, mark in the scheme.) Of course, in practice it may happen that when a candidate has once gone wrong in a part of a question, the work from there on is worthless so that no more marks can sensibly be given. On the other hand, when two or more steps are successfully run together by the candidate, the earlier marks are implied and full credit must be given.
d The abbreviation FT implies that the A or B mark indicated is allowed for work correctly following on from previously incorrect results. Otherwise, A and B marks are given for correct work only - differences in notation are of course permitted. A (accuracy) marks are not given for answers obtained from incorrect working. When A or B marks are awarded for work at an intermediate stage of a solution, there may be various alternatives that are equally acceptable. In such cases, what is acceptable will be detailed in the mark scheme. Sometimes the answer to one part of a question is used in a later part of the same question. In this case, A marks will often be 'follow through'.
e We are usually quite flexible about the accuracy to which the final answer is expressed; over-specification is usually only penalised where the scheme explicitly says so.

When a value is given in the paper only accept an answer correct to at least as many significant figures as the given value.
When a value is not given in the paper accept any answer that agrees with the correct value to $\mathbf { 3 ~ s } . \mathbf { f }$. unless a different level of accuracy has been asked for in the question, or the mark scheme specifies an acceptable range.

Follow through should be used so that only one mark in any question is lost for each distinct accuracy error.
Candidates using a value of $9.80,9.81$ or 10 for $g$ should usually be penalised for any final accuracy marks which do not agree to the value found with 9.8 which is given in the rubric.
f Rules for replaced work and multiple attempts:

If one attempt is clearly indicated as the one to mark, or only one is left uncrossed out, then mark that attempt and ignore the others.
If more than one attempt is left not crossed out, then mark the last attempt unless it only repeats part of the first attempt or is substantially less complete.
if a candidate crosses out all of their attempts, the assessor should attempt to mark the crossed out answer(s) as above and award marks appropriately.

For a genuine misreading (of numbers or symbols) which is such that the object and the difficulty of the question remain unaltered, mark according to the scheme but following through from the candidate's data. A penalty is then applied; 1 mark is generally appropriate, though this may differ for some units. This is achieved by withholding one A or B mark in the question. Marks designated as cao may be awarded as long as there are no other errors.
If a candidate corrects the misread in a later part, do not continue to follow through. Note that a miscopy of the candidate's own working is not a misread but an accuracy error.
h If a calculator is used, some answers may be obtained with little or no working visible. Allow full marks for correct answers, provided that there is nothing in the wording of the question specifying that analytical methods are required such as the bold "In this question you must show detailed reasoning", or the command words "Show" or "Determine". Where an answer is wrong but there is some evidence of method, allow appropriate method marks. Wrong answers with no supporting method score zero. \begin{table}[h]

\captionsetup{labelformat=empty} \caption{Abbreviations}

Abbreviations used in the mark scheme	Meaning
dep*	Mark dependent on a previous mark, indicated by . The may be omitted if only one previous M mark
cao	Correct answer only
ое	Or equivalent
rot	Rounded or truncated
soi	Seen or implied
www	Without wrong working
AG	Answer given
awrt	Anything which rounds to
BC	By Calculator
DR	This question included the instruction: In this question you must show detailed reasoning.

\end{table}

Answer

Mark

AO

Guidance

\multirow{2}{*}{}

\multirow[t]{2}{*}{

$53.1 \pm 1.96 \sqrt { \frac { 30 } { 8 } }$

(49.30, 56.90)

}

M1

3.3

Square root correct Awrt 1.96 used, can be implied

\multirow{2}{*}{Allow e.g. (49.30, 56.9)}

A1 [4]

3.4

Both, only these numbers (4 sf needed at least once)

2

(a)

(i)

The points do not lie very close to a straight line

B1 [1]

1.1

Or equivalent. Must refer to diagram, not just to "correlation"

Ignore extras unless wrong

\multirow{3}{*}{}

\multirow[t]{3}{*}{(ii)}

$\mathrm { H } _ { 0 } : \rho = 0 , \mathrm { H } _ { 1 } : \rho > 0$, where $\rho$ is the population pmcc between prices in 1972 and prices in 2018

B2

1.1 2.5

One error, e.g. $\rho$ not defined, B1 (but allow "population" not stated)

$\mathrm { H } _ { 0 } : r = 0 , \mathrm { H } _ { 1 } : r > 0$ : same scheme, but B 2 needs "population" pmcc Compare with 0.497(3)

$\mathrm { H } _ { 0 }$ : no correlation, $\mathrm { H } _ { 1 }$ : positive correlation: B 1

1.1

2.2b

FT on CV 0.5760 only

$0.381 < 0.4973$

Do not reject $\mathrm { H } _ { 0 }$.

There is insufficient evidence of (positive) correlation between prices in the two years.

Exx:

$\alpha$ : Insufficient evidence to reject $\mathrm { H } _ { 0 }$. No correlation between ...

$\beta$ : Wrong first conclusion, correct interpretation:

$\gamma$ : Hypotheses wrong way round:

M1

M1ft

A1ft [5]

Correct first conclusion, needs like-with-like In context, not too definite

M1A1 (bod)

M0A0

Maximum M1M1

2

(b)

0.650

B2

[2]

3.1a

1.1

Full marks for correct answer by any method

SC: if B0 allow B1 for any 3 of 8.85, 46.35, 8.8725, 241.7331, 43.153

Question

Answer

Mark

AO

Guidance

\multirow{2}{*}{3}

\multirow{2}{*}{(a)}

\multirow{2}{*}{}

\multirow[t]{2}{*}{

B1

B1 [2]

}

\multirow[t]{2}{*}{

3.5b

}

"Events occur independently and at constant average rate": B0

Any reason for independence (or not)

... and for constant average rate (or not), in each case without misunderstanding of what they mean

SC: Mere assertion of both, properly contextualised: B1

SC: Variance $= 4.67$ which is closer to 5: B 1

SC: Considers only assumptions given in the question: B0

3

(b)

(i)

0.146(223) BC

M1

A1

[2]

3.4

1.1

Correct method stated or implied

Correct answer only, awrt 0.146

3

(ii)

0.133(372) BC

M1

A1

[2]

1.1

0.068: M1A0

(0.1337 give M1A1 BOD)

3

(c)

Po(12.2) $\mathrm { P } ( \leq 15 ) - \mathrm { P } ( \leq 9 ) \quad [ = 0.8296 - 0.2253 ]$

$= 0.604 ( 224 ) \quad \mathbf { B C }$

M1

A1 [3]

3.3

1.1

3.4

Stated or implied

Allow $\mathrm { P } ( \leq 16 )$ or $\mathrm { P } ( \leq 10 )$, e.g. 0.503 or 0.662

(M1M1A0)

Correct answer only, awrt 0.604

Allow this M1 also from $\lambda = 7.2 ( 0.187,0.110,0.189 )$

3

(d)

Sales of CD players and integrated systems need to be independent

B1

[1]

1.1

Need "independent" or "not related" clearly referred to the two types of machine.

Not just "purchases independent" or "distributions independent"

\multirow{2}{*}{3}

\multirow{2}{*}{(e)}

\multirow{2}{*}{}

\multirow[t]{2}{*}{B1 [1]}

\multirow[t]{2}{*}{3.5b}

Any reason for nonindependence of sales of CD players and integrated sound systems

Can get B0B1 provided they are focussing on independence

If a customer buys a CD player they probably won't (or will) buy an integrated system as well

Exx:

α: May buy both so not independent: B0

$\beta$ : Often bought together: B1

$\gamma$ : $\quad$ Context misunderstood: can get B1

e.g. CDs/CD players, or assuming that integrated systems don't include CD players

Question

Answer

Mark

AO

Guidance

4

(a)

\(\begin{aligned}

\int _ { 1 } ^ { \infty } k x ^ { - n } \mathrm {~d} x = \left[ \frac { k } { ( 1 - n ) x ^ { n - 1 } } \right] _ { 1 } ^ { \infty }

= \frac { k } { n - 1 } = 1 \text { so } k = n - 1 \end{aligned}\)

M1

B1

A1

[3]

1.1

Integral attempted, correct limits

Correct indefinite integral Correctly obtain $k = n - 1$, www

Don't need full details of $\lim ( a \rightarrow \infty )$

4

(b)

(i)

\(\begin{aligned}

\int 3 x ^ { - 4 } \mathrm {~d} x = - \frac { 1 } { x ^ { 3 } } + c

x = 1 , \mathrm {~F} ( x ) = 0 \text { so } c = 1 . \text { Hence } 1 - x ^ { - 3 }

\mathrm {~F} ( x ) = \begin{cases} 0

x < 1

1 - \frac { 1 } { x ^ { 3 } }

x \geq 1 \end{cases} \end{aligned}\)

M1

A1

B1

[3]

1.1

Needs $+ c$ or definite integral between 1 and $x$, oe

Fully correct active part of CDF

" 0 for $x < 1$ " stated and no wrong ranges (doesn't need M1 or A1)

Allow $\leq$ for $<$, and/or $>$ for $\geq$

Wrong $k$ : can get M1A0B1

Ignore ranges here

Or "0 otherwise" if " $x \geq 1$ " stated in active part

4

(ii)

\(\begin{aligned}

\frac { \mathrm { P } [ ( X > 7 ) \cap ( X > 5 ) ] } { \mathrm { P } ( X > 5 ) } = \frac { \mathrm { P } ( X > 7 ) } { \mathrm { P } ( X > 5 ) }

= \frac { 1 - \mathrm { F } ( 7 ) } { 1 - \mathrm { F } ( 5 ) }

= \frac { 125 } { 343 } \text { or } 0.364 ( 431 \ldots ) \end{aligned}\)

M1* A1

*dep M1

A1ft [4]

3.1a 3.1a

3.3

1.1

Use cond ${ } ^ { 1 }$ prob method $\mathrm { P } [ ( X > 7 ) \cap ( X > 5 ) ] = \mathrm { P } ( X > 7 )$

Convert probabilities into $\mathrm { F } ( X )$, not using $\mathrm { P } ( X > 7 ) \times \mathrm { P } ( X > 5 )$

Any exact fraction or awrt 0.364 , ft on $1 - a / x ^ { 3 } , a \neq 0,1$

$\frac { [ 1 - \mathrm { F } ( 7 ) ] [ 1 - \mathrm { F } ( 5 ) ] } { 1 - \mathrm { F } ( 5 ) }$ : can get M1A0M0A0

Allow from $\mathrm { F } ( x ) = 1 - a / x ^ { 3 }$, otherwise www

Question

Answer

Mark

AO

Guidance

\multirow{5}{*}{4}

\multirow{5}{*}{(c)}

$\mathrm { E } \left( X ^ { 2 } \right) = \int _ { 1 } ^ { \infty } k x ^ { 2 - n } \mathrm {~d} x = \left[ \frac { k x ^ { 3 - n } } { ( 3 - n ) } \right] _ { 1 } ^ { \infty } ( n \neq 3 )$

If $n = 3 , \mathrm { E } \left( X ^ { 2 } \right) = \lim _ { x \rightarrow \infty } [ 2 \ln ( x ) ]$, not defined

M1* B1

2.1 1.1

Correct limits needed somewhere

Correct indefinite integral or $\frac { n - 1 } { n - 3 }$

SC: $\mathrm { E } \left( X ^ { 2 } \right) = \frac { n - 1 } { n - 3 }$, M1B1 $\mathrm { E } ( X ) = \frac { n - 1 } { n - 2 } \Rightarrow n \neq 2$ or 3 : (not valid, must consider ln if $n = 2$ or 3 ): B0

No marks just for this unless last 3 marks all zero, then if this (or for $n = 2$ ) is shown, award SC B1 Make deduction based on convergence, ft

Infinite integral does not converge if $3 - n \geq 0$

*dep M1

2.2a

No limits used: M0B1M0B0

If $n \geq 4$ then $\mathrm { E } ( X ) = \left[ \frac { k x ^ { 2 - n } } { ( 2 - n ) } \right] _ { 1 } ^ { \infty }$ converges

B1

2.3

Consider convergence of $\mathrm { E } ( X )$

SC: $\operatorname { Var } ( X ) < 0$ when $n < 3$ : M1B1M1 (B0) A0

Therefore $\operatorname { Var } ( X )$ is not defined if and only if $n = 2$ or 3 .

A1 [5]

2.2a

Shown not defined for $n = 2$ or 3 and only for those

But no need to state "if and only if"

OCR Further Statistics 2021 June Q1

8 marks

1 Jo can use either of two different routes, A or B, for her journey to school. She believes that route A has shorter journey times. She measures how long her journey takes for 17 journeys by route A and 12 journeys by route B. She ranks the 29 journeys in increasing order of time taken, and she finds that the sum of the ranks of the journeys by route B is 219 .

Test at the $10 \%$ significance level whether route A has shorter journey times than route B . [8]
State an assumption about the 29 journeys which is necessary for the conclusion of the test to be valid.

OCR Further Statistics 2021 June Q2

27 marks

2 The random variable $X$ is equally likely to take any of the $n$ integer values from $m + 1$ to $m + n$ inclusive. It is given that $\mathrm { E } ( 3 X ) = 30$ and $\operatorname { Var } ( 3 X ) = 36$. Determine the value of $m$ and the value of $n$. 326 cards are each labelled with a different letter of the alphabet, A to Z. The letters A, E, I, O and U are vowels.

Five cards are selected at random without replacement. Determine the probability that the letters on at least three of the cards are vowels.

All 26 cards are arranged in a line, in random order.

Show that the probability that all the vowels are next to one another is $\frac { 1 } { 2990 }$.

Determine the probability that three of the vowels are next to each other, and the other two vowels are next to each other, but the five vowels are not all next to each other. A biased spinner has five sides, numbered 1 to 5 . Elmer spins the spinner repeatedly and counts the number of spins, $X$, up to and including the first time that the number 2 appears. He carries out this experiment 100 times and records the frequency $f$ with which each value of $X$ is obtained. His results are shown in Table 1, together with the values of $x f$. \begin{table}[h] \end{table}

Question

Answer

Mark

AO

Guidance

\multirow[t]{8}{*}{1}

\multirow{8}{*}{(a)}

$\mathrm { H } _ { 0 } : m _ { A } = m _ { B } , \mathrm { H } _ { 1 } : m _ { A } < m _ { B }$ where $m _ { A }$ and $m _ { B }$

B1

1.1

OR: Median journey times equal, oe. Allow if $m$ s used but not defined

Allow "mean" or "average" only if "population" stated

М1

1.1

Find either $\mathrm { P } ( \geq 219 )$ (218.5) or $\mathrm { P } ( \leq 141 )$ (141.5)

Use of 0.9559 is M0 here. For CV method see below

Consider correct tail, either 219 or 141 ( $R _ { m } = 219 , m ( m + n + 1 ) - R _ { m } = 141$ ) $p = \Phi \left( \frac { 141.5 - 180 } { \sqrt { 510 } } \right) = 0.0441 \ldots$

BC

М1

1.1

0.0421, 0.0401, 0.470 (no/wrong cc, $\sqrt { }$ ): M1

$0.9559 > 0.9 :$ A1A1 (M1A1) $0.9559 > 0.1 :$ A1A0 M0A0

0.0441 < 0.1

A1ft

1.1

Explicit comparison. FT on

Alternative:

CV $180 - z \times \sqrt { 5 } 10$ 141 (141.5) used $z = 1.282 \quad ( \mathrm { CV } = 151.05,151.058 .$. $141.5 < 151.05 ( 85 ) \quad$ or $218.5 > 208.95$

М1 M1 A1 A1

Allow $\sqrt { }$ errors Stated or implied CV and cc correct e.g. 141 < 150.55

$180 + 1.282 \sqrt { } 510$ etc is M0 unless 219 (218.5) used, in which case give M2(A1A1) E.g. $219 > 209.45$

Reject $\mathrm { H } _ { 0 }$. Significant evidence that route B takes longer

M1ft A1ft [8]

1.1 2.2b

Correct first conclusion Contextualised, not too definite

Needs like-with-like, e.g. 0.9559 with 0.9

SC Sum of A's ranks $= 435 - 219 = 216$ used: B1B0 M0M1A0A1 M1A1 max 5/8

Exx:

$\alpha$ : $\quad \mathrm { H } _ { 0 }$ : Journey times are the same, $\mathrm { H } _ { 1 }$ : journey times for $B$ are higher:

$\beta$ : $\quad \mathrm { H } _ { 0 }$ : No evidence that median journey times are different, etc:

B0

1

(b)

Must be a random sample (of all journeys) Or distributions must be same shape (necessary assumption for Wilcoxon ranksum test!)

B1 [1]

3.5b

Or equivalent. Allow "(journeys) independent"

Not "representative".

2

\(\begin{aligned}

3 \mathrm { E } ( X ) = 30 \text { or } \mathrm { E } ( X ) = 10

9 \times \operatorname { Var } ( X ) = 36 \text { or } \operatorname { Var } ( X ) = 4 \end{aligned}\)

B1 B1

2.2a 2.2a

Used, stated or implied One of these, used, stated or implied

Question			Answer	Mark	AO	Guidance
\multirow{5}{*}{}	\multirow{5}{*}{}	\multirow{5}{*}{}	$\frac { 1 } { 1 } \left( n ^ { 2 } - 1 \right) = 4$	M1	2.2a	$n = 7$ only, no need for "reject -7"	\multirow[b]{2}{*}{Allow if $\mathrm { E } ( 3 X + m )$ used rather than $\mathrm { E } [ 3 ( X + m ) ]$}
			$\mathrm { E } ( X - m ) = \frac { 1 } { 2 } ( n + 1 )$	M1	3.1b	Use expectation of uniform, e.g. $2 m + n + 1 = 20$.
			Alternative: $\operatorname { Var } ( Y + m ) = \frac { 1 } { 12 } \left( n ^ { 2 } - 1 \right)$	М1 A1 М1		$n = 7$ only Use expectation of uniform, e.g. $2 m + n + 1 = 20$.	No need for "reject -7"
			$10 - m = 4$	М1	2.1	Validly derive single equation for $m$
			$m = 6$	A1 [7]	2.2a	$m = 6$ only	NB: $\operatorname { Var } = ( n - 1 ) ^ { 2 } / 12$ is from continuous uniform!
\multirow{4}{*}{3}	\multirow{4}{*}{(a)}	\multirow{4}{*}{}	${ } ^ { 5 } C _ { 3 } \times { } ^ { 21 } C _ { 2 } + { } ^ { 5 } C _ { 4 } \times { } ^ { 21 } C _ { 1 } + 1 \quad [ = 2100 + 105 + 1 ]$	M1dep	3.1b	Any correct pair of ${ } ^ { n } C _ { r }$ s multiplied	Or $1 - \mathrm { P } ( 0,1,2 ) = 1 - .9665$
				A1	1.1	All terms correct
			$\div { } ^ { 26 } C _ { 5 } [ = 65780 ]$	*M1	1.1
			$\frac { 1103 } { 32890 }$ or $0.0335 \ldots$	A1 [4]	3.2a	Awrt 0.0335 or any exact fraction	e.g. $\frac { 2206 } { 65780 }$ or $\frac { 264720 } { 7893600 }$
			Alternative: $\frac { 5 } { 26 } \times \frac { 4 } { 25 } \times \frac { 3 } { 24 } \times \frac { 2 } { 23 } \times \frac { 1 } { 22 }$	B1		Must have 5 oe, e.g. ${ } ^ { 5 } C _ { 1 }$

3

(b)

(i)

$\frac { 22 ! \times 5 ! } { 26 ! } \left( = \frac { 1 \times 2 \times 3 \times 4 \times 5 } { 23 \times 24 \times 25 \times 26 } = \frac { 120 } { 358800 } \right)$

M1 A1

1.1 2.1

Oe. Allow M1 for 21! instead of 22! Fully correct

$\frac { 1 \times 2 \times 3 \times 4 \times 5 } { 22 \times 23 \times 24 \times 25 \times 26 } :$ M1

Question

Answer

Mark

AO

Guidance

$= \frac { 1 } { 2990 } \quad$ AG

A1 [3]

2.2a

Correctly obtain AG using exact method

Allow even if no working after $22 ! \times 5 ! \div 26$ !

\multirow{8}{*}{3}

\multirow{8}{*}{(b)}

\multirow{8}{*}{(ii)}

22 fences: 22 for [VVV] $\times 21$ for [VV]

M1

3.1b

Correct strategy, allow ${ } ^ { 22 } C _ { 2 }$ for ${ } ^ { 22 } P _ { 2 }$

Consonants arranged in 21! ways

M1

1.1

At least one of these, no subtraction

$21 ! \times 3 ! \times 2 ! \div 26 !$ M0M1

$21 ! \times 3 ! \times 2 ! \times 22 \times 21$ : M2A0

Vowels arranged in 5! ways ( $= { } ^ { 5 } P _ { 3 } \times { } ^ { 2 } P _ { 2 }$ )

A1

2.1

Both correct

NB: ${ } ^ { 5 } C _ { 3 } \times 3 ! \times 2 ! = 5 !$

\(\begin{aligned}

\text { Product } \div 26 ! = \frac { 21 } { 2990 }

\left( = 2.832 \times 10 ^ { 24 } \div 4.0329 \times 10 ^ { 26 } \right) \end{aligned}\)

A1

[4]

3.2a

Allow from calculator but must be exact fraction

Alternative:

Treat 21 consonants, [VVV] and [VV] as 23

М1

3.1b

Correct strategy, allow $23 ! \times 2 ! \times 3 !$

(Must subtract $2 \times 1 / 2990$ as 23! method counts

A1

2.1

Correct $\left( 5 ! = { } ^ { 5 } P _ { 3 } \times { } ^ { 2 } P _ { 2 } = \right. \left. { } ^ { 5 } C _ { 3 } \times 2 ! \times 3 ! \right)$

M1 also for subtracting $1 \times$

[VVVVV] twice, once as [VVV][VV] and once as [VV][VVV])

(11/1495 is M1A1M1A0)

Answer is $\frac { 21 } { 2990 }$

A1

1.1

1/2990

Final answer, exact fraction

\begin{center} \begin{tabular}{ | l | l | l | l | l | l | }

OCR Further Statistics 2021 June Q4

7 marks

$\mathbf { 4 }$ & $\mathbf { ( a ) }$ & Geometric & M1 & 1.1 & Stated explicitly
\end{tabular} \end{center}

Question

Answer

Mark

AO

Guidance

Mean $= 400 \div 100 ( = 4 )$ and $p = 1 /$ mean

M1

2.4

Use mean (or P(1) etc) to deduce $p$ ("Determine", so justification needed for 0.25)

Needs to deduce $p$ in part (a), not defer it to (b)

\multirow{3}{*}{4}

\multirow[t]{3}{*}{(b)}

Probability is $0.75 ^ { 6 } ( = 0.1779785 \ldots )$

M1

3.3

SC Geo(0.2): $0.8 { } ^ { 6 }$ M1A0

Or: 0.177978 or 0.177979 or better seen, or $1 - [ \mathrm { P } ( 1 ) + \ldots + \mathrm { P } ( 6 ) ]$ with evidence, e.g. formula

M1

Allow ± 1 term

Expected frequency $=$ probability $\times 100 = 17.798$

A1 [2]

2.1

17.798 correctly obtained, with sufficient evidence, www

$100 - \Sigma$ (other frequencies): SC B1

\multirow{6}{*}{4}

\multirow{6}{*}{(c)}

Ho: data consistent with (geometric)

B1

1.1

Both, allow equivalents, but not "evidence that ...". 9.005 or 9.01

Compare their $\Sigma X ^ { 2 }$ with 11.07

E.g. $\mathrm { H } _ { 0 } : X \sim \operatorname { Geo } ( p )$ Allow Geo(0.25)

$\Sigma X ^ { 2 } = 9.005$

B1

1.1

$9.005 < 11.07 ( v = 5 )$

B1

1.1

Do not reject $\mathrm { H } _ { 0 }$.

M1ft

1.1

Correct first conclusion, ft on their 9.005 or on 12.59, needs like-with-like

Allow from comparison with 12.59 but nothing else

Insufficient evidence that a geometric distribution is not a good fit.

A1ft [5]

2.2b

Contextualised, not too definite (needs double negative) Don't penalise "Geo(0.25)"

Allow addition slip in $\Sigma X ^ { 2 }$ SC Geo(0.2): can get full marks if given data used, $\Sigma X ^ { 2 } = 4.54$ used gets B1B1B0M1A1

Score	0	1	2	3
Probability	\(a\)	\(b\)	\(b\)	\(b\)

Questions Further Statistics (100 questions)

Browse by module

Browse by board

OCR Further Statistics 2018 September Q9

OCR Further Statistics 2018 December Q1

OCR Further Statistics 2018 December Q2

OCR Further Statistics 2018 December Q3

OCR Further Statistics 2018 December Q4

OCR Further Statistics 2018 December Q5

OCR Further Statistics 2018 December Q6

OCR Further Statistics 2018 December Q7

OCR Further Statistics 2018 December Q8

OCR Further Statistics 2017 Specimen Q2

OCR Further Statistics 2017 Specimen Q5

OCR Further Statistics 2017 Specimen Q6

OCR Further Statistics 2017 Specimen Q7

OCR Further Statistics 2021 June Q1

OCR Further Statistics 2021 June Q1

OCR Further Statistics 2021 June Q2

OCR Further Statistics 2021 June Q4

OCR Further Statistics 2021 June Q5

OCR Further Statistics 2021 June Q1

OCR Further Statistics 2021 June Q2

OCR Further Statistics 2021 June Q3

OCR Further Statistics 2021 June Q4

OCR Further Statistics 2021 June Q1

OCR Further Statistics 2021 June Q2

OCR Further Statistics 2021 June Q4

Country	\(x\)	\(y\)	Country	\(x\)	\(y\)
Benin	4.8	59.2	Mozambique	5.4	54.63
Cameroon	4.7	54.87	Nigeria	5.7	52.29
Congo	4.9	61.42	Senegal	5.1	65.81
Gambia	5.7	59.83	Somalia	6.5	54.88
Liberia	4.7	60.25	Sudan	4.4	63.08
Malawi	5.1	60.97	Uganda	5.8	57.25
Mauretania	4.6	62.77	Zambia	5.4	58.75