2.05a - OCR Spec

OCR MEI S4 2013 June Q3

24 marks Standard +0.3

3

Explain the meaning of the following terms in the context of hypothesis testing: Type I error, Type II error, operating characteristic, power.
A test is to be carried out concerning a parameter $\theta$. The null hypothesis is that $\theta$ has the particular value $\theta _ { 0 }$. The alternative hypothesis is $\theta \neq \theta _ { 0 }$. Draw a sketch of the operating characteristic for a perfect test that never makes an error.
The random variable $X$ is distributed as $\mathrm { N } ( \mu , 9 )$. A random sample of size 25 is available. The null hypothesis $\mu = 0$ is to be tested against the alternative hypothesis $\mu \neq 0$. The null hypothesis will be accepted if $- 1 < \bar { x } < 1$ where $\bar { x }$ is the value of the sample mean, otherwise it will be rejected. Calculate the probability of a Type I error. Calculate the probability of a Type II error if in fact $\mu = 0.5$; comment on the value of this probability.
Without carrying out any further calculations, draw a sketch of the operating characteristic for the test in part (iii).

OCR MEI S4 2013 June Q4

24 marks Moderate -0.3

4

Explain the advantages of randomisation and replication in a statistically designed experiment.
The usual statistical model underlying the one-way analysis of variance is given, in the usual notation, by $$x _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ where $x _ { i j }$ denotes the $j$ th observation on the $i$ th treatment. Define carefully all the terms in this model and state the properties of the term that represents experimental error.
A trial of five fertilisers is carried out at an agricultural research station according to a completely randomised design in which each fertiliser is applied to four experimental plots of a crop (so that there are 20 experimental units altogether). The sums of squares in a one-way analysis of variance of the resulting data on yields of the crop are as follows.
Source of variation Sum of squares
Between fertilisers 219.2
Residual 304.5
Total 523.7
State the customary null and alternative hypotheses that are tested. Provide the degrees of freedom for each sum of squares. Hence copy and complete the analysis of variance table and carry out the test at the 5\% level.

CAIE FP2 2014 June Q8

9 marks Standard +0.3

8 Weekly expenses claimed by employees at two different branches, $A$ and $B$, of a large company are being compared. Expenses claimed by an employee at branch $A$ and by an employee at branch $B$ are denoted by $\$ x$ and $\$ y$ respectively. A random sample of 60 employees from branch $A$ and a random sample of 50 employees from branch $B$ give the following summarised data. $$\Sigma x = 6060 \quad \Sigma x ^ { 2 } = 626220 \quad \Sigma y = 4750 \quad \Sigma y ^ { 2 } = 464500$$ Using a $2 \%$ significance level, test whether, on average, employees from branch $A$ claim the same as employees from branch $B$.

CAIE FP2 2014 June Q11 OR

Challenging +1.2

The time taken for a randomly chosen student at College $P$ to complete a particular puzzle has a normal distribution with mean $\mu$ minutes. The times, $x$ minutes, are recorded for a random sample of 8 students chosen from the college. The results are summarised as follows. $$\Sigma x = 42.8 \quad \Sigma x ^ { 2 } = 236.0$$ Find a 95\% confidence interval for $\mu$. A test is carried out on this sample data, at the $10 \%$ significance level. The test supports the claim that $\mu > k$. Find the greatest possible value of $k$. A random sample, of size 12, is taken from the students at College $Q$. Their times to complete the puzzle give a sample mean of 4.60 minutes and an unbiased variance estimate of 1.962 minutes ${ } ^ { 2 }$. Use a 2 -sample test at the $10 \%$ significance level to test whether the mean time for students at College $Q$ to complete the puzzle is less than the mean time for students at College $P$ to complete the puzzle. You should state any assumptions necessary for the test to be valid.

CAIE FP2 2015 June Q8

8 marks Standard +0.3

8

For a random sample of ten pairs of values of $x$ and $y$ taken from a bivariate distribution, the equations of the regression lines of $y$ on $x$ and of $x$ on $y$ are, respectively, $$y = 0.38 x + 1.41 \quad \text { and } \quad x = 0.96 y + 7.47$$
1. Find the value of the product moment correlation coefficient for this sample.
2. Using a $5 \%$ significance level, test whether there is positive correlation between the variables.
For a random sample of $n$ pairs of values of $u$ and $v$ taken from another bivariate distribution, the value of the product moment correlation coefficient is 0.507 . Using a test at the $5 \%$ significance level, there is evidence of non-zero correlation between the variables. Find the least possible value of $n$.

CAIE FP2 2015 June Q10

13 marks Standard +0.3

10 Young children at a primary school are learning to throw a ball as far as they can. The distance thrown at the beginning of the school year and the distance thrown at the end of the same school year are recorded for each child. The distance thrown, in metres, at the beginning of the year is denoted by $x$; the distance thrown, in metres, at the end of the year is denoted by $y$. For a random sample of 10 children, the results are shown in the following table.

Child	$A$	$B$	$C$	$D$	$E$	$F$	$G$	$H$	$I$	$J$
$x$	5.2	4.1	3.7	5.4	7.6	6.1	3.2	4.0	3.5	8.0
$y$	6.2	4.8	5.0	5.6	7.7	7.0	4.0	4.5	3.6	8.5

$$\left[ \Sigma x = 50.8 , \quad \Sigma x ^ { 2 } = 284.16 , \quad \Sigma y = 56.9 , \quad \Sigma y ^ { 2 } = 347.59 , \quad \Sigma x y = 313.28 . \right]$$ A particular child threw the ball a distance of 7.0 metres at the beginning of the year, but he could not throw at the end of the year because he had broken his arm. By finding the equation of an appropriate regression line, estimate the distance this child would have thrown at the end of the year. The teacher suspects that, on average, the distance thrown by a child increases between the two throws by more than 0.4 metres. Stating suitable hypotheses and assuming a normal distribution, test the teacher's suspicion at the $5 \%$ significance level.

CAIE FP2 2017 November Q11 OR

Moderate -0.3

A large number of people attended a course to improve the speed of their logical thinking. The times taken to complete a particular type of logic puzzle at the beginning of the course and at the end of the course are recorded for each person. The time taken, in minutes, at the beginning of the course is denoted by $x$ and the time taken, in minutes, at the end of the course is denoted by $y$. For a random sample of 9 people, the results are summarised as follows. $$\Sigma x = 45.3 \quad \Sigma x ^ { 2 } = 245.59 \quad \Sigma y = 40.5 \quad \Sigma y ^ { 2 } = 195.11 \quad \Sigma x y = 218.72$$ Ken attended the course, but his time to complete the puzzle at the beginning of the course was not recorded. His time to complete the puzzle at the end of the course was 4.2 minutes.

By finding, showing all necessary working, the equation of a suitable regression line, find an estimate for the time that Ken would have taken to complete the puzzle at the beginning of the course.
The values of $x - y$ for the sample of 9 people are as follows. $$\begin{array} { l l l l l l l l l } 0.2 & 0.8 & 0.5 & 1.0 & 0.2 & 0.6 & 0.2 & 0.5 & 0.8 \end{array}$$ The organiser of the course believes that, on average, the time taken to complete the puzzle decreases between the beginning and the end of the course by more than 0.3 minutes.
Stating suitable hypotheses and assuming a normal distribution, test the organiser's belief at the $2 \frac { 1 } { 2 } \%$ significance level.

OCR MEI S1 2016 June Q7

18 marks Moderate -0.3

7 To withdraw money from a cash machine, the user has to enter a 4-digit PIN (personal identification number). There are several thousand possible 4-digit PINs, but a survey found that $10 \%$ of cash machine users use the PIN '1234'.

16 cash machine users are selected at random.
(A) Find the probability that exactly 3 of them use 1234 as their PIN.
(B) Find the probability that at least 3 of them use 1234 as their PIN.
(C) Find the expected number of them who use 1234 as their PIN. An advertising campaign aims to reduce the number of people who use 1234 as their PIN. A hypothesis test is to be carried out to investigate whether the advertising campaign has been successful.
Write down suitable null and alternative hypotheses for the test. Give a reason for your choice of alternative hypothesis.
A random sample of 20 cash machine users is selected.
(A) Explain why the test could not be carried out at the $10 \%$ significance level.
(B) The test is to be carried out at the $k \%$ significance level. State the lowest integer value of $k$ for which the test could result in the rejection of the null hypothesis.
A new random sample of 60 cash machine users is selected. It is found that 2 of them use 1234 as their PIN. You are given that, if $X \sim \mathrm {~B} ( 60,0.1 )$, then (to 4 decimal places) $$\mathrm { P } ( X = 2 ) = 0.0393 , \quad \mathrm { P } ( X < 2 ) = 0.0138 , \quad \mathrm { P } ( X \leqslant 2 ) = 0.0530 .$$ Using the same hypotheses as in part (ii), carry out the test at the $5 \%$ significance level. \section*{END OF QUESTION PAPER}

Edexcel AS Paper 2 2019 June Q5

6 marks Standard +0.3

Past records show that $15 \%$ of customers at a shop buy chocolate. The shopkeeper believes that moving the chocolate closer to the till will increase the proportion of customers buying chocolate.

After moving the chocolate closer to the till, a random sample of 30 customers is taken and 8 of them are found to have bought chocolate. Julie carries out a hypothesis test, at the 5\% level of significance, to test the shopkeeper's belief.
Julie's hypothesis test is shown below. $\mathrm { H } _ { 0 } : p = 0.15$ $\mathrm { H } _ { 1 } : p \geqslant 0.15$ Let $X =$ the number of customers who buy chocolate. $X \sim \mathrm {~B} ( 30,0.15 )$ $\mathrm { P } ( X = 8 ) = 0.0420$ $0.0420 < 0.05$ so reject $\mathrm { H } _ { 0 }$ There is sufficient evidence to suggest that the proportion of customers buying chocolate has increased.

Identify the first two errors that Julie has made in her hypothesis test.
Explain whether or not these errors will affect the conclusion of her hypothesis test. Give a reason for your answer.
Find, using a 5\% level of significance, the critical region for a one-tailed test of the shopkeeper's belief. The probability in the tail should be less than 0.05
Find the actual level of significance of this test.

Edexcel AS Paper 2 2023 June Q4

7 marks Standard +0.3

Past information shows that $25 \%$ of adults in a large population have a particular allergy.

Rylan believes that the proportion that has the allergy differs from 25\%
He takes a random sample of 50 adults from the population.
Rylan carries out a test of the null hypothesis $\mathrm { H } _ { 0 } : p = 0.25$ using a $5 \%$ level of significance.

Write down the alternative hypothesis for Rylan's test.
Find the critical region for this test. You should state the probability associated with each tail, which should be as close to $2.5 \%$ as possible.
State the actual probability of incorrectly rejecting $\mathrm { H } _ { 0 }$ for this test. Rylan finds that 10 of the adults in his sample have the allergy.
State the conclusion of Rylan's hypothesis test.

OCR MEI AS Paper 2 2022 June Q8

11 marks Moderate -0.3

8 In 2018 research showed that 81\% of young adults in England had never donated blood.
Following an advertising campaign in 2021, it is believed that the percentage of young adults in England who had never donated blood in 2021 is less than $81 \%$. Ling decides to carry out a hypothesis test at the 5\% level.
Ling collects data from a random sample of 400 young adults in England.

State the null and alternative hypotheses for the test, defining the parameter used.
Write down the probability that the null hypothesis is rejected when it should in fact be accepted.
Assuming the null hypothesis is correct, calculate the expected number of young adults in the sample who had never donated blood.
Calculate the probability that there were no more than 308 young adults who had never donated blood in the sample.
Determine the critical region for the test. In fact, the sample contained 314 young adults who had never donated blood.
Carry out the test, giving the conclusion in the context of the question.

OCR MEI AS Paper 2 2023 June Q13

6 marks Moderate -0.3

13 In a report published in October 2021 it is stated that $37 \%$ of adults in the United Kingdom never exercise or play sport. A researcher believes that the true percentage is less than this. They decide to carry out a hypothesis test at the $5 \%$ level to investigate the claim.

State the null and alternative hypotheses for their test.
Define the parameter for their test. In a random sample of 118 adults, they find that 35 of them never exercise or play sport.
Carry out the test.

OCR MEI AS Paper 2 2024 June Q12

6 marks Moderate -0.8

12 Data collected in the twentieth century showed that the probability of a randomly selected person having blue eyes was 0.08 . A medical researcher believes that the probability in 2024 is less than this so they decide to carry out a hypothesis test at the $5 \%$ significance level.

Write down suitable hypotheses for the test, defining the parameter used.
Assuming that the probability that a person selected at random has blue eyes is still 0.08 , calculate the probability that 3 or fewer people in a random sample of 92 people have blue eyes. The researcher collects a random sample of 92 people and finds that 3 of them have blue eyes.
Use your answer to part (b) to carry out the test, giving your conclusion in context.

OCR MEI Paper 2 2023 June Q13

9 marks Standard +0.3

13 A large supermarket chain advertises that the mean mass of apples of a certain variety on sale in their stores is 0.14 kg . Following a poor growing season, the head of quality control believes that the mean mass of these apples is less than 0.14 kg and she decides to carry out a hypothesis test at the $5 \%$ level of significance. She collects a random sample of this variety of apple from the supermarket chain and records the mass, in kg, of each apple. She uses software to analyse the data. The results are summarised in the output below.

$n$	80
Mean	0.1316
$\sigma$	0.0198
$s$	0.0199
$\Sigma x$	10.525
$\Sigma x ^ { 2 }$	1.4161
Min	0.1
Q1	0.12
Median	0.132
Q3	0.1435
Max	0.19

State the null hypothesis and the alternative hypothesis for the test, defining the parameter used.
Write down the distribution of the sample mean for this hypothesis test.
Determine the critical region for the test.
Carry out the test, giving your conclusion in context.

OCR MEI Paper 2 2021 November Q11

8 marks Standard +0.3

11 In 2010 the heights of adult women in the UK were found to have mean $\mu = 161.6 \mathrm {~cm}$ and variance $\sigma ^ { 2 } = 1.96 \mathrm {~cm} ^ { 2 }$. It is believed that the mean height of adult women in 2020 in the UK is greater than in 2010. In 2020 a researcher collected a random sample of the heights of 200 adult women in the UK. The researcher calculated the sample mean height and carried out a hypothesis test at the $5 \%$ level to investigate whether there was any evidence to suggest that the mean height of adult women in the UK had increased. The researcher assumed that the variance was unaltered.

- State suitable hypotheses for the test, defining any variables you use.
The researcher found that the sample mean was 161.9 cm and made the following statements.

AQA Further AS Paper 2 Statistics 2021 June Q4

7 marks Standard +0.3

4 The distance a particular football player runs in a match is modelled by a normal distribution with standard deviation 0.3 kilometres. A random sample of $n$ matches is taken.
The distance the player runs in this sample of matches has mean 10.8 kilometres.
The sample is used to construct a $93 \%$ confidence interval for the mean, of width 0.0543 kilometres, correct to four decimal places. 4

Find the value of $n$ 4
Find the $93 \%$ confidence interval for the mean, giving the limits to three decimal places.
4
Alison claims that the population mean distance the player runs is 10.7 kilometres. She carries out a hypothesis test at the 7\% level of significance using the random sample and the hypotheses $$\begin{aligned} & \mathrm { H } _ { 0 } : \mu = 10.7 \\ & \mathrm { H } _ { 1 } : \mu \neq 10.7 \end{aligned}$$ 4 (c) (i) State, with a reason, whether the null hypothesis will be accepted or rejected. 4 (c) (ii) Describe, in the context of the hypothesis test in part (c)(i), what is meant by a Type II error. \includegraphics[max width=\textwidth, alt={}, center]{9be40ed6-6df8-426a-8afd-fefc17287de6-06_2488_1730_219_141}

AQA Further AS Paper 2 Statistics Specimen Q8

9 marks Standard +0.3

8 In a small town, the number of properties sold during a week in spring by a local estate agent, Keith, can be regarded as occurring independently and with constant mean $\mu$. Data from several years have shown the value of $\mu$ to be 3.5 . A new housing development was built on the outskirts of the town and the properties on this development were offered for sale by the builder of the development, not by the local estate agents. During the first four weeks in spring, when properties on the new development were offered for sale by the builder, Keith sold a total of 8 properties. Keith claims that the sale of new properties by the builder reduced his mean number of properties sold during a week in spring. 8

Investigate Keith's claim, using the $5 \%$ level of significance.
[0pt] [6 marks]
8
For your test carried out in part (a) state, in context, the meaning of a Type II error.
[0pt] [1 mark]
8
State one advantage and one disadvantage of using a 1\% significance level rather than a 5\% level of significance in a hypothesis test.
[0pt] [2 marks]

Edexcel S1 2016 June Q4

13 marks Standard +0.3

4. The Venn diagram shows the probabilities of customer bookings at Harry's hotel. $R$ is the event that a customer books a room $B$ is the event that a customer books breakfast $D$ is the event that a customer books dinner $u$ and $t$ are probabilities. \includegraphics[max width=\textwidth, alt={}, center]{e3b92a5b-c0ad-4176-9b05-cb07a44aa265-08_604_1047_696_450}

Write down the probability that a customer books breakfast but does not book a room. Given that the events $B$ and $D$ are independent
find the value of $t$
hence find the value of $u$
Find
1. $\quad$ P( $D \mid R \cap B$ )
2. $\mathrm { P } \left( D \mid R \cap B ^ { \prime } \right)$ A coach load of 77 customers arrive at Harry's hotel. Of these 77 customers 40 have booked a room and breakfast 37 have booked a room without breakfast
Estimate how many of these 77 customers will book dinner.

Edexcel S2 2014 January Q2

10 marks Moderate -0.3

2. Bill owns a restaurant. Over the next four weeks Bill decides to carry out a sample survey to obtain the customers' opinions.

Suggest a suitable sampling frame for the sample survey.
Identify the sampling units.
Give one advantage and one disadvantage of taking a census rather than a sample survey. Bill believes that only $30 \%$ of customers would like a greater choice on the menu. He takes a random sample of 50 customers and finds that 20 of them would like a greater choice on the menu.
Test, at the $5 \%$ significance level, whether or not the percentage of customers who would like a greater choice on the menu is more than Bill believes. State your hypotheses clearly.

Edexcel S2 2014 January Q4

7 marks Standard +0.3

The number of telephone calls per hour received by a business is a random variable with distribution $\operatorname { Po } ( \lambda )$.

Charlotte records the number of calls, $C$, received in 4 hours. A test of the null hypothesis $\mathrm { H } _ { 0 } : \lambda = 1.5$ is carried out. $\mathrm { H } _ { 0 }$ is rejected if $C > 10$

Write down the alternative hypothesis.
Find the significance level of the test. Given that $\mathrm { P } ( C > 10 ) < 0.1$
find the largest possible value of $\lambda$ that can be found by using the tables.

Edexcel S2 2015 January Q4

7 marks Standard +0.3

4. Accidents occur randomly at a crossroads at a rate of 0.5 per month. A researcher records the number of accidents, $X$, which occur at the crossroads in a year.

Find $\mathrm { P } ( 5 \leqslant X < 7 )$ A new system is introduced at the crossroads. In the first 18 months, 4 accidents occur at the crossroads.
Test, at the $5 \%$ level of significance, whether or not there is reason to believe that the new system has led to a reduction in the mean number of accidents per month. State your hypotheses clearly.

Edexcel S2 2015 January Q6

13 marks Standard +0.8

6. The Headteacher of a school claims that $30 \%$ of parents do not support a new curriculum. In a survey of 20 randomly selected parents, the number, $X$, who do not support the new curriculum is recorded. Assuming that the Headteacher's claim is correct, find

the probability that $X = 5$
the mean and the standard deviation of $X$ The Director of Studies believes that the proportion of parents who do not support the new curriculum is greater than $30 \%$. Given that in the survey of 20 parents 8 do not support the new curriculum,
test, at the $5 \%$ level of significance, the Director of Studies' belief. State your hypotheses clearly. The teachers believe that the sample in the original survey was biased and claim that only $25 \%$ of the parents are in support of the new curriculum. A second random sample, of size $2 n$, is taken and exactly half of this sample supports the new curriculum. A test is carried out at a 10\% level of significance of the teachers' belief using this sample of size $2 n$ Using the hypotheses $\mathrm { H } _ { 0 } : p = 0.25$ and $\mathrm { H } _ { 1 } : p > 0.25$
find the minimum value of $n$ for which the outcome of the test is that the teachers' belief is rejected.

Edexcel S2 2017 January Q5

14 marks Standard +0.8

In the manufacture of cloth in a factory, defects occur randomly in the production process at a rate of 2 per $5 \mathrm {~m} ^ { 2 }$

The quality control manager randomly selects 12 pieces of cloth each of area $15 \mathrm {~m} ^ { 2 }$.

Find the probability that exactly half of these 12 pieces of cloth will contain at most 7 defects. The factory introduces a new procedure to manufacture the cloth. After the introduction of this new procedure, the manager takes a random sample of $25 \mathrm {~m} ^ { 2 }$ of cloth from the next batch produced to test if there has been any change in the rate of defects.
1. Write down suitable hypotheses for this test.
2. Describe a suitable test statistic that the manager should use.
3. Explain what is meant by the critical region for this test.
Using a 5\% level of significance, find the critical region for this test. You should choose the largest critical region for which the probability in each tail is less than 2.5\%
Find the actual significance level for this test.

Edexcel S2 2022 January Q3

9 marks Standard +0.3

3 A photocopier in a school is known to break down at random at a mean rate of 8 times per week.

Give a reason why a Poisson distribution could be used to model the number of breakdowns. The headteacher of the school replaces the photocopier with a refurbished one and wants to find out if the rate of breakdowns has increased or decreased.
Write down suitable null and alternative hypotheses that the headteacher should use. The refurbished photocopier was monitored for the first week after it was installed.
Using a $5 \%$ level of significance, find the critical region to test whether the rate of breakdowns has now changed.
Find the actual significance level of a test based on the critical region from part (c). During the first week after it was installed there were 4 breakdowns.
Comment on this finding in the light of the critical region found in part (c).

Edexcel S2 2015 June Q4

5 marks Challenging +1.2

A single observation $x$ is to be taken from a Poisson distribution with parameter $\lambda$ This observation is to be used to test, at a $5 \%$ level of significance,

$$\mathrm { H } _ { 0 } : \lambda = k \quad \mathrm { H } _ { 1 } : \lambda \neq k$$ where $k$ is a positive integer.
Given that the critical region for this test is $( X = 0 ) \cup ( X \geqslant 9 )$

find the value of $k$, justifying your answer.
Find the actual significance level of this test.

Source of variation	Sum of squares
Between fertilisers	219.2
Residual	304.5
Total	523.7

Child	\(A\)	\(B\)	\(C\)	\(D\)	\(E\)	\(F\)	\(G\)	\(H\)	\(I\)	\(J\)
\(x\)	5.2	4.1	3.7	5.4	7.6	6.1	3.2	4.0	3.5	8.0
\(y\)	6.2	4.8	5.0	5.6	7.7	7.0	4.0	4.5	3.6	8.5

\(n\)	80
Mean	0.1316
\(\sigma\)	0.0198
\(s\)	0.0199
\(\Sigma x\)	10.525
\(\Sigma x ^ { 2 }\)	1.4161
Min	0.1
Q1	0.12
Median	0.132
Q3	0.1435
Max	0.19

2.05a Hypothesis testing language: null, alternative, p-value, significance

OCR MEI S4 2013 June Q3

OCR MEI S4 2013 June Q4

CAIE FP2 2014 June Q8

CAIE FP2 2014 June Q11 OR

CAIE FP2 2015 June Q8

CAIE FP2 2015 June Q10

CAIE FP2 2017 November Q11 OR

OCR MEI S1 2016 June Q7

Edexcel AS Paper 2 2019 June Q5

Edexcel AS Paper 2 2023 June Q4

OCR MEI AS Paper 2 2022 June Q8

OCR MEI AS Paper 2 2023 June Q13

OCR MEI AS Paper 2 2024 June Q12

OCR MEI Paper 2 2023 June Q13

OCR MEI Paper 2 2021 November Q11

AQA Further AS Paper 2 Statistics 2021 June Q4

AQA Further AS Paper 2 Statistics Specimen Q8

Edexcel S1 2016 June Q4

Edexcel S2 2014 January Q2

Edexcel S2 2014 January Q4

Edexcel S2 2015 January Q4

Edexcel S2 2015 January Q6

Edexcel S2 2017 January Q5

Edexcel S2 2022 January Q3

Edexcel S2 2015 June Q4