Questions Further Statistics (100 questions)

Browse by board
AQA AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further AS Paper 1 Further AS Paper 2 Discrete Further AS Paper 2 Mechanics Further AS Paper 2 Statistics Further Paper 1 Further Paper 2 Further Paper 3 Discrete Further Paper 3 Mechanics Further Paper 3 Statistics M1 M2 M3 Paper 1 Paper 2 Paper 3 S1 S2 S3 CAIE FP1 FP2 Further Paper 1 Further Paper 2 Further Paper 3 Further Paper 4 M1 M2 P1 P2 P3 S1 S2 Edexcel AEA AS Paper 1 AS Paper 2 C1 C12 C2 C3 C34 C4 CP AS CP1 CP2 D1 D2 F1 F2 F3 FD1 FD1 AS FD2 FD2 AS FM1 FM1 AS FM2 FM2 AS FP1 FP1 AS FP2 FP2 AS FP3 FS1 FS1 AS FS2 FS2 AS M1 M2 M3 M4 M5 P1 P2 P3 P4 PMT Mocks Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 OCR AS Pure C1 C2 C3 C4 D1 D2 FD1 AS FM1 AS FP1 FP1 AS FP2 FP3 FS1 AS Further Additional Pure Further Additional Pure AS Further Discrete Further Discrete AS Further Mechanics Further Mechanics AS Further Pure Core 1 Further Pure Core 2 Further Pure Core AS Further Statistics Further Statistics AS H240/01 H240/02 H240/03 M1 M2 M3 M4 Mechanics 1 PURE Pure 1 S1 S2 S3 S4 Stats 1 OCR MEI AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further Extra Pure Further Mechanics A AS Further Mechanics B AS Further Mechanics Major Further Mechanics Minor Further Numerical Methods Further Pure Core Further Pure Core AS Further Pure with Technology Further Statistics A AS Further Statistics B AS Further Statistics Major Further Statistics Minor M1 M2 M3 M4 Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 SPS SPS ASFM SPS ASFM Mechanics SPS ASFM Pure SPS ASFM Statistics SPS FM SPS FM Mechanics SPS FM Pure SPS FM Statistics SPS SM SPS SM Mechanics SPS SM Pure SPS SM Statistics WJEC Further Unit 1 Further Unit 2 Further Unit 3 Further Unit 4 Further Unit 5 Further Unit 6 Unit 1 Unit 2 Unit 3 Unit 4
OCR Further Statistics 2019 June Q1
1 A set of bivariate data ( \(X , Y\) ) is summarised as follows.
\(n = 25 , \sum x = 9.975 , \sum y = 11.175 , \sum x ^ { 2 } = 5.725 , \sum y ^ { 2 } = 46.200 , \sum x y = 11.575\)
  1. Calculate the value of Pearson's product-moment correlation coefficient.
  2. Calculate the equation of the regression line of \(y\) on \(x\). It is desired to know whether the regression line of \(y\) on \(x\) will provide a reliable estimate of \(y\) when \(x = 0.75\).
  3. State one reason for believing that the estimate will be reliable.
  4. State what further information is needed in order to determine whether the estimate is reliable.
OCR Further Statistics 2019 June Q2
2 The average numbers of cars, lorries and buses passing a point on a busy road in a period of 30 minutes are 400, 80 and 17 respectively.
  1. Assuming that the numbers of each type of vehicle passing the point in a period of 30 minutes have independent Poisson distributions, calculate the probability that the total number of vehicles passing the point in a randomly chosen period of 30 minutes is at least 520.
  2. Buses are known to run in approximate accordance with a fixed timetable. Explain why this casts doubt on the use of a Poisson distribution to model the number of buses passing the point in a fixed time interval.
OCR Further Statistics 2019 June Q3
3 Six red counters and four blue counters are arranged in a straight line in a random order.
Find the probability that
  1. no blue counter has fewer than two red counters between it and the nearest other blue counter,
  2. no two blue counters are next to one another.
OCR Further Statistics 2019 June Q4
4 The greatest weight \(W N\) that can be supported by a shelving bracket of traditional design is a normally distributed random variable with mean 500 and standard deviation 80 . A sample of 40 shelving brackets of a new design are tested and it is found that the mean of the greatest weights that the brackets in the sample can support is 473.0 N .
  1. Test at the \(1 \%\) significance level whether the mean of the greatest weight that a bracket of the new design can support is less than the mean of the greatest weight that a bracket of the traditional design can support.
  2. State an assumption needed in carrying out the test in part (a).
  3. Explain whether it is necessary to use the central limit theorem in carrying out the test.
OCR Further Statistics 2019 June Q5
5 Five runners, \(A , B , C , D\) and \(E\), take part in two different races.
Spearman’s rank correlation coefficient for the orders in which the runners finish is calculated and a test for positive agreement is carried out at the \(5 \%\) significance level.
  1. State suitable hypotheses for the test.
  2. Find the largest possible value of \(\sum d ^ { 2 }\) for which the result of the test is to reject the null hypothesis.
  3. In the first race, the order in which the five runners finished was: \(A , B , C , D , E\). In the second race, three of the runners finished in the same positions as in the first race. The result of the test is to reject the null hypothesis. Find a possible order for the runners to finish in the second race.
OCR Further Statistics 2019 June Q6
6 Yusha investigates the proportion of left-handed people living in two cities, \(A\) and \(B\). He obtains data from random samples from the two cities. His results are shown in the table, in which \(L\) denotes "left-handed".
\(L\)\(L ^ { \prime }\)
\(A\)149
\(B\)2651
  1. Test at the 10\% significance level whether there is association between being left-handed and living in a particular city. A person is chosen at random from one of the cities \(A\) and \(B\).
    Let \(A\) denote "the person lives in city \(A\) ".
  2. State the relationship between \(\mathrm { P } ( L )\) and \(P ( L \mid A )\) according to the model implied by the null hypothesis of your test.
  3. Use the data in the table to suggest a value for \(P ( L \mid A )\) given by an improved model.
OCR Further Statistics 2019 June Q7
7 The random variable \(D\) has the distribution \(\operatorname { Geo } ( p )\). It is given that \(\operatorname { Var } ( D ) = \frac { 40 } { 9 }\).
Determine
  1. \(\operatorname { Var } ( 3 D + 5 )\),
  2. \(\mathrm { E } ( 3 \mathrm { D } + 5 )\),
  3. \(\mathrm { P } ( D > \mathrm { E } ( D ) )\).
OCR Further Statistics 2019 June Q8
8 A university course was taught by two different professors. Students could choose whether to attend the lectures given by Professor \(Q\) or the lectures given by Professor \(R\). At the end of the course all the students took the same examination. The examination marks of a random sample of 30 students taught by Professor \(Q\) and a random sample of 24 students taught by Professor \(R\) were ranked. The sum of the ranks of the students taught by Professor \(Q\) was 726 . Test at the 5\% significance level whether there is a difference in the ranks of the students taught by the two professors.
OCR Further Statistics 2019 June Q9
9 The continuous random variable \(T\) has cumulative distribution function
\(F ( t ) = \begin{cases} 0 & t < 0 ,
1 - \mathrm { e } ^ { - 0.25 t } & t \geqslant 0 . \end{cases}\)
  1. Find the cumulative distribution function of \(2 T\).
  2. Show that, for constant \(k , \mathrm { E } \left( \mathrm { e } ^ { k t } \right) = \frac { 1 } { 1 - 4 k }\). You should state with a reason the range of values of \(k\) for which this result is valid.
  3. \(\quad T\) is the time before a certain event occurs. Show that the probability that no event occurs between time \(T = 0\) and time \(T = \theta\) is the same as the probability that the value of a random variable with the distribution \(\operatorname { Po } ( \lambda )\) is 0 , for a certain value of \(\lambda\). You should state this value of \(\lambda\) in terms of \(\theta\). \section*{END OF QUESTION PAPER}
OCR Further Statistics 2022 June Q1
1 A researcher wishes to find people who say that they support a specific plan. Each day the researcher interviews people at random, one after the other, until they find one person who says that they support this plan. The researcher does not then interview any more people that day. The total number of people interviewed on any one day is denoted by \(R\).
  1. Assume that in fact \(1 \%\) of the population would say that they support the plan.
    1. State an appropriate distribution with which to model \(R\), giving the value(s) of any parameter(s).
    2. Find \(\mathrm { P } ( 50 < R \leqslant 150 )\). The researcher incorrectly believes that the variance of a random variable \(X\) with any discrete probability distribution is given by the formula \([ \mathrm { E } ( X ) ] ^ { 2 } - \mathrm { E } ( X )\).
  2. Show that, for the type of distribution stated in part (a), they will obtain the correct value of the variance, regardless of the value(s) of the parameter(s).
OCR Further Statistics 2022 June Q2
2 The directors of a large company believe that there are more computer failures in the Head Office when temperatures are higher. They obtain data for the Head Office for the maximum temperature, \(T ^ { \circ } \mathrm { C }\), and the number of computer failures, \(X\), on each of 12 randomly chosen days.
  1. State which of the following words can be applied to \(T\). Dependent Independent Controlled Response The data is summarised as follows.
    \(n = 12 \quad \sum t = 261 \quad \sum x = 41 \quad \sum t ^ { 2 } = 5869 \quad \sum x ^ { 2 } = 311 \quad \sum \mathrm { tx } = 1021\)
  2. Calculate the value of the product moment correlation coefficient \(r\).
  3. The directors wish to investigate their belief using a significance test at the \(1 \%\) level.
    1. Explain why a 1-tail test is appropriate in this situation.
    2. Carry out the test.
  4. One of the directors prefers the temperatures to be given in Fahrenheit ( \({ } ^ { \circ } \mathrm { F }\) ), rather than Centigrade ( \({ } ^ { \circ } \mathrm { C }\) ). The relationship between F and C is \(\mathrm { F } = \frac { 9 } { 5 } \mathrm { C } + 32\).
    State the value of \(r\) that would result from using temperatures in Fahrenheit in the calculation.
OCR Further Statistics 2022 June Q4
4 The manager of a car breakdown service uses the distribution \(\operatorname { Po } ( 2.7 )\) to model the number of punctures, \(R\), in a 24-hour period in a given rural area. The manager knows that, for this model to be valid, punctures must occur randomly and independently of one another.
  1. State a further assumption needed for the Poisson model to be valid.
  2. State the value of the standard deviation of \(R\).
  3. Use the model to calculate the probability that, in a randomly chosen period of 168 hours, at least 22 punctures occur. The manager uses the distribution \(\operatorname { Po } ( 0.8 )\) to model the number of flat batteries in a 24 -hour period in the same rural area, and he assumes that instances of flat batteries are independent of punctures. A day begins and ends at midnight, and a "bad" day is a day on which there are more than 6 instances, in total, of punctures and flat batteries.
  4. Assume first that both the manager's models are correct. Calculate the probability that a randomly chosen day is a "bad" day.
  5. It is found that 12 of the next 100 days are "bad" days. Comment on whether this casts doubt on the validity of the manager's models.
OCR Further Statistics 2022 June Q5
5 A company uses two drivers for deliveries.
Driver \(A\) charges a fixed rate of \(\pounds 80\) per day plus \(\pounds 2\) per mile travelled on that day. Driver \(B\) charges a fixed rate of \(\pounds 120\) per day plus \(\pounds 1.50\) per mile travelled on that day.
On each working day the total distance, in miles, travelled by each driver is a random variable with the distribution \(\mathrm { N } ( 83,360 )\).
  1. Find the probability that driver \(A\) charges the company less than \(\pounds 235.00\) for a randomly chosen day’s deliveries.
  2. Find the probability that the total charge to the company of three randomly chosen days' deliveries by driver \(A\) is at least \(\pounds 300\) more than the total charge of two randomly chosen days' deliveries by driver \(B\).
OCR Further Statistics 2022 June Q6
6 The random variable \(X\) was assumed to have a normal distribution with mean \(\mu\). Using a random sample of size 128, a significance test was carried out using the following hypotheses.
\(\mathrm { H } _ { 0 } : \mu = 30\)
\(\mathrm { H } _ { 1 } : \mu > 30\)
It was found that \(\sum x = 3929.6\) and \(\sum x ^ { 2 } = 123483.52\). The conclusion of the test was to reject the null hypothesis.
  1. Determine the range of possible values of the significance level of the test.
  2. It was subsequently found that \(X\) was not normally distributed. Explain whether this invalidates the conclusion of the test.
OCR Further Statistics 2022 June Q7
7 The continuous random variable \(X\) has probability density function
\(f ( x ) = \begin{cases} k x ^ { n } & 0 \leqslant x \leqslant 1 ,
0 & \text { otherwise, } \end{cases}\)
where \(k\) is a constant and \(n\) is a parameter whose value is positive. It is given that the median of \(X\) is 0.8816 correct to 4 decimal places. Ten independent observations of \(X\) are obtained. Find the expected number of observations that are less than 0.8 .
OCR Further Statistics 2022 June Q8
8 The critical region for an \(r\) \% two-tailed Wilcoxon signed-rank test, based on a large sample of size \(n\), is \(\left\{ W _ { + } \leqslant 113 \right\} \cup \left\{ W _ { + } \geqslant 415 \right\}\).
  1. Show that \(n = 32\).
  2. Using a suitable approximation, determine the value of \(r\).
OCR Further Statistics 2022 June Q9
9 The head teacher of a school believes that, on average, pupil absences on the days Monday, Tuesday, Wednesday, Thursday and Friday are in the ratio \(3 : 2 : 2 : 2 : 3\). The head teacher takes a random sample of 120 pupil absences. The results are as follows.
Day of weekMondayTuesdayWednesdayThursdayFriday
Number of absences2816241636
  1. Test at the \(5 \%\) significance level whether these results are consistent with the head teacher's belief. A significance test at the \(5 \%\) level is also carried out on a second, independent, random sample of \(n\) pupil absences. All the numbers of absences are integers. The ratio of the numbers of absences for each day in this sample is identical to the ratio of the numbers of absences for each day in the original sample of size 120.
  2. Determine the smallest value of \(n\) for which the conclusion of this significance test is that the data are not consistent with the head teacher's belief.
OCR Further Statistics 2023 June Q1
1 A certain section of a library contains several thousand books. A lecturer is looking for a book that refers to a particular topic. The lecturer believes that one-twentieth of the books in that section of the library contain a reference to that topic. However, the lecturer does not know which books they might be, so the lecturer looks in each book in turn for a reference to the topic. The first book the lecturer finds that refers to the topic is the \(X\) th book in which the lecturer looks.
  1. A student says, "There is a maximum value of \(X\) as there is only a finite number of books. So a geometric distribution cannot be a good model for \(X\)." Explain whether you agree with the student.
    1. State one modelling assumption (not involving the total number of books) needed for \(X\) to be modelled by a geometric distribution in this context.
    2. Suggest a reason why this assumption may not be valid in this context. Assume now that \(X\) can be well modelled by the distribution \(\operatorname { Geo } ( 0.05 )\).
  2. The probability that the lecturer needs to look in no more than \(n\) books is greater than 0.9 . Find the smallest possible value of \(n\).
  3. The lecturer needs to find four different books that refer to the topic. Find the probability that the lecturer wants to look in exactly 40 books.
OCR Further Statistics 2023 June Q2
2 The director of a concert hall wishes to investigate if the price of the most expensive concert tickets affects attendance. The director collects data about the price, \(\pounds P\), of the most expensive tickets and the number of people in the audience, \(H\) hundred (rounded to the nearest hundred), for 20 concerts. For each price there are several different concerts. The results are shown in the table.
\(P\) (£)7565554535
\multirow[t]{5}{*}{\(H\) (hundred)}2727272615
2727202112
2218169
191813
12169
\(\mathrm { n } = 20 \quad \sum \mathrm { p } = 1080 \quad \sum \mathrm {~h} = 381 \quad \sum \mathrm { p } ^ { 2 } = 61300 \quad \sum \mathrm {~h} ^ { 2 } = 8011 \quad \sum \mathrm { ph } = 21535\)
  1. Calculate the equation of the regression line of \(h\) on \(p\).
  2. State what change, if any, there would be to your answer to part (a) if \(H\) had been measured in thousands (to 1 decimal place) rather than in hundreds. For a special charity concert, the most expensive tickets cost \(\pounds 50\).
  3. Use your answer to part (b) to estimate the expected size of the audience for this concert. Give your answer correct to \(\mathbf { 1 }\) decimal place.
  4. Comment on the reliability of your answer to part (c). You should refer to
    • the value of the product-moment correlation coefficient for the data, which is 0.642
    • the value of \(\pounds 50\)
    • any one other relevant factor that should be taken into account.
OCR Further Statistics 2023 June Q3
3 The discrete random variable \(W\) has the distribution \(\mathrm { U } ( 11 )\). The independent discrete random variable \(V\) has the distribution \(\mathrm { U } ( 5 )\).
  1. It is given that, for constants \(m\) and \(n\), with \(m > 0\), \(\mathrm { E } ( \mathrm { mW } + \mathrm { nV } ) = 0\) and \(\operatorname { Var } ( \mathrm { mW } + \mathrm { nV } ) = 1\). Determine the exact values of \(m\) and \(n\). The random variable \(T\) is the mean of three independent observations of \(W\).
  2. Explain whether the Central Limit Theorem can be used to say that the distribution of \(T\) is approximately normal.
OCR Further Statistics 2023 June Q4
4 Two magazines give numerical ratings to hi-fi systems. Li wishes to test whether there is agreement between the opinions of the magazines. Li chooses a random sample of 5 hi -fi systems and looks up the ratings given by the two magazines. The results are shown in the table.
SystemABCDE
Magazine I6875778392
Magazine II3025403545
  1. Give a reason why Li might choose to use a test based on Spearman's rank correlation coefficient rather than on Pearson’s product-moment correlation coefficient.
  2. Calculate the value of Spearman's rank correlation coefficient for the data.
  3. Use your answer to part (b) to carry out a hypothesis test at the \(5 \%\) significance level.
  4. The value of Spearman's rank correlation coefficient between the ratings given by magazine I and by a third magazine, magazine III, has the same numerical value as the answer to part (b) but with the sign changed. In the Printed Answer Booklet, complete the table showing the rankings given by magazine III.
OCR Further Statistics 2023 June Q5
5 An historian has reason to believe that the average age at which men got married in the seventeenth century was higher in urban areas compared to rural areas. The historian collected data from a random sample of 8 men in an urban area and a random sample of 6 men in a rural area, all of whom were married in the seventeenth century. The results were as follows, given in the form years/months.
Urban:\(18 / 3\)\(18 / 5\)\(19 / 9\)\(20 / 7\)\(25 / 6\)\(34 / 6\)\(41 / 8\)\(46 / 3\)
Rural:\(18 / 0\)\(18 / 1\)\(18 / 4\)\(19 / 11\)\(22 / 2\)\(28 / 11\)
  1. Use an appropriate non-parametric method to test at the \(5 \%\) significance level whether the average age at marriage of men is higher in urban areas than in rural areas.
  2. When checking the data, the historian found that the age of one of the men, Mr X, which had been recorded as 28/11, had been wrongly recorded. When corrected, the result of the test in part (a) was unchanged. Determine the youngest age that Mr X could have been, given that it was not the same, in years and months, as that of any of the other men in the sample.
OCR Further Statistics 2023 June Q6
6 The continuous random variable \(X\) has a uniform distribution on the interval \([ - \pi , \pi ]\).
The random variable \(Y\) is defined by \(Y = \sin X\).
Determine the cumulative distribution function of \(Y\).
OCR Further Statistics 2023 June Q7
7 A club secretary collects data about the time, \(T\) minutes, needed to process the details of a new member. The mean of \(T\) is denoted by \(\mu\). The variance of \(T\) is denoted by \(\sigma ^ { 2 }\). The results of a random sample of 40 observations of \(T\) are summarised as follows.
\(\mathrm { n } = 40 \quad \Sigma \mathrm { t } = 396.0 \quad \Sigma \mathrm { t } ^ { 2 } = 4271.40\)
  1. Determine a 99\% confidence interval for \(\mu\).
  2. The secretary discovers that over a long period the value of \(\sigma ^ { 2 }\) is in fact 10.0 . The secretary collects an independent random sample of 50 observations of \(T\) and constructs a new 99\% confidence interval for \(\mu\) based on this sample of size 50 , but using \(\sigma ^ { 2 } = 10.0\). Find the probability that this new confidence interval contains the value \(\mu + 1.6\).
OCR Further Statistics 2023 June Q8
8 A team of researchers have reason to believe that the number of calls received in randomly chosen 10-minute intervals to a call centre can be well modelled by a Poisson distribution. To test this belief the researchers record the number of telephone calls received in 60 randomly chosen 10-minute intervals. The results, together with relevant calculations, are shown in the following table.
Total
Number of calls, \(r\)01234\(\geqslant 5\)
Observed frequency, \(f\)18131298060
rf013242732096
\(\mathrm { r } ^ { 2 } \mathrm { f }\)01348811280270
Expected frequency12.11419.38215.5068.2703.3081.42160
Contribution to test statistic2.8602.1010.7931.2326.99
  1. Calculate the mean of the observed number of calls received.
  2. Calculate the variance of the observed number of calls received.
  3. Comment on what your answers to parts (a) and (b) suggest about the proposed model.
  4. Explain why it is necessary to combine some cells in the table.
  5. Show how the values 15.506 and 0.793 in the table were obtained.
  6. Carry out the test, at the \(5 \%\) significance level. In the light of the result of the test, the team consider that a different model is appropriate. They propose the following improved model: $$P ( R = r ) = \begin{cases} \frac { 1 } { 60 } ( a + ( 2 - r ) b ) & r = 0,1,2,3,4
    0 & \text { otherwise } \end{cases}$$ where \(a\) and \(b\) are integers.
  7. Use at least three of the observed frequencies to suggest appropriate values for \(a\) and \(b\). You should consider more than one possible pair of values, and explain which pair of values you consider best. (Do not carry out a goodness-of-fit test.)