Questions — OCR Further Statistics (100 questions)

Browse by board
AQA AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further AS Paper 1 Further AS Paper 2 Discrete Further AS Paper 2 Mechanics Further AS Paper 2 Statistics Further Paper 1 Further Paper 2 Further Paper 3 Discrete Further Paper 3 Mechanics Further Paper 3 Statistics M1 M2 M3 Paper 1 Paper 2 Paper 3 S1 S2 S3 CAIE FP1 FP2 Further Paper 1 Further Paper 2 Further Paper 3 Further Paper 4 M1 M2 P1 P2 P3 S1 S2 Edexcel AEA AS Paper 1 AS Paper 2 C1 C12 C2 C3 C34 C4 CP AS CP1 CP2 D1 D2 F1 F2 F3 FD1 FD1 AS FD2 FD2 AS FM1 FM1 AS FM2 FM2 AS FP1 FP1 AS FP2 FP2 AS FP3 FS1 FS1 AS FS2 FS2 AS M1 M2 M3 M4 M5 P1 P2 P3 P4 PMT Mocks Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 OCR AS Pure C1 C2 C3 C4 D1 D2 FD1 AS FM1 AS FP1 FP1 AS FP2 FP3 FS1 AS Further Additional Pure Further Additional Pure AS Further Discrete Further Discrete AS Further Mechanics Further Mechanics AS Further Pure Core 1 Further Pure Core 2 Further Pure Core AS Further Statistics Further Statistics AS H240/01 H240/02 H240/03 M1 M2 M3 M4 Mechanics 1 PURE Pure 1 S1 S2 S3 S4 Stats 1 OCR MEI AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further Extra Pure Further Mechanics A AS Further Mechanics B AS Further Mechanics Major Further Mechanics Minor Further Numerical Methods Further Pure Core Further Pure Core AS Further Pure with Technology Further Statistics A AS Further Statistics B AS Further Statistics Major Further Statistics Minor M1 M2 M3 M4 Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 SPS SPS ASFM SPS ASFM Mechanics SPS ASFM Pure SPS ASFM Statistics SPS FM SPS FM Mechanics SPS FM Pure SPS FM Statistics SPS SM SPS SM Mechanics SPS SM Pure SPS SM Statistics WJEC Further Unit 1 Further Unit 2 Further Unit 3 Further Unit 4 Further Unit 5 Further Unit 6 Unit 1 Unit 2 Unit 3 Unit 4
OCR Further Statistics 2024 June Q1
1 A discrete random variable \(X\) has the following distribution, where \(a , b\) and \(c\) are constants.
\(x\)0123
\(\mathrm { P } ( \mathrm { X } = \mathrm { x } )\)\(a\)\(b\)\(c\)0.1
It is given that \(\mathrm { E } ( X ) = 1.25\) and \(\operatorname { Var } ( X ) = 0.8875\).
  1. Determine the values of \(a\), \(b\) and \(c\).
  2. The random variable \(Y\) is defined by \(Y = 7 - 2 X\). Write down the value of \(\operatorname { Var } ( Y )\).
  3. Twenty independent observations of \(X\) are obtained. The number of those observations for which \(X = 3\) is denoted by \(T\). Find the value of \(\operatorname { Var } ( T )\).
OCR Further Statistics 2024 June Q2
2 A newspaper article claimed that "taller dog owners have taller dogs as pets". Alex investigated this claim and obtained data from a random sample of 16 fellow students who owned exactly one dog. The results are summarised as follows, where the height of the student, in cm, is denoted by \(h\) and the height, in cm, of their dog is denoted by \(d\).
\(\mathrm { n } = 16 \quad \sum \mathrm {~h} = 2880 \quad \sum \mathrm {~d} = 660 \quad \sum \mathrm {~h} ^ { 2 } = 519276 \quad \sum \mathrm {~d} ^ { 2 } = 30000 \quad \sum \mathrm { hd } = 119425\)
  1. Calculate the value of Pearson's product moment correlation coefficient for the data.
  2. State what your answer tells you about a scatter diagram illustrating the data.
  3. Use the data to test, at the \(5 \%\) significance level, the claim of the newspaper article.
  4. Explain whether the answer to part (a) would be likely to be different if the dogs' weights had been used instead of their heights.
OCR Further Statistics 2024 June Q3
3 Research suggests that the mean reading age of a child about to start secondary school is 10.75 . The reading ages, \(X\) years, of a random sample of 80 children who were about to start secondary school in a particular district were measured, and the results are summarised as follows. $$\mathrm { n } = 80 \quad \sum \mathrm { x } = 893 \quad \sum \mathrm { x } ^ { 2 } = 10267$$
  1. Test at the \(5 \%\) significance level whether the mean reading age of children about to start secondary school in this district is not 10.75 .
  2. A student wrote: "Although we do not know that the distribution of \(X\) is normal, the central limit theorem allows us to assume that it is, as the sample size is large." This statement is incorrect. Give a corrected version of the student's statement.
OCR Further Statistics 2024 June Q4
4
  1. Write down the number of ways of choosing 5 objects from 12 distinct objects.
  2. Each possible set of 5 different integers selected from the integers \(1,2 , \ldots , 12\) is obtained, and for each set, the sum of the 5 integers is found. The sum \(S\) can take values between 15 and 50 inclusive. Part of the frequency distribution of \(S\) is shown in the following table, together with the cumulative frequencies.
    S151617181920212223
    Frequency112357101317
    Cumulative Frequency12471219294259
    Use these numbers to determine the critical region for a 1-tail Wilcoxon rank-sum test at the \(2 \%\) significance level when \(m = 5\) and \(n = 7\).
  3. A student says that, for a Wilcoxon rank-sum test on samples of size \(m\) and \(n\), where \(m\) and \(n\) are large, the mean and variance of the test statistic \(R _ { m }\) are 200 and \(616 \frac { 2 } { 3 }\) respectively. Show that at least one of these values must be incorrect.
OCR Further Statistics 2024 June Q5
5 Some bird-watchers study the song of chaffinches in a particular wood. They investigate whether the number, \(N\), of separate bursts of song in a 5 minute period can be modelled by a Poisson distribution. They assume that a burst of song can be considered as a single event, and that bursts of song occur randomly. \section*{(a) State two further assumptions needed for \(N\) to be well modelled by a Poisson distribution.} The bird-watchers record the value of \(N\) in each of 60 periods of 5 minutes. The mean and variance of the results are 3.55 and 5.6475 respectively.
(b) Explain what this suggests about the validity of a Poisson distribution as a model in this context. The complete results are shown in the table.
\(n\)012345678\(\geqslant 9\)
Frequency103781366250
The bird-watchers carry out a \(\chi ^ { 2 }\) goodness of fit test at the \(5 \%\) significance level.
(c) State suitable hypotheses for the test.
(d) Determine the contribution to the test statistic for \(n = 3\).
(e) The total value of the test statistic, obtained by combining the cells for \(n \leqslant 1\) and also for \(n \geqslant 6\), is 9.202 , correct to 4 significant figures. Complete the goodness of fit test.
(f) It is known that chaffinches are more likely to sing in the presence of other chaffinches. Explain whether this fact affects the validity of a Poisson model for \(N\).
OCR Further Statistics 2024 June Q6
6 A bag contains 6 identical blue counters and 5 identical yellow counters.
  1. Three counters are selected at random, without replacement. Find the probability that at least two of the counters are blue. All 11 counters are now arranged in a row in a random order.
  2. Find the probability that all the yellow counters are next to each other.
  3. Find the probability that no yellow counter is next to another yellow counter.
  4. Find the probability that the counters are arranged in such a way that both of the following conditions hold.
    • Exactly three of the yellow counters are next to one another.
    • Neither of the other two yellow counters is next to a yellow counter.
    • Explain whether the answer to part (d) would be different if the yellow counters were numbered \(1,2,3,4\) and 5 , so that they are not identical.
OCR Further Statistics 2024 June Q7
7 The coordinates of a set of 10 points are denoted by ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) for \(i = 1,2 , \ldots , 10\). For a particular set of values of ( \(\mathrm { x } _ { \mathrm { i } } , \mathrm { y } _ { \mathrm { i } }\) ) and any constants \(a\) and \(b\) it can be shown that
\(\Sigma \left( y _ { i } - a - b x _ { i } \right) ^ { 2 } = 10 ( 11 - a - 6 b ) ^ { 2 } + 126 \left( b - \frac { 83 } { 42 } \right) ^ { 2 } + \frac { 139 } { 14 }\).
    1. Explain why \(\sum \left( \mathrm { y } _ { \mathrm { i } } - \mathrm { a } - \mathrm { bx } _ { \mathrm { i } } \right) ^ { 2 }\) is minimised by taking \(b = \frac { 83 } { 42 }\) and \(\mathrm { a } = 11 - 6 \mathrm {~b}\).
    2. Hence explain why the equation of the regression line of \(y\) on \(x\) for these points is given by the corresponding values of \(a\) and \(b\) (so that the equation is \(\mathrm { y } = \frac { 83 } { 42 } \mathrm { x } - \frac { 6 } { 7 }\) ).
  1. State which of the following terms cannot apply to the variable \(X\) if the regression line of \(y\) on \(x\) can be used for estimating values of \(Y\). Dependent Independent Controlled Response
  2. Use the regression line to estimate the value of \(y\) corresponding to \(x = 8\).
  3. State what must be true of the value \(x = 8\) if the estimate in part (c) is to be reliable.
  4. Variables \(u\) and \(v\) are related to \(x\) and \(y\) by the following relationships.
    \(u = 2 + 4 x \quad v = 8 - 2 y\) Show that the gradient of the regression line of \(v\) on \(u\) is very close to - 1 .
OCR Further Statistics 2024 June Q8
8 A random sample of 100 students were given a task and the time taken by each student to complete the task was recorded. The maximum time allowed to complete the task was one minute and all students completed the task within the maximum time. The times, \(T\) minutes, for the random sample of students are summarised as follows.
\(n = 100 \quad \sum t = 61.88\) A researcher proposes that \(T\) can be modelled by the continuous random variable with probability density function
\(f ( t ) = \begin{cases} \alpha t ^ { \alpha - 1 } & 0 \leqslant t \leqslant 1 ,
0 & \text { otherwise, } \end{cases}\)
where \(\alpha\) is a positive constant. \section*{(a) In this question you must show detailed reasoning.} By finding \(\mathbf { E } ( T )\) according to the researcher's model, determine an approximation for the value of \(\alpha\). Give your answer correct to \(\mathbf { 3 }\) significant figures. Further information about the times taken for the sample of 100 students to complete the task is given in the table.
Time \(t\)\(0 \leqslant t < \frac { 1 } { 3 }\)\(\frac { 1 } { 3 } \leqslant t < \frac { 2 } { 3 }\)\(\frac { 2 } { 3 } \leqslant t \leqslant 1\)
Frequency183745
(b) Using the value of \(\alpha\) found in part (a), determine the extent to which the proposed model is a good model. (Do not carry out a goodness of fit test.)
OCR Further Statistics 2020 November Q1
1 The continuous random variable \(X\) has the distribution \(\mathrm { N } ( \mu , 30 )\). The mean of a random sample of 8 observations of \(X\) is 53.1. Determine a \(95 \%\) confidence interval for \(\mu\). You should give the end points of the interval correct to 4 significant figures.
OCR Further Statistics 2020 November Q2
2 A book collector compared the prices of some books, \(\pounds x\), when new in 1972 and the prices of copies of the same books, \(\pounds y\), on a second-hand website in 2018.
The results are shown in Table 1 and are summarised below the table. \begin{table}[h]
BookABCDEFGHIJKL
\(x\)0.950.650.700.900.551.401.500.501.150.350.200.35
\(y\)6.067.002.005.874.005.367.192.503.008.291.372.00
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} $$n = 12 , \sum x = 9.20 , \sum y = 54.64 , \sum x ^ { 2 } = 8.9950 , \sum y ^ { 2 } = 310.4572 , \sum x y = 46.0545$$
  1. It is given that the value of Pearson’s product-moment correlation coefficient for the data is 0.381, correct to 3 significant figures.
    1. State what this information tells you about a scatter diagram illustrating the data.
    2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018.
  2. The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books.
OCR Further Statistics 2020 November Q3
3 Jo can use either of two different routes, A or B, for her journey to school. She believes that route A has shorter journey times. She measures how long her journey takes for 17 journeys by route A and 12 journeys by route B . She ranks the 29 journeys in increasing order of time taken, and she finds that the sum of the ranks of the journeys by route B is 219 .
  1. Test at the \(10 \%\) significance level whether route A has shorter journey times than route B .
  2. State an assumption about the 29 journeys which is necessary for the conclusion of the test to be valid.
OCR Further Statistics 2020 November Q4
4 The random variable \(X\) is equally likely to take any of the \(n\) integer values from \(m + 1\) to \(m + n\) inclusive. It is given that \(\mathrm { E } ( 3 X ) = 30\) and \(\operatorname { Var } ( 3 X ) = 36\). Determine the value of \(m\) and the value of \(n\). 526 cards are each labelled with a different letter of the alphabet, A to Z. The letters A, E, I, O and U are vowels.
  1. Five cards are selected at random without replacement. Determine the probability that the letters on at least three of the cards are vowels.
  2. All 26 cards are arranged in a line, in random order.
    1. Show that the probability that all the vowels are next to one another is \(\frac { 1 } { 2990 }\).
    2. Determine the probability that three of the vowels are next to each other, and the other two vowels are next to each other, but the five vowels are not all next to each other.
OCR Further Statistics 2020 November Q6
6 The numbers of CD players sold in a shop on three consecutive weekends were 7,6 and 2 . It may be assumed that sales of CD players occur randomly and that nobody buys more than one CD player at a time. The number of CD players sold on a randomly chosen weekend is denoted by \(X\).
  1. How appropriate is the Poisson distribution as a model for \(X\) ? Now assume that a Poisson distribution with mean 5 is an appropriate model for \(X\).
  2. Find
    1. \(\mathrm { P } ( X = 6 )\),
    2. \(\mathrm { P } ( x \geqslant 8 )\). The number of integrated sound systems sold in a weekend at the same shop can be assumed to have the distribution \(\operatorname { Po } ( 7.2 )\).
  3. Find the probability that on a randomly chosen weekend the total number of CD players and integrated sound systems sold is between 10 and 15 inclusive.
  4. State an assumption needed for your answer to part (c) to be valid.
  5. Give a reason why the assumption in part (d) may not be valid in practice.
OCR Further Statistics 2020 November Q7
7 A biased spinner has five sides, numbered 1 to 5 . Elmer spins the spinner repeatedly and counts the number of spins, \(X\), up to and including the first time that the number 2 appears. He carries out this experiment 100 times and records the frequency \(f\) with which each value of \(X\) is obtained. His results are shown in Table 1, together with the values of \(x f\). \begin{table}[h]
\(x\)123456\(\geqslant 7\)Total
Frequency \(f\)2015913101023100
\(x f\)203027525060161400
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table}
  1. State an appropriate distribution with which to model \(X\), determining the value(s) of any parameter(s). Elmer carries out a goodness-of-fit test, at the \(5 \%\) level, for the distribution in part (a). Table 2 gives some of his calculations, in which numbers that are not exact have been rounded to 3 decimal places. \begin{table}[h]
    \(x\)123456\(\geqslant 7\)
    Observed frequency \(O\)2015913101023
    Expected frequency \(E\)2518.7514.06310.5477.9105.93317.798
    ( \(\mathrm { O } - \mathrm { E } ) ^ { 2 } / \mathrm { E }\)10.751.8230.5710.5522.7891.520
    \captionsetup{labelformat=empty} \caption{Table 2}
    \end{table}
  2. Show how the expected frequency corresponding to \(x \geqslant 7\) was obtained.
  3. Carry out the test.
OCR Further Statistics 2020 November Q8
8 The continuous random variable \(X\) has probability density function $$f ( x ) = \begin{cases} \frac { k } { x ^ { n } } & x \geqslant 1
0 & \text { otherwise } \end{cases}$$ where \(n\) and \(k\) are constants and \(n\) is an integer greater than 1 .
  1. Find \(k\) in terms of \(n\).
    1. When \(n = 4\), find the cumulative distribution function of \(X\).
    2. Hence determine \(\mathrm { P } ( X > 7 \mid X > 5 )\) when \(n = 4\).
  2. Determine the values of \(n\) for which \(\operatorname { Var } ( X )\) is not defined.
OCR Further Statistics 2021 November Q1
1 At a seaside resort the number \(X\) of ice-creams sold and the temperature \(Y ^ { \circ } \mathrm { F }\) were recorded on 20 randomly chosen summer days. The data can be summarised as follows.
\(\sum x = 1506 \quad \sum x ^ { 2 } = 127542 \quad \sum y = 1431 \quad \sum y ^ { 2 } = 104451 \quad \sum x y = 111297\)
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\).
  2. Explain the significance for the regression line of the quantity \(\sum \left[ y _ { i } - \left( a x _ { i } + b \right) \right] ^ { 2 }\).
  3. It is decided to measure the temperature in degrees Centigrade instead of degrees Fahrenheit. If the same temperature is measured both as \(f ^ { \circ }\) Fahrenheit and \(c ^ { \circ }\) Centigrade, the relationship between \(f\) and \(c\) is \(\mathrm { c } = \frac { 5 } { 9 } ( \mathrm { f } - 32 )\). Find the equation of the new regression line.
OCR Further Statistics 2021 November Q2
2 A discrete random variable \(D\) has the following probability distribution, where \(a\) is a constant.
\(d\)0246
\(\mathrm { P } ( D = d )\)\(a\)0.10.30.2
Determine the value of \(\operatorname { Var } ( 3 D + 4 )\).
OCR Further Statistics 2021 November Q3
3 In a large collection of coloured marbles of identical size, the proportion of green marbles is \(p\). One marble is chosen randomly, its colour is noted, and it is then replaced. This process is repeated until a green marble is chosen. The first green marble chosen is the \(X\) th marble chosen.
  1. You are given that \(p = 0.3\).
    1. Find \(\mathrm { P } ( 5 \leqslant X \leqslant 10 )\).
    2. Determine the smallest value of \(n\) for which \(\mathrm { P } ( X = n ) < 0.1\).
  2. You are given instead that \(\operatorname { Var } ( X ) = 42\). Determine the value of \(\mathrm { E } ( X )\).
OCR Further Statistics 2021 November Q4
4 A random sample of 160 observations of a random variable \(X\) is selected. The sample can be summarised as follows.
\(n = 160 \quad \sum x = 2688 \quad \sum x ^ { 2 } = 48398\)
  1. Calculate unbiased estimates of the following.
    1. \(\mathrm { E } ( X )\)
    2. \(\operatorname { Var } ( X )\)
  2. Find a 99\% confidence interval for \(\mathrm { E } ( X )\), giving the end-points of the interval correct to 4 significant figures.
  3. Explain whether it was necessary to use the Central Limit Theorem in answering
    1. part (a),
    2. part (b).
OCR Further Statistics 2021 November Q5
5 The numbers of each of 9 items sold in two different supermarkets in a week are given in the following table.
Item123456789
Supermarket \(A\)1728414362697593115
Supermarket \(B\)24718124729584237
A researcher wants to test whether there is association between the numbers of these items sold in the two supermarkets. However, it is known that the collection of data in Supermarket \(B\) was done inaccurately and each of the numbers in the corresponding row of the table could have been in error by as much as 2 items greater or 2 items fewer.
  1. Explain why Spearman's rank correlation coefficient might be preferred to the use of Pearson's product-moment correlation coefficient in this context.
  2. Carry out the test at the \(5 \%\) significance level using Spearman's rank correlation coefficient.
OCR Further Statistics 2021 November Q6
3 marks
6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, \(X\) minutes, taken by candidates to complete the paper. The organiser starts by carrying out a goodness-of-fit test for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) at the \(5 \%\) significance level. The grouped data and the results of some of the calculations are shown in the following table.
Time\(0 \leqslant X < 80\)\(80 \leqslant X < 90\)\(90 \leqslant X < 100\)\(100 \leqslant X < 110\)\(X \geqslant 110\)
Observed frequency \(O\)3695137129103
Expected frequency \(E\)45.60680.641123.754123.754126.246
\(\frac { ( O - E ) ^ { 2 } } { E }\)2.0232.5571.4180.2224.280
  1. State suitable hypotheses for the test.
  2. Show how the figures 123.754 and 0.222 in the column for \(100 \leqslant X < 110\) were obtained. [3]
  3. Carry out the test. The organiser now wants to suggest an improved model for the data.
    1. Suggest an aspect of the data that the organiser should take into account in considering an improved model.
    2. The graph of the probability density function for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) is shown in the diagram in the Printed Answer Booklet. On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
OCR Further Statistics 2021 November Q7
7 In a school opinion poll a random sample of 8 pupils were asked to rate school lunches on a scale of 0 to 20 . The results were as follows.
\(\begin{array} { l l l l l l l l } 0 & 1 & 2 & 3 & 4 & 10 & 11 & 13 \end{array}\) After a new menu was introduced, the test was repeated with a different random sample of 8 pupils. The results were as follows.
\(\begin{array} { l l l l l l l l } 7 & 8 & 9 & 14 & 15 & 17 & 19 & 20 \end{array}\)
  1. Carry out an appropriate Wilcoxon test at the \(5 \%\) significance level to test whether pupils' opinions of school lunches have changed. A statistics student tells the organisers of the opinion poll that it would have been better to have asked the same 8 pupils both times.
  2. Explain why the statistics student's suggestion would produce a better test.
  3. State which test should be used if the student's suggestion is followed.
  4. You are given that there are 12870 ways in which 8 different integers can be chosen from the integers 1 to 16 inclusive. Estimate the number of ways of selecting 8 different digits between 1 and 16 inclusive that have a sum less than or equal to the critical value used in the test in part (a).
OCR Further Statistics 2021 November Q8
8 The continuous random variable \(Y\) has a uniform distribution on [0,2].
  1. It is given that \(\mathrm { E } [ a \cos ( a Y ) ] = 0.3\), where \(a\) is a constant between 0 and 1 , and \(a Y\) is measured in radians. Determine the value of the constant \(a\).
  2. Determine the \(60 ^ { \text {th } }\) percentile of \(Y ^ { 2 }\).
OCR Further Statistics Specimen Q1
1 The table below shows the typical stopping distances \(d\) metres for a particular car travelling at \(v\) miles per hour.
\(v\)203040506070
\(d\)132436527294
  1. State each of the following words that describe the variable \(v\). \section*{Independent Dependent Controlled Response}
  2. Calculate the equation of the regression line of \(d\) on \(v\).
  3. Use the equation found in part (ii) to estimate the typical stopping distance when this car is travelling at 45 miles per hour. It is given that the product moment correlation coefficient for the data is 0.990 correct to three significant figures.
  4. Explain whether your estimate found in part (iii) is reliable.
OCR Further Statistics Specimen Q2
2 The mass \(J \mathrm {~kg}\) of a bag of randomly chosen Jersey potatoes is a normally distributed random variable with mean 1.00 and standard deviation 0.06. The mass Kkg of a bag of randomly chosen King Edward potatoes is an independent normally distributed random variable with mean 0.80 and standard deviation 0.04 .
  1. Find the probability that the total mass of 6 bags of Jersey potatoes and 8 bags of King Edward potatoes is greater than 12.70 kg .
  2. Find the probability that the mass of one bag of King Edward potatoes is more than \(75 \%\) of the mass of one bag of Jersey potatoes.