Questions — OCR MEI Further Statistics Major (78 questions)

Browse by board
AQA AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further AS Paper 1 Further AS Paper 2 Discrete Further AS Paper 2 Mechanics Further AS Paper 2 Statistics Further Paper 1 Further Paper 2 Further Paper 3 Discrete Further Paper 3 Mechanics Further Paper 3 Statistics M1 M2 M3 Paper 1 Paper 2 Paper 3 S1 S2 S3 CAIE FP1 FP2 Further Paper 1 Further Paper 2 Further Paper 3 Further Paper 4 M1 M2 P1 P2 P3 S1 S2 Edexcel AEA AS Paper 1 AS Paper 2 C1 C12 C2 C3 C34 C4 CP AS CP1 CP2 D1 D2 F1 F2 F3 FD1 FD1 AS FD2 FD2 AS FM1 FM1 AS FM2 FM2 AS FP1 FP1 AS FP2 FP2 AS FP3 FS1 FS1 AS FS2 FS2 AS M1 M2 M3 M4 M5 P1 P2 P3 P4 PMT Mocks Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 OCR AS Pure C1 C2 C3 C4 D1 D2 FD1 AS FM1 AS FP1 FP1 AS FP2 FP3 FS1 AS Further Additional Pure Further Additional Pure AS Further Discrete Further Discrete AS Further Mechanics Further Mechanics AS Further Pure Core 1 Further Pure Core 2 Further Pure Core AS Further Statistics Further Statistics AS H240/01 H240/02 H240/03 M1 M2 M3 M4 Mechanics 1 PURE Pure 1 S1 S2 S3 S4 Stats 1 OCR MEI AS Paper 1 AS Paper 2 C1 C2 C3 C4 D1 D2 FP1 FP2 FP3 Further Extra Pure Further Mechanics A AS Further Mechanics B AS Further Mechanics Major Further Mechanics Minor Further Numerical Methods Further Pure Core Further Pure Core AS Further Pure with Technology Further Statistics A AS Further Statistics B AS Further Statistics Major Further Statistics Minor M1 M2 M3 M4 Paper 1 Paper 2 Paper 3 S1 S2 S3 S4 SPS SPS ASFM SPS ASFM Mechanics SPS ASFM Pure SPS ASFM Statistics SPS FM SPS FM Mechanics SPS FM Pure SPS FM Statistics SPS SM SPS SM Mechanics SPS SM Pure SPS SM Statistics WJEC Further Unit 1 Further Unit 2 Further Unit 3 Further Unit 4 Further Unit 5 Further Unit 6 Unit 1 Unit 2 Unit 3 Unit 4
OCR MEI Further Statistics Major 2023 June Q5
5 Amari is investigating how accurately people can estimate a short time period. He asks each of a random sample of 40 people to estimate a period of 20 seconds. For each person, he starts a stopwatch and then stops it when they tell him that they think that 20 s has elapsed. The times which he records are denoted by \(x \mathrm {~s}\). You are given that
\(\sum x = 765 , \quad \sum x ^ { 2 } = 15065\).
  1. Determine a 95\% confidence interval for the mean estimated time.
  2. Amari says that the confidence interval supports the suggestion that people can estimate 20 s accurately. Make two comments about Amari's statement.
  3. Discuss whether you could have constructed the confidence interval if there had only been 10 people involved in the experiment. Amari thinks that people would be able to estimate more accurately if he gave them a second attempt. He repeats the experiment with each person and again records the times. Software is used to produce a \(95 \%\) confidence interval for the mean estimated time. The output from the software is shown below. Z Estimate of a Mean Confidence level 0.95 Sample
    Mean19.68
    s1.38
    N40
    Result
    Z Estimate of a Mean
    Mean19.68
    s1.38
    SE0.2182
    N40
    Interval\(19.68 \pm 0.4277\)
  4. State the confidence interval in the form \(\mathrm { a } < \mu < \mathrm { b }\).
  5. Make two comments based on this confidence interval about Amari's opinion that second attempts result in more accurate estimates.
OCR MEI Further Statistics Major 2023 June Q6
6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed \(x\) and upload speed \(y\) in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected.
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
  1. Explain why the student decides to carry out a test based on the product moment correlation coefficient. Summary statistics for the 20 occasions are as follows. $$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any correlation between download speed and upload speed.
  4. Both of the variables, download speed and upload speed, are random. Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.
OCR MEI Further Statistics Major 2023 June Q7
7 An analyst routinely examines bottles of hair shampoo in order to check that the average percentage of a particular chemical which the shampoo contains does not exceed the value of \(1.0 \%\) specified by the manufacturer. The percentages of the chemical in a random sample of 12 bottles of the shampoo are as follows.
\(\begin{array} { l l l l l l l l l l l } 1.087 & 1.171 & 1.047 & 0.846 & 0.909 & 1.052 & 1.042 & 0.893 & 1.021 & 1.085 & 1.096 \end{array} 0.931\)
The analyst uses software to draw a Normal probability plot for these data, and to carry out a Normality test as shown below.
\includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-08_524_1539_694_264}
  1. The analyst is going to carry out a hypothesis test to check whether the average percentage exceeds 1.0\%. Explain which test the analyst should use, referring to each of the following.
    • The Normal probability plot
    • The \(p\)-value of the Kolmogorov-Smirnov test
    • In this question you must show detailed reasoning.
    Carry out the test at the 5\% significance level.
OCR MEI Further Statistics Major 2023 June Q8
8 The random variable \(X\) has a continuous uniform distribution over [0,10].
  1. Find the probability that, if two independent values of \(X\) are taken, one is less than 3 and the other is greater than 3 . The random variable \(T\) denotes the sum of 5 independent values of \(X\).
  2. State the value of \(\mathrm { P } ( T \leqslant 25 )\). The spreadsheet below shows the heading row and the first 20 data rows from a total of 100 data rows of a simulation of the distribution of \(X\). Each of the 100 rows shows a simulation of 5 independent values of \(X\), together with \(T\), the sum of the 5 values. All of the values have been rounded to 2 decimal places. In column I the spreadsheet shows the number of values of \(T\) that are less than or equal to the corresponding values in column H . For example, there are 75 simulated values of \(T\) that are less than or equal to 30 .
    ABcDEFGHI
    1\(\mathrm { X } _ { 1 }\)\(\mathrm { X } _ { 2 }\)\(\mathrm { X } _ { 3 }\)\(\mathrm { X } _ { 4 }\)\(\mathrm { X } _ { 5 }\)TtNumber \(\leqslant \mathrm { t }\)
    23.736.654.930.419.3325.0600
    34.956.584.482.517.2625.7950
    48.104.874.263.830.7921.85101
    56.704.105.101.826.7624.48154
    63.738.388.499.871.3131.792023
    73.224.360.121.349.4918.532548
    89.177.135.474.352.4428.553075
    93.421.936.042.998.8523.243593
    100.980.689.829.837.2828.584099
    115.861.677.774.087.1426.5245100
    129.200.315.825.316.4527.1050100
    137.044.302.060.064.1617.62
    140.315.021.485.371.7713.94
    153.776.041.217.675.0123.69
    161.215.541.901.436.9117.00
    179.271.985.809.379.3435.76
    184.305.662.801.561.1915.51
    197.153.196.895.412.1824.82
    206.186.323.016.499.1231.13
    215.035.995.196.973.5526.73
  3. Use the spreadsheet output to estimate each of the following.
    • \(\mathrm { P } ( T \leqslant 25 )\)
    • \(\mathrm { P } ( T > 35 )\)
    • In this question you must show detailed reasoning.
    The random variable \(Y\) is the mean of 100 independent values of \(T\). Determine an estimate of \(\mathrm { P } ( Y > 26 )\).
OCR MEI Further Statistics Major 2023 June Q9
9 A cyclist who lives on an island suspects that car drivers with locally registered number plates allow more space when passing her than those with non-locally registered number plates. She decides to carry out a hypothesis test and so over a period of time selects a random sample of 250 cars which pass her. For each car she estimates whether the car driver allows at least the recommended 1.5 metres when passing her. The table shows the data which she collected.
Where registered
\cline { 3 - 4 } \multicolumn{2}{|c|}{}LocalNon-local
\multirow{2}{*}{
Passing
distance
}
Under 1.5 m1211
\cline { 2 - 4 }At least 1.5 m15770
  1. In this question you must show detailed reasoning. Carry out the test at the \(5 \%\) significance level to examine whether there is any association between where the car is registered and passing distance.
  2. A friend of the cyclist suggests that there may be a problem with the data, since the cyclist may have introduced some bias in estimating whether cars were allowing the recommended distance. Explain how any bias might have arisen.
OCR MEI Further Statistics Major 2023 June Q10
10 The continuous random variable \(X\) has probability density function given by
\(f ( x ) = \begin{cases} \frac { 4 } { 15 } \left( \frac { a } { x ^ { 2 } } + 3 x ^ { 2 } - \frac { 7 } { 2 } \right) & 1 \leqslant x \leqslant 2 ,
0 & \text { otherwise, } \end{cases}\)
where \(a\) is a positive constant.
  1. Find the cumulative distribution function of \(X\) in terms of \(a\).
  2. Hence or otherwise determine the value of \(a\).
  3. Show that the median value \(m\) of \(X\) satisfies the equation $$8 m ^ { 4 } - 28 m ^ { 2 } + 9 m - 4 = 0 .$$
  4. Verify that the median value of \(X\) is 1.74, correct to \(\mathbf { 2 }\) decimal places.
  5. Find \(\mathrm { E } ( X )\).
  6. Determine the mode of \(X\).
OCR MEI Further Statistics Major 2023 June Q11
11 The random variable \(X\) takes the value 1 with probability \(p\) and the value 0 with probability \(1 - p\).
  1. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    • The random variable \(Y \sim \mathrm {~B} ( 50,0.2 )\) has mean \(\mu\) and variance \(\sigma ^ { 2 }\).
    Use the results of part (a) to prove that
    • \(\mu = 10\)
    • \(\sigma ^ { 2 } = 8\).
OCR MEI Further Statistics Major 2024 June Q1
1 The number of insurance policy sales made per month by a salesperson is modelled by the random variable \(X\), with probability distribution shown in the table.
\(r\)0123456
\(\mathrm { P } ( \mathrm { X } = \mathrm { r } )\)0.050.10.250.30.150.10.05
  1. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    The salesperson is paid a basic salary of \(\pounds 1000\) per month plus \(\pounds 500\) for each policy that is sold.
  2. Find the mean and standard deviation of the salesperson's monthly salary.
OCR MEI Further Statistics Major 2024 June Q2
2 The number of cars arriving per minute to queue at a drive-through fast-food restaurant is modelled by the random variable \(X\). The standard deviation of \(X\) is 0.6 . You should assume that arrivals are random and independent and occur at a constant average rate.
  1. Find the mean of \(X\).
    1. Calculate \(\mathrm { P } ( X = 1 )\).
    2. Calculate \(\mathrm { P } ( X > 1 )\).
  2. Find the probability that fewer than 5 cars arrive in a randomly chosen 20 -minute period.
OCR MEI Further Statistics Major 2024 June Q3
3 At a launderette the process of cleaning a load of clothes consists of three stages: washing, drying and folding. The times in minutes for each process are modelled by independent Normal distributions with means and standard deviations as shown in the table.
\cline { 2 - 3 } \multicolumn{1}{c|}{}MeanStandard deviation
Washing352.4
Drying463.1
Folding122.2
  1. Find the probability that drying a randomly chosen load of clothes takes more than 50 minutes.
  2. It is given that for \(99 \%\) of loads of clothes the washing time is less than \(k\) minutes. Find the value of \(k\).
  3. Determine the probability that the drying time for a randomly chosen load of clothes is less than the total of the washing and folding times.
  4. Determine the probability that the mean time for cleaning 5 randomly chosen loads of clothes is less than 90 minutes. You should assume that the time for cleaning any load is independent of the time for cleaning any other load.
OCR MEI Further Statistics Major 2024 June Q4
4 An archer fires arrows at a circular target of radius 50 cm . The distance in cm that an arrow lands from the centre of the target is modelled by the random variable \(X\), with probability density function given by
\(f ( x ) = \begin{cases} a x & 0 \leqslant x \leqslant 50 ,
0 & \text { otherwise, } \end{cases}\)
where \(a\) is a constant.
  1. Determine the value of \(a\).
  2. Determine the probability that an arrow will land within 5 cm of the centre of the target.
  3. Determine the median distance from the centre of the target that an arrow will land.
OCR MEI Further Statistics Major 2024 June Q5
5 A researcher is investigating whether doing yoga has any effect on quality of sleep in older people. The researcher selects a random sample of 40 older people, who then complete a yoga course. Before they start the course and again at the end, the 40 people fill in a questionnaire which measures their perceived sleep quality. The higher the score, the better is the perceived quality of sleep. The researcher uses software to produce a 90\% confidence interval for the difference in mean sleep quality (sleep quality after the course minus sleep quality before the course). The output from the software is shown below. Z Estimate of a Mean Confidence level □ 0.9 Sample
Mean0.586
\(s\)2.14
40
Result
Z Estimate of a Mean
Mean0.586
s2.14
SE0.3384
N40
Lower limit0.029
Upper limit1.143
Interval\(0.586 \pm 0.557\)
  1. Explain why the confidence interval is based on the Normal distribution even though the distribution of the population of differences is not known.
  2. Explain whether the confidence interval suggests that the mean sleep qualities before and after completing a yoga course are different.
  3. In the output from the software, SE stands for 'standard error'.
    1. Explain what standard error is.
    2. Show how the standard error was calculated in this case.
  4. A colleague of the researcher suggests that the confidence level should have been \(95 \%\) rather than \(90 \%\). Determine whether this would have made a difference to your answer to part (b).
OCR MEI Further Statistics Major 2024 June Q6
6 A student is investigating the relationship between age and grip strength in adults. The student selects 10 people and records their ages in years and the grip strengths of their dominant hand, measured in kg. The data are shown in the table below, together with a scatter diagram to illustrate the data.
Age22293639535760717682
Grip strength38464249374736333424
\includegraphics[max width=\textwidth, alt={}]{bab116b3-6e5f-44db-ac86-670e4040d649-05_634_1107_641_239}
The student decides to carry out a hypothesis test to investigate whether there is negative association between age and grip strength.
  1. Explain why the student decides to carry out a test based on Spearman's rank correlation coefficient.
  2. State what property of the sample is required in order for it to be valid to carry out a hypothesis test.
  3. In this question you must show detailed reasoning. Assuming that the property in part (b) holds, carry out the test at the \(5 \%\) significance level.
OCR MEI Further Statistics Major 2024 June Q7
7 An environmental investigator wants to check whether the level of selenium in carrots in fields near a mine is different from the usual level in the country, which is \(9.4 \mathrm { ng } / \mathrm { g }\) (nanograms per gram). She takes a random sample of 10 carrots from fields near the mine and measures the selenium level of each of them in \(\mathrm { ng } / \mathrm { g }\), with results as follows.
\(\begin{array} { l l l l l l l l l l } 6.20 & 10.72 & 11.42 & 16.32 & 15.33 & 10.56 & 8.83 & 9.21 & 7.78 & 14.32 \end{array}\)
  1. Find estimates of each of the following.
    • The population mean
    • The population standard deviation
    The investigator produces a Normal probability plot and carries out a Kolmogorov-Smirnov test for these data as shown in the diagram.
    \includegraphics[max width=\textwidth, alt={}, center]{bab116b3-6e5f-44db-ac86-670e4040d649-06_583_1499_959_242}
  2. Comment on what the Normal probability plot and the \(p\)-value of the test suggest about the data.
  3. State the null hypothesis for the Kolmogorov-Smirnov test for Normality.
  4. In this question you must show detailed reasoning. Carry out a test at the \(5 \%\) significance level to investigate whether the mean selenium level in carrots from fields near the mine is different from \(9.4 \mathrm { ng } / \mathrm { g }\).
  5. If the \(p\)-value of the Kolmogorov-Smirnov test for Normality had been 0.007, explain what procedure you could have used to investigate the selenium level in carrots from fields near the mine.
OCR MEI Further Statistics Major 2024 June Q8
8 An estate agent collects data for a random selection of 13 flats in order to investigate the link between the floor areas of flats and their price. The scatter diagram shows the floor areas, \(x \mathrm {~m} ^ { 2 }\), and prices, \(\pounds y\) thousand, of the 13 flats.
\includegraphics[max width=\textwidth, alt={}, center]{bab116b3-6e5f-44db-ac86-670e4040d649-07_613_1246_386_242}
  1. The estate agent notes that two of the data points are outliers. One is Flat A which has a large floor area but is in poor condition. The other is Flat B which has a balcony with a desirable view overlooking the sea. Label these two data points on the copy of the scatter diagram in the Printed Answer Booklet. The estate agent decides to remove these two data points from the analysis. Summary statistics for the remaining 11 flats are as follows. $$\sum x = 652.5 \quad \sum y = 5067 \quad \sum x ^ { 2 } = 41987.35 \quad \sum y ^ { 2 } = 2456813 \quad \sum x y = 315928.2$$
  2. In this question you must show detailed reasoning. Calculate the equation of a regression line which is suitable for estimating the price of a flat from its floor area.
  3. Use the regression line to estimate the price for the following floor areas.
    • \(40 \mathrm {~m} ^ { 2 }\)
    • \(110 \mathrm {~m} ^ { 2 }\)
    • Given that the value of the product moment correlation coefficient for these 11 data items is 0.765 , comment on the reliability of your estimates.
    • The estate agent thinks that he can predict the floor area of a flat from its price, using the equation of the regression line found in part (b).
    Comment briefly on the estate agent's idea.
OCR MEI Further Statistics Major 2024 June Q9
9 A cyclist has 3 bicycles, a road bike, a gravel bike and an electric bike. She wishes to know if the bicycle which she is riding makes any difference to whether she reaches a speed of 25 mph or greater on a journey. She selects a random sample of 120 journeys and notes the bicycle and whether or not her maximum speed was 25 mph or greater. She decides to carry out a chisquared test to investigate whether there is any association between bicycle type and whether her maximum speed is 25 mph or greater. Tables 9.1 and 9.2 show the data and some of the expected frequencies for the test. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 9.1}
\multirow{2}{*}{}Bicycle
RoadGravelElectricTotal
\multirow{2}{*}{Maximum speed}Less than 25 mph2211942
25 mph or greater13471878
Total156837120
\end{table} \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Table 9.2}
\multirow{2}{*}{Expected frequency}Bicycle
RoadGravelElectric
\multirow{2}{*}{Maximum speed}Less than 25 mph12.95
25 mph or greater24.05
\end{table}
  1. Complete the table of expected frequencies in the Printed Answer Booklet.
  2. Determine the contribution to the chi-squared test statistic for the Electric bicycle and maximum speed 25 mph or greater. Give your answer correct to 4 decimal places. The contributions to the chi-squared test statistic for the remaining categories are shown in Table 9.3. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 9.3}
    \multirow{2}{*}{Contribution to the test statistic}Bicycle
    RoadGravelElectric
    \multirow{2}{*}{Maximum speed}Less than 25 mph2.01190.32942.8264
    25 mph or greater1.08330.1774
    \end{table}
  3. In this question you must show detailed reasoning. Carry out the test at the 5\% significance level.
  4. For each type of bicycle, give a brief interpretation of what the data suggest about maximum speed.
OCR MEI Further Statistics Major 2024 June Q10
10 Ben takes an underground train to work and back home each day. The waiting time is defined as the time from when he reaches the station platform until he boards the train. On his way to work the waiting time is \(X\) minutes, where \(X\) is modelled by a continuous uniform distribution on \([ 0,6 ]\). On his way back from work, the waiting time is \(Y\) minutes, where \(Y\) is modelled by a continuous uniform distribution on [0,4]. Ben's total waiting time for both journeys is \(Z\) minutes, where \(Z = X + Y\). You should assume that \(X\) and \(Y\) are independent.
  1. Find \(\mathrm { E } ( \mathrm { Z } )\).
  2. Ben thinks that \(Z\) will be well modelled by a continuous uniform distribution on \([ 0,10 ]\). By considering variances, show that he is not correct.
  3. Ben's friend Jamila constructs the spreadsheet below, which shows a simulation of 20 values of \(X , Y\) and \(Z\). All of the values have been rounded to 2 decimal places.
    \multirow[b]{3}{*}{
    1
    2
    }
    ABC
    XYZ
    1.173.835.01
    32.010.812.82
    41.271.522.78
    51.413.945.35
    64.112.947.05
    71.760.962.72
    83.290.984.27
    90.770.220.99
    100.991.442.43
    114.792.437.22
    123.823.937.75
    135.252.747.99
    142.640.483.12
    151.542.183.72
    162.711.664.36
    170.043.243.28
    185.953.129.07
    195.221.216.42
    204.160.114.27
    211.020.992.01
    22
    Write down an estimate of \(\mathrm { P } ( Z > 6 )\).
  4. Use a Normal approximation to determine the probability that Ben's total waiting time when travelling to and from work on 40 days is more than 210 minutes.
OCR MEI Further Statistics Major 2024 June Q11
11 The discrete random variable \(X\) has a uniform distribution over the set of all integers between 25 and \(n\) inclusive, where \(n\) is a positive integer with \(n > 25\).
  1. Determine \(\mathrm { P } \left( \mathrm { X } < \frac { \mathrm { n } + 25 } { 2 } \right)\) in each of the following cases.
    • \(n\) is even
    • \(n\) is odd
    • Determine an expression in terms of \(n\) for the variance of the mean of 100 independent values of \(X\).
    • Given that \(n = 75\), calculate an estimate of the probability that the mean of 100 independent values of \(X\) is less than 48 .
OCR MEI Further Statistics Major 2024 June Q12
12 The cumulative distribution function of the continuous random variable \(X\) is given by
\(F ( x ) = \begin{cases} 0 & x < 20 ,
a \left( x ^ { 2 } + b x + c \right) & 20 \leqslant x \leqslant 30 ,
1 & x > 30 , \end{cases}\)
where \(a\), \(b\) and \(c\) are constants.
You are given that \(\mathrm { P } ( X < 25 ) = \frac { 11 } { 24 }\).
  1. Find \(\mathrm { P } ( X > 27 )\).
  2. Find the 90th percentile of \(X\).
OCR MEI Further Statistics Major 2020 November Q1
1 In a game at a fair, players choose 4 countries from a list of 10 countries. The names of all 10 countries are then put in a box and the player selects 4 of them at random. The random variable \(X\) represents the number of countries that match those which the player originally chose.
  1. Show that the probability that a randomly selected player matches all 4 countries is \(\frac { 1 } { 210 }\). Table 1 shows the probability distribution of \(X\). \begin{table}[h]
    \(r\)01234
    \(\mathrm { P } ( X = r )\)\(\frac { 1 } { 14 }\)\(\frac { 8 } { 21 }\)\(\frac { 3 } { 7 }\)\(\frac { 4 } { 35 }\)\(\frac { 1 } { 210 }\)
    \captionsetup{labelformat=empty} \caption{Table 1}
    \end{table}
  2. Find each of the following.
    • \(\mathrm { E } ( X )\)
    • \(\operatorname { Var } ( X )\)
    • A player has to pay \(\pounds 1\) to play the game. The player gets 40 pence back for every country which is matched.
    Find the mean and standard deviation of the player's loss per game.
  3. In order to try to attract more customers, the rules will be changed as follows. The game will still cost \(\pounds 1\) to play. The player will get 25 pence back for every country which is matched, plus an additional bonus of \(\pounds 100\) if all four countries are matched. Find the player's mean gain or loss per game with these new rules.
OCR MEI Further Statistics Major 2020 November Q2
2 On average 1 in 4000 people have a particular antigen in their blood (an antigen is a molecule which may cause an adverse reaction).
    1. A random sample of 1200 people is selected. The random variable \(X\) represents the number of people in the sample who have this antigen in their blood. Explain why you could use either a binomial distribution or a Poisson distribution to model the distribution of \(X\).
    2. Use either a binomial or a Poisson distribution to calculate each of the following probabilities.
      • \(\mathrm { P } ( X = 3 )\)
  1. \(\mathrm { P } ( X > 3 )\)
  2. A researcher needs to find 2 people with the antigen. Find the probability that at most 5000 people have to be tested in order to achieve this.
OCR MEI Further Statistics Major 2020 November Q3
3 A supermarket sells cashew nuts in three different sizes of bag: small, medium and large. The weights in grams of the nuts in each type of bag are modelled by independent Normal distributions as shown in Table 3. \begin{table}[h]
Bag sizeMeanStandard deviation
Small51.51.1
Medium100.71.6
Large201.31.7
\captionsetup{labelformat=empty} \caption{Table 3}
\end{table}
  1. Find the probability that the mean weight of two randomly selected large bags is at least 200 g .
  2. Find the probability that the total weight of eight randomly selected small bags is greater than the total weight of two randomly selected medium bags and one randomly selected large bag.
OCR MEI Further Statistics Major 2020 November Q4
4 An amateur meteorologist records the total rainfall at her home each day using a traditional rain gauge. This means that she has to go out each day at 9 am to read the rain gauge and then to empty it. She wants to save time by using a digital rain gauge, but she also wants to ensure that the readings from the digital gauge are similar to those of her traditional gauge. Over a period of 100 days, she uses both gauges to measure the rainfall. The meteorologist uses software to produce a 95\% confidence interval for the difference between the two readings (the traditional gauge reading minus the digital gauge reading). The output from the software is shown in Fig. 4. Although rainfall was measured over a period of 100 days, there was no rain on 40 of those days and so the sample size in the software output is 60 rather than 100. \begin{table}[h]
Z Estimate of a Mean
Confidence Level
0.95
Sample
Mean 0.1173
Result
Z Estimate of a Mean
Mean0.1173
\(\sigma\)0.5766
SE0.07444
N60
Lower Limit-0.0286
Upper Limit0.2632
Interval\(0.1173 \pm 0.1459\)
\captionsetup{labelformat=empty} \caption{Fig. 4}
\end{table}
  1. Explain why this confidence interval can be calculated even though nothing is known about the distribution of the population of differences.
  2. State the confidence interval which the software gives in the form \(a < \mu < b\).
  3. Show how the value 0.07444 (labelled SE) was calculated.
  4. Comment on whether you think that the confidence interval suggests that the two different methods of measurement are broadly in agreement.
OCR MEI Further Statistics Major 2020 November Q5
5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, \(x\), and the web-based test, \(y\), are both measured in the same suitable units.
  1. Half of the participants do the laboratory-based test first and the other half do the web-based test first. Explain why the expert adopts this approach. The scatter diagram in Fig. 5 shows the data that the expert collected. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242} \captionsetup{labelformat=empty} \caption{Fig. 5}
    \end{figure} Summary statistics for these data are as follows. $$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
  2. Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
  3. Estimate the web-based scores of people whose laboratory-based scores were as follows.
    • 12
    • 25
    • Comment on the reliability of each of your estimates.
    • A colleague of the expert suggests that the regression line is not valid because one of the data values is an outlier.
    Stating the approximate coordinates of the outlier, suggest what the expert should do.
OCR MEI Further Statistics Major 2020 November Q6
6 A pollution control officer is investigating a possible link between the levels of various pollutants in the air and the speed of the wind at various sites. A random sample of 60 values of the windspeed together with the levels of a variety of pollutants is taken at a particular site. The product moment correlation coefficient between wind-speed and nitrogen dioxide level is 0.3231 .
  1. Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any correlation between wind-speed and nitrogen dioxide level.
  2. State the condition required for the test carried out in part (a) to be valid. Table 6.1 shows the values of the product moment correlation coefficient between 5 different measures of pollution and also wind-speed for a very large random sample of values at another site. Those correlations that are significant at the \(10 \%\) level are denoted by a * after the value of the correlation. \begin{table}[h]
    CorrelationsPM10SPEED\(\mathrm { NO } _ { 2 }\)\(\mathrm { O } _ { 3 }\)PM25\(\mathrm { SO } _ { 2 }\)
    PM101.00
    SPEED0.08*1.00
    \(\mathrm { NO } _ { 2 }\)0.59*0.25*1.00
    \(\mathbf { O } _ { \mathbf { 3 } }\)-0.05*-0.04*-0.30*1.00
    PM250.85*-0.010.56*-0.021.00
    \(\mathrm { SO } _ { 2 }\)0.42*0.15*0.73*-0.63*0.40*1.00
    \captionsetup{labelformat=empty} \caption{Table 6.1}
    \end{table} \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Table 6.2 shows standard guidelines for effect sizes.}
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \end{table} Table 6.2 The officer analyses these data for effect size.
  3. Explain how the very large sample size relates to the interpretation of the correlation coefficients shown in Table 6.1.
  4. Comment briefly on what the pollution control officer might conclude from these tables, relevant to her investigation into wind-speed and pollutant levels.