5.06c Fit other distributions: discrete and continuous

72 questions

Sort by: Default | Easiest first | Hardest first
CAIE Further Paper 4 2020 November Q3
7 marks Standard +0.8
3 A random sample of 200 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
Interval\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(2 \leqslant x < 2.5\)\(2.5 \leqslant x < 3\)
Observed frequency52340414645
It is required to test the goodness of fit of the distribution with probability density function f given by $$f ( x ) = \begin{cases} \frac { 1 } { 9 } x ( 4 - x ) & 0 \leqslant x \leqslant 3 \\ 0 & \text { otherwise } \end{cases}$$ Most of the relevant expected frequencies, correct to 2 decimal places, are given in the following table.
Interval\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x < 2\)\(2 \leqslant x < 2.5\)\(2.5 \leqslant x < 3\)
Expected frequency\(p\)\(q\)37.9643.5243.5237.96
  1. Show that \(p = 10.19\) and find the value of \(q\).
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to test whether f is a satisfactory model for the data.
CAIE Further Paper 4 2021 November Q2
8 marks Standard +0.3
2 It is claimed that the heights of a particular age group of boys follow a normal distribution with mean 125 cm and standard deviation 12 cm . Observations for a randomly chosen group of 60 boys in this age group are summarised in the following table. The table also gives the expected frequencies, correct to 2 decimal places, based on the normal distribution with mean 125 cm and standard deviation 12 cm .
Height, \(x \mathrm {~cm}\)\(x < 100\)\(100 \leqslant x < 110\)\(110 \leqslant x < 120\)\(120 \leqslant x < 130\)\(130 \leqslant x < 140\)\(x \geqslant 140\)
Observed frequency031523118
Expected frequency1.125.2213.9719.3813.976.34
  1. Show how the expected frequency for \(130 \leqslant x < 140\) is obtained.
  2. Carry out a goodness of fit test, at the \(5 \%\) significance level, to determine whether the claim is supported by the data.
CAIE Further Paper 4 2023 November Q2
8 marks Standard +0.3
2 The number of breakdowns on a particular section of road is recorded each day over a period of 90 days. It is suggested that the number of breakdowns follows a Poisson distribution with mean 3.5. The data is summarised in the table, together with some of the expected frequencies resulting from the suggested Poisson distribution.
Number of breakdowns per day012345678 or more
Observed frequency0513172116954
Expected frequency2.7189.51216.64616.99311.8953.4692.407
  1. Complete the table.
  2. Carry out a goodness of fit test, at the 10\% significance level, to determine whether or not \(\operatorname { Po } ( 3.5 )\) is a good fit to the data.
OCR S3 2007 June Q8
14 marks Standard +0.3
8 The continuous random variable \(Y\) has a distribution with mean \(\mu\) and variance 20. A random sample of 50 observations of \(Y\) is selected and these observations are summarised in the following grouped frequency table.
Values\(y < 20\)\(20 \leqslant y < 25\)\(25 \leqslant y < 30\)\(y \geqslant 30\)
Frequency327128
  1. Assuming that \(Y \sim \mathrm {~N} ( 25,20 )\), show that the expected frequency for the interval \(20 \leqslant y < 25\) is 18.41, correct to 2 decimal places, and obtain the remaining expected frequencies.
  2. Test, at the \(5 \%\) significance level, whether the distribution \(\mathrm { N } ( 25,20 )\) fits the data.
  3. Given that the sample mean is 24.91 , find a \(98 \%\) confidence interval for \(\mu\).
  4. Does the outcome of the test in part (ii) affect the validity of the confidence interval found in part (iii)? Justify your answer.
OCR S3 Specimen Q4
10 marks Standard +0.3
4 The lengths of time, in seconds, between vehicles passing a fixed observation point on a road were recorded at a time when traffic was flowing freely. The frequency distribution in Table 1 is a summary of the data from 100 observations. \begin{table}[h]
Time interval \(( x\) seconds \()\)\(0 < x \leqslant 5\)\(5 < x \leqslant 10\)\(10 < x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x\)
Observed frequency49222072
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} It is thought that the distribution of times might be modelled by the continuous random variable \(X\) with probability density function given by $$f ( x ) = \begin{cases} 0.1 e ^ { - 0.1 x } & x > 0 \\ 0 & \text { otherwise } \end{cases}$$ Using this model, the expected frequencies (correct to 2 decimal places) for the given time intervals are shown in Table 2. \begin{table}[h]
Time interval \(( x\) seconds \()\)\(0 < x \leqslant 5\)\(5 < x \leqslant 10\)\(10 < x \leqslant 20\)\(20 < x \leqslant 40\)\(40 < x\)
Expected frequency39.3523.8723.2511.701.83
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Show how the expected frequency of 23.87, corresponding to the interval \(5 < x \leqslant 10\), is obtained.
  2. Test, at the 10\% significance level, the goodness of fit of the model to the data.
OCR MEI S3 2007 January Q4
18 marks Standard +0.3
4
  1. An amateur weather forecaster has been keeping records of air pressure, measured in atmospheres. She takes the measurement at the same time every day using a barometer situated in her garden. A random sample of 100 of her observations is summarised in the table below. The corresponding expected frequencies for a Normal distribution, with its two parameters estimated by sample statistics, are also shown in the table.
    Pressure ( \(a\) atmospheres)Observed frequencyFrequency as given by Normal model
    \(a \leqslant 0.98\)41.45
    \(0.98 < a \leqslant 0.99\)65.23
    \(0.99 < a \leqslant 1.00\)913.98
    \(1.00 < a \leqslant 1.01\)1523.91
    \(1.01 < a \leqslant 1.02\)3726.15
    \(1.02 < a \leqslant 1.03\)2118.29
    \(1.03 < a\)810.99
    Carry out a test at the \(5 \%\) level of significance of the goodness of fit of the Normal model. State your conclusion carefully and comment on your findings.
  2. The forecaster buys a new digital barometer that can be linked to her computer for easier recording of observations. She decides that she wishes to compare the readings of the new barometer with those of the old one. For a random sample of 10 days, the readings (in atmospheres) of the two barometers are shown below.
    DayABCDEFGHIJ
    Old0.9921.0051.0011.0111.0260.9801.0201.0251.0421.009
    New0.9851.0031.0021.0141.0220.9881.0301.0161.0471.025
    Use an appropriate Wilcoxon test to examine at the \(10 \%\) level of significance whether there is any reason to suppose that, on the whole, readings on the old and new barometers do not agree.
OCR MEI S3 2007 June Q4
18 marks Standard +0.3
4 A machine produces plastic strip in a continuous process. Occasionally there is a flaw at some point along the strip. The length of strip (in hundreds of metres) between successive flaws is modelled by a continuous random variable \(X\) with probability density function \(\mathrm { f } ( x ) = \frac { 18 } { ( 3 + x ) ^ { 3 } }\) for \(x > 0\). The table below gives the frequencies for 100 randomly chosen observations of \(X\). It also gives the probabilities for the class intervals using the model.
Length \(x\) (hundreds of metres)Observed frequencyProbability
\(0 < x \leqslant 0.5\)210.2653
\(0.5 < x \leqslant 1\)240.1722
\(1 < x \leqslant 2\)120.2025
\(2 < x \leqslant 3\)150.1100
\(3 < x \leqslant 5\)130.1094
\(5 < x \leqslant 10\)90.0874
\(x > 10\)60.0532
  1. Examine the fit of this model to the data at the \(5 \%\) level of significance. You are given that the median length between successive flaws is 124 metres. At a later date the following random sample of ten lengths (in metres) between flaws is obtained. $$\begin{array} { l l l l l l l l l l } 239 & 77 & 179 & 221 & 100 & 312 & 52 & 129 & 236 & 42 \end{array}$$
  2. Test at the \(10 \%\) level of significance whether the median length may still be assumed to be 124 metres.
OCR MEI S4 2008 June Q4
24 marks Standard +0.3
4
  1. State the usual model, including the accompanying distributional assumptions, for the one-way analysis of variance. Interpret the terms in the model.
  2. An examinations authority is considering using an external contractor for the typesetting and printing of its examination papers. Four contractors are being investigated. A random sample of 20 examination papers over the entire range covered by the authority is selected and 5 are allocated at random to each contractor for preparation. The authority carefully checks the printed papers for errors and assigns a score to each to indicate the overall quality (higher scores represent better quality). The scores are as follows.
    Contractor AContractor BContractor CContractor D
    41545641
    49454536
    50505446
    44505038
    56474935
    [The sum of these data items is 936 and the sum of their squares is 44544 .]
    Construct the usual one-way analysis of variance table. Carry out the appropriate test, using a \(5 \%\) significance level. Report briefly on your conclusions.
  3. The authority thinks that there might be differences in the ways the contractors cope with the preparation of examination papers in different subject areas. For this purpose, the subject areas are broadly divided into mathematics, sciences, languages, humanities, and others. The authority wishes to design a further investigation, ensuring that each of these subject areas is covered by each contractor. Name the experimental design that should be used and describe briefly the layout of the investigation.
OCR MEI S4 2010 June Q4
24 marks Standard +0.3
4 At an agricultural research station, a trial is made of four varieties (A, B, C, D) of a certain crop in an experimental field. The varieties are grown on plots in the field and their yields are measured in a standard unit.
  1. It is at first thought that there may be a consistent trend in the natural fertility of the soil in the field from the west side to the east, though no other trends are known. Name an experimental design that should be used in these circumstances and give an example of an experimental layout. Initial analysis suggests that any natural fertility trend may in fact be ignored, so the data from the trial are analysed by one-way analysis of variance.
  2. The usual model for one-way analysis of variance of the yields \(y _ { i j }\) may be written as $$y _ { i j } = \mu + \alpha _ { i } + e _ { i j }$$ where the \(e _ { i j }\) represent the experimental errors. Interpret the other terms in the model. State the usual distributional assumptions for the \(e _ { i j }\).
  3. The data for the yields are as follows, each variety having been used on 5 plots.
    Variety
    ABCD
    12.314.214.113.6
    11.913.113.212.8
    12.813.114.613.3
    12.212.513.714.3
    13.512.713.413.8
    $$\left[ \Sigma \Sigma y _ { i j } = 265.1 , \quad \Sigma \Sigma y _ { i j } ^ { 2 } = 3524.31 . \right]$$ Construct the usual one-way analysis of variance table and carry out the usual test, at the 5\% significance level. Report briefly on your conclusions. {www.ocr.org.uk} after the live examination series.
    If OCR has unwittingly failed to correctly acknowledge or clear any third-party content in this assessment material, OCR will be happy to correct its mistake at the earliest possible opportunity. For queries or further information please contact the Copyright Team, First Floor, 9 Hills Road, Cambridge CB2 1GE.
    OCR is part of the
OCR S3 2015 June Q6
13 marks Standard +0.3
6 In each of 38 randomly selected weeks of the English Premier Football League there were 10 matches. Table 1 summarises the number of home wins in 10 matches, \(X\), and the corresponding number of weeks. \begin{table}[h]
Number of home wins012345678910
Number of weeks01288971200
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} A researcher investigates whether \(X\) can be modelled by the distribution \(\mathrm { B } ( 10 , p )\). He calculates the expected frequencies using a value of \(p\) obtained from the sample mean.
  1. Show that \(p = 0.45\). Table 2 shows the observed and expected number of weeks. \begin{table}[h]
    Number of home wins012345678910Totals
    Observed number of weeks0128897120038
    Expected number of weeks0.0960.7882.8996.3269.0588.8936.0642.8350.8700.1580.01338
    \captionsetup{labelformat=empty} \caption{Table 2
  2. Show how the value of 2.835 for 7 home wins is obtained.}
\end{table} The researcher carries out a test, at the \(5 \%\) significance level, of whether the distribution \(\mathrm { B } ( 10 , p )\) fits the data.
  • Explain why it is necessary to combine classes.
  • Carry out the test.
  • OCR S3 2009 January Q8
    14 marks Standard +0.3
    8 A soft drinks factory produces lemonade which is sold in packs of 6 bottles. As part of the factory's quality control, random samples of 75 packs are examined at regular intervals. The number of underfilled bottles in a pack of 6 bottles is denoted by the random variable \(X\). The results of one quality control check are shown in the following table.
    Number of underfilled bottles0123
    Number of packs442083
    A researcher assumes that \(X \sim \mathrm {~B} ( 3 , p )\).
    1. By finding the sample mean, show that an estimate of \(p\) is 0.2 .
    2. Show that, at the \(5 \%\) significance level, there is evidence that this binomial distribution does not fit the data.
    3. Another researcher suggests that the goodness of fit test should be for \(\mathrm { B } ( 6 , p )\). She finds that the corresponding value of \(\chi ^ { 2 }\) is 2.74 , correct to 3 significant figures. Given that the number of degrees of freedom is the same as in part (ii), state the conclusion of the test at the same significance level.
    OCR S3 2016 June Q7
    12 marks Standard +0.8
    7 A continuous random variable \(X\) has probability density function $$f ( x ) = \begin{cases} a x ^ { 3 } & 0 \leqslant x \leqslant 1 \\ a x ^ { 2 } & 1 < x \leqslant 2 \\ 0 & \text { otherwise } \end{cases}$$ where \(a\) is a constant.
    1. Show that \(a = \frac { 12 } { 31 }\).
    2. Find \(\mathrm { E } ( X )\). It is thought that the time taken by a student to complete a task can be well modelled by \(X\). The times taken by 992 randomly chosen students are summarised in the table, together with some of the expected frequencies.
      Time\(0 \leqslant x < 0.5\)\(0.5 \leqslant x < 1\)\(1 \leqslant x < 1.5\)\(1.5 \leqslant x \leqslant 2\)
      Observed frequency892279613
      Expected frequency690
    3. Find the other expected frequencies and test, at the \(5 \%\) level of significance, whether the data can be well modelled by \(X\).
    OCR MEI S3 2009 January Q4
    18 marks Standard +0.3
    4
    1. Explain the meaning of 'opportunity sampling'. Give one reason why it might be used and state one disadvantage of using it. A market researcher is conducting an 'on-street' survey in a busy city centre, for which he needs to stop and interview 100 people. For each interview the researcher counts the number of people he has to ask until one agrees to be interviewed. The data collected are as follows.
      No. of people asked1234567 or more
      Frequency261917131186
      A model for these data is proposed as follows, where \(p\) (assumed constant throughout) is the probability that a person asked agrees to be interviewed, and \(q = 1 - p\).
      No. of people asked1234567 or more
      Probability\(p\)\(p q\)\(p q ^ { 2 }\)\(p q ^ { 3 }\)\(p q ^ { 4 }\)\(p q ^ { 5 }\)\(q ^ { 6 }\)
    2. Verify that these probabilities add to 1 whatever the value of \(p\).
    3. Initially it is thought that on average 1 in 4 people asked agree to be interviewed. Test at the \(10 \%\) level of significance whether it is reasonable to suppose that the model applies with \(p = 0.25\).
    4. Later an estimate of \(p\) obtained from the data is used in the analysis. The value of the test statistic (with no combining of cells) is found to be 9.124 . What is the outcome of this new test? Comment on your answer in relation to the outcome of the test in part (iii).
    OCR MEI S3 2011 June Q2
    18 marks Standard +0.3
    2 Scientists researching into the chemical composition of dust in space collect specimens using a specially designed spacecraft. The craft collects the particles of dust in trays that are made up of a large array of cells containing aerogel. The aerogel traps the particles that penetrate into the cells.
    1. For a random sample of 100 cells, the number of particles of dust in each cell was counted, giving the following results.
      Number of particles0123456789\(10 +\)
      Frequency4710201715109530
      It is thought that the number of particles collected in each cell can be modelled using the distribution Poisson(4.2) since 4.2 is the sample mean for these data. Some of the calculations for a \(\chi ^ { 2 }\) test are shown below. The cells for 8,9 and \(10 +\) particles have been combined.
      Number of particles
      Observed frequency
      Expected frequency
      Contribution to \(X ^ { 2 }\)
      567\(8 +\)
      151098
      16.3311.446.866.39
      0.10830.18130.66760.4056
      Complete the calculations and carry out the test using a \(10 \%\) significance level to see whether the number of particles per cell may be modelled in this way.
    2. The diameters of the dust particles are believed to be distributed symmetrically about a median of 15 micrometres \(( \mu \mathrm { m } )\). For a random sample of 20 particles, the sum of the signed ranks of the diameters of the particles smaller than \(15 \mu \mathrm {~m} \left( W _ { - } \right)\)is found to be 53 . Test at the \(5 \%\) level of significance whether the median diameter appears to be more than \(15 \mu \mathrm {~m}\).
    CAIE FP2 2010 June Q7
    8 marks Challenging +1.2
    7 Benford's Law states that, in many tables containing large numbers of numerical values, the probability distribution of the leading non-zero digit \(D\) is given by $$\mathrm { P } ( D = d ) = \log _ { 10 } \left( \frac { d + 1 } { d } \right) , \quad d = 1,2 , \ldots , 9 .$$ The following table shows a summary of a random sample of 100 non-zero leading digits taken from a table of cumulative probabilities for the Poisson distribution.
    Leading digit12345\(\geqslant 6\)
    Frequency222113111122
    Carry out a suitable goodness of fit test at the 10\% significance level.
    CAIE FP2 2014 June Q9
    10 marks Standard +0.8
    9 A random sample of 200 observations of the continuous random variable \(X\) was taken and the values are summarised in the following table.
    Interval\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)\(6 \leqslant x < 7\)\(7 \leqslant x < 8\)
    Observed frequency634532252276
    It is required to test the goodness of fit of the distribution with probability density function \(f\) given by $$f ( x ) = \begin{cases} \frac { 1 } { x \ln 8 } & 1 \leqslant x < 8 \\ 0 & \text { otherwise } \end{cases}$$ The relevant expected frequencies, correct to 2 decimal places, are given in the following table.
    Interval\(1 \leqslant x < 2\)\(2 \leqslant x < 3\)\(3 \leqslant x < 4\)\(4 \leqslant x < 5\)\(5 \leqslant x < 6\)\(6 \leqslant x < 7\)\(7 \leqslant x < 8\)
    Expected frequency66.67\(p\)27.67\(q\)17.5414.8312.84
    Show that \(p = 39.00\), correct to 2 decimal places, and find the value of \(q\). Carry out a goodness of fit test at the 5\% significance level.
    CAIE FP2 2016 June Q9
    10 marks Standard +0.3
    9 Applicants for a national teacher training course are required to pass a mathematics test. Each year, the applicants are tested in groups of 6 and the number of successful applicants in each group is recorded. The overall proportion of successful applicants has remained constant over the years and is equal to \(60 \%\) of the applicants. The results from 150 randomly chosen groups are shown in the following table.
    Number of successful applicants0123456
    Number of groups13255138302
    Test, at the \(5 \%\) significance level, the goodness of fit of the distribution \(\mathbf { B } ( 6,0.6 )\) for the number of successful applicants in a group.
    CAIE FP2 2016 November Q9
    13 marks Standard +0.3
    9 The number of visitors arriving at an art exhibition is recorded for each 10 -minute period of time during the ten hours that it is open on a particular day. The results are as follows.
    Number of visitors in a 10 -minute period012345678\(\geqslant 9\)
    Number of 10 -minute periods2212811134710
    1. Calculate the mean and variance for this sample and explain whether your answers support a suggestion that a Poisson distribution might be a suitable model for the number of visitors in a 10-minute period.
    2. Use an appropriate Poisson distribution to find the two expected frequencies missing from the following table.
      Number of visitors in
      a 10-minute period
      012345678\(\geqslant 9\)
      Expected number of
      10 -minute periods
      1.108.7911.729.386.253.571.791.28
    3. Test, at the \(10 \%\) significance level, the goodness of fit of this Poisson distribution to the data.
    OCR Further Statistics AS 2023 June Q6
    12 marks Standard +0.3
    6 A machine is used to toss a coin repeatedly. Rosa believes that the outcome of each toss made by the machine is not independent of the previous toss. Rosa gets the machine to toss a coin 6 times and record the number of heads, \(X\), obtained. After recording the number of heads obtained, Rosa resets the machine and gets it to toss the coin 6 more times. Rosa again records the number of heads obtained and she repeats this procedure until she has recorded 88 independent values of \(X\).
    1. The sample mean and sample variance of \(X\) are 3.35 and 3.392 respectively. Explain what these results suggest about the validity of a binomial model \(\mathrm { B } ( 6 , p )\) for the data. Rosa uses a computer spreadsheet to work out the probabilities for a more sophisticated model in which the outcome of each toss is dependent on the outcome of the previous toss. Her model suggests that the probabilities \(\mathrm { P } ( X = x )\), for \(x = 0,1,2,3,4,5,6\), are approximately in the ratio \(5 : 6 : 7 : 8 : 7 : 6 : 5\). She carries out a \(\chi ^ { 2 }\) test to investigate whether this model is a good fit for the data. The following table shows the full results of the experiments, together with some of the calculations needed for the test.
      \(x\)0123456Total
      Observed frequency710161515111488
      Expected frequency
      Contribution to \(\chi ^ { 2 }\) statistic0.90.33330.28570.06250.0714
    2. In the Printed Answer Booklet, complete the table.
    3. Carry out the test, using a 10\% significance level.
    4. Rosa says that the results definitely show that one of the two proposed models is correct. Comment on this statement.
    OCR Further Statistics 2021 November Q6
    11 marks Standard +0.3
    6 A practice examination paper is taken by 500 candidates, and the organiser wishes to know what continuous distribution could be used to model the actual time, \(X\) minutes, taken by candidates to complete the paper. The organiser starts by carrying out a goodness-of-fit test for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) at the \(5 \%\) significance level. The grouped data and the results of some of the calculations are shown in the following table.
    Time\(0 \leqslant X < 80\)\(80 \leqslant X < 90\)\(90 \leqslant X < 100\)\(100 \leqslant X < 110\)\(X \geqslant 110\)
    Observed frequency \(O\)3695137129103
    Expected frequency \(E\)45.60680.641123.754123.754126.246
    \(\frac { ( O - E ) ^ { 2 } } { E }\)2.0232.5571.4180.2224.280
    1. State suitable hypotheses for the test.
    2. Show how the figures 123.754 and 0.222 in the column for \(100 \leqslant X < 110\) were obtained. [3]
    3. Carry out the test. The organiser now wants to suggest an improved model for the data.
      1. Suggest an aspect of the data that the organiser should take into account in considering an improved model.
      2. The graph of the probability density function for the distribution \(\mathrm { N } \left( 100,15 ^ { 2 } \right)\) is shown in the diagram in the Printed Answer Booklet. On the same diagram sketch the probability density function of an improved model that takes into account the aspect of the data in part (d)(i).
    OCR Further Statistics Specimen Q8
    15 marks Standard +0.3
    8 A continuous random variable \(X\) has probability density function given by $$\mathrm { f } ( x ) = \left\{ \begin{array} { c c } 0.8 \mathrm { e } ^ { - 0.8 x } & x \geq 0 \\ 0 & x < 0 \end{array} \right.$$
    1. Find the mean and variance of \(X\). The lifetime of a certain organism is thought to have the same distribution as \(X\). The lifetimes in days of a random sample of 60 specimens of the organism were found. The observed frequencies, together with the expected frequencies correct to 3 decimal places, are given in the table.
      Range\(0 \leq x < 1\)\(1 \leq x < 2\)\(2 \leq x < 3\)\(3 \leq x < 4\)\(x \geq 4\)
      Observed24221031
      Expected33.04014.8466.6712.9972.446
    2. Show how the expected frequency for \(1 \leq x < 2\) is obtained.
    3. Carry out a goodness of fit test at the \(5 \%\) significance level.
    Edexcel S3 2022 January Q6
    14 marks Standard +0.3
    1. A farmer sells strawberries in baskets. The contents of each of 100 randomly selected baskets were weighed and the results, given to the nearest gram, are shown below.
    Weight of strawberries (grams)Number of baskets
    302-3035
    304-30513
    306-30710
    308-30918
    310-31125
    312-31320
    314-3155
    316-3174
    The farmer proposes that the weight of strawberries per basket, in grams, should be modelled by a normal distribution with a mean of 310 g and standard deviation 4 g . Using his model, the farmer obtains the following expected frequencies.
    Weight of strawberries (s, grams)Expected frequency
    \(s \leqslant 303.5\)\(a\)
    \(303.5 < s \leqslant 305.5\)7.8
    \(305.5 < s \leqslant 307.5\)13.6
    \(307.5 < s \leqslant 309.5\)18.4
    \(309.5 < s \leqslant 311.5\)19.6
    \(311.5 < s \leqslant 313.5\)16.3
    \(313.5 < s \leqslant 315.5\)10.6
    \(s > 315.5\)\(b\)
    1. Find the value of \(a\) and the value of \(b\). Give your answers correct to one decimal place. Before \(s \leqslant 303.5\) and \(s > 315.5\) are included, for the remaining cells, $$\sum \frac { ( O - E ) ^ { 2 } } { E } = 9.71$$
    2. Using a 5\% significance level, test whether the data are consistent with the model. You should state your hypotheses, the test statistic and the critical value used. An alternative model uses estimates for the population mean and standard deviation from the data given. Using these estimated values no expected frequency is below 5
      Another test is to be carried out, using a \(5 \%\) significance level, to assess whether the data are consistent with this alternative model.
    3. State the effect, if any, on the critical value for this test. Give a reason for your answer.
    Edexcel S3 2023 January Q4
    14 marks Standard +0.3
    4 A research student is investigating the number of children who are girls in families with 4 children. The table below shows her results for 200 such families.
    Number of girls01234
    Frequency1568693810
    The research student suggests that a binomial distribution with \(p = \frac { 1 } { 2 }\) could be a suitable model for the number of children who are girls in a family of 4 children.
    1. Using her results and a \(5 \%\) significance level, test the research student's claim. You should state your hypotheses, expected frequencies, test statistic and the critical value used. The research student decides to refine the model and retains the idea of using a binomial distribution but does not specify the probability that the child is a girl.
    2. Use the data in the table to show that the probability that a child is a girl is 0.45 The research student uses the probability from part (b) to calculate a new set of expected frequencies, none of which are less than 5
      The statistic \(\sum \frac { ( O - E ) ^ { 2 } } { E }\) is evaluated and found to be 2.47
    3. Test, at the \(5 \%\) significance level, whether using a binomial distribution is suitable to model the number of children who are girls in a family of 4 children. You should state your hypotheses and the critical value used.
    Edexcel S3 2024 January Q4
    10 marks Standard +0.3
    1. The number of jobs sent to a printer per hour in a small office is recorded for 120 hours. The results are summarised in the following table.
    Number of jobs012345
    Frequency2434282185
    1. Show that the mean number of jobs sent to the printer per hour for these data is 1.75 The office manager believes that the number of jobs sent to the printer per hour can be modelled using a Poisson distribution. The office manager uses the mean given in part (a) to calculate the expected frequencies for this model. Some of the results are given in the following table.
      Number of jobs012345 or more
      Expected frequency20.8536.4931.93\(r\)\(s\)3.95
    2. Show that the value of \(s\) is 8.15 to 2 decimal places.
    3. Find the value of \(r\) to 2 decimal places. The value of \(\sum \frac { \left( O _ { i } - E _ { i } \right) ^ { 2 } } { E _ { i } }\) for the first four frequencies in the table is 1.43
    4. Test, at the \(5 \%\) level of significance, whether or not the number of jobs sent to the printer per hour can be modelled using a Poisson distribution. Show your working clearly, stating your hypotheses, test statistic and critical value.
    Edexcel S3 2014 June Q6
    14 marks Standard +0.3
    6. Eight tasks were given to each of 125 randomly selected job applicants. The number of tasks failed by each applicant is recorded. The results are as follows
    Number of tasks failed by an applicant0123456 or more
    Frequency22145421230
    1. Show that the probability of a randomly selected task, from this sample, being failed is 0.3 An employer believes that a binomial distribution might provide a good model for the number of tasks, out of 8, that an applicant fails. He uses a binomial distribution, with the estimated probability 0.3 of a task being failed. The calculated expected frequencies are as follows
      Number of tasks failed by an applicant0123456 or more
      Expected frequency7.2124.7137.06\(r\)17.025.83\(s\)
    2. Find the value of \(r\) and the value of \(s\) giving your answers to 2 decimal places.
    3. Test, at the \(5 \%\) level of significance, whether or not a binomial distribution is a suitable model for these data. State your hypotheses and show your working clearly. The employer believes that all applicants have the same probability of failing each task.
    4. Use your result from part(c) to comment on this belief.