Edexcel S1 (Statistics 1)

Question 4
View details
4. Aeroplanes fly from City \(A\) to City \(B\). Over a long period of time the number of minutes delay in take-off from City \(A\) was recorded. The minimum delay was 5 minutes and the maximum delay was 63 minutes. A quarter of all delays were at most 12 minutes, half were at most 17 minutes and \(75 \%\) were at most 28 minutes. Only one of the delays was longer than 45 minutes. An outlier is an observation that falls either \(1.5 \times\) (interquartile range) above the upper quartile or \(1.5 \times\) (interquartile range) below the lower quartile.
  1. On the graph paper opposite draw a box plot to represent these data.
  2. Comment on the distribution of delays. Justify your answer.
  3. Suggest how the distribution might be interpreted by a passenger who frequently flies from City \(A\) to City \(B\).
    \includegraphics[max width=\textwidth, alt={}, center]{3d4f7bfb-b235-418a-9411-a4d0b3188254-008_1190_1487_278_223}
Question 7
View details
7. In a school there are 148 students in Years 12 and 13 studying Science, Humanities or Arts subjects. Of these students, 89 wear glasses and the others do not. There are 30 Science students of whom 18 wear glasses. The corresponding figures for the Humanities students are 68 and 44 respectively. A student is chosen at random. Find the probability that this student
  1. is studying Arts subjects,
  2. does not wear glasses, given that the student is studying Arts subjects. Amongst the Science students, \(80 \%\) are right-handed. Corresponding percentages for Humanities and Arts students are 75\% and 70\% respectively. A student is again chosen at random.
  3. Find the probability that this student is right-handed.
  4. Given that this student is right-handed, find the probability that the student is studying Science subjects.
    Turn over
    1. (a) Describe the main features and uses of a box plot.
    Children from schools \(A\) and \(B\) took part in a fun run for charity. The times, to the nearest minute, taken by the children from school \(A\) are summarised in Figure 1. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Figure 1} \includegraphics[alt={},max width=\textwidth]{3d4f7bfb-b235-418a-9411-a4d0b3188254-015_398_1045_946_461}
    \end{figure}
    1. Write down the time by which \(75 \%\) of the children in school \(A\) had completed the run.
    2. State the name given to this value.
  5. Explain what you understand by the two crosses ( X ) on Figure 1.
    For school \(B\) the least time taken by any of the children was 25 minutes and the longest time was 55 minutes. The three quartiles were 30,37 and 50 respectively.
  6. Draw a box plot to represent the data from school \(B\).
    \includegraphics[max width=\textwidth, alt={}, center]{3d4f7bfb-b235-418a-9411-a4d0b3188254-016_798_1196_580_372}
  7. Compare and contrast these two box plots.
    2. Sunita and Shelley talk to one another once a week on the telephone. Over many weeks they recorded, to the nearest minute, the number of minutes spent in conversation on each occasion. The following table summarises their results. Turn over
    1. As part of a statistics project, Gill collected data relating to the length of time, to the nearest minute, spent by shoppers in a supermarket and the amount of money they spent. Her data for a random sample of 10 shoppers are summarised in the table below, where \(t\) represents time and \(\pounds m\) the amount spent over \(\pounds 20\).
    Turn over
    1. A young family were looking for a new 3 bedroom semi-detached house. A local survey recorded the price \(x\), in \(\pounds 1000\), and the distance \(y\), in miles, from the station of such houses. The following summary statistics were provided
    $$S _ { x x } = 113573 , \quad S _ { y y } = 8.657 , \quad S _ { x y } = - 808.917$$
  8. Use these values to calculate the product moment correlation coefficient.
  9. Give an interpretation of your answer to part (a). Another family asked for the distances to be measured in km rather than miles.
  10. State the value of the product moment correlation coefficient in this case.
    2. The box plot in Figure 1 shows a summary of the weights of the luggage, in kg, for each musician in an orchestra on an overseas tour. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3d4f7bfb-b235-418a-9411-a4d0b3188254-045_346_1452_324_228} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} The airline's recommended weight limit for each musician's luggage was 45 kg . Given that none of the musicians' luggage weighed exactly 45 kg ,
  11. state the proportion of the musicians whose luggage was below the recommended weight limit. A quarter of the musicians had to pay a charge for taking heavy luggage.
  12. State the smallest weight for which the charge was made.
  13. Explain what you understand by the + on the box plot in Figure 1, and suggest an instrument that the owner of this luggage might play.
  14. Describe the skewness of this distribution. Give a reason for your answer. One musician of the orchestra suggests that the weights of luggage, in kg, can be modelled by a normal distribution with quartiles as given in Figure 1.
  15. Find the standard deviation of this normal distribution.
    3. A student is investigating the relationship between the price ( \(y\) pence) of 100 g of chocolate and the percentage ( \(x \%\) ) of cocoa solids in the chocolate.
    The following data is obtained Turn over
    advancing learning, changing lives
    1. A personnel manager wants to find out if a test carried out during an employee's interview and a skills assessment at the end of basic training is a guide to performance after working for the company for one year.
    The table below shows the results of the interview test of 10 employees and their performance after one year. Turn over
    advancing learning, changing lives
    1. A disease is known to be present in \(2 \%\) of a population. A test is developed to help determine whether or not someone has the disease.
    Given that a person has the disease, the test is positive with probability 0.95
    Given that a person does not have the disease, the test is positive with probability 0.03
  16. Draw a tree diagram to represent this information. A person is selected at random from the population and tested for this disease.
  17. Find the probability that the test is positive. A doctor randomly selects a person from the population and tests him for the disease. Given that the test is positive,
  18. find the probability that he does not have the disease.
  19. Comment on the usefulness of this test. 2. The age in years of the residents of two hotels are shown in the back to back stem and leaf diagram below. Abbey Hotel \(8 | 5 | 0\) means 58 years in Abbey hotel and 50 years in Balmoral hotel Balmoral Hotel Turn over
    1. A teacher is monitoring the progress of students using a computer based revision course. The improvement in performance, \(y\) marks, is recorded for each student along with the time, \(x\) hours, that the student spent using the revision course. The results for a random sample of 10 students are recorded below.
    Turn over
    advancing learning, changing lives
    1. The volume of a sample of gas is kept constant. The gas is heated and the pressure, \(p\), is measured at 10 different temperatures, \(t\). The results are summarised below.
      \(\sum p = 445 \quad \sum p ^ { 2 } = 38125 \quad \sum t = 240 \quad \sum t ^ { 2 } = 27520 \quad \sum p t = 26830\)
    2. Find \(\mathrm { S } _ { p p }\) and \(\mathrm { S } _ { p t }\).
    Given that \(\mathrm { S } _ { t t } = 21760\),
  20. calculate the product moment correlation coefficient.
  21. Give an interpretation of your answer to part (b).
    2. On a randomly chosen day the probability that Bill travels to school by car, by bicycle or on foot is \(\frac { 1 } { 2 } , \frac { 1 } { 6 }\) and \(\frac { 1 } { 3 }\) respectively. The probability of being late when using these methods of travel is \(\frac { 1 } { 5 } , \frac { 2 } { 5 }\) and \(\frac { 1 } { 10 }\) respectively.
  22. Draw a tree diagram to represent this information.
  23. Find the probability that on a randomly chosen day
    1. Bill travels by foot and is late,
    2. Bill is not late.
  24. Given that Bill is late, find the probability that he did not travel on foot.
    3. The variable \(x\) was measured to the nearest whole number. Forty observations are given in the table below.
    \(x\)\(10 - 15\)\(16 - 18\)\(19 -\)
    Frequency15916
    A histogram was drawn and the bar representing the \(10 - 15\) class has a width of 2 cm and a height of 5 cm . For the \(16 - 18\) class find
  25. the width,
  26. the height
    of the bar representing this class.
    4. A researcher measured the foot lengths of a random sample of 120 ten-year-old children. The lengths are summarised in the table below.
    Foot length, \(l\), (cm)Number of children
    \(10 \leqslant l < 12\)5
    \(12 \leqslant l < 17\)53
    \(17 \leqslant l < 19\)29
    \(19 \leqslant l < 21\)15
    \(21 \leqslant l < 23\)11
    \(23 \leqslant l < 25\)7
  27. Use interpolation to estimate the median of this distribution.
  28. Calculate estimates for the mean and the standard deviation of these data. One measure of skewness is given by $$\text { Coefficient of skewness } = \frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }$$
  29. Evaluate this coefficient and comment on the skewness of these data. Greg suggests that a normal distribution is a suitable model for the foot lengths of ten-year-old children.
  30. Using the value found in part (c), comment on Greg's suggestion, giving a reason for your answer.
    5. The weight, \(w\) grams, and the length, \(l \mathrm {~mm}\), of 10 randomly selected newborn turtles are given in the table below.
    \(l\)49.052.053.054.554.153.450.051.649.551.2
    \(w\)29323439383530312930
    $$\text { (You may use } \mathrm { S } _ { l l } = 33.381 \quad \mathrm {~S} _ { w l } = 59.99 \quad \mathrm {~S} _ { w w } = 120.1 \text { ) }$$
  31. Find the equation of the regression line of \(w\) on \(l\) in the form \(w = a + b l\).
  32. Use your regression line to estimate the weight of a newborn turtle of length 60 mm .
  33. Comment on the reliability of your estimate giving a reason for your answer.
    6. The discrete random variable \(X\) has probability function $$\mathrm { P } ( X = x ) = \left\{ \begin{array} { c l } a ( 3 - x ) & x = 0,1,2
    b & x = 3 \end{array} \right.$$
  34. Find \(\mathrm { P } ( X = 2 )\) and complete the table below.
    \(x\)0123
    \(\mathrm { P } ( X = x )\)\(3 a\)\(2 a\)\(b\)
    Given that \(\mathrm { E } ( X ) = 1.6\)
  35. Find the value of \(a\) and the value of \(b\). Find
  36. \(\mathrm { P } ( 0.5 < X < 3 )\),
  37. \(\mathrm { E } ( 3 X - 2 )\).
  38. Show that the \(\operatorname { Var } ( X ) = 1.64\)
  39. Calculate \(\operatorname { Var } ( 3 X - 2 )\).
    7. (a) Given that \(\mathrm { P } ( A ) = a\) and \(\mathrm { P } ( B ) = b\) express \(\mathrm { P } ( A \cup B )\) in terms of \(a\) and \(b\) when
    1. \(A\) and \(B\) are mutually exclusive,
    2. \(A\) and \(B\) are independent. Two events \(R\) and \(Q\) are such that
      \(\mathrm { P } \left( R \cap Q ^ { \prime } \right) = 0.15 , \quad \mathrm { P } ( Q ) = 0.35\) and \(\mathrm { P } ( R \mid Q ) = 0.1\)
      Find the value of
  40. \(\mathrm { P } ( R \cup Q )\),
  41. \(\mathrm { P } ( R \cap Q )\),
  42. \(\mathrm { P } ( R )\).
Question 8
View details
8. The lifetimes of bulbs used in a lamp are normally distributed. A company \(X\) sells bulbs with a mean lifetime of 850 hours and a standard deviation of 50 hours.
  1. Find the probability of a bulb, from company \(X\), having a lifetime of less than 830 hours.
  2. In a box of 500 bulbs, from company \(X\), find the expected number having a lifetime of less than 830 hours. A rival company \(Y\) sells bulbs with a mean lifetime of 860 hours and \(20 \%\) of these bulbs have a lifetime of less than 818 hours.
  3. Find the standard deviation of the lifetimes of bulbs from company \(Y\). Both companies sell the bulbs for the same price.
  4. State which company you would recommend. Give reasons for your answer.
    \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{physicsandmathstutor.com}
    \end{table} Paper Reference(s)
    6683/01 \section*{Edexcel GCE } Examiner's use only
    \includegraphics[max width=\textwidth, alt={}, center]{3d4f7bfb-b235-418a-9411-a4d0b3188254-112_99_309_493_1636} \(\frac { \text { Materials required for examination } } { \text { Mathematical Formulae (Pink) } } \frac { \text { Items included with question papers } } { \text { Nil } }\) Candidates may use any calculator allowed by the regulations of the Joint Council for Qualifications. Calculators must not have the facility for symbolic algebra manipulation, differentiation and integration, or have retrievable mathematical formulae stored in them. In the boxes above, write your centre number, candidate number, your surname, initials and signature.
    Check that you have the correct question paper.
    Answer ALL the questions.
    You must write your answer to each question in the space following the question.
    Values from the statistical tables should be quoted in full. When a calculator is used, the answer should be given to an appropriate degree of accuracy. A booklet 'Mathematical Formulae and Statistical Tables' is provided.
    Full marks may be obtained for answers to ALL questions.
    The marks for individual questions and the parts of questions are shown in round brackets: e.g. (2).
    There are 7 questions in this question paper. The total mark for this paper is 75.
    There are 28 pages in this question paper. Any blank pages are indicated. You must ensure that your answers to parts of questions are clearly labelled.
    You should show sufficient working to make your methods clear to the Examiner.
    Answers without working may not gain full credit. Turn over
    advancing learning, changing lives
    1. Gary compared the total attendance, \(x\), at home matches and the total number of goals, \(y\), scored at home during a season for each of 12 football teams playing in a league. He correctly calculated:
    $$S _ { x x } = 1022500 \quad S _ { y y } = 130.9 \quad S _ { x y } = 8825$$
  5. Calculate the product moment correlation coefficient for these data.
  6. Interpret the value of the correlation coefficient. Helen was given the same data to analyse. In view of the large numbers involved she decided to divide the attendance figures by 100 . She then calculated the product moment correlation coefficient between \(\frac { x } { 100 }\) and \(y\).
  7. Write down the value Helen should have obtained.
    2. An experiment consists of selecting a ball from a bag and spinning a coin. The bag contains 5 red balls and 7 blue balls. A ball is selected at random from the bag, its colour is noted and then the ball is returned to the bag. When a red ball is selected, a biased coin with probability \(\frac { 2 } { 3 }\) of landing heads is spun.
    When a blue ball is selected a fair coin is spun.
  8. Complete the tree diagram below to show the possible outcomes and associated probabilities.
    \includegraphics[max width=\textwidth, alt={}, center]{3d4f7bfb-b235-418a-9411-a4d0b3188254-129_787_395_734_548} \section*{Coin}
    \includegraphics[max width=\textwidth, alt={}]{3d4f7bfb-b235-418a-9411-a4d0b3188254-129_1007_488_808_950}
    Shivani selects a ball and spins the appropriate coin.
  9. Find the probability that she obtains a head. Given that Tom selected a ball at random and obtained a head when he spun the appropriate coin,
  10. find the probability that Tom selected a red ball. Shivani and Tom each repeat this experiment.
  11. Find the probability that the colour of the ball Shivani selects is the same as the colour of the ball Tom selects. 3. The discrete random variable \(X\) has probability distribution given by Turn over
    advancing learning, changing lives
    1. A random sample of 50 salmon was caught by a scientist. He recorded the length \(l \mathrm {~cm}\) and weight \(w \mathrm {~kg}\) of each salmon.
    The following summary statistics were calculated from these data.
    \(\sum l = 4027 \quad \sum l ^ { 2 } = 327754.5 \quad \sum w = 357.1 \quad \sum l w = 29330.5 \quad S _ { w w } = 289.6\)
  12. Find \(S _ { l l }\) and \(S _ { l w }\)
  13. Calculate, to 3 significant figures, the product moment correlation coefficient between \(l\) and \(w\).
  14. Give an interpretation of your coefficient.
    1. Keith records the amount of rainfall, in mm , at his school, each day for a week. The results are given below.
      0.0
      0.5
      1.8
      2.8
      2.3
      5.6
      9.4
    Jenny then records the amount of rainfall, \(x \mathrm {~mm}\), at the school each day for the following 21 days. The results for the 21 days are summarised below. $$\sum x = 84.6$$
  15. Calculate the mean amount of rainfall during the whole 28 days. Keith realises that he has transposed two of his figures. The number 9.4 should have been 4.9 and the number 0.5 should have been 5.0 Keith corrects these figures.
  16. State, giving your reason, the effect this will have on the mean.
    3. Over a long period of time a small company recorded the amount it received in sales per month. The results are summarised below. Turn over
    advancing learning, changing lives
    1. On a particular day the height above sea level, \(x\) metres, and the mid-day temperature, \(y ^ { \circ } \mathrm { C }\), were recorded in 8 north European towns. These data are summarised below
    $$\mathrm { S } _ { x x } = 3535237.5 \quad \sum y = 181 \quad \sum y ^ { 2 } = 4305 \quad \mathrm {~S} _ { x y } = - 23726.25$$
  17. Find \(\mathrm { S } _ { y y }\)
  18. Calculate, to 3 significant figures, the product moment correlation coefficient for these data.
  19. Give an interpretation of your coefficient. A student thought that the calculations would be simpler if the height above sea level, \(h\), was measured in kilometres and used the variable \(h = \frac { x } { 1000 }\) instead of \(x\).
  20. Write down the value of \(\mathrm { S } _ { h h }\)
  21. Write down the value of the correlation coefficient between \(h\) and \(y\).
    1. The random variable \(X \sim \mathrm {~N} \left( \mu , 5 ^ { 2 } \right)\) and \(\mathrm { P } ( X < 23 ) = 0.9192\)
    2. Find the value of \(\mu\).
    3. Write down the value of \(\mathrm { P } ( \mu < X < 23 )\).
    4. The discrete random variable \(Y\) has probability distribution
    Turn over
    1. The histogram in Figure 1 shows the time, to the nearest minute, that a random sample of 100 motorists were delayed by roadworks on a stretch of motorway.
    \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3d4f7bfb-b235-418a-9411-a4d0b3188254-171_1312_673_349_639} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure}
  22. Complete the table. Turn over
    1. A discrete random variable \(X\) has the probability function
    $$\mathrm { P } ( X = x ) = \begin{cases} k ( 1 - x ) ^ { 2 } & x = - 1,0,1 \text { and } 2
    0 & \text { otherwise } \end{cases}$$
  23. Show that \(k = \frac { 1 } { 6 }\)
  24. Find \(\mathrm { E } ( X )\)
  25. Show that \(\mathrm { E } \left( X ^ { 2 } \right) = \frac { 4 } { 3 }\)
  26. Find \(\operatorname { Var } ( 1 - 3 X )\)
    2. A bank reviews its customer records at the end of each month to find out how many customers have become unemployed, \(u\), and how many have had their house repossessed, \(h\), during that month. The bank codes the data using variables \(x = \frac { u - 100 } { 3 }\) and \(y = \frac { h - 20 } { 7 }\) The results for the 12 months of 2009 are summarised below. $$\sum x = 477 \quad S _ { x x } = 5606.25 \quad \sum y = 480 \quad S _ { y y } = 4244 \quad \sum x y = 23070$$
  27. Calculate the value of the product moment correlation coefficient for \(x\) and \(y\).
  28. Write down the product moment correlation coefficient for \(u\) and \(h\). The bank claims that an increase in unemployment among its customers is associated with an increase in house repossessions.
  29. State, with a reason, whether or not the bank's claim is supported by these data.
    3. A scientist is researching whether or not birds of prey exposed to pollutants lay eggs with thinner shells. He collects a random sample of egg shells from each of 6 different nests and tests for pollutant level, \(p\), and measures the thinning of the shell, \(t\). The results are shown in the table below. Turn over
    1. A teacher asked a random sample of 10 students to record the number of hours of television, \(t\), they watched in the week before their mock exam. She then calculated their grade, \(g\), in their mock exam. The results are summarised as follows.
    $$\sum t = 258 \quad \sum t ^ { 2 } = 8702 \quad \sum g = 63.6 \quad \mathrm {~S} _ { g g } = 7.864 \quad \sum g t = 1550.2$$
  30. Find \(\mathrm { S } _ { t t }\) and \(\mathrm { S } _ { g t }\)
  31. Calculate, to 3 significant figures, the product moment correlation coefficient between \(t\) and \(g\). The teacher also recorded the number of hours of revision, \(v\), these 10 students completed during the week before their mock exam. The correlation coefficient between \(t\) and \(v\) was -0.753
  32. Describe, giving a reason, the nature of the correlation you would expect to find between \(v\) and \(g\).
    2. The discrete random variable \(X\) can take only the values 1,2 and 3 . For these values the cumulative distribution function is defined by $$\mathrm { F } ( x ) = \frac { x ^ { 3 } + k } { 40 } \quad x = 1,2,3$$
  33. Show that \(k = 13\)
  34. Find the probability distribution of \(X\). Given that \(\operatorname { Var } ( X ) = \frac { 259 } { 320 }\)
  35. find the exact value of \(\operatorname { Var } ( 4 X - 5 )\).
    3. A biologist is comparing the intervals ( \(m\) seconds) between the mating calls of a certain species of tree frog and the surrounding temperature ( \(t { } ^ { \circ } \mathrm { C }\) ). The following results were obtained. Turn over
    1. Sammy is studying the number of units of gas, \(g\), and the number of units of electricity, \(e\), used in her house each week. A random sample of 10 weeks use was recorded and the data for each week were coded so that \(x = \frac { g - 60 } { 4 }\) and \(y = \frac { e } { 10 }\). The results for the coded data are summarised below
    $$\sum x = 48.0 \quad \sum y = 58.0 \quad \mathrm {~S} _ { x x } = 312.1 \quad \mathrm {~S} _ { y y } = 2.10 \quad \mathrm {~S} _ { x y } = 18.35$$
  36. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + b x\). Give the values of \(a\) and \(b\) correct to 3 significant figures.
  37. Hence find the equation of the regression line of \(e\) on \(g\) in the form \(e = c + d g\). Give the values of \(c\) and \(d\) correct to 2 significant figures.
  38. Use your regression equation to estimate the number of units of electricity used in a week when 100 units of gas were used.
    (a)Find the probability distribution of \(X\) .
    (b)Write down the value of \(\mathrm { F } ( 1.8 )\) .
    (a)Find the probability distribution of \(X\) .勤 2.The discrete random variable \(X\) takes the values 1,2 and 3 and has cum
    function \(\mathrm { F } ( x )\) given by Turn over
    1. A meteorologist believes that there is a relationship between the height above sea level, \(h \mathrm {~m}\), and the air temperature, \(t ^ { \circ } \mathrm { C }\). Data is collected at the same time from 9 different places on the same mountain. The data is summarised in the table below.
    \(h\)140011002608409005501230100770
    \(t\)310209101352416
    [You may assume that \(\sum h = 7150 , \sum t = 110 , \sum h ^ { 2 } = 7171500 , \sum t ^ { 2 } = 1716\), \(\sum t h = 64980\) and \(\mathrm { S } _ { t t } = 371.56\) ]
  39. Calculate \(\mathrm { S } _ { t h }\) and \(\mathrm { S } _ { h h }\). Give your answers to 3 significant figures.
  40. Calculate the product moment correlation coefficient for this data.
  41. State whether or not your value supports the use of a regression equation to predict the air temperature at different heights on this mountain. Give a reason for your answer.
  42. Find the equation of the regression line of \(t\) on \(h\) giving your answer in the form \(t = a + b h\).
  43. Interpret the value of \(b\).
  44. Estimate the difference in air temperature between a height of 500 m and a height of 1000 m .
    1. The marks of a group of female students in a statistics test are summarised in Figure 1
    \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3d4f7bfb-b235-418a-9411-a4d0b3188254-227_629_1102_342_429} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure}
  45. Write down the mark which is exceeded by \(75 \%\) of the female students. The marks of a group of male students in the same statistics test are summarised by the stem and leaf diagram below.
    Mark(2|6 means 26)Totals
    14(1)
    26(1)
    3447(3)
    4066778(6)
    5001113677(9)
    6223338(6)
    7008(3)
    85(1)
    90(1)
  46. Find the median and interquartile range of the marks of the male students. An outlier is a mark that is
    either more than \(1.5 \times\) interquartile range above the upper quartile or more than \(1.5 \times\) interquartile range below the lower quartile.
  47. In the space provided on Figure 1 draw a box plot to represent the marks of the male students, indicating clearly any outliers.
  48. Compare and contrast the marks of the male and the female students.
    3. In a company the 200 employees are classified as full-time workers, part-time workers or contractors.
    The table below shows the number of employees in each category and whether they walk to work or use some form of transport.
    \cline { 2 - 3 } \multicolumn{1}{c|}{}WalkTransport
    Full-time worker28
    Part-time worker3575
    Contractor3050
    The events \(F , H\) and \(C\) are that an employee is a full-time worker, part-time worker or contractor respectively. Let \(W\) be the event that an employee walks to work. An employee is selected at random.
    Find
  49. \(\mathrm { P } ( H )\)
  50. \(\mathrm { P } \left( [ F \cap W ] ^ { \prime } \right)\)
  51. \(\mathrm { P } ( W \mid C )\) Let \(B\) be the event that an employee uses the bus.
    Given that \(10 \%\) of full-time workers use the bus, \(30 \%\) of part-time workers use the bus and \(20 \%\) of contractors use the bus,
  52. draw a Venn diagram to represent the events \(F , H , C\) and \(B\),
  53. find the probability that a randomly selected employee uses the bus to travel to work. 4. The following table summarises the times, \(t\) minutes to the nearest minute, recorded for a group of students to complete an exam.
    Time (minutes) \(t\)\(11 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 35\)\(36 - 45\)\(46 - 60\)
    Number of students f628816131110
    $$\text { [You may use } \sum \mathrm { f } t ^ { 2 } = 134281.25 \text { ] }$$
  54. Estimate the mean and standard deviation of these data.
  55. Use linear interpolation to estimate the value of the median.
  56. Show that the estimated value of the lower quartile is 18.6 to 3 significant figures.
  57. Estimate the interquartile range of this distribution.
  58. Give a reason why the mean and standard deviation are not the most appropriate summary statistics to use with these data. The person timing the exam made an error and each student actually took 5 minutes less than the times recorded above. The table below summarises the actual times.
    Time (minutes) \(t\)\(6 - 15\)\(16 - 20\)\(21 - 25\)\(26 - 30\)\(31 - 40\)\(41 - 55\)
    Number of students f628816131110
  59. Without further calculations, explain the effect this would have on each of the estimates found in parts (a), (b), (c) and (d).
    1. A biased die with six faces is rolled. The discrete random variable \(X\) represents the score on the uppermost face. The probability distribution of \(X\) is shown in the table below.
    \(x\)123456
    \(\mathrm { P } ( X = x )\)\(a\)\(a\)\(a\)\(b\)\(b\)0.3
  60. Given that \(\mathrm { E } ( X ) = 4.2\) find the value of \(a\) and the value of \(b\).
  61. Show that \(\mathrm { E } \left( X ^ { 2 } \right) = 20.4\)
  62. Find \(\operatorname { Var } ( 5 - 3 X )\) A biased die with five faces is rolled. The discrete random variable \(Y\) represents the score which is uppermost. The cumulative distribution function of \(Y\) is shown in the table below.
    \(y\)12345
    \(\mathrm {~F} ( y )\)\(\frac { 1 } { 10 }\)\(\frac { 2 } { 10 }\)\(3 k\)\(4 k\)\(5 k\)
  63. Find the value of \(k\).
  64. Find the probability distribution of \(Y\). Each die is rolled once. The scores on the two dice are independent.
  65. Find the probability that the sum of the two scores equals 2
    1. The weight, in grams, of beans in a tin is normally distributed with mean \(\mu\) and standard deviation 7.8
    Given that \(10 \%\) of tins contain less than 200 g , find
  66. the value of \(\mu\)
  67. the percentage of tins that contain more than 225 g of beans. The machine settings are adjusted so that the weight, in grams, of beans in a tin is normally distributed with mean 205 and standard deviation \(\sigma\).
  68. Given that \(98 \%\) of tins contain between 200 g and 210 g find the value of \(\sigma\).
    \section*{Probability} $$\begin{aligned} & \mathrm { P } ( A \cup B ) = \mathrm { P } ( A ) + \mathrm { P } ( B ) - \mathrm { P } ( A \cap B )
    & \mathrm { P } ( A \cap B ) = \mathrm { P } ( A ) \mathrm { P } ( B \mid A )
    & \mathrm { P } ( A \mid B ) = \frac { \mathrm { P } ( B \mid A ) \mathrm { P } ( A ) } { \mathrm { P } ( B \mid A ) \mathrm { P } ( A ) + \mathrm { P } \left( B \mid A ^ { \prime } \right) \mathrm { P } \left( A ^ { \prime } \right) } \end{aligned}$$ \section*{Discrete distributions} For a discrete random variable \(X\) taking values \(x _ { i }\) with probabilities \(\mathrm { P } \left( X = x _ { i } \right)\)
    Expectation (mean): \(\mathrm { E } ( X ) = \mu = \Sigma x _ { i } \mathrm { P } \left( X = x _ { i } \right)\)
    Variance: \(\operatorname { Var } ( X ) = \sigma ^ { 2 } = \Sigma \left( x _ { i } - \mu \right) ^ { 2 } \mathrm { P } \left( X = x _ { i } \right) = \Sigma x _ { i } ^ { 2 } \mathrm { P } \left( X = x _ { i } \right) - \mu ^ { 2 }\)
    For a function \(\mathrm { g } ( X ) : \mathrm { E } ( \mathrm { g } ( X ) ) = \Sigma \mathrm { g } \left( x _ { i } \right) \mathrm { P } \left( X = x _ { i } \right)\) \section*{Continuous distributions} Standard continuous distribution:
    Distribution of \(X\)P.D.F.MeanVariance
    Normal \(\mathrm { N } \left( \mu , \sigma ^ { 2 } \right)\)\(\frac { 1 } { \sigma \sqrt { 2 \pi } } \mathrm { e } ^ { - \frac { 1 } { 2 } \left( \frac { x - \mu } { \sigma } \right) ^ { 2 } }\)\(\mu\)\(\sigma ^ { 2 }\)
    \section*{Correlation and regression} For a set of \(n\) pairs of values ( \(x _ { i } , y _ { i }\) ) $$\begin{aligned} & S _ { x x } = \Sigma \left( x _ { i } - \bar { x } \right) ^ { 2 } = \Sigma x _ { i } ^ { 2 } - \frac { \left( \Sigma x _ { i } \right) ^ { 2 } } { n }
    & S _ { y y } = \Sigma \left( y _ { i } - \bar { y } \right) ^ { 2 } = \Sigma y _ { i } ^ { 2 } - \frac { \left( \Sigma y _ { i } \right) ^ { 2 } } { n }
    & S _ { x y } = \Sigma \left( x _ { i } - \bar { x } \right) \left( y _ { i } - \bar { y } \right) = \Sigma x _ { i } y _ { i } - \frac { \left( \Sigma x _ { i } \right) \left( \Sigma y _ { i } \right) } { n } \end{aligned}$$ The product moment correlation coefficient is $$r = \frac { S _ { x y } } { \sqrt { S _ { x x } S _ { y y } } } = \frac { \Sigma \left( x _ { i } - \bar { x } \right) \left( y _ { i } - \bar { y } \right) } { \sqrt { \left\{ \Sigma \left( x _ { i } - \bar { x } \right) ^ { 2 } \right\} \left\{ \Sigma \left( y _ { i } - \bar { y } \right) ^ { 2 } \right\} } } = \frac { \Sigma x _ { i } y _ { i } - \frac { \left( \Sigma x _ { i } \right) \left( \Sigma y _ { i } \right) } { n } } { \sqrt { \left( \Sigma x _ { i } ^ { 2 } - \frac { \left( \Sigma x _ { i } \right) ^ { 2 } } { n } \right) \left( \Sigma y _ { i } ^ { 2 } - \frac { \left( \Sigma y _ { i } \right) ^ { 2 } } { n } \right) } }$$ The regression coefficient of \(y\) on \(x\) is \(b = \frac { S _ { x y } } { S _ { x x } } = \frac { \Sigma \left( x _ { i } - \bar { x } \right) \left( y _ { i } - \bar { y } \right) } { \Sigma \left( x _ { i } - \bar { x } \right) ^ { 2 } }\) Least squares regression line of \(y\) on \(x\) is \(y = a + b x\) where \(a = \bar { y } - b \bar { x }\) \section*{THE NORMAL DISTRIBUTION FUNCTION} The function tabulated below is \(\Phi ( z )\), defined as \(\Phi ( z ) = \frac { 1 } { \sqrt { 2 \pi } } \int _ { - \infty } ^ { z } e ^ { - \frac { 1 } { 2 } t ^ { 2 } } \mathrm {~d} t\).
    \(z\)\(\Phi ( z )\)\(z\)\(\Phi ( z )\)\(z\)\(\Phi ( z )\)\(z\)\(\Phi ( z )\)\(z\)\(\Phi ( z )\)
    0.000.50000.500.69151.000.84131.500.93322.000.9772
    0.010.50400.510.69501.010.84381.510.93452.020.9783
    0.020.50800.520.69851.020.84611.520.93572.040.9793
    0.030.51200.530.70191.030.84851.530.93702.060.9803
    0.040.51600.540.70541.040.85081.540.93822.080.9812
    0.050.51990.550.70881.050.85311.550.93942.100.9821
    0.060.52390.560.71231.060.85541.560.94062.120.9830
    0.070.52790.570.71571.070.85771.570.94182.140.9838
    0.080.53190.580.71901.080.85991.580.94292.160.9846
    0.090.53590.590.72241.090.86211.590.94412.180.9854
    0.100.53980.600.72571.100.86431.600.94522.200.9861
    0.110.54380.610.72911.110.86651.610.94632.220.9868
    0.120.54780.620.73241.120.86861.620.94742.240.9875
    0.130.55170.630.73571.130.87081.630.94842.260.9881
    0.140.55570.640.73891.140.87291.640.94952.280.9887
    0.150.55960.650.74221.150.87491.650.95052.300.9893
    0.160.56360.660.74541.160.87701.660.95152.320.9898
    0.170.56750.670.74861.170.87901.670.95252.340.9904
    0.180.57140.680.75171.180.88101.680.95352.360.9909
    0.190.57530.690.75491.190.88301.690.95452.380.9913
    0.200.57930.700.75801.200.88491.700.95542.400.9918
    0.210.58320.710.76111.210.88691.710.95642.420.9922
    0.220.58710.720.76421.220.88881.720.95732.440.9927
    0.230.59100.730.76731.230.89071.730.95822.460.9931
    0.240.59480.740.77041.240.89251.740.95912.480.9934
    0.250.59870.750.77341.250.89441.750.95992.500.9938
    0.260.60260.760.77641.260.89621.760.96082.550.9946
    0.270.60640.770.77941.270.89801.770.96162.600.9953
    0.280.61030.780.78231.280.89971.780.96252.650.9960
    0.290.61410.790.78521.290.90151.790.96332.700.9965
    0.300.61790.800.78811.300.90321.800.96412.750.9970
    0.310.62170.810.79101.310.90491.810.96492.800.9974
    0.320.62550.820.79391.320.90661.820.96562.850.9978
    0.330.62930.830.79671.330.90821.830.96642.900.9981
    0.340.63310.840.79951.340.90991.840.96712.950.9984
    0.350.63680.850.80231.350.91151.850.96783.000.9987
    0.360.64060.860.80511.360.91311.860.96863.050.9989
    0.370.64430.870.80781.370.91471.870.96933.100.9990
    0.380.64800.880.81061.380.91621.880.96993.150.9992
    0.390.65170.890.81331.390.91771.890.97063.200.9993
    0.400.65540.900.81591.400.91921.900.97133.250.9994
    0.410.65910.910.81861.410.92071.910.97193.300.9995
    0.420.66280.920.82121.420.92221.920.97263.350.9996
    0.430.66640.930.82381.430.92361.930.97323.400.9997
    0.440.67000.940.82641.440.92511.940.97383.500.9998
    0.450.67360.950.82891.450.92651.950.97443.600.9998
    0.460.67720.960.83151.460.92791.960.97503.700.9999
    0.470.68080.970.83401.470.92921.970.97563.800.9999
    0.480.68440.980.83651.480.93061.980.97613.901.0000
    0.490.68790.990.83891.490.93191.990.97674.001.0000
    0.500.69151.000.84131.500.93322.000.9772
    \section*{PERCENTAGE POINTS OF THE NORMAL DISTRIBUTION} The values \(z\) in the table are those which a random variable \(Z \sim N ( 0,1 )\) exceeds with probability \(p\); that is, \(\mathrm { P } ( \mathrm { Z } > \mathrm { z } ) = 1 - \Phi ( \mathrm { z } ) = p\).
    \(p\)\(z\)\(p\)\(z\)
    0.50000.00000.05001.6449
    0.40000.25330.02501.9600
    0.30000.52440.01002.3263
    0.20000.84160.00502.5758
    0.15001.03640.00103.0902
    0.10001.28160.00053.2905