SPS SPS SM Statistics (SPS SM Statistics) 2024 January

Question 1
View details
1. At the beginning of the academic year, all the pupils in year 12 at a college take part in an assessment. Summary statistics for the marks obtained by the 2021 cohort are given below.
\(n = 205 \quad \sum x = 23042 \quad \sum x ^ { 2 } = 2591716\) Marks may only be whole numbers, but the Head of Mathematics believes that the distribution of marks may be modelled by a Normal distribution.
  1. Calculate
    • The mean mark
    • The variance of the marks
    • Use your answers to part (a) to write down a possible Normal model for the distribution of marks.
Question 2
View details
2. The heights, in centimetres, of a random sample of 150 plants of a certain variety were measured. The results are summarised in the histogram.
\includegraphics[max width=\textwidth, alt={}, center]{0e73f1d0-5532-4995-b39e-759d82c2bd92-04_860_1684_367_130} One of the 150 plants is chosen at random, and its height, \(X \mathrm {~cm}\), is noted.
  1. Show that \(\mathrm { P } ( 20 < X < 30 ) = 0.147\), correct to 3 significant figures. Sam suggests that the distribution of \(X\) can be well modelled by the distribution \(\mathrm { N } ( 40,100 )\).
    1. Give a brief justification for the use of the normal distribution in this context.
    2. Give a brief justification for the choice of the parameter values 40 and 100 .
  2. Use Sam's model to find \(\mathrm { P } ( 20 < X < 30 )\). Nina suggests a different model. She uses the midpoints of the classes to calculate estimates, \(m\) and \(s\), for the mean and standard deviation respectively, in centimetres, of the 150 heights. She then uses the distribution \(\mathrm { N } \left( m , s ^ { 2 } \right)\) as her model.
  3. Use Nina's model to find \(\mathrm { P } ( 20 < X < 30 )\).
    1. Complete the table in the Printed Answer Booklet to show the probabilities obtained from Sam's model and Nina's model.
    2. By considering the different ranges of values of \(X\) given in the table, discuss how well the two models fit the original distribution. Table for (e)(i):
      \(x\)Below 2020 to 3030 to 3535 to 4040 to 4545 to 5050 to 60Above 60
      Probability obtained from histogram0.0270.1470.1530.1870.1930.1470.1330.013
      Probability obtained from Sam's model, N(40, 100)0.0230.1500.1910.1360.023
      Probability obtained from Nina's model, \(\mathrm { N } \left( m , s ^ { 2 } \right)\)0.0300.1530.1880.1300.023
Question 3
View details
3. Zac is planning to write a report on the music preferences of the students at his college. There is a large number of students at the college.
  1. State one reason why Zac might wish to obtain information from a sample of students, rather than from all the students.
  2. Amaya suggests that Zac should use a sample that is stratified by school year. Give one advantage of this method as compared with random sampling, in this context. Zac decides to take a random sample of 60 students from his college. He asks each student how many hours per week, on average, they spend listening to music during term. From his results he calculates the following statistics.
    Mean
    Standard
    deviation
    Median
    Lower
    quartile
    Upper
    quartile
    21.04.2020.518.022.9
  3. Sundip tells Zac that, during term, she spends on average 30 hours per week listening to music. Discuss briefly whether this value should be considered an outlier.
  4. Layla claims that, during term, each student spends on average 20 hours per week listening to music. Zac believes that the true figure is higher than 20 hours. He uses his results to carry out a hypothesis test at the \(5 \%\) significance level. Assume that the time spent listening to music is normally distributed with standard deviation 4.20 hours. Carry out the test.
Question 4
View details
4. The table shows the increases, between 2001 and 2011, in the percentages of employees travelling to work by various methods, in the Local Authorities (LAs) in the North East region of the UK.
Geography codeLocal authorityWork mainly at or from homeUnderground, metro, light rail or tramBus, minibus or coachDriving a car or vanPassenger in a car or vanOn foot
E06000047County Durham0.74\%0.05\%-1.50\%4.58\%-2.99\%-0.97\%
E06000005Darlington0.26\%-0.01\%-3.25\%3.06\%-1.28\%0.99\%
E08000020Gateshead-0.01\%-0.01\%-2.28\%4.62\%-2.35\%-0.18\%
E06000001Hartlepool0.03\%-0.04\%-1.62\%4.80\%-2.38\%-0.26\%
E06000002Middlesbrough-0.34\%-0.01\%-2.32\%2.19\%-1.33\%0.67\%
E08000021Newcastle upon Tyne0.10\%-0.23\%-0.67\%-0.48\%-1.51\%1.75\%
E08000022North Tyneside0.05\%0.54\%-1.18\%3.30\%-2.21\%-0.60\%
E06000048Northumberland1.39\%-0.08\%-0.95\%3.50\%-2.37\%-1.44\%
E06000003Redcar and Cleveland-0.02\%-0.01\%-2.09\%4.20\%-2.06\%-0.49\%
E08000023South Tyneside-0.36\%2.03\%-3.05\%4.50\%-2.41\%-0.51\%
E06000004Stockton-on-Tees0.14\%0.03\%-2.02\%3.52\%-2.01\%-0.15\%
E08000024Sunderland0.17\%1.48\%-3.11\%4.89\%-2.21\%-0.52\%
\section*{Increase in percentage of employees travelling to work by various methods} The first two digits of the Geography code give the type of each of the LAs:
06: Unitary authority
07: Non-metropolitan district
08: Metropolitan borough
  1. In what type of LA are the largest increases in percentages of people travelling by underground, metro, light rail or tram?
  2. Identify two main changes in the pattern of travel to work in the North East region between 2001 and 2011. Now assume the following.
    • The data refer to residents in the given LAs who are in the age range 20 to 65 at the time of each census.
    • The number of people in the age range 20 to 65 who move into or out of each given LA, or who die, between 2001 and 2011 is negligible.
    • Estimate the percentage of the people in the age range 20 to 65 in 2011 whose data appears in both 2001 and 2011.
    • In the light of your answer to part (c), suggest a reason for the changes in the pattern of travel to work in the North East region between 2001 and 2011.
Question 5
View details
5. Labrador puppies may be black, yellow or chocolate in colour. Some information about a litter of 9 puppies is given in the table.
malefemale
black13
yellow21
chocolate11
Four puppies are chosen at random to train as guide dogs.
(b) Determine the probability that at least 3 black puppies are chosen.
(c) Determine the probability that exactly 3 females are chosen given that at least 3 black puppies are chosen.
(d) Explain whether the 2 events
'choosing exactly 3 females' and 'choosing at least 3 black puppies' are independent events. A firm claims that no more than \(2 \%\) of their packets of sugar are underweight. A market researcher believes that the actual proportion is greater than \(2 \%\). In order to test the firm's claim, the researcher weighs a random sample of 600 packets and carries out a hypothesis test, at the \(5 \%\) significance level, using the null hypothesis \(p = 0.02\).
(a) Given that the researcher's null hypothesis is correct, determine the probability that the researcher will conclude that the firm's claim is incorrect.
(b) The researcher finds that 18 out of the 600 packets are underweight. A colleague says
" 18 out of 600 is \(3 \%\), so there is evidence that the actual proportion of underweight bags is greater than \(2 \%\)." Criticise this statement.
Question 7
View details
7. The probability distribution of a random variable \(X\) is modelled as follows.
\(\mathrm { P } ( X = x ) = \begin{cases} \frac { k } { x } & x = 1,2,3,4 ,
0 & \text { otherwise, } \end{cases}\)
where \(k\) is a constant.
  1. Show that \(k = \frac { 12 } { 25 }\).
  2. Show in a table the values of \(X\) and their probabilities.
  3. The values of three independent observations of \(X\) are denoted by \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\). Find \(\mathrm { P } \left( X _ { 1 } > X _ { 2 } + X _ { 3 } \right)\). In a game, a player notes the values of successive independent observations of \(X\) and keeps a running total. The aim of the game is to reach a total of exactly 7 .
  4. Determine the probability that a total of exactly 7 is first reached on the 5th observation. END OF TEST