OCR H240/02 2021 November — Question 11 2 marks

Exam BoardOCR
ModuleH240/02 (Pure Mathematics and Statistics)
Year2021
SessionNovember
Marks2
PaperDownload PDF ↗
TopicZ-tests (known variance)
TypeKnown variance (z-distribution)
DifficultyModerate -0.8 This is a straightforward hypothesis test question with standard bookwork parts (a-c) requiring minimal calculation. Part (d) is a routine one-sample z-test with given population standard deviation - a textbook exercise requiring only substitution into standard formulas. The outlier discussion uses the basic IQR rule. No problem-solving or novel insight required.
Spec2.01c Sampling techniques: simple random, opportunity, etc2.01d Select/critique sampling: in context2.02h Recognize outliers2.05e Hypothesis test for normal mean: known variance

11 Zac is planning to write a report on the music preferences of the students at his college. There is a large number of students at the college.
  1. State one reason why Zac might wish to obtain information from a sample of students, rather than from all the students.
  2. Amaya suggests that Zac should use a sample that is stratified by school year. Give one advantage of this method as compared with random sampling, in this context. Zac decides to take a random sample of 60 students from his college. He asks each student how many hours per week, on average, they spend listening to music during term. From his results he calculates the following statistics.
    Mean
    Standard
    deviation
    Median
    Lower
    quartile
    Upper
    quartile
    21.04.2020.518.022.9
  3. Sundip tells Zac that, during term, she spends on average 30 hours per week listening to music. Discuss briefly whether this value should be considered an outlier.
  4. Layla claims that, during term, each student spends on average 20 hours per week listening to music. Zac believes that the true figure is higher than 20 hours. He uses his results to carry out a hypothesis test at the 5\% significance level. Assume that the time spent listening to music is normally distributed with standard deviation 4.20 hours. Carry out the test.

Question 11(a):
AnswerMarks Guidance
Population largeB1 [1] Or e.g. Would take too long to contact all students. NOT "Easier"
Question 11(b):
AnswerMarks Guidance
Includes students from all years (or ages); Numbers in years in correct proportions; Different years might like different musicB1 [1] Or: Different years may have different numbers of students. NOT "It's more representative" or "Takes all students into account" or "You get a range of people" or "It avoids bias"
Question 11(c):
AnswerMarks Guidance
\(21 + 2 \times 4.2 = 29.4\)B1 Allow "30 is more than 2 sds away from the mean"
\(22.9 + 1.5(22.9 - 18.0) = 30.25\)B1 Allow "30 is less than \(1.5 \times\) IQR from UQ"
Unclear whether 30 is an outlierB1 [3] Or e.g. "It depends which definition you use." Any comment implying uncertainty. Ignore comments about mean \(\pm 3\) sds or mean \(\pm 1\) sd
Question 11(d):
AnswerMarks Guidance
\(H_0: \mu = 20\); \(H_1: \mu > 20\) where \(\mu\) = pop mean time spentB1, B1 Allow other letters not \(X\) unless defined. Not \(\bar{X}\). B1B0 for 1 error e.g. 2-tail; \(\mu\) = sample mean implied B1B0; undefined \(\mu\) B1B0 not include value 20 B0B0; not in terms of parameter B1B0
\(\bar{X} \sim N(20, \frac{4.2^2}{60})\) and \(\bar{X} = 21\)M1 Correct distribution and value of \(\bar{X}\); stated or implied e.g. by 0.0326 or 0.967 or 20.9 or 1.84. Condone \(\frac{4.2^2}{\sqrt{60}}\) or \(\frac{4.2^2}{60^2}\) or \(\frac{4.2}{60}\)
\(P(\bar{X} > 21) = 0.0326\); Compare 0.05A1, A1 BC Allow 2 sf i.e. 0.033. Dep 0.0326 or 1.84 or 0.9674 or \(P(X>21)\) or \(P(X \geq 21)\); must compare like with like
Alternative methods:
AnswerMarks Guidance
\(\frac{a-20}{4.2 \div \sqrt{60}} = 1.645 \Rightarrow a = 20.9\); CV = 20.9; \(21 > 20.9\) or 21 not in acceptance regionM1, A1, A1 Condone \(\frac{4.2^2}{\sqrt{60}}\) or \(\frac{4.2^2}{60^2}\) or \(\frac{4.2}{60}\)
\(\frac{21-20}{4.2 \div \sqrt{60}} = 1.84\); \(z_{calc} = 1.84\); Compare 1.645M1, A1, A1
\(P(\bar{X} < 21) = 0.9674\); Compare 0.95M1, A1, A1 BC
Reject \(H_0\), Accept \(H_1\)M1 Dependent on clearly valid comparison of like with like. May be implied by conclusion
There is evidence that (mean) time spent is \(> 20\) hoursA1f [7] In context, not definite. e.g. "Mean time is \(> 20\) hours": A0
## Question 11(a):

| Population large | B1 [1] | Or e.g. Would take too long to contact all students. NOT "Easier" |
|---|---|---|

---

## Question 11(b):

| Includes students from all years (or ages); Numbers in years in correct proportions; Different years might like different music | B1 [1] | Or: Different years may have different numbers of students. NOT "It's more representative" or "Takes all students into account" or "You get a range of people" or "It avoids bias" |
|---|---|---|

---

## Question 11(c):

| $21 + 2 \times 4.2 = 29.4$ | B1 | Allow "30 is more than 2 sds away from the mean" |
|---|---|---|
| $22.9 + 1.5(22.9 - 18.0) = 30.25$ | B1 | Allow "30 is less than $1.5 \times$ IQR from UQ" |
| Unclear whether 30 is an outlier | B1 [3] | Or e.g. "It depends which definition you use." Any comment implying uncertainty. Ignore comments about mean $\pm 3$ sds or mean $\pm 1$ sd |

---

## Question 11(d):

| $H_0: \mu = 20$; $H_1: \mu > 20$ where $\mu$ = pop mean time spent | B1, B1 | Allow other letters not $X$ unless defined. Not $\bar{X}$. B1B0 for 1 error e.g. 2-tail; $\mu$ = sample mean implied B1B0; undefined $\mu$ B1B0 not include value 20 B0B0; not in terms of parameter B1B0 |
|---|---|---|
| $\bar{X} \sim N(20, \frac{4.2^2}{60})$ and $\bar{X} = 21$ | M1 | Correct distribution and value of $\bar{X}$; stated or implied e.g. by 0.0326 or 0.967 or 20.9 or 1.84. Condone $\frac{4.2^2}{\sqrt{60}}$ or $\frac{4.2^2}{60^2}$ or $\frac{4.2}{60}$ |
| $P(\bar{X} > 21) = 0.0326$; Compare 0.05 | A1, A1 | BC Allow 2 sf i.e. 0.033. Dep 0.0326 or 1.84 or 0.9674 or $P(X>21)$ or $P(X \geq 21)$; must compare like with like |

**Alternative methods:**
| $\frac{a-20}{4.2 \div \sqrt{60}} = 1.645 \Rightarrow a = 20.9$; CV = 20.9; $21 > 20.9$ or 21 not in acceptance region | M1, A1, A1 | Condone $\frac{4.2^2}{\sqrt{60}}$ or $\frac{4.2^2}{60^2}$ or $\frac{4.2}{60}$ |
|---|---|---|
| $\frac{21-20}{4.2 \div \sqrt{60}} = 1.84$; $z_{calc} = 1.84$; Compare 1.645 | M1, A1, A1 | |
| $P(\bar{X} < 21) = 0.9674$; Compare 0.95 | M1, A1, A1 | BC |
| Reject $H_0$, Accept $H_1$ | M1 | Dependent on clearly valid comparison of like with like. May be implied by conclusion |
| There is evidence that (mean) time spent is $> 20$ hours | A1f [7] | In context, not definite. e.g. "Mean time is $> 20$ hours": A0 |

---
11 Zac is planning to write a report on the music preferences of the students at his college. There is a large number of students at the college.
\begin{enumerate}[label=(\alph*)]
\item State one reason why Zac might wish to obtain information from a sample of students, rather than from all the students.
\item Amaya suggests that Zac should use a sample that is stratified by school year.

Give one advantage of this method as compared with random sampling, in this context.

Zac decides to take a random sample of 60 students from his college. He asks each student how many hours per week, on average, they spend listening to music during term. From his results he calculates the following statistics.

\begin{center}
\begin{tabular}{ | c | c | c | c | c | }
\hline
Mean & \begin{tabular}{ c }
Standard \\
deviation \\
\end{tabular} & Median & \begin{tabular}{ c }
Lower \\
quartile \\
\end{tabular} & \begin{tabular}{ c }
Upper \\
quartile \\
\end{tabular} \\
\hline
21.0 & 4.20 & 20.5 & 18.0 & 22.9 \\
\hline
\end{tabular}
\end{center}
\item Sundip tells Zac that, during term, she spends on average 30 hours per week listening to music.

Discuss briefly whether this value should be considered an outlier.
\item Layla claims that, during term, each student spends on average 20 hours per week listening to music. Zac believes that the true figure is higher than 20 hours. He uses his results to carry out a hypothesis test at the 5\% significance level.

Assume that the time spent listening to music is normally distributed with standard deviation 4.20 hours.

Carry out the test.
\end{enumerate}

\hfill \mbox{\textit{OCR H240/02 2021 Q11 [2]}}