OCR MEI S4 2016 June — Question 3 24 marks

Exam BoardOCR MEI
ModuleS4 (Statistics 4)
Year2016
SessionJune
Marks24
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicWilcoxon tests
TypeTwo-sample t-test
DifficultyStandard +0.3 This is a straightforward application of a two-sample t-test with standard bookwork components. Part (i) requires routine calculation with given summary statistics, part (ii) tests recall of assumptions and knowledge of the Wilcoxon rank-sum test, and part (iii) requires explanation of paired design—all standard S4 material with no novel problem-solving required. Slightly easier than average due to computational simplicity and predictable structure.
Spec5.05c Hypothesis test: normal distribution for population mean5.07a Non-parametric tests: when to use5.07d Paired vs two-sample: selection

3 A large department in a university wished to compare the standards of literacy and numeracy of its students. A random sample of 24 students was taken and sub-divided, randomly, into two groups of 12 . The students in one group took a literacy assessment (scores denoted by \(x\) ); the students in the other group took a numeracy assessment (scores denoted by \(y\) ). The two assessments were designed to give the same distributions of scores when taken by random samples from the general population. The scores obtained by the students on the two assessments are shown in the table.
\(x\)234243464848505458596265
\(y\)443663555358638061578354
$$\sum x = 598 \quad \sum x ^ { 2 } = 31196 \quad \sum y = 707 \quad \sum y ^ { 2 } = 43543$$
  1. Carry out an appropriate \(t\) test, at the \(5 \%\) level of significance, to compare the standards of literacy and numeracy.
  2. State the distributional assumptions required for the \(t\) test to be valid. Name the test that you would use if the assumptions required for the \(t\) test are thought not to hold. State the hypotheses for this new test. Explain, in general terms, which of the two tests is more powerful, and why. A statistician at the university looked at the data and commented that a paired sample design would have been better.
  3. Explain how a paired sample design would be applied in this context, and how the data would be analysed. Explain also why it would be better than the design used.

Part (i)
AnswerMarks Guidance
AnswerMarks Guidance
\(H_0: \mu_1=\mu_2\)B1 Zero if sample means used
\(H_1: \mu_1\neq\mu_2\) where \(\mu_1\) and \(\mu_2\) are the means in the underlying populationB1 B1 if not clearly population means
\(\bar{x}=\frac{598}{12}=49.8333\), \(\bar{y}=\frac{707}{12}=58.9167\)B1
\(\sum(x-\bar{x})^2=31196-\frac{598^2}{12}=1395.66667\); \([s_x^2=126.87..., s_x=11.264...]\)M1 Accept alternative forms if correctly used later
\(\sum(y-\bar{y})^2=43543-\frac{707^2}{12}=1888.91667\); \([s_y^2=171.719..., s_y=13.104...]\)A1
Pooled variance estimate \(=\frac{(1395.666...+1888.916...)}{(11+11)}=149.299\)M1A1 \(\frac{11s_x^2+11s_y^2}{22}\); correct construction, their \(s\), \(\bar{x}\), \(\bar{y}\)
Test statistic: \(\frac{58.9167-49.8333}{\sqrt{149.299}\sqrt{\frac{1}{12}+\frac{1}{12}}}=1.8209\)M1A1
5% two-tailed critical value for \(t_{22}\) is 2.0739B1 2.0772 by interpolation from tables
Hence no reason to reject \(H_0\), no reason to suppose that standards of literacy and numeracy are different in the underlying population, on averageM1A1 no reason to reject \(H_0\); context
Part (ii)
AnswerMarks Guidance
AnswerMarks Guidance
Scores in the underlying population distributed NormallyB1
With common varianceB1 Accept same median and different medians
Wilcoxon rank sum test (or Mann-Whitney 2 sample test)B1
\(H_0\): literacy scores and numeracy scores have the same distributionB1
\(H_1\): literacy scores and numeracy scores have the same distribution but for a shift in locationB1
The \(t\) test will be more powerful becauseB1
it uses the magnitudes of the data rather than just their ranksB1
Part (iii)
AnswerMarks Guidance
AnswerMarks Guidance
In a paired sample design, all the students in the sample would do both assessmentsB1 This part is entirely descriptive; marks should be awarded accordingly
The order in which the students do the assessments should be randomised and/or blocked for balanceB1
The data used in the test would be the differences in their scoresB1
A single sample \(t\) test (or Wilcoxon if Normality cannot be assumed) would be usedB1
This would be better than the two sample design used because the variation between students would be factored outB1
The design would therefore be more sensitive to differences between literacy and numeracyB1
## Part (i)
| Answer | Marks | Guidance |
|--------|-------|----------|
| $H_0: \mu_1=\mu_2$ | B1 | Zero if sample means used |
| $H_1: \mu_1\neq\mu_2$ where $\mu_1$ and $\mu_2$ are the means in the underlying population | B1 | B1 if not clearly population means |
| $\bar{x}=\frac{598}{12}=49.8333$, $\bar{y}=\frac{707}{12}=58.9167$ | B1 | |
| $\sum(x-\bar{x})^2=31196-\frac{598^2}{12}=1395.66667$; $[s_x^2=126.87..., s_x=11.264...]$ | M1 | Accept alternative forms if correctly used later |
| $\sum(y-\bar{y})^2=43543-\frac{707^2}{12}=1888.91667$; $[s_y^2=171.719..., s_y=13.104...]$ | A1 | |
| Pooled variance estimate $=\frac{(1395.666...+1888.916...)}{(11+11)}=149.299$ | M1A1 | $\frac{11s_x^2+11s_y^2}{22}$; correct construction, their $s$, $\bar{x}$, $\bar{y}$ |
| Test statistic: $\frac{58.9167-49.8333}{\sqrt{149.299}\sqrt{\frac{1}{12}+\frac{1}{12}}}=1.8209$ | M1A1 | |
| 5% two-tailed critical value for $t_{22}$ is 2.0739 | B1 | 2.0772 by interpolation from tables |
| Hence no reason to reject $H_0$, no reason to suppose that standards of literacy and numeracy are different in the underlying population, on average | M1A1 | no reason to reject $H_0$; context |

## Part (ii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| Scores in the underlying population distributed Normally | B1 | |
| With common variance | B1 | Accept same median and different medians |
| Wilcoxon rank sum test (or Mann-Whitney 2 sample test) | B1 | |
| $H_0$: literacy scores and numeracy scores have the same distribution | B1 | |
| $H_1$: literacy scores and numeracy scores have the same distribution but for a shift in location | B1 | |
| The $t$ test will be more powerful because | B1 | |
| it uses the magnitudes of the data rather than just their ranks | B1 | |

## Part (iii)
| Answer | Marks | Guidance |
|--------|-------|----------|
| In a paired sample design, all the students in the sample would do both assessments | B1 | This part is entirely descriptive; marks should be awarded accordingly |
| The order in which the students do the assessments should be randomised and/or blocked for balance | B1 | |
| The data used in the test would be the differences in their scores | B1 | |
| A single sample $t$ test (or Wilcoxon if Normality cannot be assumed) would be used | B1 | |
| This would be better than the two sample design used because the variation between students would be factored out | B1 | |
| The design would therefore be more sensitive to differences between literacy and numeracy | B1 | |

---
3 A large department in a university wished to compare the standards of literacy and numeracy of its students. A random sample of 24 students was taken and sub-divided, randomly, into two groups of 12 . The students in one group took a literacy assessment (scores denoted by $x$ ); the students in the other group took a numeracy assessment (scores denoted by $y$ ). The two assessments were designed to give the same distributions of scores when taken by random samples from the general population.

The scores obtained by the students on the two assessments are shown in the table.

\begin{center}
\begin{tabular}{ | l | l | l | l | l | l | l | l | l | l | l | l | l | }
\hline
$x$ & 23 & 42 & 43 & 46 & 48 & 48 & 50 & 54 & 58 & 59 & 62 & 65 \\
\hline
$y$ & 44 & 36 & 63 & 55 & 53 & 58 & 63 & 80 & 61 & 57 & 83 & 54 \\
\hline
\end{tabular}
\end{center}

$$\sum x = 598 \quad \sum x ^ { 2 } = 31196 \quad \sum y = 707 \quad \sum y ^ { 2 } = 43543$$

\begin{enumerate}[label=(\roman*)]
\item Carry out an appropriate $t$ test, at the $5 \%$ level of significance, to compare the standards of literacy and numeracy.
\item State the distributional assumptions required for the $t$ test to be valid.

Name the test that you would use if the assumptions required for the $t$ test are thought not to hold. State the hypotheses for this new test.

Explain, in general terms, which of the two tests is more powerful, and why.

A statistician at the university looked at the data and commented that a paired sample design would have been better.
\item Explain how a paired sample design would be applied in this context, and how the data would be analysed. Explain also why it would be better than the design used.
\end{enumerate}

\hfill \mbox{\textit{OCR MEI S4 2016 Q3 [24]}}