OCR H240/02 2019 June — Question 11 8 marks

Exam BoardOCR
ModuleH240/02 (Pure Mathematics and Statistics)
Year2019
SessionJune
Marks8
PaperDownload PDF ↗
TopicHypothesis test of Pearson’s product-moment correlation coefficient
TypeDescribe correlation from scatter diagram
DifficultyModerate -0.8 This question tests basic interpretation of scatter diagrams and understanding of correlation hypothesis testing. Part (a) requires simple ratio estimation from a graph, part (b)(i) is straightforward table lookup, part (b)(ii) tests understanding that biased sampling affects validity (a standard critique), and parts (c)-(d) require simple contextual interpretation. While it's a multi-part question, each component involves routine application of statistical concepts without requiring novel insight or complex reasoning.
Spec5.08a Pearson correlation: calculate pmcc5.08d Hypothesis test: Pearson correlation

11 A trainer was asked to give a lecture on population profiles in different Local Authorities (LAs) in the UK. Using data from the 2011 census, he created the following scatter diagram for 17 selected LAs. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{17 Selected Local Authorities} \includegraphics[alt={},max width=\textwidth]{1a0e0afb-81be-45d1-8c86-f98e508e9a49-08_560_897_466_246}
\end{figure} He selected the 17 LAs using the following method. The proportions of people aged 18 to 24 and aged 65+ in any Local Authority are denoted by \(P _ { \text {young } }\) and \(P _ { \text {senior } }\) respectively. The trainer used a spreadsheet to calculate the value of \(k = \frac { P _ { \text {young } } } { P _ { \text {senior } } }\) for each of the 348 LAs in the UK. He then used specific ranges of values of \(k\) to select the 17 LAs.
  1. Estimate the ranges of values of \(k\) that he used to select these 17 LAs.
  2. Using the 17 LAs the trainer carried out a hypothesis test with the following hypotheses. \(\mathrm { H } _ { 0 }\) : There is no linear correlation in the population between \(P _ { \text {young } }\) and \(P _ { \text {senior } }\). \(\mathrm { H } _ { 1 }\) : There is negative linear correlation in the population between \(P _ { \text {young } }\) and \(P _ { \text {senior } }\).
    He found that the value of Pearson's product-moment correlation coefficient for the 17 LAs is - 0.797 , correct to 3 significant figures.
    1. Use the table on page 9 to show that this value is significant at the \(1 \%\) level. The trainer concluded that there is evidence of negative linear correlation between \(P _ { \text {young } }\) and \(P _ { \text {senior } }\) in the population.
    2. Use the diagram to comment on the reliability of this conclusion.
  3. Describe one outstanding feature of the population in the areas represented by the points in the bottom right hand corner of the diagram.
  4. The trainer's audience included representatives from several universities. Suggest a reason why the diagram might be of particular interest to these people. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Critical values of Pearson's product-moment correlation coefficient}
    \multirow{2}{*}{1-tail test 2-tail test}5\%2.5\%1\%0.5\%
    10\%5\%2\%1\%
    \(n\)
    1----
    2----
    30.98770.99690.99950.9999
    40.90000.95000.98000.9900
    50.80540.87830.93430.9587
    60.72930.81140.88220.9172
    70.66940.75450.83290.8745
    80.62150.70670.78870.8343
    90.58220.66640.74980.7977
    100.54940.63190.71550.7646
    110.52140.60210.68510.7348
    120.49730.57600.65810.7079
    130.47620.55290.63390.6835
    140.45750.53240.61200.6614
    150.44090.51400.59230.6411
    160.42590.49730.57420.6226
    170.41240.48210.55770.6055
    180.40000.46830.54250.5897
    190.38870.45550.52850.5751
    200.37830.44380.51550.5614
    210.36870.43290.50340.5487
    220.35980.42270.49210.5368
    230.35150.41320.48150.5256
    240.34380.40440.47160.5151
    250.33650.39610.46220.5052
    260.32970.38820.45340.4958
    270.32330.38090.44510.4869
    280.31720.37390.43720.4785
    290.31150.36730.42970.4705
    300.30610.36100.42260.4629
    \end{table} Turn over for questions 12 and 13

Question 11(a):
AnswerMarks Guidance
\(k > 1.4\) (allow \(k > 1.1\) to \(1.6\)); \(k < 0.25\) (allow \(k < 0.2\) to \(0.3\))B1, B1 [2] Allow \(\geq\) and \(\leq\); SC: \(0.25 < k < 1.4\): B1B0 (ranges as on left); Allow "\(x\)"
Question 11(b)(i):
AnswerMarks Guidance
\(0.797 > 0.5577\) or \(-0.797 < -0.5577\) or \(-0.797 > 0.5577\)
Question 11(b)(ii):
AnswerMarks Guidance
There are clusters (or groups etc.)B1* NOT: Too scattered; Not represent whole pop; Small sample
Apparent good correlation caused by clusters or two clusters with no \(-\)ve corr'n within them or a comment similar to one of the above. AND Conclusion: unreliable or Value of \(r\) is misleading oeB1 dep B1* [2] or Not bivariate normal distribution B1; so use of tables for \(r\) not valid B1; Clusters not on reg line B1B0
Question 11(c):
AnswerMarks Guidance
High prop of 65+ or Low prop of 18–24; Prop of young very similar, or \(\approx 0.06\); Proportion of senior to young is highB1 [1] If consider only one age-group, must be proportion not number; If consider both age-groups, allow e.g. Higher number of seniors than young or Many seniors, few young; NOT: Similar proportions of 65+; Population is elderly
Question 11(d):
AnswerMarks Guidance
Top left points contain high prop of 18–24s. (So these LAs may be areas where there are universities or where they can recruit)B1 [1] Shows places where large nos of 18–24s; Shows where to focus recruiting; So universities can recruit; 18–24s are their target group; No need to specify "Top left group"; Allow "students" or "young" instead of "18–24s"; Any implication that diagram enables you to see information about location of young people
# Question 11(a):
| $k > 1.4$ (allow $k > 1.1$ to $1.6$); $k < 0.25$ (allow $k < 0.2$ to $0.3$) | B1, B1 [2] | Allow $\geq$ and $\leq$; SC: $0.25 < k < 1.4$: B1B0 (ranges as on left); Allow "$x$" |

# Question 11(b)(i):
| $0.797 > 0.5577$ or $-0.797 < -0.5577$ or $|-0.797| > 0.5577$ | B2 [2] | $0.797 > 0.6055$ or $-0.797 < -0.6055$ B1; $\pm 0.5577$ B1; Allow $\geq$ or $\leq$ |

# Question 11(b)(ii):
| There are clusters (or groups etc.) | B1* | NOT: Too scattered; Not represent whole pop; Small sample |
| Apparent good correlation caused by clusters or two clusters with no $-$ve corr'n within them or a comment similar to one of the above. AND Conclusion: unreliable or Value of $r$ is misleading oe | B1 dep B1* [2] | or Not bivariate normal distribution B1; so use of tables for $r$ not valid B1; Clusters not on reg line B1B0 |

# Question 11(c):
| High prop of 65+ or Low prop of 18–24; Prop of young very similar, or $\approx 0.06$; Proportion of senior to young is high | B1 [1] | If consider only one age-group, must be proportion not number; If consider both age-groups, allow e.g. Higher number of seniors than young or Many seniors, few young; NOT: Similar proportions of 65+; Population is elderly |

# Question 11(d):
| Top left points contain high prop of 18–24s. (So these LAs may be areas where there are universities or where they can recruit) | B1 [1] | Shows places where large nos of 18–24s; Shows where to focus recruiting; So universities can recruit; 18–24s are their target group; No need to specify "Top left group"; Allow "students" or "young" instead of "18–24s"; Any implication that diagram enables you to see information about location of young people |

---
11 A trainer was asked to give a lecture on population profiles in different Local Authorities (LAs) in the UK. Using data from the 2011 census, he created the following scatter diagram for 17 selected LAs.

\begin{figure}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{17 Selected Local Authorities}
  \includegraphics[alt={},max width=\textwidth]{1a0e0afb-81be-45d1-8c86-f98e508e9a49-08_560_897_466_246}
\end{center}
\end{figure}

He selected the 17 LAs using the following method. The proportions of people aged 18 to 24 and aged 65+ in any Local Authority are denoted by $P _ { \text {young } }$ and $P _ { \text {senior } }$ respectively. The trainer used a spreadsheet to calculate the value of $k = \frac { P _ { \text {young } } } { P _ { \text {senior } } }$ for each of the 348 LAs in the UK. He then used specific ranges of values of $k$ to select the 17 LAs.
\begin{enumerate}[label=(\alph*)]
\item Estimate the ranges of values of $k$ that he used to select these 17 LAs.
\item Using the 17 LAs the trainer carried out a hypothesis test with the following hypotheses.\\
$\mathrm { H } _ { 0 }$ : There is no linear correlation in the population between $P _ { \text {young } }$ and $P _ { \text {senior } }$.\\
$\mathrm { H } _ { 1 }$ : There is negative linear correlation in the population between $P _ { \text {young } }$ and $P _ { \text {senior } }$.\\
He found that the value of Pearson's product-moment correlation coefficient for the 17 LAs is - 0.797 , correct to 3 significant figures.
\begin{enumerate}[label=(\roman*)]
\item Use the table on page 9 to show that this value is significant at the $1 \%$ level.

The trainer concluded that there is evidence of negative linear correlation between $P _ { \text {young } }$ and $P _ { \text {senior } }$ in the population.
\item Use the diagram to comment on the reliability of this conclusion.
\end{enumerate}\item Describe one outstanding feature of the population in the areas represented by the points in the bottom right hand corner of the diagram.
\item The trainer's audience included representatives from several universities.

Suggest a reason why the diagram might be of particular interest to these people.

\begin{table}[h]
\begin{center}
\captionsetup{labelformat=empty}
\caption{Critical values of Pearson's product-moment correlation coefficient}
\begin{tabular}{|l|l|l|l|l|}
\hline
\multirow{2}{*}{1-tail test 2-tail test} & 5\% & 2.5\% & 1\% & 0.5\% \\
\hline
 & 10\% & 5\% & 2\% & 1\% \\
\hline
$n$ & \multicolumn{4}{|c|}{} \\
\hline
1 & - & - & - & - \\
\hline
2 & - & - & - & - \\
\hline
3 & 0.9877 & 0.9969 & 0.9995 & 0.9999 \\
\hline
4 & 0.9000 & 0.9500 & 0.9800 & 0.9900 \\
\hline
5 & 0.8054 & 0.8783 & 0.9343 & 0.9587 \\
\hline
6 & 0.7293 & 0.8114 & 0.8822 & 0.9172 \\
\hline
7 & 0.6694 & 0.7545 & 0.8329 & 0.8745 \\
\hline
8 & 0.6215 & 0.7067 & 0.7887 & 0.8343 \\
\hline
9 & 0.5822 & 0.6664 & 0.7498 & 0.7977 \\
\hline
10 & 0.5494 & 0.6319 & 0.7155 & 0.7646 \\
\hline
11 & 0.5214 & 0.6021 & 0.6851 & 0.7348 \\
\hline
12 & 0.4973 & 0.5760 & 0.6581 & 0.7079 \\
\hline
13 & 0.4762 & 0.5529 & 0.6339 & 0.6835 \\
\hline
14 & 0.4575 & 0.5324 & 0.6120 & 0.6614 \\
\hline
15 & 0.4409 & 0.5140 & 0.5923 & 0.6411 \\
\hline
16 & 0.4259 & 0.4973 & 0.5742 & 0.6226 \\
\hline
17 & 0.4124 & 0.4821 & 0.5577 & 0.6055 \\
\hline
18 & 0.4000 & 0.4683 & 0.5425 & 0.5897 \\
\hline
19 & 0.3887 & 0.4555 & 0.5285 & 0.5751 \\
\hline
20 & 0.3783 & 0.4438 & 0.5155 & 0.5614 \\
\hline
21 & 0.3687 & 0.4329 & 0.5034 & 0.5487 \\
\hline
22 & 0.3598 & 0.4227 & 0.4921 & 0.5368 \\
\hline
23 & 0.3515 & 0.4132 & 0.4815 & 0.5256 \\
\hline
24 & 0.3438 & 0.4044 & 0.4716 & 0.5151 \\
\hline
25 & 0.3365 & 0.3961 & 0.4622 & 0.5052 \\
\hline
26 & 0.3297 & 0.3882 & 0.4534 & 0.4958 \\
\hline
27 & 0.3233 & 0.3809 & 0.4451 & 0.4869 \\
\hline
28 & 0.3172 & 0.3739 & 0.4372 & 0.4785 \\
\hline
29 & 0.3115 & 0.3673 & 0.4297 & 0.4705 \\
\hline
30 & 0.3061 & 0.3610 & 0.4226 & 0.4629 \\
\hline
\end{tabular}
\end{center}
\end{table}

Turn over for questions 12 and 13
\end{enumerate}

\hfill \mbox{\textit{OCR H240/02 2019 Q11 [8]}}