OCR MEI Further Statistics Major 2019 June — Question 6 18 marks

Exam BoardOCR MEI
ModuleFurther Statistics Major (Further Statistics Major)
Year2019
SessionJune
Marks18
PaperDownload PDF ↗
Mark schemeDownload PDF ↗
TopicLinear regression
TypeInterpret features of scatter diagram
DifficultyModerate -0.8 This is a straightforward linear regression calculation using provided summary statistics (sums of x, y, x², y², xy). Students need to apply standard formulas for regression coefficients with minimal algebraic manipulation. The context is clear and the computational work is routine for Further Statistics students, making it easier than average even for A-level.
Spec5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09e Use regression: for estimation in context

6
  1. A researcher is investigating the date of the 'start of spring' at different locations around the country.
    A suitable date (measured in days from the start of the year) can be identified by checking, for example, when buds first appear for certain species of trees and plants, but this is time-consuming and expensive. Satellite data, measuring microwave emissions, can alternatively be used to estimate the date that land-based measurements would give. The researcher chooses a random sample of 12 locations, and obtains land-based measurements for the start of spring date at each location, together with relevant satellite measurements. The scatter diagram in Fig. 6.1 shows the results; the land-based measurements are denoted by \(x\) days and the corresponding values derived from satellite measurements by \(y\) days. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-06_732_1342_781_333} \captionsetup{labelformat=empty} \caption{Fig. 6.1}
    \end{figure} Fig. 6.2 shows part of a spreadsheet used to analyse the data. Some rows of the spreadsheet have been deliberately omitted. \begin{table}[h]
    1ABCDEF
    1x\(\boldsymbol { y }\)\(\boldsymbol { x } ^ { \mathbf { 2 } }\)\(\boldsymbol { y } ^ { \mathbf { 2 } }\)xy
    2901028100104049180
    3
    10
    11
    129497883694099118
    13991019801102019999
    14Sum11311227107783126725116724
    15
    \captionsetup{labelformat=empty} \caption{Fig. 6.2}
    \end{table}
    1. Calculate the equation of a regression line suitable for estimating the land-based date of the start of spring from satellite measurements.
    2. Using this equation, estimate the land-based date of the start of spring for the following dates from satellite measurements.
      • 95 days
      • 60 days
        (iii) Comment on the reliability of each of your estimates.
      • The researcher is also investigating whether there is any correlation between the average temperature during a month in spring and the total rainfall during that month at a particular location. The average temperatures in degrees Celsius and total rainfall in mm for a random selection, over several years, of 10 spring months at this location are as follows.
      Temperature4.27.15.63.58.66.52.75.96.74.1
      Rainfall18264276154384536636
      The researcher plots the scatter diagram shown in Fig. 6.3 to check which type of test to carry out. \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-07_693_880_1174_338} \captionsetup{labelformat=empty} \caption{Fig. 6.3}
      \end{figure}
      1. Explain why the researcher might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
      2. Find the value of Pearson's product moment correlation coefficient.
      3. Carry out a test at the \(5 \%\) significance level to investigate whether there is any correlation between temperature and rainfall.

Question 6:
AnswerMarks Guidance
6(a) (i)
b xy  0.8537
S 126725(12272 /12)
yy
Correct regression line is x on y
so equation is xx b(yy)
x – 94.25 = 0.8537(y – 102.25)
AnswerMarks
 x = 0.8537y + 6.962M1
A1
B1
DM1
A1
AnswerMarks
[5]For attempt at gradient (b)
For 0.8537 cao
May be implied by correct form of
equation
FT provided first M1 earned
CAO Accept either form
Allow M1M1 for y on x regression line
AnswerMarks
with b = 0.9098Sxy = 1079.25
Syy = 1264.25
b = 0.853668…
4317
Allow b =
5057
Allow constant
between 6.9 and
7.0
AnswerMarks Guidance
6(a) (ii)
Prediction for 60 is 58B1FT
B1FT
AnswerMarks
[2]If answers given to more than 1
decimal place then MAX B1B0 as
AnswerMarks
these are estimatesFT only if any
reasonable x on y
line
AnswerMarks Guidance
6(a) (iii)
the first prediction is only moderately reliable.
The second prediction is rather less reliable
AnswerMarks
because in addition it is extrapolationE1
E1
AnswerMarks
[2]Allow 1 mark for either not very close
to line and so not very reliable or for
second value is extrapolation so
unreliable.
AnswerMarks Guidance
6(b) (i)
approximately elliptical, so bivariate Normality is
AnswerMarks
possibleB1
B1
AnswerMarks
[2]For identifying ‘elliptical’ shape
For conclusion about ‘bivariate
AnswerMarks
Normal’Do NOT allow
‘data is bivariate
Normal’ but can
get first mark
AnswerMarks Guidance
6(b) (ii)
B1
AnswerMarks
[2]NB B1 for +0.5638 or for answer given
to less than 3 dp
AnswerMarks Guidance
6(b) (iii)
0 1
where ρ is the population correlation coefficient
between temperature and rainfall
For n = 10, 5% critical value = 0.6319
AnswerMarks Guidance
Since0.5640 < 0.6319 the result is not
significant
There is insufficient evidence to reject H
0
There is insufficient evidence at the 5% level to
suggest that there is correlation between
AnswerMarks
temperature and rainfallB1
B1
B1
M1
A1
AnswerMarks
[5]For both hypotheses
Allor r if defined
For defining ρ
For critical value
For comparison leading to a conclusion
NB M0 for 0.5640 < 0.6319
Do NOT allow M1 for incorrect
conclusion
FT for conclusion in context Provided
critical value is correct
Do NOT allow ‘there is evidence to
suggest that there is no correlation
AnswerMarks
between temperature and rainfall’Only penalise lack
of context once.
For hypotheses in
words allow both
marks if
population and
context
mentioned, but
zero if no mention
of population
Question 6:
6 | (a) | (i) | S 116724(11311227/12)
b xy  0.8537
S 126725(12272 /12)
yy
Correct regression line is x on y
so equation is xx b(yy)

x – 94.25 = 0.8537(y – 102.25)
 x = 0.8537y + 6.962 | M1
A1
B1
DM1
A1
[5] | For attempt at gradient (b)
For 0.8537 cao
May be implied by correct form of
equation
FT provided first M1 earned
CAO Accept either form
Allow M1M1 for y on x regression line
with b = 0.9098 | Sxy = 1079.25
Syy = 1264.25
b = 0.853668…
4317
Allow b =
5057
Allow constant
between 6.9 and
7.0
6 | (a) | (ii) | Prediction for 95 is 88
Prediction for 60 is 58 | B1FT
B1FT
[2] | If answers given to more than 1
decimal place then MAX B1B0 as
these are estimates | FT only if any
reasonable x on y
line
6 | (a) | (iii) | Because the points do not lie very close to the line,
the first prediction is only moderately reliable.
The second prediction is rather less reliable
because in addition it is extrapolation | E1
E1
[2] | Allow 1 mark for either not very close
to line and so not very reliable or for
second value is extrapolation so
unreliable.
6 | (b) | (i) | The shape of the scatter diagram is very
approximately elliptical, so bivariate Normality is
possible | B1
B1
[2] | For identifying ‘elliptical’ shape
For conclusion about ‘bivariate
Normal’ | Do NOT allow
‘data is bivariate
Normal’ but can
get first mark
6 | (b) | (ii) | PMCC = 0.5638 | B1
B1
[2] | NB B1 for +0.5638 or for answer given
to less than 3 dp
6 | (b) | (iii) | H : ρ = 0 , H : ρ ≠ 0 (two-tailed test)
0 1
where ρ is the population correlation coefficient
between temperature and rainfall
For n = 10, 5% critical value = 0.6319
Since |0.5640| < 0.6319 the result is not
significant
There is insufficient evidence to reject H
0
There is insufficient evidence at the 5% level to
suggest that there is correlation between
temperature and rainfall | B1
B1
B1
M1
A1
[5] | For both hypotheses
Allor r if defined
For defining ρ
For critical value
For comparison leading to a conclusion
NB M0 for 0.5640 < 0.6319
Do NOT allow M1 for incorrect
conclusion
FT for conclusion in context Provided
critical value is correct
Do NOT allow ‘there is evidence to
suggest that there is no correlation
between temperature and rainfall’ | Only penalise lack
of context once.
For hypotheses in
words allow both
marks if
population and
context
mentioned, but
zero if no mention
of population
6
\begin{enumerate}[label=(\alph*)]
\item A researcher is investigating the date of the 'start of spring' at different locations around the country.\\
A suitable date (measured in days from the start of the year) can be identified by checking, for example, when buds first appear for certain species of trees and plants, but this is time-consuming and expensive. Satellite data, measuring microwave emissions, can alternatively be used to estimate the date that land-based measurements would give.

The researcher chooses a random sample of 12 locations, and obtains land-based measurements for the start of spring date at each location, together with relevant satellite measurements. The scatter diagram in Fig. 6.1 shows the results; the land-based measurements are denoted by $x$ days and the corresponding values derived from satellite measurements by $y$ days.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-06_732_1342_781_333}
\captionsetup{labelformat=empty}
\caption{Fig. 6.1}
\end{center}
\end{figure}

Fig. 6.2 shows part of a spreadsheet used to analyse the data. Some rows of the spreadsheet have been deliberately omitted.

\begin{table}[h]
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|}
\hline
1 & A & B & C & D & E & F \\
\hline
1 &  & x & $\boldsymbol { y }$ & $\boldsymbol { x } ^ { \mathbf { 2 } }$ & $\boldsymbol { y } ^ { \mathbf { 2 } }$ & xy \\
\hline
2 &  & 90 & 102 & 8100 & 10404 & 9180 \\
\hline
3 & \multicolumn{6}{|c|}{} \\
\hline
10 &  &  &  &  &  &  \\
\hline
11 &  &  &  &  &  &  \\
\hline
12 &  & 94 & 97 & 8836 & 9409 & 9118 \\
\hline
13 &  & 99 & 101 & 9801 & 10201 & 9999 \\
\hline
14 & Sum & 1131 & 1227 & 107783 & 126725 & 116724 \\
\hline
15 &  &  &  &  &  &  \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Fig. 6.2}
\end{center}
\end{table}
\begin{enumerate}[label=(\roman*)]
\item Calculate the equation of a regression line suitable for estimating the land-based date of the start of spring from satellite measurements.
\item Using this equation, estimate the land-based date of the start of spring for the following dates from satellite measurements.

\begin{itemize}
\end{enumerate}\item 95 days
  \item 60 days\\
(iii) Comment on the reliability of each of your estimates.
\item The researcher is also investigating whether there is any correlation between the average temperature during a month in spring and the total rainfall during that month at a particular location. The average temperatures in degrees Celsius and total rainfall in mm for a random selection, over several years, of 10 spring months at this location are as follows.
\end{itemize}

\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | c | c | c | }
\hline
Temperature & 4.2 & 7.1 & 5.6 & 3.5 & 8.6 & 6.5 & 2.7 & 5.9 & 6.7 & 4.1 \\
\hline
Rainfall & 18 & 26 & 42 & 76 & 15 & 43 & 84 & 53 & 66 & 36 \\
\hline
\end{tabular}
\end{center}

The researcher plots the scatter diagram shown in Fig. 6.3 to check which type of test to carry out.

\begin{figure}[h]
\begin{center}
  \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-07_693_880_1174_338}
\captionsetup{labelformat=empty}
\caption{Fig. 6.3}
\end{center}
\end{figure}
\begin{enumerate}[label=(\roman*)]
\item Explain why the researcher might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
\item Find the value of Pearson's product moment correlation coefficient.
\item Carry out a test at the $5 \%$ significance level to investigate whether there is any correlation between temperature and rainfall.
\end{enumerate}\end{enumerate}

\hfill \mbox{\textit{OCR MEI Further Statistics Major 2019 Q6 [18]}}