| Exam Board | OCR MEI |
|---|---|
| Module | Further Statistics Major (Further Statistics Major) |
| Year | 2019 |
| Session | June |
| Marks | 18 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Linear regression |
| Type | Interpret features of scatter diagram |
| Difficulty | Moderate -0.8 This is a straightforward linear regression calculation using provided summary statistics (sums of x, y, x², y², xy). Students need to apply standard formulas for regression coefficients with minimal algebraic manipulation. The context is clear and the computational work is routine for Further Statistics students, making it easier than average even for A-level. |
| Spec | 5.08a Pearson correlation: calculate pmcc5.09a Dependent/independent variables5.09b Least squares regression: concepts5.09c Calculate regression line5.09e Use regression: for estimation in context |
| 1 | A | B | C | D | E | F |
| 1 | x | \(\boldsymbol { y }\) | \(\boldsymbol { x } ^ { \mathbf { 2 } }\) | \(\boldsymbol { y } ^ { \mathbf { 2 } }\) | xy | |
| 2 | 90 | 102 | 8100 | 10404 | 9180 | |
| 3 | ||||||
| 10 | ||||||
| 11 | ||||||
| 12 | 94 | 97 | 8836 | 9409 | 9118 | |
| 13 | 99 | 101 | 9801 | 10201 | 9999 | |
| 14 | Sum | 1131 | 1227 | 107783 | 126725 | 116724 |
| 15 | ||||||
| Temperature | 4.2 | 7.1 | 5.6 | 3.5 | 8.6 | 6.5 | 2.7 | 5.9 | 6.7 | 4.1 |
| Rainfall | 18 | 26 | 42 | 76 | 15 | 43 | 84 | 53 | 66 | 36 |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (a) | (i) |
| Answer | Marks |
|---|---|
| x = 0.8537y + 6.962 | M1 |
| Answer | Marks |
|---|---|
| [5] | For attempt at gradient (b) |
| Answer | Marks |
|---|---|
| with b = 0.9098 | Sxy = 1079.25 |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (a) | (ii) |
| Prediction for 60 is 58 | B1FT |
| Answer | Marks |
|---|---|
| [2] | If answers given to more than 1 |
| Answer | Marks |
|---|---|
| these are estimates | FT only if any |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (a) | (iii) |
| Answer | Marks |
|---|---|
| because in addition it is extrapolation | E1 |
| Answer | Marks |
|---|---|
| [2] | Allow 1 mark for either not very close |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (b) | (i) |
| Answer | Marks |
|---|---|
| possible | B1 |
| Answer | Marks |
|---|---|
| [2] | For identifying ‘elliptical’ shape |
| Answer | Marks |
|---|---|
| Normal’ | Do NOT allow |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (b) | (ii) |
| Answer | Marks |
|---|---|
| [2] | NB B1 for +0.5638 or for answer given |
| Answer | Marks | Guidance |
|---|---|---|
| 6 | (b) | (iii) |
| Answer | Marks | Guidance |
|---|---|---|
| Since | 0.5640 | < 0.6319 the result is not |
| Answer | Marks |
|---|---|
| temperature and rainfall | B1 |
| Answer | Marks |
|---|---|
| [5] | For both hypotheses |
| Answer | Marks |
|---|---|
| between temperature and rainfall’ | Only penalise lack |
Question 6:
6 | (a) | (i) | S 116724(11311227/12)
b xy 0.8537
S 126725(12272 /12)
yy
Correct regression line is x on y
so equation is xx b(yy)
x – 94.25 = 0.8537(y – 102.25)
x = 0.8537y + 6.962 | M1
A1
B1
DM1
A1
[5] | For attempt at gradient (b)
For 0.8537 cao
May be implied by correct form of
equation
FT provided first M1 earned
CAO Accept either form
Allow M1M1 for y on x regression line
with b = 0.9098 | Sxy = 1079.25
Syy = 1264.25
b = 0.853668…
4317
Allow b =
5057
Allow constant
between 6.9 and
7.0
6 | (a) | (ii) | Prediction for 95 is 88
Prediction for 60 is 58 | B1FT
B1FT
[2] | If answers given to more than 1
decimal place then MAX B1B0 as
these are estimates | FT only if any
reasonable x on y
line
6 | (a) | (iii) | Because the points do not lie very close to the line,
the first prediction is only moderately reliable.
The second prediction is rather less reliable
because in addition it is extrapolation | E1
E1
[2] | Allow 1 mark for either not very close
to line and so not very reliable or for
second value is extrapolation so
unreliable.
6 | (b) | (i) | The shape of the scatter diagram is very
approximately elliptical, so bivariate Normality is
possible | B1
B1
[2] | For identifying ‘elliptical’ shape
For conclusion about ‘bivariate
Normal’ | Do NOT allow
‘data is bivariate
Normal’ but can
get first mark
6 | (b) | (ii) | PMCC = 0.5638 | B1
B1
[2] | NB B1 for +0.5638 or for answer given
to less than 3 dp
6 | (b) | (iii) | H : ρ = 0 , H : ρ ≠ 0 (two-tailed test)
0 1
where ρ is the population correlation coefficient
between temperature and rainfall
For n = 10, 5% critical value = 0.6319
Since |0.5640| < 0.6319 the result is not
significant
There is insufficient evidence to reject H
0
There is insufficient evidence at the 5% level to
suggest that there is correlation between
temperature and rainfall | B1
B1
B1
M1
A1
[5] | For both hypotheses
Allor r if defined
For defining ρ
For critical value
For comparison leading to a conclusion
NB M0 for 0.5640 < 0.6319
Do NOT allow M1 for incorrect
conclusion
FT for conclusion in context Provided
critical value is correct
Do NOT allow ‘there is evidence to
suggest that there is no correlation
between temperature and rainfall’ | Only penalise lack
of context once.
For hypotheses in
words allow both
marks if
population and
context
mentioned, but
zero if no mention
of population
6
\begin{enumerate}[label=(\alph*)]
\item A researcher is investigating the date of the 'start of spring' at different locations around the country.\\
A suitable date (measured in days from the start of the year) can be identified by checking, for example, when buds first appear for certain species of trees and plants, but this is time-consuming and expensive. Satellite data, measuring microwave emissions, can alternatively be used to estimate the date that land-based measurements would give.
The researcher chooses a random sample of 12 locations, and obtains land-based measurements for the start of spring date at each location, together with relevant satellite measurements. The scatter diagram in Fig. 6.1 shows the results; the land-based measurements are denoted by $x$ days and the corresponding values derived from satellite measurements by $y$ days.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-06_732_1342_781_333}
\captionsetup{labelformat=empty}
\caption{Fig. 6.1}
\end{center}
\end{figure}
Fig. 6.2 shows part of a spreadsheet used to analyse the data. Some rows of the spreadsheet have been deliberately omitted.
\begin{table}[h]
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|}
\hline
1 & A & B & C & D & E & F \\
\hline
1 & & x & $\boldsymbol { y }$ & $\boldsymbol { x } ^ { \mathbf { 2 } }$ & $\boldsymbol { y } ^ { \mathbf { 2 } }$ & xy \\
\hline
2 & & 90 & 102 & 8100 & 10404 & 9180 \\
\hline
3 & \multicolumn{6}{|c|}{} \\
\hline
10 & & & & & & \\
\hline
11 & & & & & & \\
\hline
12 & & 94 & 97 & 8836 & 9409 & 9118 \\
\hline
13 & & 99 & 101 & 9801 & 10201 & 9999 \\
\hline
14 & Sum & 1131 & 1227 & 107783 & 126725 & 116724 \\
\hline
15 & & & & & & \\
\hline
\end{tabular}
\captionsetup{labelformat=empty}
\caption{Fig. 6.2}
\end{center}
\end{table}
\begin{enumerate}[label=(\roman*)]
\item Calculate the equation of a regression line suitable for estimating the land-based date of the start of spring from satellite measurements.
\item Using this equation, estimate the land-based date of the start of spring for the following dates from satellite measurements.
\begin{itemize}
\end{enumerate}\item 95 days
\item 60 days\\
(iii) Comment on the reliability of each of your estimates.
\item The researcher is also investigating whether there is any correlation between the average temperature during a month in spring and the total rainfall during that month at a particular location. The average temperatures in degrees Celsius and total rainfall in mm for a random selection, over several years, of 10 spring months at this location are as follows.
\end{itemize}
\begin{center}
\begin{tabular}{ | l | c | c | c | c | c | c | c | c | c | c | }
\hline
Temperature & 4.2 & 7.1 & 5.6 & 3.5 & 8.6 & 6.5 & 2.7 & 5.9 & 6.7 & 4.1 \\
\hline
Rainfall & 18 & 26 & 42 & 76 & 15 & 43 & 84 & 53 & 66 & 36 \\
\hline
\end{tabular}
\end{center}
The researcher plots the scatter diagram shown in Fig. 6.3 to check which type of test to carry out.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-07_693_880_1174_338}
\captionsetup{labelformat=empty}
\caption{Fig. 6.3}
\end{center}
\end{figure}
\begin{enumerate}[label=(\roman*)]
\item Explain why the researcher might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
\item Find the value of Pearson's product moment correlation coefficient.
\item Carry out a test at the $5 \%$ significance level to investigate whether there is any correlation between temperature and rainfall.
\end{enumerate}\end{enumerate}
\hfill \mbox{\textit{OCR MEI Further Statistics Major 2019 Q6 [18]}}