| Exam Board | Edexcel |
|---|---|
| Module | AS Paper 2 (AS Paper 2) |
| Session | Specimen |
| Marks | 7 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Analyze large data set correlations |
| Difficulty | Moderate -0.8 This is a straightforward AS-level statistics question testing standard procedures: calculating outliers using IQR (simple arithmetic), interpreting correlation and gradient (bookwork), and commenting on sampling methods. All parts are routine applications of taught techniques with no problem-solving or novel insight required, making it easier than average. |
| Spec | 2.02c Scatter diagrams and regression lines2.02d Informal interpretation of correlation2.02h Recognize outliers |
| \(h\) | 93 | 86 | 95 | 97 | 86 | 94 | 97 | 97 | 87 | 97 | 86 |
| \(r\) | 1.1 | 0.3 | 3.7 | 20.6 | 0 | 0 | 2.4 | 1.1 | 0.1 | 0.9 | 0.1 |
| Answer | Marks | Guidance |
|---|---|---|
| \(\text{IQR} = 2.3\) and \(20.6 \gg 2.4 + 1.5 \times 2.3\ (= 5.85)\) (Compare correct values) | B1 | For sight of the correct calculation and suitable comparison with \(20.6\) |
| Answer | Marks | Guidance |
|---|---|---|
| e.g. It is a piece of data and we should consider all the data | B1 | For a suitable reason for including the data point |
| Answer | Marks | Guidance |
|---|---|---|
| e.g. It is an extreme value and could unduly influence the analysis or It could be a mistake | B1 | For a suitable reason for excluding the data point |
| Answer | Marks | Guidance |
|---|---|---|
| e.g. "as humidity increases rainfall increases" | B1 | For a suitable interpretation of positive correlation mentioning humidity and rainfall |
| Answer | Marks | Guidance |
|---|---|---|
| e.g. a \(10\%\) increase in humidity gives rise to a \(1.5\) mm increase in rainfall or represents \(0.15\) mm of rainfall per percentage of humidity | B1 | For a suitable description of the rate: rainfall per percentage of humidity including reference to values |
| Answer | Marks | Guidance |
|---|---|---|
| Not a good method since only uses 11 days from one location in one month | B1 | For a comment that supports the idea that her sampling method was not a good one |
| Answer | Marks | Guidance |
|---|---|---|
| e.g. She should use data from more of the UK locations and more of the months or using a spreadsheet or computer package she could use all of the available UK data | B1 | Must show awareness of LDS having different UK locations and more months; must be clear NOT using overseas locations. NB: B0 for comment saying use more than one location without specifying only UK locations are required |
## Question 4:
### Part (a):
$\text{IQR} = 2.3$ and $20.6 \gg 2.4 + 1.5 \times 2.3\ (= 5.85)$ (Compare correct values) | B1 | For sight of the correct calculation and suitable comparison with $20.6$
### Part (b)(i):
e.g. It is a piece of data and we should consider all the data | B1 | For a suitable reason for including the data point
### Part (b)(ii):
e.g. It is an extreme value and could unduly influence the analysis **or** It could be a mistake | B1 | For a suitable reason for excluding the data point
### Part (c):
e.g. "as humidity increases rainfall increases" | B1 | For a suitable interpretation of positive correlation mentioning humidity and rainfall
### Part (d):
e.g. a $10\%$ increase in humidity gives rise to a $1.5$ mm increase in rainfall **or** represents $0.15$ mm of rainfall per percentage of humidity | B1 | For a suitable description of the rate: rainfall per percentage of humidity including reference to values
### Part (e)(i):
Not a good method since only uses 11 days from one location in one month | B1 | For a comment that supports the idea that her sampling method was not a good one
### Part (e)(ii):
e.g. She should use data from more of the UK locations and more of the months **or** using a spreadsheet or computer package she could use all of the available UK data | B1 | Must show awareness of LDS having different UK locations and more months; must be clear NOT using overseas locations. **NB: B0** for comment saying use more than one location without specifying only UK locations are required
---
\begin{enumerate}
\item Sara was studying the relationship between rainfall, $r \mathrm {~mm}$, and humidity, $h \%$, in the UK. She takes a random sample of 11 days from May 1987 for Leuchars from the large data set.
\end{enumerate}
She obtained the following results.
\begin{center}
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | c | }
\hline
$h$ & 93 & 86 & 95 & 97 & 86 & 94 & 97 & 97 & 87 & 97 & 86 \\
\hline
$r$ & 1.1 & 0.3 & 3.7 & 20.6 & 0 & 0 & 2.4 & 1.1 & 0.1 & 0.9 & 0.1 \\
\hline
\end{tabular}
\end{center}
Sara examined the rainfall figures and found
$$Q _ { 1 } = 0.1 \quad Q _ { 2 } = 0.9 \quad Q _ { 3 } = 2.4$$
A value that is more than 1.5 times the interquartile range (IQR) above $Q _ { 3 }$ is called an outlier.\\
(a) Show that $r = 20.6$ is an outlier.\\
(b) Give a reason why Sara might:\\
(i) include\\
(ii) exclude\\
this day's reading.
Sara decided to exclude this day's reading and drew the following scatter diagram for the remaining 10 days' values of $r$ and $h$.\\
\includegraphics[max width=\textwidth, alt={}, center]{8f3dbcb4-3260-4493-a230-12577b4ed691-08_988_1081_1555_420}
\begin{center}
\end{center}
(c) Give an interpretation of the correlation between rainfall and humidity.
The equation of the regression line of $r$ on $h$ for these 10 days is $r = - 12.8 + 0.15 h$\\
(d) Give an interpretation of the gradient of this regression line.\\
(e) (i) Comment on the suitability of Sara's sampling method for this study.\\
(ii) Suggest how Sara could make better use of the large data set for her study.\\
\hfill \mbox{\textit{Edexcel AS Paper 2 Q4 [7]}}