| Exam Board | Edexcel |
|---|---|
| Module | S1 (Statistics 1) |
| Year | 2015 |
| Session | June |
| Marks | 6 |
| Paper | Download PDF ↗ |
| Mark scheme | Download PDF ↗ |
| Topic | Bivariate data |
| Type | Draw scatter diagram from data |
| Difficulty | Easy -1.8 This is a routine S1 question testing basic scatter diagram plotting and standard correlation calculations with provided summary statistics. All parts involve direct application of formulas or simple interpretation with no problem-solving required—significantly easier than average A-level questions. |
| Spec | 2.02c Scatter diagrams and regression lines2.02d Informal interpretation of correlation5.08a Pearson correlation: calculate pmcc5.08d Hypothesis test: Pearson correlation |
| Patient | \(A\) | \(B\) | \(C\) | \(D\) | \(E\) | \(F\) | \(G\) | \(H\) |
| \(b\) | 32 | 36 | 40 | 44 | 42 | 21 | 27 | 37 |
| \(p\) | 18 | 21 | 31 | 39 | 21 | 12 | 19 | 70 |
| Answer | Marks | Guidance |
|---|---|---|
| All 8 points correctly plotted (½ small square tolerance) | B1B1 (-1ee) | 7 points correct scores B1B0 |
| Answer | Marks | Guidance |
|---|---|---|
| Comment that H is far away from other points | B1 | e.g. "H is an outlier/anomaly", "blood protein/\(p\)/70 for H is much higher than other patients", "H does not follow the linear pattern", "Data collected for H may be incorrect"; do not allow "H is not in range" on its own |
| Answer | Marks | Guidance |
|---|---|---|
| \(r = \frac{369}{\sqrt{423\frac{5}{7} \times 490}} = 0.809826...\) | M1A1 | M1 for correct expression; A1 awrt 0.810 (accept 0.81 if fully correct expression seen) |
| Answer | Marks | Guidance |
|---|---|---|
| \(r\) would be closer to 0 | B1 | Allow "\(r\) would be smaller/weaker correlation" |
## Question 7:
### Part (a):
All 8 points correctly plotted (½ small square tolerance) | B1B1 (-1ee) | 7 points correct scores B1B0
### Part (b):
Comment that H is far away from other points | B1 | e.g. "H is an outlier/anomaly", "blood protein/$p$/70 for H is much higher than other patients", "H does not follow the linear pattern", "Data collected for H may be incorrect"; do not allow "H is not in range" on its own
### Part (c):
$r = \frac{369}{\sqrt{423\frac{5}{7} \times 490}} = 0.809826...$ | M1A1 | M1 for correct expression; A1 awrt 0.810 (accept 0.81 if fully correct expression seen)
### Part (d):
$r$ would be closer to 0 | B1 | Allow "$r$ would be smaller/weaker correlation"
---
7. A doctor is investigating the correlation between blood protein, $p$, and body mass index, $b$.
He takes a random sample of 8 patients and the data are shown in the table below.
\begin{center}
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | }
\hline
Patient & $A$ & $B$ & $C$ & $D$ & $E$ & $F$ & $G$ & $H$ \\
\hline
$b$ & 32 & 36 & 40 & 44 & 42 & 21 & 27 & 37 \\
\hline
$p$ & 18 & 21 & 31 & 39 & 21 & 12 & 19 & 70 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}[label=(\alph*)]
\item Draw a scatter diagram of these data on the axes provided.\\
\includegraphics[max width=\textwidth, alt={}, center]{36cf6341-1957-45b9-9f7d-0914506f5919-13_938_673_785_614}
The doctor decides to leave out patient $H$ from his calculations.
\item Give a reason for the doctor's decision.
For the 7 patients $A , B , C , D , E , F$ and $G$,
$$S _ { b p } = 369 , \quad S _ { p p } = 490 \text { and } S _ { b b } = 423 \frac { 5 } { 7 }$$
\item Find the product moment correlation coefficient, $r$, for these 7 patients.
\item Without any further calculations, state how $r$ would differ from your answer in part (c) if it was calculated for all 8 patients.
\begin{figure}[h]
\begin{center}
\includegraphics[alt={},max width=\textwidth]{36cf6341-1957-45b9-9f7d-0914506f5919-15_1322_1593_207_173}
\captionsetup{labelformat=empty}
\caption{Figure 1}
\end{center}
\end{figure}
The histogram in Figure 1 summarises the times, in minutes, that 200 people spent shopping in a supermarket.\\
(a) Give a reason to justify the use of a histogram to represent these data.
Given that 40 people spent between 11 and 21 minutes shopping in the supermarket, estimate\\
(b) the number of people that spent between 18 and 25 minutes shopping in the supermarket,\\
(c) the median time spent shopping in the supermarket by these 200 people.
The mid-point of each bar is represented by $x$ and the corresponding frequency by f .\\
(d) Show that $\sum \mathrm { f } x = 6390$
Given that $\sum \mathrm { f } x ^ { 2 } = 238430$
\item for the data shown in the histogram, calculate estimates of
\begin{enumerate}[label=(\roman*)]
\item the mean,
\item the standard deviation.
A coefficient of skewness is given by $\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }$
\end{enumerate}\item Calculate this coefficient of skewness for these data.
The manager of the supermarket decides to model these data with a normal distribution.
\item Comment on the manager's decision. Give a justification for your answer.
\end{enumerate}
\hfill \mbox{\textit{Edexcel S1 2015 Q7 [6]}}