Identify response/explanatory variables

Edexcel S1 2015 January Q5

The resting heart rate, $h$ beats per minute (bpm), and average length of daily exercise, $t$ minutes, of a random sample of 8 teachers are shown in the table below.

$t$	20	35	40	25	45	70	75	90
$h$	88	85	77	75	71	66	60	54

State, with a reason, which variable is the response variable. The equation of the least squares regression line of $h$ on $t$ is $$h = 93.5 - 0.43 t$$
Give an interpretation of the gradient of this regression line.
Find the value of $\bar { t }$ and the value of $\bar { h }$
Show that the point $( \bar { t } , \bar { h } )$ lies on the regression line.
Estimate the resting heart rate of a teacher with an average length of daily exercise of 1 hour.
Comment, giving a reason, on the reliability of the estimate in part (e). The resting heart rate of teachers is assumed to be normally distributed with mean 73 bpm and standard deviation 8 bpm . The middle $95 \%$ of resting heart rates of teachers lies between $a$ and $b$
Find the value of $a$ and the value of $b$.

CAIE FP2 2012 November Q8

8 The yield of a particular crop on a farm is thought to depend principally on the amount of sunshine during the growing season. For a random sample of 8 years, the average yield, $y$ kilograms per square metre, and the average amount of sunshine per day, $x$ hours, are recorded. The results are given in the following table.

$x$	12.2	10.4	5.2	6.3	11.8	10.0	14.2	2.3
$y$	15	9	10	7	8	11	12	6

$$\left[ \Sigma x = 72.4 , \Sigma x ^ { 2 } = 769.9 , \Sigma y = 78 , \Sigma y ^ { 2 } = 820 , \Sigma x y = 761.3 . \right]$$

Find the equation of the regression line of $y$ on $x$.
Find the product moment correlation coefficient.
Test, at the $5 \%$ significance level, whether there is positive correlation between the average yield and the average amount of sunshine per day.

OCR Further Statistics AS 2018 June Q7

7 An environmentalist measures the mean concentration, $c$ milligrams per litre, of a particular chemical in a group of rivers, and the mean mass, $m$ pounds, of fish of a certain species found in those rivers. The results are given in the table.

$c$	1.94	1.78	1.62	1.51	1.52	1.4
$m$	6.5	7.2	7.4	7.6	8.3	9.7

State which, if either, of $m$ and $c$ is an independent variable.
Calculate the equation of the least squares regression line of $c$ on $m$.
State what effect, if any, there would be on your answer to part (ii) if the masses of the fish had been recorded in kilograms rather than pounds. ( $1 \mathrm {~kg} \approx 2.2$ pounds.)
The data is illustrated in the scatter diagram. Explain what is meant by 'least squares', illustrating your answer using the copy of this diagram in the Printed Answer Booklet.
\includegraphics[max width=\textwidth, alt={}, center]{708e125e-43a8-40d8-94db-0ed80337d273-4_719_1043_961_513}

OCR Further Statistics Specimen Q1

1 The table below shows the typical stopping distances $d$ metres for a particular car travelling at $v$ miles per hour.

$v$	20	30	40	50	60	70
$d$	13	24	36	52	72	94

State each of the following words that describe the variable $v$. \section*{Independent Dependent Controlled Response}
Calculate the equation of the regression line of $d$ on $v$.
Use the equation found in part (ii) to estimate the typical stopping distance when this car is travelling at 45 miles per hour. It is given that the product moment correlation coefficient for the data is 0.990 correct to three significant figures.
Explain whether your estimate found in part (iii) is reliable.

Edexcel S1 2008 June Q4

4. Crickets make a noise. The pitch, $v \mathrm { kHz }$, of the noise made by a cricket was recorded at 15 different temperatures, $t ^ { \circ } \mathrm { C }$. These data are summarised below. $$\sum t ^ { 2 } = 10922.81 , \sum v ^ { 2 } = 42.3356 , \sum t v = 677.971 , \sum t = 401.3 , \sum v = 25.08$$

Find $S _ { t t } , S _ { v v }$ and $S _ { t v }$ for these data.
Find the product moment correlation coefficient between $t$ and $v$.
State, with a reason, which variable is the explanatory variable.
Give a reason to support fitting a regression model of the form $v = a + b t$ to these data.
Find the value of $a$ and the value of $b$. Give your answers to 3 significant figures.
Using this model, predict the pitch of the noise at $19 ^ { \circ } \mathrm { C }$.

Edexcel S1 2015 June Q4

Statistical models can provide a cheap and quick way to describe a real world situation.
1. Give two other reasons why statistical models are used.
A scientist wants to develop a model to describe the relationship between the average daily temperature, $x ^ { \circ } \mathrm { C }$, and her household's daily energy consumption, $y \mathrm { kWh }$, in winter. A random sample of the average daily temperature and her household's daily energy consumption are taken from 10 winter days and shown in the table.
$x$ - 0.4 - 0.2 0.3 0.8 1.1 1.4 1.8 2.1 2.5 2.6
$y$ 28 30 26 25 26 27 26 24 22 21
$$\text { [You may use } \sum x ^ { 2 } = 24.76 \quad \sum y = 255 \quad \sum x y = 283.8 \quad \mathrm {~S} _ { x x } = 10.36 \text { ] }$$
Find $\mathrm { S } _ { x y }$ for these data.
Find the equation of the regression line of $y$ on $x$ in the form $y = a + b x$ Give the value of $a$ and the value of $b$ to 3 significant figures.
Give an interpretation of the value of $a$
Estimate her household's daily energy consumption when the average daily temperature is $2 ^ { \circ } \mathrm { C }$ The scientist wants to use the linear regression model to predict her household's energy consumption in the summer.
Discuss the reliability of using this model to predict her household's energy consumption in the summer.

OCR MEI Further Statistics Minor 2022 June Q2

2 A forester is investigating the relationship between the diameter and the height of young beech trees. She selects a random sample of 15 young beech trees in a forest and records their diameters, $d \mathrm {~cm}$, and their heights, $h \mathrm {~m}$. The data are illustrated in the scatter diagram.
\includegraphics[max width=\textwidth, alt={}, center]{e8624e9b-5143-49d2-9683-cc3a1082694e-3_649_1116_386_230}

State whether either or both of the variables $d$ and $h$ are random variables. Summary data for the diameters and heights are as follows. $$\mathrm { n } = 15 \quad \sum \mathrm {~d} = 84.9 \quad \sum \mathrm {~h} = 124.7 \quad \sum \mathrm {~d} ^ { 2 } = 624.55 \quad \sum \mathrm {~h} ^ { 2 } = 1230.57 \quad \sum \mathrm { dh } = 866.63$$
Find the equation of the regression line of $h$ on $d$. Give your answer in the form $h = a d + b$, giving the values of $a$ and $b$ correct to $\mathbf { 2 }$ decimal places.
Use the regression line to predict the heights of beech trees with the following diameters.
- 7.5 cm
- 20.0 cm
- Comment on the reliability of your predictions.
- There are many mature beech trees with diameter of 60 cm or greater. However, there are no beech trees with a height of more than 50 m .
Comment on this in relation to your regression line.
State the coordinates of the point at which the regression line of $d$ on $h$ meets the line which you calculated in part (b).

OCR MEI Further Statistics Major 2020 November Q5

5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, $x$, and the web-based test, $y$, are both measured in the same suitable units.

Half of the participants do the laboratory-based test first and the other half do the web-based test first. Explain why the expert adopts this approach. The scatter diagram in Fig. 5 shows the data that the expert collected. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242} \captionsetup{labelformat=empty} \caption{Fig. 5}
\end{figure} Summary statistics for these data are as follows. $$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
Estimate the web-based scores of people whose laboratory-based scores were as follows.
- 12
- 25
- Comment on the reliability of each of your estimates.
- A colleague of the expert suggests that the regression line is not valid because one of the data values is an outlier.
Stating the approximate coordinates of the outlier, suggest what the expert should do.

WJEC Unit 2 Specimen Q4

4. A researcher wishes to investigate the relationship between the amount of carbohydrate and the number of calories in different fruits. He compiles a list of 90 different fruits, e.g. apricots, kiwi fruits, raspberries. As he does not have enough time to collect data for each of the 90 different fruits, he decides to select a simple random sample of 14 different fruits from the list. For each fruit selected, he then uses a dieting website to find the number of calories (kcal) and the amount of carbohydrate (g) in a typical adult portion (e.g. a whole apple, a bunch of 10 grapes, half a cup of strawberries). He enters these data into a spreadsheet for analysis.

Explain how the random number function on a calculator could be used to select this sample of 14 different fruits.
The scatter graph represents 'Number of calories' against 'Carbohydrate' for the sample of 14 different fruits.
1. Describe the correlation between 'Number of calories' and 'Carbohydrate'.
2. Interpret the correlation between 'Number of calories' and 'Carbohydrate' in this context.
  \includegraphics[max width=\textwidth, alt={}, center]{dfe44f43-5e4d-4b8b-a581-f7889abc5cda-3_810_1154_1315_539}
The equation of the regression line for this dataset is: $$\text { 'Number of calories' } = 12.4 + 2.9 \times \text { 'Carbohydrate' }$$
1. Interpret the gradient of the regression line in this context.
2. Explain why it is reasonable for the regression line to have a non-zero intercept in this context.

\(x\)	- 0.4	- 0.2	0.3	0.8	1.1	1.4	1.8	2.1	2.5	2.6
\(y\)	28	30	26	25	26	27	26	24	22	21

\(t\)	20	35	40	25	45	70	75	90
\(h\)	88	85	77	75	71	66	60	54

\(v\)	20	30	40	50	60	70
\(d\)	13	24	36	52	72	94