Comment on reliability/validity of prediction

Questions that ask whether a regression line provides reliable estimates for a given value, whether extrapolation is appropriate, or to comment on the validity/reliability of using the model for a specific prediction.

5 questions · Moderate -0.4

Sort by: Default | Easiest first | Hardest first
OCR MEI AS Paper 2 2022 June Q6
6 marks Moderate -0.3
6 The pre-release material contains information about employment rates in London boroughs. The graph shows employment rates for Westminster between 2006 and 2019. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Employment rate in Westminster} \includegraphics[alt={},max width=\textwidth]{e0b502a8-c742-4d78-993c-8c0c7329ec9c-05_641_1465_406_242}
\end{figure} A local politician stated that the diagram shows that more than \(60 \%\) of seventy-year-olds were in employment throughout the period from 2006 to 2019.
  1. Use your knowledge of the pre-release material to explain whether there is any evidence to support this statement. In order to estimate the employment rate in 2020, two different models were proposed using the LINEST function in a spreadsheet. Model 1 (using all the data from 2006 onwards) \(\mathrm { Y } = 0.549 \mathrm { x } - 1040\), Model 2 (using data from 2017 onwards) \(\mathrm { Y } = 2.65 \mathrm { x } - 5280\),
    where \(Y =\) employment rate and \(x =\) calendar year. It was subsequently found that the employment rate in Westminster in 2020 was 68.4\%.
  2. Determine which of the two models provided the better estimate for the employment rate in Westminster in 2020.
  3. Use your knowledge of the pre-release material to explain whether it would be appropriate to use either model to estimate the employment rate in 2020 in other London boroughs.
  4. What does model 2 predict for employment rates in Westminster in the long term?
OCR Further Statistics 2019 June Q1
5 marks Standard +0.3
1 A set of bivariate data ( \(X , Y\) ) is summarised as follows. \(n = 25 , \sum x = 9.975 , \sum y = 11.175 , \sum x ^ { 2 } = 5.725 , \sum y ^ { 2 } = 46.200 , \sum x y = 11.575\)
  1. Calculate the value of Pearson's product-moment correlation coefficient.
  2. Calculate the equation of the regression line of \(y\) on \(x\). It is desired to know whether the regression line of \(y\) on \(x\) will provide a reliable estimate of \(y\) when \(x = 0.75\).
  3. State one reason for believing that the estimate will be reliable.
  4. State what further information is needed in order to determine whether the estimate is reliable.
OCR MEI Further Statistics Major Specimen Q3
11 marks Standard +0.3
3 A researcher is investigating factors that might affect how many hours per day different species of mammals spend asleep. First she investigates human beings. She collects data on body mass index, \(x\), and hours of sleep, \(y\), for a random sample of people. A scatter diagram of the data is shown in Fig. 3.1 together with the regression line of \(y\) on \(x\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-04_885_1584_598_274} \captionsetup{labelformat=empty} \caption{Fig. 3.1}
\end{figure}
  1. Calculate the residual for the data point which has the residual with the greatest magnitude.
  2. Use the equation of the regression line to estimate the mean number of hours spent asleep by a person with body mass index
    (A) 26,
    (B) 16,
    commenting briefly on each of your predictions. The researcher then collects additional data for a large number of species of mammals and analyses different factors for effect size. Definitions of the variables measured for a typical animal of the species, the correlations between these variables, and guidelines often used when considering effect size are given in Fig. 3.2.
    VariableDefinition
    Body massMass of animal in kg
    Brain massMass of brain in g
    Hours of sleep/dayNumber of hours per day spent asleep
    Life spanHow many years the animal lives
    DangerA measure of how dangerous the animal's situation is when asleep, taking into account predators and how protected the animal's den is: higher value indicates greater danger.
    Correlations (pmcc)Body MassBrain MassHours of sleep/dayLife spanDanger
    Body Mass1.00
    Brain Mass0.931.00
    Hours of sleep/day-0.31-0.361.00
    Life span0.300.51-0.411.00
    Danger0.130.15-0.590.061.00
    \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Fig. 3.2}
    \end{table}
  3. State two conclusions the researcher might draw from these tables, relevant to her investigation into how many hours mammals spend asleep. One of the researcher's students notices the high correlation between body mass and brain mass and produces a scatter diagram for these two variables, shown in Fig. 3.3 below. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-05_675_698_1802_735} \captionsetup{labelformat=empty} \caption{Fig. 3.3}
    \end{figure}
  4. Comment on the suitability of a linear model for these two variables.
OCR S1 2013 January Q3
12 marks Moderate -0.3
The Gross Domestic Product per Capita (GDP), \(x\) dollars, and the Infant Mortality Rate per thousand (IMR), \(y\), of 6 African countries were recorded and summarised as follows. \(n = 6\) \quad \(\sum x = 7000\) \quad \(\sum x^2 = 8700000\) \quad \(\sum y = 456\) \quad \(\sum y^2 = 36262\) \quad \(\sum xy = 509900\)
  1. Calculate the equation of the regression line of \(y\) on \(x\) for these 6 countries. [4]
The original data were plotted on a scatter diagram and the regression line of \(y\) on \(x\) was drawn, as shown below. \includegraphics{figure_3}
  1. The GDP for another country, Tanzania, is 1300 dollars. Use the regression line in the diagram to estimate the IMR of Tanzania. [1]
  2. The GDP for Nigeria is 2400 dollars. Give two reasons why the regression line is unlikely to give a reliable estimate for the IMR for Nigeria. [2]
  3. The actual value of the IMR for Tanzania is 96. The data for Tanzania (\(x = 1300, y = 96\)) is now included with the original 6 countries. Calculate the value of the product moment correlation coefficient, \(r\), for all 7 countries. [4]
  4. The IMR is now redefined as the infant mortality rate per hundred instead of per thousand, and the value of \(r\) is recalculated for all 7 countries. Without calculation state what effect, if any, this would have on the value of \(r\) found in part (iv). [1]
OCR MEI Paper 2 2022 June Q15
9 marks Easy -2.0
The pre-release material includes information on life expectancy at birth in countries of the world. Fig. 15.1 shows the data for Liberia, which is in Africa, together with a time series graph. \includegraphics{figure_15_1} Sundip uses the LINEST function on a spreadsheet to model life expectancy as a function of calendar year by a straight line. The equation of this line is \(L = 0.473y - 892\), where \(L\) is life expectancy at birth and \(y\) is calendar year.
  1. Use this model to find an estimate of the life expectancy at birth in Liberia in 1995. [1]
According to the model, the life expectancy at birth in Liberia in 2025 is estimated to be 65.83 years.
  1. Explain whether each of these two estimates is likely to be reliable. [2]
  2. Use your knowledge of the pre-release material to explain whether this model could be used to obtain a reliable estimate of the life expectancy at birth in other countries in 1995. [1]
Fig. 15.2 shows the life expectancy at birth between 1960 and 2010 for Italy and South Africa. \includegraphics{figure_15_2}
  1. Use your knowledge of the pre-release material to
    [2]
Sundip is investigating whether there is an association between the wealth of a country and life expectancy at birth in that country. As part of her analysis she draws a scatter diagram of GDP per capita in US\$ and life expectancy at birth in 2010 for all the countries in Europe for which data is available. She accidentally includes the data for the Central African Republic. The diagram is shown in Fig. 15.3. \includegraphics{figure_15_3}
  1. On the copy of Fig. 15.3 in the Printed Answer Booklet, use your knowledge of the pre-release material to circle the point representing the data for the Central African Republic. [1]
Sundip states that as GDP per capita increases, life expectancy at birth increases.
  1. Explain to what extent the information in Fig. 15.3 supports Sundip's statement. [2]