2.02c Scatter diagrams and regression lines

115 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI C2 2016 June Q11
12 marks Moderate -0.3
There are many different flu viruses. The numbers of flu viruses detected in the first few weeks of the 2012–2013 flu epidemic in the UK were as follows.
Week12345678910
Number of flu viruses710243240386396234480
These data may be modelled by an equation of the form \(y = a \times 10^{bt}\), where \(y\) is the number of flu viruses detected in week \(t\) of the epidemic, and \(a\) and \(b\) are constants to be determined.
  1. Explain why this model leads to a straight-line graph of \(\log_{10} y\) against \(t\). State the gradient and intercept of this graph in terms of \(a\) and \(b\). [3]
  2. Complete the values of \(\log_{10} y\) in the table, draw the graph of \(\log_{10} y\) against \(t\), and draw by eye a line of best fit for the data. Hence determine the values of \(a\) and \(b\) and the equation for \(y\) in terms of \(t\) for this model. [8]
During the decline of the epidemic, an appropriate model was $$y = 921 \times 10^{-0.137w},$$ where \(y\) is the number of flu viruses detected in week \(w\) of the decline.
  1. Use this to find the number of viruses detected in week 4 of the decline. [1]
Edexcel S1 Q6
21 marks Standard +0.3
A missile was fired vertically upwards and its height above ground level, \(h\) metres, was found at various times \(t\) seconds after it was released. The results are given in the following table:
\(t\)1234567
\(h\)68126174216240252266
It is thought that this data can be fitted to the formula \(h = pt - qt^2\).
  1. Show that this equation can be written as \(\frac{h}{t} = p - qt\). [1 mark]
  2. Plot a scatter diagram of \(\frac{h}{t}\) against \(t\). [5 marks]
Given that \(\sum h = 1342\), \(\sum \frac{h}{t} = 371\) and \(\sum \frac{h^2}{t^2} = 20385\),
  1. find the equation of the regression line of \(\frac{h}{t}\) on \(t\) and hence write down the values of \(p\) and \(q\). [8 marks]
  2. Use your equation to find the value of \(h\) when \(t = 10\). Comment on the implication of your answer. [3 marks]
  3. Find the product-moment correlation coefficient between \(\frac{h}{t}\) and \(t\) and state the significance of its value. [4 marks]
Edexcel S1 Q6
17 marks Moderate -0.3
Penshop have stores selling stationary in each of 6 towns. The population, \(P\), in tens of thousands and the monthly turnover, \(T\), in thousands of pounds for each of the shops are as recorded below.
TownAbbertonBemberClasterDellerEdgetonFigland
\(P\) (0.000's)3.27.65.29.08.14.8
\(T\) (£ 000's)11.112.413.319.317.911.8
  1. Represent these data on a scatter diagram with \(T\) on the vertical axis. [4]
    1. Which town's shop might appear to be underachieving given the populations of the towns?
    2. Suggest two other factors that might affect each shop's turnover. [3]
You may assume that $$\Sigma P = 37.9, \quad \Sigma T = 85.8, \quad \Sigma P^2 = 264.69, \quad \Sigma T^2 = 1286, \quad \Sigma PT = 574.25.$$
  1. Find the equation of the regression line of \(T\) on \(P\). [7]
  2. Estimate the monthly turnover that might be expected if a shop were opened in Gratton, a town with a population of 68 000. [2]
  3. Why might the management of Penshop be reluctant to use the regression line to estimate the monthly turnover they could expect if a shop were opened in Haggin, a town with a population of 172 000? [1]
Edexcel S1 Q6
16 marks Moderate -0.3
The Principal of a school believes that more students are absent on days when the temperature is lower. Over a two-week period in December she records the percentage of students who are absent, \(A\%\), and the temperature, \(T°\)C, at 9 am each morning giving these results.
\(T\) (°C)4\(-3\)\(-2\)\(-6\)037\(-1\)32
\(A\) (\%)8.514.117.020.317.915.512.412.813.711.6
  1. Represent these data on a scatter diagram. [4 marks]
You may use $$\Sigma T = 7, \quad \Sigma A = 143.8, \quad \Sigma T^2 = 137, \quad \Sigma A^2 = 2172.66, \quad \Sigma TA = 20.7$$
  1. Calculate the product moment correlation coefficient for these data and comment on the Principal's hypothesis. [6 marks]
  2. Find an equation of the regression line of \(A\) on \(T\) in the form \(A = p + qT\). [4 marks]
  3. Draw the regression line on your scatter diagram. [2 marks]
OCR MEI S2 2007 January Q1
18 marks Moderate -0.8
In a science investigation into energy conservation in the home, a student is collecting data on the time taken for an electric kettle to boil as the volume of water in the kettle is varied. The student's data are shown in the table below, where \(v\) litres is the volume of water in the kettle and \(t\) seconds is the time taken for the kettle to boil (starting with the water at room temperature in each case). Also shown are summary statistics and a scatter diagram on which the regression line of \(t\) on \(v\) is drawn.
\(v\)0.20.40.60.81.0
\(t\)4478114156172
\(n = 5\), \(\Sigma v = 3.0\), \(\Sigma t = 564\), \(\Sigma v^2 = 2.20\), \(\Sigma vt = 405.2\). \includegraphics{figure_1}
  1. Calculate the equation of the regression line of \(t\) on \(v\), giving your answer in the form \(t = a + bv\). [5]
  2. Use this equation to predict the time taken for the kettle to boil when the amount of water which it contains is
    1. 0.5 litres,
    2. 1.5 litres.
    Comment on the reliability of each of these predictions. [4]
  3. In the equation of the regression line found in part (i), explain the role of the coefficient of \(v\) in the relationship between time taken and volume of water. [2]
  4. Calculate the values of the residuals for \(v = 0.8\) and \(v = 1.0\). [4]
  5. Explain how, on a scatter diagram with the regression line drawn accurately on it, a residual could be measured and its sign determined. [3]
OCR H240/02 2023 June Q12
4 marks Standard +0.3
A student has an ordinary six-sided dice. The student suspects that it is biased against six, so that when it is thrown, it is less likely to show a six than if it were fair. In order to test this suspicion, the student plans to carry out a hypothesis test at the 5% significance level. The student throws the dice 100 times and notes the number of times, \(X\), that it shows a six.
  1. Determine the largest value of \(X\) that would provide evidence at the 5% significance level that the dice is biased against six. [3]
Later another student carries out a similar test, at the 5% significance level. This student also throws the dice 100 times.
  1. It is given that the dice is fair. Find the probability that the conclusion of the test is that there is significant evidence that the dice is biased against six. [1]
AQA AS Paper 1 2020 June Q10
12 marks Moderate -0.8
Raj is investigating how the price, \(P\) pounds, of a brilliant-cut diamond ring is related to the weight, \(C\) carats, of the diamond. He believes that they are connected by a formula $$P = aC^n$$ where \(a\) and \(n\) are constants.
  1. Express \(\ln P\) in terms of \(\ln C\). [2 marks]
  2. Raj researches the price of three brilliant-cut diamond rings on a website with the following results.
    \(C\)0.601.151.50
    \(P\)49512001720
    1. Plot \(\ln P\) against \(\ln C\) for the three rings on the grid below. [2 marks] \includegraphics{figure_10b}
    2. Explain which feature of the plot suggests that Raj's belief may be correct. [1 mark]
    3. Using the graph on page 15, estimate the value of \(a\) and the value of \(n\). [4 marks]
  3. Explain the significance of \(a\) in this context. [1 mark]
  4. Raj wants to buy a ring with a brilliant-cut diamond of weight 2 carats. Estimate the price of such a ring. [2 marks]
AQA AS Paper 1 Specimen Q10
7 marks Standard +0.3
A student conducts an experiment and records the following data for two variables, \(x\) and \(y\).
\(x\)123456
\(y\)1445130110013003400
\(\log_{10} y\)
The student is told that the relationship between \(x\) and \(y\) can be modelled by an equation of the form \(y = kb^x\)
  1. Plot values of \(\log_{10} y\) against \(x\) on the grid below. [2 marks] \includegraphics{figure_10}
  2. State, with a reason, which value of \(y\) is likely to have been recorded incorrectly. [1 mark]
  3. By drawing an appropriate straight line, find the values of \(k\) and \(b\). [4 marks]
AQA AS Paper 2 2018 June Q18
6 marks Easy -1.2
Jennie is a piano teacher who teaches nine pupils. She records how many hours per week they practice the piano along with their most recent practical exam score.
StudentPractice (hours per week)Practical exam score (out of 100)
Donovan5064
Vazquez671
Higgins355
Begum2.547
Collins180
Coldbridge461
Nedbalek4.565
Carter883
White1192
[diagram]
  1. Identify two possible outliers by name, giving a possible explanation for the position on the scatter diagram of each outlier. [4 marks]
  2. Jennie discards the two outliers.
    1. Describe the correlation shown by the scatter diagram for the remaining points. [1 mark]
    2. Interpret this correlation in the context of the question. [1 mark]
OCR MEI AS Paper 2 2018 June Q11
9 marks Easy -1.8
The pre-release material contains data concerning the death rate per thousand people and the birth rate per thousand people in all the countries of the world. The diagram in Fig. 11.1 was generated using a spreadsheet and summarises the birth rates for all the countries in Africa. \includegraphics{figure_11_1} Fig. 11.1
  1. Identify two respects in which the presentation of the data is incorrect. [2]
Fig. 11.2 shows a scatter diagram of death rate, \(y\), against birth rate, \(x\), for a sample of 55 countries, all of which are in Africa. A line of best fit has also been drawn. \includegraphics{figure_11_2} Fig. 11.2 The equation of the line of best fit is \(y = 0.15x + 4.72\).
    1. What does the diagram suggest about the relationship between death rate and birth rate? [1]
    2. The birth rate in Togo is recorded as 34.13 per thousand, but the data on death rate has been lost. Use the equation of the line of best fit to estimate the death rate in Togo. [1]
    3. Explain why it would not be sensible to use the equation of the line of best fit to estimate the death rate in a country where the birth rate is 5.5 per thousand. [1]
    4. Explain why it would not be sensible to use the equation of the line of best fit to estimate the death rate in a Caribbean country where the birth rate is known. [1]
    5. Explain why it is unlikely that the sample is random. [1]
Including Togo there were 56 items available for selection.
  1. Describe how a sample of size 14 from this data could be generated for further analysis using systematic sampling. [2]
OCR MEI Paper 2 2022 June Q15
9 marks Easy -2.0
The pre-release material includes information on life expectancy at birth in countries of the world. Fig. 15.1 shows the data for Liberia, which is in Africa, together with a time series graph. \includegraphics{figure_15_1} Sundip uses the LINEST function on a spreadsheet to model life expectancy as a function of calendar year by a straight line. The equation of this line is \(L = 0.473y - 892\), where \(L\) is life expectancy at birth and \(y\) is calendar year.
  1. Use this model to find an estimate of the life expectancy at birth in Liberia in 1995. [1]
According to the model, the life expectancy at birth in Liberia in 2025 is estimated to be 65.83 years.
  1. Explain whether each of these two estimates is likely to be reliable. [2]
  2. Use your knowledge of the pre-release material to explain whether this model could be used to obtain a reliable estimate of the life expectancy at birth in other countries in 1995. [1]
Fig. 15.2 shows the life expectancy at birth between 1960 and 2010 for Italy and South Africa. \includegraphics{figure_15_2}
  1. Use your knowledge of the pre-release material to
    [2]
Sundip is investigating whether there is an association between the wealth of a country and life expectancy at birth in that country. As part of her analysis she draws a scatter diagram of GDP per capita in US\$ and life expectancy at birth in 2010 for all the countries in Europe for which data is available. She accidentally includes the data for the Central African Republic. The diagram is shown in Fig. 15.3. \includegraphics{figure_15_3}
  1. On the copy of Fig. 15.3 in the Printed Answer Booklet, use your knowledge of the pre-release material to circle the point representing the data for the Central African Republic. [1]
Sundip states that as GDP per capita increases, life expectancy at birth increases.
  1. Explain to what extent the information in Fig. 15.3 supports Sundip's statement. [2]
SPS SPS SM 2022 October Q8
7 marks Standard +0.3
\includegraphics{figure_2} The resting heart rate, \(h\), of a mammal, measured in beats per minute, is modelled by the equation $$h = pm^q$$ where \(p\) and \(q\) are constants and \(m\) is the mass of the mammal measured in kg. Figure 2 illustrates the linear relationship between \(\log_{10} h\) and \(\log_{10} m\) The line meets the vertical \(\log_{10} h\) axis at 2.25 and has a gradient of \(-0.235\)
  1. Find, to 3 significant figures, the value of \(p\) and the value of \(q\). [3]
A particular mammal has a mass of 5kg and a resting heart rate of 119 beats per minute.
  1. Comment on the suitability of the model for this mammal. [3]
  2. With reference to the model, interpret the value of the constant \(p\). [1]
SPS SPS SM Pure 2023 June Q15
6 marks Moderate -0.5
The resting metabolic rate, \(R\) ml of oxygen consumed per hour, of a particular species of mammal is modelled by the formula, $$R = aM^b$$ where • \(M\) grams is the mass of the mammal • \(a\) and \(b\) are constants
  1. Show that this relationship can be written in the form $$\log_{10} R = b \log_{10} M + \log_{10} a$$ [2] \includegraphics{figure_3} A student gathers data for \(R\) and \(M\) and plots a graph of \(\log_{10} R\) against \(\log_{10} M\) The graph is a straight line passing through points \((0.7, 1.2)\) and \((1.8, 1.9)\) as shown in Figure 3.
  2. Using this information, find a complete equation for the model. Write your answer in the form $$R = aM^b$$ giving the value of each of \(a\) and \(b\) to 3 significant figures. [3]
  3. With reference to the model, interpret the value of the constant \(a\) [1]
OCR H240/02 2017 Specimen Q13
5 marks Moderate -0.8
The table and the four scatter diagrams below show data taken from the 2011 UK census for four regions. On the scatter diagrams the names have been replaced by letters. The table shows, for each region, the mean and standard deviation of the proportion of workers in each Local Authority who travel to work by driving a car or van and the proportion of workers in each Local Authority who travel to work as a passenger in a car or van. Each scatter diagram shows, for each of the Local Authorities in a particular region, the proportion of workers who travel to work by driving a car or van and the proportion of workers who travel to work as a passenger in a car or van. \includegraphics{figure_13}
  1. Using the values given in the table, match each region to its corresponding scatter diagram, explaining your reasoning. [3]
  2. Steven claims that the outlier in the scatter diagram for Region C consists of a group of small islands. Explain whether or not the data given above support his claim. [1]
  3. One of the Local Authorities in Region B consists of a single large island. Explain whether or not you would expect this Local Authority to appear as an outlier in the scatter diagram for Region B. [1]
OCR AS Pure 2017 Specimen Q5
7 marks Moderate -0.8
A doctors' surgery starts a campaign to reduce missed appointments. The number of missed appointments for each of the first five weeks after the start of the campaign is shown below.
Number of weeks after the start (\(x\))12345
Number of missed appointments (\(y\))235149995938
This data could be modelled by an equation of the form \(y = pq^x\) where \(p\) and \(q\) are constants.
  1. Show that this relationship may be expressed in the form \(\log_{10} y = mx + c\), expressing \(m\) and \(c\) in terms of \(p\) and/or \(q\). [2]
The diagram below shows \(\log_{10} y\) plotted against \(x\), for the given data. \includegraphics{figure_5}
  1. Estimate the values of \(p\) and \(q\). [3]
  2. Use the model to predict when the number of missed appointments will fall below 20. Explain why this answer may not be reliable. [2]