2.02c Scatter diagrams and regression lines

115 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI C2 2009 January Q12
12 marks Moderate -0.3
12 Answer part (ii) of this question on the insert provided. The proposal for a major building project was accepted, but actual construction was delayed. Each year a new estimate of the cost was made. The table shows the estimated cost, \(\pounds y\) million, of the project \(t\) years after the project was first accepted.
Years after proposal accepted \(( t )\)12345
Cost \(( \pounds y\) million \()\)250300360440530
The relationship between \(y\) and \(t\) is modelled by \(y = a b ^ { t }\), where \(a\) and \(b\) are constants.
  1. Show that \(y = a b ^ { t }\) may be written as $$\log _ { 10 } y = \log _ { 10 } a + t \log _ { 10 } b$$
  2. On the insert, complete the table and plot \(\log _ { 10 } y\) against \(t\), drawing by eye a line of best fit.
  3. Use your graph and the results of part (i) to find the values of \(\log _ { 10 } a\) and \(\log _ { 10 } b\) and hence \(a\) and \(b\).
  4. According to this model, what was the estimated cost of the project when it was first accepted?
  5. Find the value of \(t\) given by this model when the estimated cost is \(\pounds 1000\) million. Give your answer rounded to 1 decimal place.
OCR MEI C3 Q9
18 marks Moderate -0.3
9 Answer parts (ii) and (iii) of this question on the Insert provided. The bat population of a colony is being investigated and data are collected of the estimated number of bats in the colony at the beginning of each year. It is thought that the population may be modelled by the formula $$P = P _ { 0 } \mathrm { e } ^ { k t }$$ where \(P _ { 0 }\) and \(k\) are constants, \(P\) is the number of bats and \(t\) is the number of years after the start of the collection of data.
  1. Explain why a graph of \(\ln P\) against \(t\) should give a straight line. State the gradient and intercept of this line.
  2. The data collected are as follows.
    Time \(( t\) years \()\)01234
    Number of bats, \(P\)100170300340360
    Using the first three pairs of data in the table, plot \(\ln P\) against \(t\) on the axes given on the Insert, and hence estimate values for \(P _ { 0 }\) and \(k\).
    (Work to three significant figures.) This model assumes exponential growth, and assumes that once born a bat does not die, continuing to reproduce. This is unrealistic and so a second model is proposed with formula $$P = 150 \arctan ( t - 1 ) + 170$$ (You are reminded that arctan values should be given in radians.)
  3. Plot on a single graph on the Insert the curves \(P = P _ { 0 } \mathrm { e } ^ { k t }\) for your values of \(P _ { 0 }\) and \(k\) and \(P = 150 \arctan ( t - 1 ) + 170\). The data pairs in the table above have been plotted for you.
  4. Using the second model calculate an estimate of the number of years it is before the bat population exceeds 375. \section*{Insert for question 3.}
  5. Sketch the graph of \(y = 2 \mathrm { f } ( x )\) \includegraphics[max width=\textwidth, alt={}, center]{3853d1e7-ae1f-4eca-93c7-96f03b6d31c3-6_641_1431_541_354}
  6. Sketch the graph of \(y = \mathrm { f } ( 2 x )\). \includegraphics[max width=\textwidth, alt={}, center]{3853d1e7-ae1f-4eca-93c7-96f03b6d31c3-6_691_1539_1468_374} \section*{Insert for question 9.}
  7. Plot \(\ln P\) against \(t\). \includegraphics[max width=\textwidth, alt={}, center]{3853d1e7-ae1f-4eca-93c7-96f03b6d31c3-7_704_1442_443_338}
  8. Plot the curves \(P = P _ { 0 } \mathrm { e } ^ { k t }\) and \(P = 150 \arctan ( t - 1 ) + 170\) for your values of \(P _ { 0 }\) and \(k\). The data pairs are plotted on the graph. \includegraphics[max width=\textwidth, alt={}, center]{3853d1e7-ae1f-4eca-93c7-96f03b6d31c3-7_780_1399_1546_333}
OCR AS Pure 2017 Specimen Q11
4 marks Easy -1.2
11 The scatter diagram below shows data taken from the 2011 UK census for each of the Local Authorities in the North East and North West regions.
The scatter diagram shows the total population of the Local Authority and the proportion of its workforce that travel to work by bus, minibus or coach. \includegraphics[max width=\textwidth, alt={}, center]{35d8bb6d-ff0f-4590-b13d-46e4869e2587-07_938_1136_664_260}
  1. Samuel suggests that, with a few exceptions, the data points in the diagram show that Local Authorities with larger populations generally have higher proportions of workers travelling by bus, minibus or coach. On the diagram in the Printed Answer Booklet draw a ring around each of the data points that Samuel might regard as an exception.
  2. Jasper suggests that it is possible to separate these Local Authorities into more than one group with different relationships between population and proportion travelling to work by bus, minibus or coach. Discuss Jasper's suggestion, referring to the data and to how differences between the Local Authorities could explain the patterns seen in the diagram.
    [0pt] [3]
Edexcel S1 2003 June Q7
16 marks Moderate -0.8
  1. Eight students took tests in mathematics and physics. The marks for each student are given in the table below where \(m\) represents the mathematics mark and \(p\) the physics mark.
\multirow{2}{*}{}Student
\(A\)B\(C\)D\(E\)\(F\)G\(H\)
\multirow{2}{*}{Mark}\(m\)9141310782017
\(p\)1123211519103126
A science teacher believes that students' marks in physics depend upon their mathematical ability. The teacher decides to investigate this relationship using the test marks.
  1. Write down which is the explanatory variable in this investigation.
  2. Draw a scatter diagram to illustrate these data.
  3. Showing your working, find the equation of the regression line of \(p\) on \(m\).
  4. Draw the regression line on your scatter diagram. A ninth student was absent for the physics test, but she sat the mathematics test and scored 15 .
  5. Using this model, estimate the mark she would have scored in the physics test.
AQA S1 2005 January Q1
7 marks Moderate -0.3
1 Each Monday, Azher has a stall at a town's outdoor market. The table below shows, for each of a random sample of 10 Mondays during 2003, the air temperature, \(x ^ { \circ } \mathrm { C }\), at 9 am and Azher's takings, £y.
Monday\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)
\(\boldsymbol { x }\)2691813712134
\(\boldsymbol { y }\)9710313624512178145128141312
  1. A scatter diagram of these data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{7faa4a2d-f5cc-4cc3-a3a9-5d8290ceabdc-2_901_1068_1078_447} Give two distinct comments, in context, on what this diagram reveals.
  2. One of the Mondays is found to be Easter Monday, the busiest Monday market of the year. Identify which Monday this is most likely to be.
  3. Removing the data for the Monday you identified in part (b), calculate the value of the product moment correlation coefficient for the remaining 9 pairs of values of \(x\) and \(y\).
  4. Name one other variable that would have been likely to affect Azher's takings at this town's outdoor market.
    (l mark)
AQA S1 2010 January Q7
13 marks Standard +0.3
7 [Figure 1, printed on the insert, is provided for use in this question.]
Harold considers himself to be an expert in assessing the auction value of antiques. He regularly visits car boot sales to buy items that he then sells at his local auction rooms. Harold's father, Albert, who is not convinced of his son's expertise, collects the following data from a random sample of 12 items bought by Harold.
ItemPurchase price (£ \(\boldsymbol { x }\) )Auction price (£ y)
A2030
B3545
C1825
D5050
E4538
F5545
G4350
H8190
I9085
J30190
K5765
L11225
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
    1. On Figure 1, complete the scatter diagram for these data.
    2. Comment on what this reveals.
  3. When items J and L are omitted from the data, it is found that $$S _ { x x } = 4854.4 \quad S _ { y y } = 4216.1 \quad S _ { x y } = 4268.8$$
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\) for the remaining 10 items.
    2. Hence revise as necessary your interpretation in part (b).
AQA S1 2015 June Q5
11 marks Moderate -0.8
5 The table shows the number of customers, \(x\), and the takings, \(\pounds y\), recorded to the nearest \(\pounds 10\), at a local butcher's shop on each of 10 randomly selected weekdays.
\(\boldsymbol { x }\)86606546719356817557
\(\boldsymbol { y }\)9407906205307701050690780860550
  1. The first 6 pairs of data values in this table are plotted on the scatter diagram shown on the opposite page. Plot the final 4 pairs of data values on the scatter diagram.
    1. Calculate the equation of the least squares regression line in the form \(y = a + b x\) and draw your line on the scatter diagram.
    2. Interpret your value for \(b\) in the context of the question.
    3. State why your value for \(a\) has no practical interpretation.
  2. Estimate, to the nearest \(\pounds 10\), the shop's takings when the number of customers is 50 .
    [0pt] [1 mark]
    \includegraphics[max width=\textwidth, alt={}]{4c679380-894f-4d36-aec8-296b662058e2-14_1255_1705_1448_155}
    Butcher's shop \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Answer space for question 5} \includegraphics[alt={},max width=\textwidth]{4c679380-894f-4d36-aec8-296b662058e2-15_2335_1760_372_100}
    \end{figure}
AQA AS Paper 2 2021 June Q15
3 marks Easy -1.8
15
The number of hours of sunshine and the daily maximum temperature were recorded over a 9-day period in June at an English seaside town. A scatter diagram representing the recorded data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{f87d1b36-26db-4a0b-b9ec-d7d82a396aba-20_872_1511_488_264} One of the points on the scatter diagram is an error. 15
    1. Write down the letter that identifies this point.
      15
      1. (ii) Suggest one possible action that could be taken to deal with this error.
        15
    2. It is claimed that the scatter diagram proves that longer hours of sunshine cause
      higher maximum daily temperatures. Comment on the validity of this claim.
      [0pt] [1 mark]
AQA Paper 1 2021 June Q9
15 marks Moderate -0.3
9 The table below shows the annual global production of plastics, \(P\), measured in millions of tonnes per year, for six selected years.
Year198019851990199520002005
\(\boldsymbol { P }\)7594120156206260
It is thought that \(P\) can be modelled by $$P = A \times 10 ^ { k t }$$ where \(t\) is the number of years after 1980 and \(A\) and \(k\) are constants.
9
  1. Show algebraically that the graph of \(\log _ { 10 } P\) against \(t\) should be linear.
    9
  2. (i) Complete the table below.
    \(\boldsymbol { t }\)0510152025
    \(\boldsymbol { \operatorname { l o g } } _ { \mathbf { 1 0 } } \boldsymbol { P }\)1.881.972.082.31
    9 (b) (ii) Plot \(\log _ { 10 } P\) against \(t\), and draw a line of best fit for the data. \includegraphics[max width=\textwidth, alt={}, center]{042e248a-9efa-4844-957d-f05715900ffc-13_1203_1308_360_367} 9
  3. (i) Hence, show that \(k\) is approximately 0.02
    9 (c) (ii) Find the value of \(A\).
    9
  4. Using the model with \(k = 0.02\) predict the number of tonnes of annual global production of plastics in 2030. 9
  5. Using the model with \(k = 0.02\) predict the year in which \(P\) first exceeds 8000
    9
  6. Give a reason why it may be inappropriate to use the model to make predictions about future annual global production of plastics. \includegraphics[max width=\textwidth, alt={}, center]{042e248a-9efa-4844-957d-f05715900ffc-15_2488_1716_219_153}
Edexcel AS Paper 2 Specimen Q4
7 marks Moderate -0.8
  1. Sara was studying the relationship between rainfall, \(r \mathrm {~mm}\), and humidity, \(h \%\), in the UK. She takes a random sample of 11 days from May 1987 for Leuchars from the large data set.
She obtained the following results.
\(h\)9386959786949797879786
\(r\)1.10.33.720.6002.41.10.10.90.1
Sara examined the rainfall figures and found $$Q _ { 1 } = 0.1 \quad Q _ { 2 } = 0.9 \quad Q _ { 3 } = 2.4$$ A value that is more than 1.5 times the interquartile range (IQR) above \(Q _ { 3 }\) is called an outlier.
  1. Show that \(r = 20.6\) is an outlier.
  2. Give a reason why Sara might:
    1. include
    2. exclude
      this day's reading. Sara decided to exclude this day's reading and drew the following scatter diagram for the remaining 10 days' values of \(r\) and \(h\). \includegraphics[max width=\textwidth, alt={}, center]{8f3dbcb4-3260-4493-a230-12577b4ed691-08_988_1081_1555_420}
  3. Give an interpretation of the correlation between rainfall and humidity. The equation of the regression line of \(r\) on \(h\) for these 10 days is \(r = - 12.8 + 0.15 h\)
  4. Give an interpretation of the gradient of this regression line.
    1. Comment on the suitability of Sara's sampling method for this study.
    2. Suggest how Sara could make better use of the large data set for her study.
Edexcel AS Paper 2 Specimen Q3
6 marks Standard +0.3
  1. Pete is investigating the relationship between daily rainfall, \(w \mathrm {~mm}\), and daily mean pressure, \(p\) hPa , in Perth during 2015. He used the large data set to take a sample of size 12.
He obtained the following results.
\(p\)100710121013100910191010101010101013101110141022
\(w\)102.063.063.038.438.035.034.232.030.428.028.015
Pete drew the following scatter diagram for the values of \(w\) and \(p\) and calculated the quartiles.
Q 1Q 2Q 3
\(p\)10101011.51013.5
\(w\)29.234.650.7
\includegraphics[max width=\textwidth, alt={}]{b29b0411-8401-420b-9227-befe25c245d8-04_818_1081_989_477}
An outlier is a value which is more than 1.5 times the interquartile range above Q3 or more than 1.5 times the interquartile range below Q1.
  1. Show that the 3 points circled on the scatter diagram above are outliers.
    (2)
  2. Describe the effect of removing the 3 outliers on the correlation between daily rainfall and daily mean pressure in this sample.
    (1) John has also been studying the large data set and believes that the sample Pete has taken is not random.
  3. From your knowledge of the large data set, explain why Pete's sample is unlikely to be a random sample. John finds that the equation of the regression line of \(w\) on \(p\), using all the data in the large data set, is $$w = 1023 - 0.223 p$$
  4. Give an interpretation of the figure - 0.223 in this regression line. John decided to use the regression line to estimate the daily rainfall for a day in December when the daily mean pressure is 1011 hPa .
  5. Using your knowledge of the large data set, comment on the reliability of John's estimate.
    (Total for Question 3 is 6 marks)
Edexcel Paper 3 2018 June Q2
7 marks Standard +0.3
  1. Tessa owns a small clothes shop in a seaside town. She records the weekly sales figures, \(\pounds w\), and the average weekly temperature, \(t ^ { \circ } \mathrm { C }\), for 8 weeks during the summer.
    The product moment correlation coefficient for these data is - 0.915
    1. Stating your hypotheses clearly and using a \(5 \%\) level of significance, test whether or not the correlation between sales figures and average weekly temperature is negative.
    2. Suggest a possible reason for this correlation.
    Tessa suggests that a linear regression model could be used to model these data.
  2. State, giving a reason, whether or not the correlation coefficient is consistent with Tessa's suggestion.
  3. State, giving a reason, which variable would be the explanatory variable. Tessa calculated the linear regression equation as \(w = 10755 - 171 t\)
  4. Give an interpretation of the gradient of this regression equation.
Edexcel Paper 3 Specimen Q2
7 marks Moderate -0.3
2. A researcher believes that there is a linear relationship between daily mean temperature and daily total rainfall. The 7 places in the northern hemisphere from the large data set are used. The mean of the daily mean temperatures, \(t ^ { \circ } \mathrm { C }\), and the mean of the daily total rainfall, \(s \mathrm {~mm}\), for the month of July in 2015 are shown on the scatter diagram below. \includegraphics[max width=\textwidth, alt={}, center]{565bfa73-8095-4242-80b6-cd47aaff6a31-03_844_1339_497_372}
  1. With reference to the scatter diagram, explain why a linear regression model may not be suitable for the relationship between \(t\) and s .
    (1) The researcher calculated the product moment correlation coefficient for the 7 places and obtained \(r = 0.658\).
  2. Stating your hypotheses clearly, test at the \(10 \%\) level of significance, whether or not the product moment correlation coefficient for the population is greater than zero.
    (3)
  3. Using your knowledge of the large data set, suggest the names of the 2 places labelled \(G\) and \(H\).
    (1)
  4. Using your knowledge from the large data set, and with reference to the locations of the two places labelled \(G\) and \(H\), give a reason why these places have the highest temperatures in July.
    (2)
  5. Suggest how you could make better use of the large data set to investigate the relationship between daily mean temperature and daily total rainfall.
    (1)
    (Total 7 marks)
WJEC Unit 4 Specimen Q5
7 marks Moderate -0.3
5. A hotel owner in Cardiff is interested in what factors hotel guests think are important when staying at a hotel. From a hotel booking website he collects the ratings for 'Cleanliness', 'Location', 'Comfort' and 'Value for money' for a random sample of 17 Cardiff hotels.
(Each rating is the average of all scores awarded by guests who have contributed reviews using a scale from 1 to 10 , where 10 is 'Excellent'.) The scatter graph shows the relationship between 'Value for money' and 'Cleanliness' for the sample of Cardiff hotels. \includegraphics[max width=\textwidth, alt={}, center]{b35e94ab-a426-4fca-9ecb-c659e0143ed7-4_693_1033_749_516}
  1. The product moment correlation coefficient for 'Value for money' and 'Cleanliness' for the sample of 17 Cardiff hotels is 0.895 . Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether this correlation is significant. State your conclusion in context.
  2. The hotel owner also wishes to investigate whether 'Value for money' has a significant correlation with 'Cost per night'. He used a statistical analysis package which provided the following output which includes the Pearson correlation coefficient of interest and the corresponding \(p\)-value.
    Value for moneyCost per night
    Value for money1
    Cost per night
    0.047
    \(( 0.859 )\)
    1
    Comment on the correlation between 'Value for money' and 'Cost per night'.
OCR FS1 AS 2021 June Q1
5 marks Moderate -0.3
1 Five observations of bivariate data \(( x , y )\) are given in the table.
\(x\)781264
\(y\)201671723
  1. Find the value of Pearson's product-moment correlation coefficient.
  2. State what your answer to part (a) tells you about a scatter diagram representing the data.
  3. A new variable \(a\) is defined by \(a = 3 x + 4\). Dee says "The value of Pearson's product-moment correlation coefficient between \(a\) and \(y\) will not be the same as the answer to part (a)." State with a reason whether you agree with Dee. An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
    AccountABCDEFGH
    \(p\)1.62.12.42.72.83.35.28.4
    \(q\)1.62.32.22.23.12.97.64.8
    \(n = 8 \quad \Sigma p = 28.5 \quad \Sigma q = 26.7 \quad \Sigma p ^ { 2 } = 136.35 \quad \Sigma q ^ { 2 } = 116.35 \quad \Sigma p q = 116.70\) \includegraphics[max width=\textwidth, alt={}, center]{4c7546b9-03ee-47a1-915f-41e2b4ca19c0-03_762_1248_906_260}
    1. State which, if either, of the variables \(p\) and \(q\) is independent.
    2. Calculate the equation of the regression line of \(q\) on \(p\).
      1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
      2. Give two reasons why this estimate could be considered reliable.
    3. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
CAIE P2 2016 November Q2
5 marks Moderate -0.8
\includegraphics{figure_2} The variables \(x\) and \(y\) satisfy the equation \(y = Ae^{px}\), where \(A\) and \(p\) are constants. The graph of \(\ln y\) against \(x\) is a straight line passing through the points \((5, 3.17)\) and \((10, 4.77)\), as shown in the diagram. Find the values of \(A\) and \(p\) correct to 2 decimal places. [5]
CAIE P2 2016 November Q2
5 marks Moderate -0.8
\includegraphics{figure_2} The variables \(x\) and \(y\) satisfy the equation \(y = Kx^p\), where \(K\) and \(p\) are constants. The graph of \(\ln y\) against \(\ln x\) is a straight line passing through the points \((1.28, 3.69)\) and \((2.11, 4.81)\), as shown in the diagram. Find the values of \(K\) and \(p\) correct to 2 decimal places. [5]
Edexcel S1 2002 January Q7
19 marks Moderate -0.3
A number of people were asked to guess the calorific content of 10 foods. The mean \(s\) of the guesses for each food and the true calorific content \(t\) are given in the table below.
Food\(t\)\(s\)
Packet of biscuits170420
1 potato90160
1 apple80110
Crisp breads1070
Chocolate bar260360
1 slice white bread75135
1 slice brown bread60115
Portion of beef curry270350
Portion of rice pudding165390
Half a pint of milk160200
[You may assume that \(\Sigma t = 1340\), \(\Sigma s = 2310\), \(\Sigma ts = 396775\), \(\Sigma t^2 = 246050\), \(\Sigma s^2 = 694650\).]
  1. Draw a scatter diagram, indicating clearly which is the explanatory (independent) and which is the response (dependent) variable. [3]
  2. Calculate, to 3 significant figures, the product moment correlation coefficient for the above data. [7]
  3. State, with a reason, whether or not the value of the product moment correlation coefficient changes if all the guesses are 50 calories higher than the values in the table. [2]
The mean of the guesses for the portion of rice pudding and for the packet of biscuits are outside the linear relation of the other eight foods.
  1. Find the equation of the regression line of \(s\) on \(t\) excluding the values for rice pudding and biscuits. [3]
[You may now assume that \(S_{tt} = 72587\), \(S_{st} = 63671.875\), \(\bar{t} = 125.625\), \(\bar{s} = 187.5\).]
  1. Draw the regression line on your scatter diagram. [2]
  2. State, with a reason, what the effect would be on the regression line of including the values for a portion of rice pudding and a packet of biscuits. [2]
Edexcel S1 2010 January Q6
18 marks Moderate -0.8
The blood pressures, \(p\) mmHg, and the ages, \(t\) years, of 7 hospital patients are shown in the table below.
PatientABCDEFG
\(t\)42744835562660
\(p\)981301208818280135
[\(\sum t = 341\), \(\sum p = 833\), \(\sum t^2 = 18181\), \(\sum p^2 = 106397\), \(\sum tp = 42948\)]
  1. Find \(S_{tt}\), \(S_{pp}\) and \(S_t\) for these data. [4]
  2. Calculate the product moment correlation coefficient for these data. [3]
  3. Interpret the correlation coefficient. [1]
  4. On the graph paper on page 17, draw the scatter diagram of blood pressure against age for these 7 patients. [2]
  5. Find the equation of the regression line of \(p\) on \(t\). [4]
  6. Plot your regression line on your scatter diagram. [2]
  7. Use your regression line to estimate the blood pressure of a 40 year old patient. [2]
Edexcel S1 2011 June Q7
12 marks Moderate -0.8
A teacher took a random sample of 8 children from a class. For each child the teacher recorded the length of their left foot, \(f\) cm, and their height, \(h\) cm. The results are given in the table below.
\(f\)2326232227242021
\(h\)135144134136140134130132
(You may use \(\sum f = 186 \quad \sum h = 1085 \quad S_{ff} = 39.5 \quad S_{hh} = 139.875 \quad \sum fh = 25291\))
  1. Calculate \(S_{fh}\) [2]
  2. Find the equation of the regression line of \(h\) on \(f\) in the form \(h = a + bf\). Give the value of \(a\) and the value of \(b\) correct to 3 significant figures. [5]
  3. Use your equation to estimate the height of a child with a left foot length of 25 cm. [2]
  4. Comment on the reliability of your estimate in (c), giving a reason for your answer. [2]
The left foot length of the teacher is 25 cm.
  1. Give a reason why the equation in (b) should not be used to estimate the teacher's height. [1]
Edexcel S3 2005 June Q4
13 marks Standard +0.3
Over a period of time, researchers took 10 blood samples from one patient with a blood disease. For each sample, they measured the levels of serum magnesium, \(s\) mg/dl, in the blood and the corresponding level of the disease protein, \(d\) mg/dl. The results are shown in the table.
\(s\)1.21.93.23.92.54.55.74.01.15.9
\(d\)3.87.011.012.09.012.013.512.22.013.9
[Use \(\sum s^2 = 141.51\), \(\sum d^2 = 1081.74\) and \(\sum sd = 386.32\)]
  1. Draw a scatter diagram to represent these data. [3]
  2. State what is measured by the product moment correlation coefficient. [1]
  3. Calculate \(S_{ss}\), \(S_{dd}\) and \(S_{sd}\). [3]
  4. Calculate the value of the product moment correlation coefficient \(r\) between \(s\) and \(d\). [2]
  5. Stating your hypotheses clearly, test, at the 1\% significance level, whether or not the correlation coefficient is greater than zero. [3]
  6. With reference to your scatter diagram, comment on your result in part (e). [1]
(Total 13 marks)
OCR MEI C2 2013 January Q12
13 marks Moderate -0.3
The table shows population data for a country.
Year19691979198919992009
Population in millions (\(p\))58.8180.35105.27134.79169.71
The data may be represented by an exponential model of growth. Using \(t\) as the number of years after 1960, a suitable model is \(p = a \times 10^{kt}\).
  1. Derive an equation for \(\log_{10} p\) in terms of \(a\), \(k\) and \(t\). [2]
  2. Complete the table and draw the graph of \(\log_{10} p\) against \(t\), drawing a line of best fit by eye. [3]
  3. Use your line of best fit to express \(\log_{10} p\) in terms of \(t\) and hence find \(p\) in terms of \(t\). [4]
  4. According to the model, what was the population in 1960? [1]
  5. According to the model, when will the population reach 200 million? [3]
OCR MEI C2 2006 June Q12
12 marks Moderate -0.8
Answer the whole of this question on the insert provided. A colony of bats is increasing. The population, \(P\), is modelled by \(P = a \times 10^{bt}\), where \(t\) is the time in years after 2000.
  1. Show that, according to this model, the graph of \(\log_{10} P\) against \(t\) should be a straight line of gradient \(b\). State, in terms of \(a\), the intercept on the vertical axis. [3]
  2. The table gives the data for the population from 2001 to 2005.
    Year20012002200320042005
    \(t\)12345
    \(P\)79008800100001130012800
    Complete the table of values on the insert, and plot \(\log_{10} P\) against \(t\). Draw a line of best fit for the data. [3]
  3. Use your graph to find the equation for \(P\) in terms of \(t\). [4]
  4. Predict the population in 2008 according to this model. [2]
OCR MEI C2 2008 June Q13
12 marks Moderate -0.3
The percentage of the adult population visiting the cinema in Great Britain has tended to increase since the 1980s. The table shows the results of surveys in various years.
Year1986/871991/921996/971999/002000/012001/02
Percentage of the adult population visiting the cinema314454565557
Source: Department of National Statistics, www.statistics.gov.uk This growth may be modelled by an equation of the form $$P = at^b,$$ where \(P\) is the percentage of the adult population visiting the cinema, \(t\) is the number of years after the year 1985/86 and \(a\) and \(b\) are constants to be determined.
  1. Show that, according to this model, the graph of \(\log_{10} P\) against \(\log_{10} t\) should be a straight line of gradient \(b\). State, in terms of \(a\), the intercept on the vertical axis. [3]
  2. Complete the table of values on the insert, and plot \(\log_{10} P\) against \(\log_{10} t\). Draw by eye a line of best fit for the data. [4]
  3. Use your graph to find the equation for \(P\) in terms of \(t\). [4]
  4. Predict the percentage of the adult population visiting the cinema in the year 2007/2008 (i.e. when \(t = 22\)), according to this model. [1]
OCR MEI C2 2014 June Q13
13 marks Moderate -0.3
The thickness of a glacier has been measured every five years from 1960 to 2010. The table shows the reduction in thickness from its measurement in 1960.
Year1965197019751980198519901995200020052010
Number of years since 1960 \((t)\)5101520253035404550
Reduction in thickness since 1960 \((h\) m\()\)0.71.01.72.33.64.76.08.21215.9
An exponential model may be used for these data, assuming that the relationship between \(h\) and \(t\) is of the form \(h = a \times 10^{bt}\), where \(a\) and \(b\) are constants to be determined.
  1. Show that this relationship may be expressed in the form \(\log_{10} h = mt + c\), stating the values of \(m\) and \(c\) in terms of \(a\) and \(b\). [2]
  2. Complete the table of values in the answer book, giving your answers correct to 2 decimal places, and plot the graph of \(\log_{10} h\) against \(t\), drawing by eye a line of best fit. [4]
  3. Use your graph to find \(h\) in terms of \(t\) for this model. [4]
  4. Calculate by how much the glacier will reduce in thickness between 2010 and 2020, according to the model. [2]
  5. Give one reason why this model will not be suitable in the long term. [1]