2.02d Informal interpretation of correlation

51 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI Further Statistics A AS 2020 November Q2
12 marks Standard +0.3
2 A researcher is investigating the concentration of bacteria and fungi in the air in buildings. The researcher selects a random sample of 12 buildings and measures the concentrations of bacteria, \(x\), and fungi, \(y\), in the air in each building. Both concentrations are measured in the same standard units. Fig. 2 illustrates the data collected. The researcher wishes to test for a relationship between \(x\) and \(y\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ba3fcd3c-6834-4116-be0e-d5b27aed0a7e-3_595_844_513_255} \captionsetup{labelformat=empty} \caption{Fig. 2}
\end{figure}
  1. Explain why a test based on the product moment correlation coefficient is likely to be appropriate for these data. Summary statistics for the data are as follows. \(n = 12 \quad \sum x = 18030 \quad \sum y = 15550 \quad \sum x ^ { 2 } = 31458700 \quad \sum y ^ { 2 } = 21980500 \quad \sum x y = 25626800\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. Carry out a test at the \(5 \%\) significance level based on the product moment correlation coefficient to investigate whether there is any correlation between concentrations of bacteria and fungi.
  4. Explain why, in order for proper inference to be undertaken, the sample should be chosen randomly.
OCR MEI Further Statistics A AS 2021 November Q3
9 marks Standard +0.3
3 A student is investigating the link between temperature (in degrees Celsius) and electricity consumption (in Gigawatt-hours) in the country in which he lives. The student has read that there is strong negative correlation between daily mean temperature over the whole country and daily electricity consumption during a year. He wonders if this applies to an individual season. He therefore obtains data on the mean temperature and electricity consumption on ten randomly selected days in the summer. The spreadsheet output below shows the data, together with a scatter diagram to illustrate the data. \includegraphics[max width=\textwidth, alt={}, center]{5be067ff-4668-48d6-8ed2-b8dfa3e678f7-3_798_1593_639_251}
  1. Calculate Pearson's product moment correlation coefficient between daily mean temperature and daily electricity consumption. The student decides to carry out a hypothesis test to investigate whether there is negative correlation between daily mean temperature and daily electricity consumption during the summer.
  2. Explain why the student decides to carry out a test based on Pearson's product moment correlation coefficient.
  3. Show that the test at the \(5 \%\) significance level does not result in the null hypothesis being rejected.
  4. The student concludes that there is no correlation between the variables in the summer months. Comment on the student's conclusion.
OCR MEI Further Statistics Minor 2019 June Q5
16 marks Standard +0.3
5 A student wants to know if there is a positive correlation between the amounts of two pollutants, sulphur dioxide and PM10 particulates, on different days in the area of London in which he lives; these amounts, measured in suitable units, are denoted by \(s\) and \(p\) respectively.
He uses a government website to obtain data for a random sample of 15 days on which the amounts of these pollutants were measured simultaneously. Fig. 5.1 is a scatter diagram showing the data. Summary statistics for these 15 values of \(s\) and \(p\) are as follows. \(\sum s _ { 1 } = 155.4 \quad \sum p = 518.9 \quad \sum s ^ { 2 } = 2322.7 \quad \sum p ^ { 2 } = 21270.5 \quad \sum s p = 6009.1\) \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{4a4d5816-5b53-49a1-b72f-f8bcf3b4e8bc-4_935_1134_683_260} \captionsetup{labelformat=empty} \caption{Fig. 5.1}
\end{figure}
  1. Explain why the student might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
  2. Find the value of Pearson's product moment correlation coefficient.
  3. Carry out a test at the \(5 \%\) significance level to investigate whether there is positive correlation between the amounts of sulphur dioxide and PM10 particulates.
  4. Explain why the student made sure that the sample chosen was a random sample. The student also wishes to model the relationship between the amounts of nitrogen dioxide \(n\) and PM10 particulates \(p\).
    He takes a random sample of 54 values of the two variables, both measured at the same times. Fig. 5.2 is a scatter diagram which shows the data, together with the regression line of \(n\) on \(p\), the equation of the regression line and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{4a4d5816-5b53-49a1-b72f-f8bcf3b4e8bc-5_824_1230_495_258} \captionsetup{labelformat=empty} \caption{Fig. 5.2}
    \end{figure}
  5. Predict the value of \(n\) for \(p = 150\).
  6. Discuss the reliability of your prediction in part (e).
OCR MEI Further Statistics Minor 2021 November Q4
14 marks Standard +0.3
4 A scientist is investigating sea salinity (the level of salt in the sea) in a particular area. She wishes to check whether satellite measurements, \(y\), of salinity are similar to those directly measured, \(x\). Both variables are measured in parts per thousand in suitable units. The scientist obtains a random sample of 10 values of \(x\) and the related values of \(y\). Below is a screenshot of a scatter diagram to illustrate the data. She decides to carry out a hypothesis test to check if there is any correlation between direct measurement, \(x\), and satellite measurement, \(y\). \includegraphics[max width=\textwidth, alt={}, center]{691e8b55-e9a1-4fff-b9ee-a71ff1f73ead-5_830_837_589_246}
  1. Explain why the scientist might decide to carry out a test based on the product moment correlation coefficient. Summary statistics for \(x\) and \(y\) are as follows. \(n = 10 \quad \sum x = 351.9 \quad \sum y = 350.0 \quad \sum x ^ { 2 } = 12384.5 \quad \sum y ^ { 2 } = 12251.2 \quad \sum \mathrm { xy } = 12317.2\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive correlation between directly measured and satellite measured salinity levels.
  4. Explain why it would be preferable to use a larger sample. The scientist is also interested in whether there is any correlation between salinity and numbers of a particular species of shrimp in the water. She takes a large sample and finds that the product moment correlation coefficient for this sample is 0.165 . The result of a test based on this sample is to reject the null hypothesis and conclude that there is correlation between salinity and numbers of shrimp.
  5. Comment on the outcome of the hypothesis test with reference to the effect size of 0.165 .
OCR MEI Further Statistics Major 2022 June Q8
14 marks Standard +0.3
8 A swimming coach is investigating whether there is correlation between the times taken by teenage swimmers to swim 50 m Butterfly and 50 m Freestyle. The coach selects a random sample of 11 teenage swimmers and records the times that each of them take for each event. The spreadsheet shows the data, together with a scatter diagram to illustrate the data. \includegraphics[max width=\textwidth, alt={}, center]{77eabbd6-a058-457f-9601-d66f3c2db005-06_712_1465_456_274}
  1. In the scatter diagram, Butterfly times have been plotted on the horizontal axis and Freestyle times on the vertical axis. A student states that the variables should have been plotted the other way around. Explain whether the student is correct. The student decides to carry out a hypothesis test to investigate whether there is any correlation between the times taken for the two events.
  2. Explain why the student decides to carry out a test based on Spearman's rank correlation coefficient.
  3. In this question you must show detailed reasoning. Carry out the test at the 5\% significance level.
  4. The student concludes that there is definitely no correlation between the times. Comment on the student's conclusion.
OCR H240/02 2018 March Q11
7 marks Easy -1.2
11 The scatter diagram shows data, taken from the pre-release data set, for several Local Authorities in one region of the UK in 2011. The diagram shows, for each Local Authority, the number of workers who drove to work, and the number of workers who walked to work. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{2011} \includegraphics[alt={},max width=\textwidth]{6a6316e4-7b2d-4533-988a-4863d79ce668-08_483_956_479_557}
\end{figure}
  1. Four students calculated the value of Pearson's product-moment correlation coefficient for the data in the diagram. Their answers were \(0.913,0.124 , - 0.913\) and - 0.124 . One of these values is correct. Without calculation state, with a reason, which is the correct value.
  2. Sanjay makes the following statement.
    "The diagram shows that, in any Local Authority, if there are a large number of people who drive to work there will be a large number who walk to work." Give a reason why this statement is incorrect.
  3. Rosie makes the following statement.
    "The diagram must be wrong because it shows good positive correlation. If there are more people driving to work, there will be fewer people walking to work, so there would be negative correlation." Explain briefly why Rosie's statement is incorrect.
  4. The diagram shows a fairly close relationship between the two variables. One point on the diagram represents a Local Authority where this relationship is less strong than for the others. On the diagram in the Printed Answer Booklet, label this point A.
  5. Given that the point A represents a metropolitan borough, suggest a reason why the relationship is less strong for this Local Authority than for the others in the region. The scatter diagram below shows the corresponding data for the same region in 2001. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{2001} \includegraphics[alt={},max width=\textwidth]{6a6316e4-7b2d-4533-988a-4863d79ce668-09_481_885_388_591}
    \end{figure}
  6. (a) State a change that has taken place in the metropolitan borough represented by the point A between 2001 and 2011.
    (b) Suggest a possible reason for this change.
OCR AS Pure 2017 Specimen Q11
4 marks Easy -1.2
11 The scatter diagram below shows data taken from the 2011 UK census for each of the Local Authorities in the North East and North West regions.
The scatter diagram shows the total population of the Local Authority and the proportion of its workforce that travel to work by bus, minibus or coach. \includegraphics[max width=\textwidth, alt={}, center]{35d8bb6d-ff0f-4590-b13d-46e4869e2587-07_938_1136_664_260}
  1. Samuel suggests that, with a few exceptions, the data points in the diagram show that Local Authorities with larger populations generally have higher proportions of workers travelling by bus, minibus or coach. On the diagram in the Printed Answer Booklet draw a ring around each of the data points that Samuel might regard as an exception.
  2. Jasper suggests that it is possible to separate these Local Authorities into more than one group with different relationships between population and proportion travelling to work by bus, minibus or coach. Discuss Jasper's suggestion, referring to the data and to how differences between the Local Authorities could explain the patterns seen in the diagram.
    [0pt] [3]
AQA S1 2005 January Q1
7 marks Moderate -0.3
1 Each Monday, Azher has a stall at a town's outdoor market. The table below shows, for each of a random sample of 10 Mondays during 2003, the air temperature, \(x ^ { \circ } \mathrm { C }\), at 9 am and Azher's takings, £y.
Monday\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)
\(\boldsymbol { x }\)2691813712134
\(\boldsymbol { y }\)9710313624512178145128141312
  1. A scatter diagram of these data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{7faa4a2d-f5cc-4cc3-a3a9-5d8290ceabdc-2_901_1068_1078_447} Give two distinct comments, in context, on what this diagram reveals.
  2. One of the Mondays is found to be Easter Monday, the busiest Monday market of the year. Identify which Monday this is most likely to be.
  3. Removing the data for the Monday you identified in part (b), calculate the value of the product moment correlation coefficient for the remaining 9 pairs of values of \(x\) and \(y\).
  4. Name one other variable that would have been likely to affect Azher's takings at this town's outdoor market.
    (l mark)
AQA S1 2007 January Q3
5 marks Easy -1.3
3 Estimate, without undertaking any calculations, the value of the product moment correlation coefficient between the variables \(x\) and \(y\) in each of the three scatter diagrams.
  1. \includegraphics[max width=\textwidth, alt={}, center]{868dc38b-3f24-4218-a300-c3cc2d9ff5d1-03_631_659_516_301}
  2. \includegraphics[max width=\textwidth, alt={}, center]{868dc38b-3f24-4218-a300-c3cc2d9ff5d1-03_620_647_525_1119}
  3. \includegraphics[max width=\textwidth, alt={}, center]{868dc38b-3f24-4218-a300-c3cc2d9ff5d1-03_624_655_1279_303}
    (5 marks)
AQA S1 2010 January Q7
13 marks Standard +0.3
7 [Figure 1, printed on the insert, is provided for use in this question.]
Harold considers himself to be an expert in assessing the auction value of antiques. He regularly visits car boot sales to buy items that he then sells at his local auction rooms. Harold's father, Albert, who is not convinced of his son's expertise, collects the following data from a random sample of 12 items bought by Harold.
ItemPurchase price (£ \(\boldsymbol { x }\) )Auction price (£ y)
A2030
B3545
C1825
D5050
E4538
F5545
G4350
H8190
I9085
J30190
K5765
L11225
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
    1. On Figure 1, complete the scatter diagram for these data.
    2. Comment on what this reveals.
  3. When items J and L are omitted from the data, it is found that $$S _ { x x } = 4854.4 \quad S _ { y y } = 4216.1 \quad S _ { x y } = 4268.8$$
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\) for the remaining 10 items.
    2. Hence revise as necessary your interpretation in part (b).
AQA AS Paper 2 2021 June Q15
3 marks Easy -1.8
15
The number of hours of sunshine and the daily maximum temperature were recorded over a 9-day period in June at an English seaside town. A scatter diagram representing the recorded data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{f87d1b36-26db-4a0b-b9ec-d7d82a396aba-20_872_1511_488_264} One of the points on the scatter diagram is an error. 15
    1. Write down the letter that identifies this point.
      15
      1. (ii) Suggest one possible action that could be taken to deal with this error.
        15
    2. It is claimed that the scatter diagram proves that longer hours of sunshine cause
      higher maximum daily temperatures. Comment on the validity of this claim.
      [0pt] [1 mark]
AQA Paper 1 2021 June Q9
15 marks Moderate -0.3
9 The table below shows the annual global production of plastics, \(P\), measured in millions of tonnes per year, for six selected years.
Year198019851990199520002005
\(\boldsymbol { P }\)7594120156206260
It is thought that \(P\) can be modelled by $$P = A \times 10 ^ { k t }$$ where \(t\) is the number of years after 1980 and \(A\) and \(k\) are constants.
9
  1. Show algebraically that the graph of \(\log _ { 10 } P\) against \(t\) should be linear.
    9
  2. (i) Complete the table below.
    \(\boldsymbol { t }\)0510152025
    \(\boldsymbol { \operatorname { l o g } } _ { \mathbf { 1 0 } } \boldsymbol { P }\)1.881.972.082.31
    9 (b) (ii) Plot \(\log _ { 10 } P\) against \(t\), and draw a line of best fit for the data. \includegraphics[max width=\textwidth, alt={}, center]{042e248a-9efa-4844-957d-f05715900ffc-13_1203_1308_360_367} 9
  3. (i) Hence, show that \(k\) is approximately 0.02
    9 (c) (ii) Find the value of \(A\).
    9
  4. Using the model with \(k = 0.02\) predict the number of tonnes of annual global production of plastics in 2030. 9
  5. Using the model with \(k = 0.02\) predict the year in which \(P\) first exceeds 8000
    9
  6. Give a reason why it may be inappropriate to use the model to make predictions about future annual global production of plastics. \includegraphics[max width=\textwidth, alt={}, center]{042e248a-9efa-4844-957d-f05715900ffc-15_2488_1716_219_153}
Edexcel AS Paper 2 Specimen Q4
7 marks Moderate -0.8
  1. Sara was studying the relationship between rainfall, \(r \mathrm {~mm}\), and humidity, \(h \%\), in the UK. She takes a random sample of 11 days from May 1987 for Leuchars from the large data set.
She obtained the following results.
\(h\)9386959786949797879786
\(r\)1.10.33.720.6002.41.10.10.90.1
Sara examined the rainfall figures and found $$Q _ { 1 } = 0.1 \quad Q _ { 2 } = 0.9 \quad Q _ { 3 } = 2.4$$ A value that is more than 1.5 times the interquartile range (IQR) above \(Q _ { 3 }\) is called an outlier.
  1. Show that \(r = 20.6\) is an outlier.
  2. Give a reason why Sara might:
    1. include
    2. exclude
      this day's reading. Sara decided to exclude this day's reading and drew the following scatter diagram for the remaining 10 days' values of \(r\) and \(h\). \includegraphics[max width=\textwidth, alt={}, center]{8f3dbcb4-3260-4493-a230-12577b4ed691-08_988_1081_1555_420}
  3. Give an interpretation of the correlation between rainfall and humidity. The equation of the regression line of \(r\) on \(h\) for these 10 days is \(r = - 12.8 + 0.15 h\)
  4. Give an interpretation of the gradient of this regression line.
    1. Comment on the suitability of Sara's sampling method for this study.
    2. Suggest how Sara could make better use of the large data set for her study.
Edexcel AS Paper 2 Specimen Q3
6 marks Standard +0.3
  1. Pete is investigating the relationship between daily rainfall, \(w \mathrm {~mm}\), and daily mean pressure, \(p\) hPa , in Perth during 2015. He used the large data set to take a sample of size 12.
He obtained the following results.
\(p\)100710121013100910191010101010101013101110141022
\(w\)102.063.063.038.438.035.034.232.030.428.028.015
Pete drew the following scatter diagram for the values of \(w\) and \(p\) and calculated the quartiles.
Q 1Q 2Q 3
\(p\)10101011.51013.5
\(w\)29.234.650.7
\includegraphics[max width=\textwidth, alt={}]{b29b0411-8401-420b-9227-befe25c245d8-04_818_1081_989_477}
An outlier is a value which is more than 1.5 times the interquartile range above Q3 or more than 1.5 times the interquartile range below Q1.
  1. Show that the 3 points circled on the scatter diagram above are outliers.
    (2)
  2. Describe the effect of removing the 3 outliers on the correlation between daily rainfall and daily mean pressure in this sample.
    (1) John has also been studying the large data set and believes that the sample Pete has taken is not random.
  3. From your knowledge of the large data set, explain why Pete's sample is unlikely to be a random sample. John finds that the equation of the regression line of \(w\) on \(p\), using all the data in the large data set, is $$w = 1023 - 0.223 p$$
  4. Give an interpretation of the figure - 0.223 in this regression line. John decided to use the regression line to estimate the daily rainfall for a day in December when the daily mean pressure is 1011 hPa .
  5. Using your knowledge of the large data set, comment on the reliability of John's estimate.
    (Total for Question 3 is 6 marks)
Edexcel Paper 3 Specimen Q2
7 marks Moderate -0.3
2. A researcher believes that there is a linear relationship between daily mean temperature and daily total rainfall. The 7 places in the northern hemisphere from the large data set are used. The mean of the daily mean temperatures, \(t ^ { \circ } \mathrm { C }\), and the mean of the daily total rainfall, \(s \mathrm {~mm}\), for the month of July in 2015 are shown on the scatter diagram below. \includegraphics[max width=\textwidth, alt={}, center]{565bfa73-8095-4242-80b6-cd47aaff6a31-03_844_1339_497_372}
  1. With reference to the scatter diagram, explain why a linear regression model may not be suitable for the relationship between \(t\) and s .
    (1) The researcher calculated the product moment correlation coefficient for the 7 places and obtained \(r = 0.658\).
  2. Stating your hypotheses clearly, test at the \(10 \%\) level of significance, whether or not the product moment correlation coefficient for the population is greater than zero.
    (3)
  3. Using your knowledge of the large data set, suggest the names of the 2 places labelled \(G\) and \(H\).
    (1)
  4. Using your knowledge from the large data set, and with reference to the locations of the two places labelled \(G\) and \(H\), give a reason why these places have the highest temperatures in July.
    (2)
  5. Suggest how you could make better use of the large data set to investigate the relationship between daily mean temperature and daily total rainfall.
    (1)
    (Total 7 marks)
WJEC Unit 4 Specimen Q5
7 marks Moderate -0.3
5. A hotel owner in Cardiff is interested in what factors hotel guests think are important when staying at a hotel. From a hotel booking website he collects the ratings for 'Cleanliness', 'Location', 'Comfort' and 'Value for money' for a random sample of 17 Cardiff hotels.
(Each rating is the average of all scores awarded by guests who have contributed reviews using a scale from 1 to 10 , where 10 is 'Excellent'.) The scatter graph shows the relationship between 'Value for money' and 'Cleanliness' for the sample of Cardiff hotels. \includegraphics[max width=\textwidth, alt={}, center]{b35e94ab-a426-4fca-9ecb-c659e0143ed7-4_693_1033_749_516}
  1. The product moment correlation coefficient for 'Value for money' and 'Cleanliness' for the sample of 17 Cardiff hotels is 0.895 . Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether this correlation is significant. State your conclusion in context.
  2. The hotel owner also wishes to investigate whether 'Value for money' has a significant correlation with 'Cost per night'. He used a statistical analysis package which provided the following output which includes the Pearson correlation coefficient of interest and the corresponding \(p\)-value.
    Value for moneyCost per night
    Value for money1
    Cost per night
    0.047
    \(( 0.859 )\)
    1
    Comment on the correlation between 'Value for money' and 'Cost per night'.
Edexcel S1 Q6
16 marks Moderate -0.3
The Principal of a school believes that more students are absent on days when the temperature is lower. Over a two-week period in December she records the percentage of students who are absent, \(A\%\), and the temperature, \(T°\)C, at 9 am each morning giving these results.
\(T\) (°C)4\(-3\)\(-2\)\(-6\)037\(-1\)32
\(A\) (\%)8.514.117.020.317.915.512.412.813.711.6
  1. Represent these data on a scatter diagram. [4 marks]
You may use $$\Sigma T = 7, \quad \Sigma A = 143.8, \quad \Sigma T^2 = 137, \quad \Sigma A^2 = 2172.66, \quad \Sigma TA = 20.7$$
  1. Calculate the product moment correlation coefficient for these data and comment on the Principal's hypothesis. [6 marks]
  2. Find an equation of the regression line of \(A\) on \(T\) in the form \(A = p + qT\). [4 marks]
  3. Draw the regression line on your scatter diagram. [2 marks]
OCR H240/02 2023 June Q12
4 marks Standard +0.3
A student has an ordinary six-sided dice. The student suspects that it is biased against six, so that when it is thrown, it is less likely to show a six than if it were fair. In order to test this suspicion, the student plans to carry out a hypothesis test at the 5% significance level. The student throws the dice 100 times and notes the number of times, \(X\), that it shows a six.
  1. Determine the largest value of \(X\) that would provide evidence at the 5% significance level that the dice is biased against six. [3]
Later another student carries out a similar test, at the 5% significance level. This student also throws the dice 100 times.
  1. It is given that the dice is fair. Find the probability that the conclusion of the test is that there is significant evidence that the dice is biased against six. [1]
AQA AS Paper 2 2018 June Q18
6 marks Easy -1.2
Jennie is a piano teacher who teaches nine pupils. She records how many hours per week they practice the piano along with their most recent practical exam score.
StudentPractice (hours per week)Practical exam score (out of 100)
Donovan5064
Vazquez671
Higgins355
Begum2.547
Collins180
Coldbridge461
Nedbalek4.565
Carter883
White1192
[diagram]
  1. Identify two possible outliers by name, giving a possible explanation for the position on the scatter diagram of each outlier. [4 marks]
  2. Jennie discards the two outliers.
    1. Describe the correlation shown by the scatter diagram for the remaining points. [1 mark]
    2. Interpret this correlation in the context of the question. [1 mark]
AQA Paper 3 2019 June Q10
1 marks Easy -2.5
Which of the options below best describes the correlation shown in the diagram below? \includegraphics{figure_10} Tick \((\checkmark)\) one box. [1 mark] moderate positive strong positive moderate negative strong negative
AQA Paper 3 2021 June Q10
1 marks Easy -2.5
Anke has collected data from 30 similar-sized cars to investigate any correlation between the age of the car and the current market value. She calculates the correlation coefficient. Which of the following statements best describes her answer of \(-1.2\)? Tick (\(\checkmark\)) one box. [1 mark] Definitely incorrect Probably incorrect Probably correct Definitely correct
OCR PURE Q12
4 marks Easy -2.3
This question deals with information about the populations of Local Authorities (LAs) in the North of England, taken from the 2011 census. \includegraphics{figure_6} Fig. 1 and Fig. 2 both show strong correlation, but of two different kinds.
  1. For each diagram, use a single word to describe the kind of correlation shown. [1]
  2. For each diagram, suggest a reason, in context, why the correlation is of the particular kind described in part (a). [2]
Fig. 3 is the same as Fig. 2 but with the point \(A\) marked. Fig. 4 shows information about the same LAs as Fig. 2 and Fig. 3. \includegraphics{figure_7}
  1. Point \(A\) in Fig. 3 and point \(B\) in Fig. 4 represent the same LA. Explain how you can tell that this LA has a large population. [1]
OCR MEI AS Paper 2 2018 June Q11
9 marks Easy -1.8
The pre-release material contains data concerning the death rate per thousand people and the birth rate per thousand people in all the countries of the world. The diagram in Fig. 11.1 was generated using a spreadsheet and summarises the birth rates for all the countries in Africa. \includegraphics{figure_11_1} Fig. 11.1
  1. Identify two respects in which the presentation of the data is incorrect. [2]
Fig. 11.2 shows a scatter diagram of death rate, \(y\), against birth rate, \(x\), for a sample of 55 countries, all of which are in Africa. A line of best fit has also been drawn. \includegraphics{figure_11_2} Fig. 11.2 The equation of the line of best fit is \(y = 0.15x + 4.72\).
    1. What does the diagram suggest about the relationship between death rate and birth rate? [1]
    2. The birth rate in Togo is recorded as 34.13 per thousand, but the data on death rate has been lost. Use the equation of the line of best fit to estimate the death rate in Togo. [1]
    3. Explain why it would not be sensible to use the equation of the line of best fit to estimate the death rate in a country where the birth rate is 5.5 per thousand. [1]
    4. Explain why it would not be sensible to use the equation of the line of best fit to estimate the death rate in a Caribbean country where the birth rate is known. [1]
    5. Explain why it is unlikely that the sample is random. [1]
Including Togo there were 56 items available for selection.
  1. Describe how a sample of size 14 from this data could be generated for further analysis using systematic sampling. [2]
WJEC Unit 2 Specimen Q4
7 marks Easy -1.3
A researcher wishes to investigate the relationship between the amount of carbohydrate and the number of calories in different fruits. He compiles a list of 90 different fruits, e.g. apricots, kiwi fruits, raspberries. As he does not have enough time to collect data for each of the 90 different fruits, he decides to select a simple random sample of 14 different fruits from the list. For each fruit selected, he then uses a dieting website to find the number of calories (kcal) and the amount of carbohydrate (g) in a typical adult portion (e.g. a whole apple, a bunch of 10 grapes, half a cup of strawberries). He enters these data into a spreadsheet for analysis.
  1. Explain how the random number function on a calculator could be used to select this sample of 14 different fruits. [3]
  2. The scatter graph represents 'Number of calories' against 'Carbohydrate' for the sample of 14 different fruits.
    1. Describe the correlation between 'Number of calories' and 'Carbohydrate'. [1]
    2. Interpret the correlation between 'Number of calories' and 'Carbohydrate' in this context. [1]
    \includegraphics{figure_1}
  3. The equation of the regression line for this dataset is: 'Number of calories' = 12.4 + 2.9 × 'Carbohydrate'
    1. Interpret the gradient of the regression line in this context. [1]
    2. Explain why it is reasonable for the regression line to have a non-zero intercept in this context. [1]
SPS SPS SM 2021 February Q1
1 marks Easy -2.5
Which of the options below best describes the correlation shown in the diagram below? \includegraphics{figure_1} Tick \((\checkmark)\) one box. [1 mark] moderate positive \(\square\) strong positive \(\square\) moderate negative \(\square\) strong negative \(\square\)