2.02d Informal interpretation of correlation

51 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2005 January Q1
4 marks Easy -1.3
1 The scatter diagrams below illustrate three sets of bivariate data, \(A , B\) and \(C\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_440_428_360_317} \captionsetup{labelformat=empty} \caption{Set \(A\)}
\end{figure} \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_440_426_360_858} \captionsetup{labelformat=empty} \caption{Set \(B\)}
\end{figure} \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_435_424_365_1402} \captionsetup{labelformat=empty} \caption{Set \(C\)}
\end{figure} State, with an explanation in each case, which of the three sets of data has
  1. the largest,
  2. the smallest,
    value of the product moment correlation coefficient.
OCR MEI S2 2008 June Q1
18 marks Standard +0.3
1 A researcher believes that there is a negative correlation between money spent by the government on education and population growth in various countries. A random sample of 48 countries is selected to investigate this belief. The level of government spending on education \(x\), measured in suitable units, and the annual percentage population growth rate \(y\), are recorded for these countries. Summary statistics for these data are as follows. $$\Sigma x = 781.3 \quad \Sigma y = 57.8 \quad \Sigma x ^ { 2 } = 14055 \quad \Sigma y ^ { 2 } = 106.3 \quad \Sigma x y = 880.1 \quad n = 48$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the researcher's belief. State your hypotheses clearly, defining any symbols which you use.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A student suggests that if the variables are negatively correlated then population growth rates can be reduced by increasing spending on education. Explain why the student may be wrong. Discuss an alternative explanation for the correlation.
  5. State briefly one advantage and one disadvantage of using a smaller sample size in this investigation.
Edexcel S1 2014 January Q3
11 marks Moderate -0.8
3. Jean works for an insurance company. She randomly selects 8 people and records the price of their car insurance, \(\pounds p\), and the time, \(t\) years, since they passed their driving test. The data is shown in the table below.
\(t\)1013171822242527
\(p\)720650430490500390280300
$$\text { (You may use } \bar { t } = 19.5 , \bar { p } = 470 , S _ { t p } = - 6080 , S _ { t t } = 254 , S _ { p p } = 169200 \text { ) }$$
  1. On the graph below draw a scatter diagram for these data.
  2. Comment on the relationship between \(p\) and \(t\).
  3. Find the equation of the regression line of \(p\) on \(t\).
  4. Use your regression equation to estimate the price of car insurance for someone who passed their driving test 20 years ago. Jack passed his test 39 years ago and decides to use Jean's data to predict the price of his car insurance.
  5. Comment on Jack's decision. Give a reason for your answer. \includegraphics[max width=\textwidth, alt={}, center]{a839a89a-17f0-473b-ac10-bcec3dbe97f7-06_951_1365_1603_294}
Edexcel S1 2015 January Q3
9 marks Moderate -0.8
  1. The table shows the price of a bottle of milk, \(m\) pence, and the price of a loaf of bread, \(b\) pence, for 8 different years.
\(m\)2929353941434446
\(b\)758391121120126119126
(You may use \(\mathrm { S } _ { b b } = 3083.875\) and \(\mathrm { S } _ { m m } = 305.5\) )
  1. Find the exact value of \(\sum b m\)
  2. Find \(\mathrm { S } _ { b m }\)
  3. Calculate the product moment correlation coefficient between \(b\) and \(m\)
  4. Interpret the value of the correlation coefficient. A ninth year is added to the data set. In this year the price of the bottle of milk is 46 pence and the price of a loaf of bread is 175 pence.
  5. Without further calculation, state whether the value of the product moment correlation coefficient will increase, decrease or stay the same when all nine years are used. Give a reason for your answer.
Edexcel S1 2019 January Q6
18 marks Moderate -0.3
  1. Following some school examinations, Chetna is studying the results of the 16 students in her class. The mark for paper \(1 , x\), and the mark for paper \(2 , y\), for each student are summarised in the following statistics.
$$\bar { x } = 35.75 \quad \bar { y } = 25.75 \quad \sigma _ { x } = 7.79 \quad \sigma _ { y } = 11.91 \quad \sum x y = 15837$$
  1. Comment on the differences between the marks of the students on paper 1 and paper 2 Chetna decides to examine these data in more detail and plots the marks for each of the 16 students on the scatter diagram opposite.
    1. Explain why the circled point \(( 38,0 )\) is possibly an outlier.
    2. Suggest a possible reason for this result. Chetna decides to omit the data point \(( 38,0 )\) and examine the other 15 students' marks.
  2. Find the value of \(\bar { x }\) and the value of \(\bar { y }\) for these 15 students. For these 15 students
    1. explain why \(\sum x y\) is still 15837
    2. show that \(\mathrm { S } _ { x y } = 1169.8\) For these 15 students, Chetna calculates \(\mathrm { S } _ { x x } = 965.6\) and \(\mathrm { S } _ { y y } = 1561.7\) correct to 1 decimal place.
  3. Calculate the product moment correlation coefficient for these 15 students.
  4. Calculate the equation of the line of regression of \(y\) on \(x\) for these 15 students, giving your answer in the form \(y = a + b x\) The product moment correlation coefficient between \(x\) and \(y\) for all 16 students is 0.746
  5. Explain how your calculation in part (e) supports Chetna's decision to omit the point \(( 38,0 )\) before calculating the equation of the linear regression line.
    (1)
  6. Estimate the mark in the second paper for a student who scored 38 marks in the first paper.
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-17_1127_1146_301_406}
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-20_2630_1828_121_121}
Edexcel S1 2015 June Q7
6 marks Easy -1.8
7. A doctor is investigating the correlation between blood protein, \(p\), and body mass index, \(b\). He takes a random sample of 8 patients and the data are shown in the table below.
Patient\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(b\)3236404442212737
\(p\)1821313921121970
  1. Draw a scatter diagram of these data on the axes provided. \includegraphics[max width=\textwidth, alt={}, center]{36cf6341-1957-45b9-9f7d-0914506f5919-13_938_673_785_614} The doctor decides to leave out patient \(H\) from his calculations.
  2. Give a reason for the doctor's decision. For the 7 patients \(A , B , C , D , E , F\) and \(G\), $$S _ { b p } = 369 , \quad S _ { p p } = 490 \text { and } S _ { b b } = 423 \frac { 5 } { 7 }$$
  3. Find the product moment correlation coefficient, \(r\), for these 7 patients.
  4. Without any further calculations, state how \(r\) would differ from your answer in part (c) if it was calculated for all 8 patients. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{36cf6341-1957-45b9-9f7d-0914506f5919-15_1322_1593_207_173} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} The histogram in Figure 1 summarises the times, in minutes, that 200 people spent shopping in a supermarket.
    1. Give a reason to justify the use of a histogram to represent these data. Given that 40 people spent between 11 and 21 minutes shopping in the supermarket, estimate
    2. the number of people that spent between 18 and 25 minutes shopping in the supermarket,
    3. the median time spent shopping in the supermarket by these 200 people. The mid-point of each bar is represented by \(x\) and the corresponding frequency by f .
    4. Show that \(\sum \mathrm { f } x = 6390\) Given that \(\sum \mathrm { f } x ^ { 2 } = 238430\)
    5. for the data shown in the histogram, calculate estimates of
      1. the mean,
      2. the standard deviation. A coefficient of skewness is given by \(\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }\)
    6. Calculate this coefficient of skewness for these data. The manager of the supermarket decides to model these data with a normal distribution.
    7. Comment on the manager's decision. Give a justification for your answer.
Edexcel AS Paper 2 2020 June Q2
5 marks Moderate -0.8
  1. Jerry is studying visibility for Camborne using the large data set June 1987.
The table below contains two extracts from the large data set.
It shows the daily maximum relative humidity and the daily mean visibility.
Date
Daily Maximum
Relative Humidity
Daily Mean Visibility
Units\(\%\)
\(10 / 06 / 1987\)905300
\(28 / 06 / 1987\)1000
(The units for Daily Mean Visibility are deliberately omitted.)
Given that daily mean visibility is given to the nearest 100,
  1. write down the range of distances in metres that corresponds to the recorded value 0 for the daily mean visibility. Jerry drew the following scatter diagram, Figure 2, and calculated some statistics using the June 1987 data for Camborne from the large data set. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{d62e5a00-cd23-417f-b244-8b3e24da4aa2-04_823_1764_1281_137} \captionsetup{labelformat=empty} \caption{Figure 2}
    \end{figure} Jerry defines an outlier as a value that is more than 1.5 times the interquartile range above \(Q _ { 3 }\) or more than 1.5 times the interquartile range below \(Q _ { 1 }\).
  2. Show that the point circled on the scatter diagram is an outlier for visibility.
  3. Interpret the correlation between the daily mean visibility and the daily maximum relative humidity. Jerry drew the following scatter diagram, Figure 3, using the June 1987 data for Camborne from the large data set, but forgot to label the \(x\)-axis.
    \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{d62e5a00-cd23-417f-b244-8b3e24da4aa2-05_730_1056_342_386} \captionsetup{labelformat=empty} \caption{Figure 3}
    \end{figure}
  4. Using your knowledge of the large data set, suggest which variable the \(x\)-axis on this scatter diagram represents.
Edexcel AS Paper 2 2023 June Q2
4 marks Moderate -0.8
  1. Fred and Nadine are investigating whether there is a linear relationship between Daily Mean Pressure, \(p \mathrm { hPa }\), and Daily Mean Air Temperature, \(t ^ { \circ } \mathrm { C }\), in Beijing using the 2015 data from the large data set.
Fred randomly selects one month from the data set and draws the scatter diagram in Figure 1 using the data from that month. The scale has been left off the horizontal axis. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{854568d2-b32d-44de-8a9c-26372e509c20-04_794_1539_589_264} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure}
  1. Describe the correlation shown in Figure 1. Nadine chooses to use all of the data for Beijing from 2015 and draws the scatter diagram in Figure 2. She uses the same scales as Fred. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{854568d2-b32d-44de-8a9c-26372e509c20-04_777_1509_1841_278} \captionsetup{labelformat=empty} \caption{Figure 2}
    \end{figure}
  2. Explain, in context, what Nadine can infer about the relationship between \(p\) and \(t\) using the information shown in Figure 2.
  3. Using your knowledge of the large data set, state a value of \(p\) for which interpolation can be used with Figure 2 to predict a value of \(t\).
  4. Using your knowledge of the large data set, explain why it is not meaningful to look for a linear relationship between Daily Mean Wind Speed (Beaufort Conversion) and Daily Mean Air Temperature in Beijing in 2015.
  5. Explain, in context, what Nadine can infer about the relationship between \(p\) and \(t\) using the information shown in Figure 2.
Edexcel Paper 3 2022 June Q6
9 marks Standard +0.3
6. Anna is investigating the relationship between exercise and resting heart rate. She takes a random sample of 19 people in her year at school and records for each person
  • their resting heart rate, \(h\) beats per minute
  • the number of minutes, \(m\), spent exercising each week
Her results are shown on the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{3a09f809-fa28-4b3d-bb69-ea074433bd8f-16_531_551_653_740}
  1. Interpret the nature of the relationship between \(h\) and \(m\) Anna codes the data using the formulae $$\begin{aligned} & x = \log _ { 10 } m \\ & y = \log _ { 10 } h \end{aligned}$$ The product moment correlation coefficient between \(x\) and \(y\) is - 0.897
  2. Test whether or not there is significant evidence of a negative correlation between \(x\) and \(y\) You should
    The equation of the line of best fit of \(y\) on \(x\) is $$y = - 0.05 x + 1.92$$
  3. Use the equation of the line of best fit of \(y\) on \(x\) to find a model for \(h\) on \(m\) in the form $$h = a m ^ { k }$$ where \(a\) and \(k\) are constants to be found.
Edexcel Paper 3 2024 June Q2
6 marks Moderate -0.3
  1. Amar is studying the flight of a bird from its nest.
He measures the bird's height above the ground, \(h\) metres, at time \(t\) seconds for 10 values of \(t\) Amar finds the equation of the regression line for the data to be \(h = 38.6 - 1.28 t\)
  1. Interpret the gradient of this line. The product moment correlation coefficient between \(h\) and \(t\) is - 0.510
  2. Test whether or not there is evidence of a negative correlation between the height above the ground and the time during the flight.
    You should
    • state your hypotheses clearly
    • use a \(5 \%\) level of significance
    • state the critical value used
    Jane draws the following scatter diagram for Amar's data. \includegraphics[max width=\textwidth, alt={}, center]{ab7f7951-e6fe-4853-bb69-8016cf3e796c-06_1024_1033_1135_516}
  3. With reference to the scatter diagram, state, giving a reason, whether or not the regression line \(h = 38.6 - 1.28 t\) is an appropriate model for these data. Jane suggests an improved model using the variable \(u = ( t - k ) ^ { 2 }\) where \(k\) is a constant.
    She obtains the equation \(h = 38.1 - 0.78 u\)
  4. Choose a suitable value for \(k\) to write Jane's improved model for \(h\) in terms of \(t\) only.
Edexcel Paper 3 2020 October Q2
7 marks Moderate -0.8
  1. A random sample of 15 days is taken from the large data set for Perth in June and July 1987. The scatter diagram in Figure 1 displays the values of two of the variables for these 15 days.
\begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{2b63aa7f-bc50-4422-8dc0-e661b521c221-04_722_709_376_677} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure}
  1. Describe the correlation. The variable on the \(x\)-axis is Daily Mean Temperature measured in \({ } ^ { \circ } \mathrm { C }\).
  2. Using your knowledge of the large data set,
    1. suggest which variable is on the \(y\)-axis,
    2. state the units that are used in the large data set for this variable. Stav believes that there is a correlation between Daily Total Sunshine and Daily Maximum Relative Humidity at Heathrow. He calculates the product moment correlation coefficient between these two variables for a random sample of 30 days and obtains \(r = - 0.377\)
  3. Carry out a suitable test to investigate Stav's belief at a \(5 \%\) level of significance. State clearly
    • your hypotheses
    • your critical value
    On a random day at Heathrow the Daily Maximum Relative Humidity was 97\%
  4. Comment on the number of hours of sunshine you would expect on that day, giving a reason for your answer.
Edexcel Paper 3 2021 October Q2
6 marks Standard +0.3
  1. Marc took a random sample of 16 students from a school and for each student recorded
  • the number of letters, \(x\), in their last name
  • the number of letters, \(y\), in their first name
His results are shown in the scatter diagram on the next page.
  1. Describe the correlation between \(x\) and \(y\). Marc suggests that parents with long last names tend to give their children shorter first names.
  2. Using the scatter diagram comment on Marc's suggestion, giving a reason for your answer. The results from Marc's random sample of 16 observations are given in the table below.
    \(x\)368753113454971066
    \(y\)7744685584745563
  3. Use your calculator to find the product moment correlation coefficient between \(x\) and \(y\) for these data.
  4. Test whether or not there is evidence of a negative correlation between the number of letters in the last name and the number of letters in the first name. You should
    • state your hypotheses clearly
    • use a \(5 \%\) level of significance
    \section*{Question 2 continued.}
    \includegraphics[max width=\textwidth, alt={}]{10736735-3050-43eb-9e76-011ca6fa48b8-05_1125_1337_294_372}
    \section*{Question 2 continued.} \section*{Question 2 continued.}
OCR PURE Q8
4 marks Easy -1.8
8 A random sample of 10 students from a college was chosen. They were asked how much time, \(x\) hours, they spent studying, and how much money, \(\pounds y\), they earned, in a typical week during term time. The results are shown in the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{c4cc2cd8-46bf-448f-b223-92378984bfde-5_544_741_555_242}
  1. Comment on the relationship shown by the diagram between hours spent studying and money earned, during term time, by these 10 students. The coordinates of the points in the diagram are \(( 18,23 ) , ( 20,21 ) , ( 23,20 ) , ( 25,19 ) , ( 25,21 )\), \(( 27,18 ) , ( 32,16 ) , ( 38,17 ) , ( 40,16 )\) and \(( 41,23 )\).
  2. Find the mean and standard deviation of the number of hours spent per week studying during term time by these 10 students.
OCR MEI AS Paper 2 2024 June Q5
3 marks Easy -1.8
5 The pre-release material contains information for countries in the world concerning real GDP per capita in US\$ and mobile phone subscribers per 100 population. In an investigation into the relationship between these two variables, a student takes a sample of 20 countries in Africa. The student draws a scatter diagram for the data, which is shown in Fig. 5.1. \section*{Fig. 5.1} \section*{Africa 1st sample} \includegraphics[max width=\textwidth, alt={}, center]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-4_433_1043_842_244}
  1. What does Fig. 5.1 suggest about the relationship between real GDP per capita and the number of mobile phone subscribers per 100 population? Another student collects a different sample of 20 countries from Africa, and draws a scatter diagram for the data, which is shown in Fig. 5.2. \section*{Fig. 5.2} \section*{Africa 2nd sample}
    \includegraphics[max width=\textwidth, alt={}]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-4_273_1084_1818_244}
    Mobile phone subscribers per 100 population
  2. What does Fig. 5.2 suggest about the relationship between real GDP per capita and the number of mobile phone subscribers per 100 population?
  3. Explain whether either of the two scatter diagrams is likely to be representative of the true relationship between real GDP per capita and the number of mobile phone subscribers per 100 population, for countries in Africa.
OCR MEI AS Paper 2 2020 November Q10
9 marks Moderate -0.8
10 Fig. 10.1 shows a sample collected from the large data set. BMI is defined as \(\frac { \text { mass of person in kilograms } } { \text { square of person's height in metres } }\). \begin{table}[h]
SexAge in yearsMass in kgHeight in cmBMI
Male3877.6164.828.57
Male1763.5170.321.89
Male1868.0172.322.91
Male1857.2172.219.29
Male1977.6191.221.23
Male2472.7177.023.21
Male2592.5177.929.23
Male2670.4159.427.71
Male3177.5174.025.60
Male34132.4182.239.88
Male38115.0186.433.10
Male40112.1171.738.02
\captionsetup{labelformat=empty} \caption{Fig. 10.1}
\end{table}
  1. Calculate the mass in kg of a person with a BMI of 23.56 and a height of 181.6 cm , giving your answer correct to 1 decimal place. Fig. 10.2 shows a scatter diagram of BMI against age for the data in the table. A line of best fit has also been drawn. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c08a2212-3104-425e-8aee-7f2d46f23924-09_682_1212_351_248} \captionsetup{labelformat=empty} \caption{Fig. 10.2}
    \end{figure}
  2. Describe the correlation between age and BMI.
  3. Use the line of best fit to estimate the BMI of a 30-year-old man.
  4. Explain why it would not be sensible to use the line of best fit to estimate the BMI of a 60-year-old man.
  5. Use your knowledge of the large data set to suggest two reasons why the sample data in the table may not be representative of the population.
  6. Once the data in the large data set had been cleaned there were 196 values available for selection. Describe how a sample of size 12 could be generated using systematic sampling so that each of the 196 values could be selected in the sample.
OCR MEI Paper 2 2024 June Q11
5 marks Moderate -0.8
11 A householder is investigating whether there is any relationship between his monthly cost of gas and his monthly cost of electricity, both measured in pounds ( \(\pounds\) ). The householder collects a random sample of monthly costs and presents them in the scatter diagram below. \includegraphics[max width=\textwidth, alt={}, center]{8e48bbd3-2166-49e7-8906-833261f331ca-08_604_1452_392_244} One of the points on the diagram represents the energy costs in a month when the householder was away on holiday for three weeks. The other points represent the energy costs in months when the householder did not go away on holiday.
  1. On the copy of the diagram in the Printed Answer Booklet, circle the point which represents the month when the householder was most likely to have been away on holiday for three weeks.
  2. With reference to the diagram, describe the relationship between the cost of gas and the cost of electricity. The householder decides to test whether there is evidence to suggest that there is any association between the monthly cost of gas and the monthly cost of electricity. The value of Spearman's rank correlation coefficient for this sample is 0.4359 and the associated \(p\)-value is 0.09195 .
  3. Determine whether there is any evidence to suggest, at the \(5 \%\) level, that there is any association between the monthly cost of gas and the monthly cost of electricity.
Edexcel S1 2005 January Q3
15 marks Easy -1.3
3. The following table shows the height \(x\), to the nearest cm , and the weight \(y\), to the nearest kg , of a random sample of 12 students.
\(x\)148164156172147184162155182165175152
\(y\)395956774477654980727052
  1. On graph paper, draw a scatter diagram to represent these data.
  2. Write down, with a reason, whether the correlation coefficient between \(x\) and \(y\) is positive or negative. The data in the table can be summarised as follows. $$\Sigma x = 1962 , \quad \Sigma y = 740 , \quad \Sigma y ^ { 2 } = 47746 , \quad \Sigma x y = 122783 , \quad S _ { x x } = 1745 .$$
  3. Find \(S _ { x y }\). The equation of the regression line of \(y\) on \(x\) is \(y = - 106.331 + b x\).
  4. Find, to 3 decimal places, the value of \(b\).
  5. Find, to 3 significant figures, the mean \(\bar { y }\) and the standard deviation \(s\) of the weights of this sample of students.
  6. Find the values of \(\bar { y } \pm 1.96 s\).
  7. Comment on whether or not you think that the weights of these students could be modelled by a normal distribution.
Edexcel S1 2012 January Q5
15 marks Moderate -0.8
  1. The age, \(t\) years, and weight, \(w\) grams, of each of 10 coins were recorded. These data are summarised below.
$$\sum t ^ { 2 } = 2688 \quad \sum t w = 1760.62 \quad \sum t = 158 \quad \sum w = 111.75 \quad S _ { w w } = 0.16$$
  1. Find \(S _ { t t }\) and \(S _ { t w }\) for these data.
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(t\) and \(w\).
  3. Find the equation of the regression line of \(w\) on \(t\) in the form \(w = a + b t\)
  4. State, with a reason, which variable is the explanatory variable.
  5. Using this model, estimate
    1. the weight of a coin which is 5 years old,
    2. the effect of an increase of 4 years in age on the weight of a coin. It was discovered that a coin in the original sample, which was 5 years old and weighed 20 grams, was a fake.
  6. State, without any further calculations, whether the exclusion of this coin would increase or decrease the value of the product moment correlation coefficient. Give a reason for your answer.
Edexcel S1 2013 January Q1
7 marks Easy -1.2
  1. A teacher asked a random sample of 10 students to record the number of hours of television, \(t\), they watched in the week before their mock exam. She then calculated their grade, \(g\), in their mock exam. The results are summarised as follows.
$$\sum t = 258 \quad \sum t ^ { 2 } = 8702 \quad \sum g = 63.6 \quad \mathrm {~S} _ { g g } = 7.864 \quad \sum g t = 1550.2$$
  1. Find \(\mathrm { S } _ { t t }\) and \(\mathrm { S } _ { g t }\)
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(t\) and \(g\). The teacher also recorded the number of hours of revision, \(v\), these 10 students completed during the week before their mock exam. The correlation coefficient between \(t\) and \(v\) was -0.753
  3. Describe, giving a reason, the nature of the correlation you would expect to find between \(v\) and \(g\).
Edexcel S1 2013 January Q3
10 marks Moderate -0.8
3. A biologist is comparing the intervals ( \(m\) seconds) between the mating calls of a certain species of tree frog and the surrounding temperature ( \(t { } ^ { \circ } \mathrm { C }\) ). The following results were obtained.
\(t { } ^ { \circ } \mathrm { C }\)813141515202530
\(m\) secs6.54.5654321
$$\text { (You may use } \sum t m = 469.5 , \quad \mathrm {~S} _ { t t } = 354 , \quad \mathrm {~S} _ { m m } = 25.5 \text { ) }$$
  1. Show that \(\mathrm { S } _ { t m } = - 90.5\)
  2. Find the equation of the regression line of \(m\) on \(t\) giving your answer in the form \(m = a + b t\).
  3. Use your regression line to estimate the time interval between mating calls when the surrounding temperature is \(10 ^ { \circ } \mathrm { C }\).
  4. Comment on the reliability of this estimate, giving a reason for your answer.
Edexcel S1 2001 June Q2
5 marks Easy -1.2
2. On a particular day in summer 1993 at 0800 hours the height above sea level, \(x\) metres, and the temperature, \(y ^ { \circ } \mathrm { C }\), were recorded in 10 Mediterranean towns. The following summary statistics were calculated from the results. $$\Sigma x = 7300 , \Sigma x ^ { 2 } = 6599600 , S _ { x y } = - 13060 , S _ { y y } = 140.9 .$$
  1. Find \(S _ { x x }\).
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(x\) and \(y\).
  3. Give an interpretation of your coefficient.
Edexcel S1 2002 June Q7
16 marks Moderate -0.8
7. An ice cream seller believes that there is a relationship between the temperature on a summer day and the number of ice creams sold. Over a period of 10 days he records the temperature at 1 p.m., \(t ^ { \circ } \mathrm { C }\), and the number of ice creams sold, \(c\), in the next hour. The data he collects is summarised in the table below.
\(t\)\(c\)
1324
2255
1735
2045
1020
1530
1939
1219
1836
2354
[Use \(\left. \Sigma t ^ { 2 } = 3025 , \Sigma c ^ { 2 } = 14245 , \Sigma c t = 6526 .\right]\)
  1. Calculate the value of the product moment correlation coefficient between \(t\) and \(c\).
  2. State whether or not your value supports the use of a regression equation to predict the number of ice creams sold. Give a reason for your answer.
  3. Find the equation of the least squares regression line of \(c\) on \(t\) in the form \(c = a + b t\).
  4. Interpret the value of \(b\).
  5. Estimate the number of ice creams sold between 1 p.m. and 2 p.m. when the temperature at 1 p.m. is \(16 ^ { \circ } \mathrm { C }\).
    (3)
  6. At 1 p.m. on a particular day, the highest temperature for 50 years was recorded. Give a reason why you should not use the regression equation to predict ice cream sales on that day.
    (1)
Edexcel S1 2005 June Q1
6 marks Easy -1.2
  1. The scatter diagrams below were drawn by a student.
$$\begin{aligned} & y \underset { x } { \begin{array} { l l l l } & & \\ + & & & \\ + & + & + & \\ + & + & + \end{array} } \end{aligned}$$ The student calculated the value of the product moment correlation coefficient for each of the sets of data. The values were $$\begin{array} { l l l } 0.68 & - 0.79 & 0.08 \end{array}$$ Write down, with a reason, which value corresponds to which scatter diagram.
(6)
AQA S1 2006 January Q5
11 marks Easy -1.2
5 [Figure 1, printed on the insert, is provided for use in this question.]
The table shows the times, in seconds, taken by a random sample of 10 boys from a junior swimming club to swim 50 metres freestyle and 50 metres backstroke.
BoyABCDEFGHIJ
Freestyle ( \(\boldsymbol { x }\) seconds)30.232.825.131.831.235.632.438.036.134.1
Backstroke ( \(y\) seconds)33.535.437.427.234.738.237.741.442.338.4
  1. On Figure 1, complete the scatter diagram for these data.
  2. Hence:
    1. give two distinct comments on what your scatter diagram reveals;
    2. state, without calculation, which of the following 3 values is most likely to be the value of the product moment correlation coefficient for the data in your scatter diagram. $$0.912 \quad 0.088 \quad 0.462$$
  3. In the sample of 10 boys, one boy is a junior-champion freestyle swimmer and one boy is a junior-champion backstroke swimmer. Identify the two most likely boys.
  4. Removing the data for the two boys whom you identified in part (c):
    1. calculate the value of the product moment correlation coefficient for the remaining 8 pairs of values of \(x\) and \(y\);
    2. comment, in context, on the value that you obtain.
Edexcel S1 Q6
17 marks Moderate -0.8
6. A school introduced a new programme of support lessons in 1994 with a view to improving grades in GCSE English. The table below shows the number of years since 1994, n, and the corresponding percentage of students achieving A to C grades in GCSE English, \(p\), for each year.
\(n\)123456
\(p ( \% )\)35.237.140.639.043.444.8
  1. Represent these data on a scatter diagram. You may use the following values. $$\Sigma n = 21 , \quad \Sigma p = 240.1 , \quad \Sigma n ^ { 2 } = 91 , \quad \Sigma p ^ { 2 } = 9675.41 , \quad \Sigma n p = 873 .$$
  2. Find an equation of the regression line of \(p\) on \(n\) and draw it on your graph.
  3. Calculate the product moment correlation coefficient for these data and comment on the suitability of a linear model for the relationship between \(n\) and \(p\) during this period.