5.08a Pearson correlation: calculate pmcc

246 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2007 January Q2
6 marks Moderate -0.8
2 The table contains data concerning five households selected at random from a certain town.
Number of people in the household23357
Number of cars belonging to people in the household11324
  1. Calculate the product moment correlation coefficient, \(r\), for the data in the table.
  2. Give a reason why it would not be sensible to use your answer to draw a conclusion about all the households in the town.
OCR S1 2007 January Q5
8 marks Moderate -0.8
5 A chemical solution was gradually heated. At five-minute intervals the time, \(x\) minutes, and the temperature, \(y ^ { \circ } \mathrm { C }\), were noted.
\(x\)05101520253035
\(y\)0.83.06.810.915.619.623.426.7
$$\left[ n = 8 , \Sigma x = 140 , \Sigma y = 106.8 , \Sigma x ^ { 2 } = 3500 , \Sigma y ^ { 2 } = 2062.66 , \Sigma x y = 2685.0 . \right]$$
  1. Calculate the equation of the regression line of \(y\) on \(x\).
  2. Use your equation to estimate the temperature after 12 minutes.
  3. It is given that the value of the product moment correlation coefficient is close to + 1 . Comment on the reliability of using your equation to estimate \(y\) when
    1. \(x = 17\),
    2. \(x = 57\).
OCR S1 2008 January Q3
6 marks Standard +0.3
3 A sample of bivariate data was taken and the results were summarised as follows. $$n = 5 \quad \Sigma x = 24 \quad \Sigma x ^ { 2 } = 130 \quad \Sigma y = 39 \quad \Sigma y ^ { 2 } = 361 \quad \Sigma x y = 212$$
  1. Show that the value of the product moment correlation coefficient \(r\) is 0.855 , correct to 3 significant figures.
  2. The ranks of the data were found. One student calculated Spearman's rank correlation coefficient \(r _ { s }\), and found that \(r _ { s } = 0.7\). Another student calculated the product moment coefficient, \(R\), of these ranks. State which one of the following statements is true, and explain your answer briefly.
    (A) \(R = 0.855\) (B) \(R = 0.7\) (C) It is impossible to give the value of \(R\) without carrying out a calculation using the original data.
  3. All the values of \(x\) are now multiplied by a scaling factor of 2 . State the new values of \(r\) and \(r _ { s }\).
OCR S1 2008 January Q9
11 marks Moderate -0.8
9 It is thought that the pH value of sand (a measure of the sand's acidity) may affect the extent to which a particular species of plant will grow in that sand. A botanist wished to determine whether there was any correlation between the pH value of the sand on certain sand dunes, and the amount of each of two plant species growing there. She chose random sections of equal area on each of eight sand dunes and measured the pH values. She then measured the area within each section that was covered by each of the two species. The results were as follows.
\cline { 2 - 10 } \multicolumn{1}{c|}{}Dune\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\cline { 2 - 10 } \multicolumn{1}{c|}{}pH value, \(x\)8.58.59.58.56.57.58.59.0
\multirow{2}{*}{
Area, \(y \mathrm {~cm} ^ { 2 }\)
covered
}
Species \(P\)1501505753304515340330
\cline { 2 - 10 }Species \(Q\)1701580230752500
The results for species \(P\) can be summarised by $$n = 8 , \quad \Sigma x = 66.5 , \quad \Sigma x ^ { 2 } = 558.75 , \quad \Sigma y = 1935 , \quad \Sigma y ^ { 2 } = 711275 , \quad \Sigma x y = 17082.5 .$$
  1. Give a reason why it might be appropriate to calculate the equation of the regression line of \(y\) on \(x\) rather than \(x\) on \(y\) in this situation.
  2. Calculate the equation of the regression line of \(y\) on \(x\) for species \(P\), in the form \(y = a + b x\), giving the values of \(a\) and \(b\) correct to 3 significant figures.
  3. Estimate the value of \(y\) for species \(P\) on sand where the pH value is 7.0 . The values of the product moment correlation coefficient between \(x\) and \(y\) for species \(P\) and \(Q\) are \(r _ { P } = 0.828\) and \(r _ { Q } = 0.0302\).
  4. Describe the relationship between the area covered by species \(Q\) and the pH value.
  5. State, with a reason, whether the regression line of \(y\) on \(x\) for species \(P\) will provide a reliable estimate of the value of \(y\) when the pH value is
    1. 8,
    2. 4 .
    3. Assume that the equation of the regression line of \(y\) on \(x\) for species \(Q\) is also known. State, with a reason, whether this line will provide a reliable estimate of the value of \(y\) when the pH value is 8 .
OCR S1 2005 June Q4
9 marks Moderate -0.3
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54 . State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate.
OCR S1 2006 June Q1
6 marks Moderate -0.8
1 Some observations of bivariate data were made and the equations of the two regression lines were found to be as follows. $$\begin{array} { c c } y \text { on } x : & y = - 0.6 x + 13.0 \\ x \text { on } y : & x = - 1.6 y + 21.0 \end{array}$$
  1. State, with a reason, whether the correlation between \(x\) and \(y\) is negative or positive.
  2. Neither variable is controlled. Calculate an estimate of the value of \(x\) when \(y = 7.0\).
  3. Find the values of \(\bar { x }\) and \(\bar { y }\).
OCR S1 2007 June Q6
12 marks Moderate -0.3
6 A machine with artificial intelligence is designed to improve its efficiency rating with practice. The table shows the values of the efficiency rating, y , after the machine has carried out its task various numbers of times, \(x\)
x0123471330
y0481011121314
$$\left[ n = 8 , \Sigma x = 60 , \Sigma y = 72 , \Sigma x ^ { 2 } = 1148 , \Sigma y ^ { 2 } = 810 , \Sigma x y = 767 . \right]$$ These data are illustrated in the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{dfad6626-75ca-4dbd-9c45-42f809c163f3-4_769_1328_760_411}
  1. (a) Calculate the value of r , the product moment correlation coefficient.
    (b) Without calculation, state with a reason the value of \(\mathrm { r } _ { \mathrm { s } ^ { \prime } }\) Spearman's rank correlation coefficient.
  2. A researcher suggests that the data for \(\mathrm { x } = 0\) and \(\mathrm { x } = 1\) should be ignored. Without cal culation, state with a reason what effect this would have on the value of
    (a) \(r\),
    (b) \(r _ { s }\).
  3. Use the diagram to estimate the value of y when \(\mathrm { x } = 29\).
  4. Jack finds the equation of the regression line of y on xf for all the data, and uses it to estimate the value of \(y\) when \(x = 29\). Without calculation, state with a reason whether this estimate or the one found in part (iii) will be the more reliable.
OCR S1 2016 June Q2
10 marks Moderate -0.3
2
  1. The table shows the amount, \(x\), in hundreds of pounds, spent on heating and the number of absences, \(y\), at a factory during each month in 2014.
    Amount, \(x\), spent on
    heating (£ hundreds)
    212319151452109201823
    Number of absences, \(y\)2325181812104911152026
    \(n = 12 \quad \Sigma x = 179 \quad \Sigma x ^ { 2 } = 3215 \quad \Sigma y = 191 \quad \Sigma y ^ { 2 } = 3565 \quad \Sigma x y = 3343\)
    1. Calculate \(r\), the product moment correlation coefficient, showing that \(r > 0.92\).
    2. A manager says, 'The value of \(r\) shows that spending more money on heating causes more absences, so we should spend less on heating.' Comment on this claim.
    3. The months in 2014 were numbered \(1,2,3 , \ldots , 12\). The output, \(z\), in suitable units was recorded along with the month number, \(n\), for each month in 2014. The equation of the regression line of \(z\) on \(n\) was found to be \(z = 0.6 n + 17\).
      (a) Use this equation to explain whether output generally increased or decreased over these months.
      (b) Find the mean of \(n\) and use the equation of the regression line to calculate the mean of \(z\).
    4. Hence calculate the total output in 2014.
OCR S1 Specimen Q8
13 marks Moderate -0.8
8 An experiment was conducted to see whether there was any relationship between the maximum tidal current, \(y \mathrm {~cm} \mathrm {~s} ^ { - 1 }\), and the tidal range, \(x\) metres, at a particular marine location. [The tidal range is the difference between the height of high tide and the height of low tide.] Readings were taken over a period of 12 days, and the results are shown in the following table.
\(x\)2.02.43.03.13.43.73.83.94.04.54.64.9
\(y\)15.222.025.233.033.134.251.042.345.050.761.059.2
$$\left[ \Sigma x = 43.3 , \Sigma y = 471.9 , \Sigma x ^ { 2 } = 164.69 , \Sigma y ^ { 2 } = 20915.75 , \Sigma x y = 1837.78 . \right]$$ The scatter diagram below illustrates the data. \includegraphics[max width=\textwidth, alt={}, center]{2fb25fc5-0445-44fa-a23e-647d14b1a376-4_462_793_1464_644}
  1. Calculate the product moment correlation coefficient for the data, and comment briefly on your answer with reference to the appearance of the scatter diagram.
  2. Calculate the equation of the regression line of maximum tidal current on tidal range.
  3. Estimate the maximum tidal current on a day when the tidal range is 4.2 m , and comment briefly on how reliable you consider your estimate is likely to be.
  4. It is suggested that the equation found in part (ii) could be used to predict the maximum tidal current on a day when the tidal range is 15 m . Comment briefly on the validity of this suggestion.
OCR MEI S2 2006 June Q3
18 marks Standard +0.3
3 A student is investigating the relationship between the length \(x \mathrm {~mm}\) and circumference \(y \mathrm {~mm}\) of plums from a large crop. The student measures the dimensions of a random sample of 10 plums from this crop. Summary statistics for these dimensions are as follows. $$\begin{aligned} & \sum x = 4715 \quad \sum y = 13175 \quad \sum x ^ { 2 } = 2237725 \\ & \sum y ^ { 2 } = 17455825 \quad \sum x y = 6235575 \quad n = 10 \end{aligned}$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between length and circumference of plums from this crop. State your hypotheses clearly, defining any symbols which you use.
  3. (A) Explain the meaning of a 5\% significance level.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(5 \%\) significance level in a hypothesis test. The student decides to take another random sample of 10 plums. Using the same hypotheses as in part (ii), the correlation coefficient for this second sample is significant at the \(5 \%\) level. The student decides to ignore the first result and concludes that there is correlation between the length and circumference of plums in the crop.
  4. Comment on the student's decision to ignore the first result. Suggest a better way in which the student could proceed.
OCR MEI S2 2007 June Q2
19 marks Standard +0.3
2 A medical student is trying to estimate the birth weight of babies using pre-natal scan images. The actual weights, \(x \mathrm {~kg}\), and the estimated weights, \(y \mathrm {~kg}\), of ten randomly selected babies are given in the table below.
\(x\)2.612.732.872.963.053.143.173.243.764.10
\(y\)3.22.63.53.12.82.73.43.34.44.1
  1. Calculate the value of Spearman's rank correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) level to determine whether there is positive association between the student's estimates and the actual birth weights of babies in the underlying population.
  3. Calculate the value of the product moment correlation coefficient of the sample. You may use the following summary statistics in your calculations: $$\Sigma x = 31.63 , \quad \Sigma y = 33.1 , \quad \Sigma x ^ { 2 } = 101.92 , \quad \Sigma y ^ { 2 } = 112.61 , \quad \Sigma x y = 106.51 .$$
  4. Explain why, if the underlying population has a bivariate Normal distribution, it would be preferable to carry out a hypothesis test based on the product moment correlation coefficient. Comment briefly on the significance of the product moment correlation coefficient in relation to that of Spearman's rank correlation coefficient.
OCR MEI S2 2008 June Q1
18 marks Standard +0.3
1 A researcher believes that there is a negative correlation between money spent by the government on education and population growth in various countries. A random sample of 48 countries is selected to investigate this belief. The level of government spending on education \(x\), measured in suitable units, and the annual percentage population growth rate \(y\), are recorded for these countries. Summary statistics for these data are as follows. $$\Sigma x = 781.3 \quad \Sigma y = 57.8 \quad \Sigma x ^ { 2 } = 14055 \quad \Sigma y ^ { 2 } = 106.3 \quad \Sigma x y = 880.1 \quad n = 48$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the researcher's belief. State your hypotheses clearly, defining any symbols which you use.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A student suggests that if the variables are negatively correlated then population growth rates can be reduced by increasing spending on education. Explain why the student may be wrong. Discuss an alternative explanation for the correlation.
  5. State briefly one advantage and one disadvantage of using a smaller sample size in this investigation.
Edexcel S1 2016 January Q3
15 marks Moderate -0.3
3. A publisher collects information about the amount spent on advertising, \(\pounds x\), and the sales, \(y\) books, for some of her publications. She collects information for a random sample of 8 textbooks and codes the data using \(v = \frac { x + 50 } { 200 }\) and \(s = \frac { y } { 1000 }\) to give
\(v\)0.608.104.300.401.606.402.505.10
\(s\)1.846.735.951.302.457.464.826.25
[You may use: \(\sum v = 29 \sum s = 36.8 \sum s ^ { 2 } = 209.72 \sum v s = 177.311 \quad \mathrm {~S} _ { v v } = 55.275\) ]
  1. Find \(\mathrm { S } _ { v s }\) and \(\mathrm { S } _ { s s }\)
  2. Calculate the product moment correlation coefficient for these data. The publisher believes that a linear regression model may be appropriate to describe these data.
  3. State, giving a reason, whether or not your answer to part (b) supports the publisher's belief.
  4. Find the equation of the regression line of \(s\) on \(v\), giving your answer in the form \(s = a + b v\)
  5. Hence find the equation of the regression line of \(y\) on \(x\) for the sample of textbooks, giving your answer in the form \(y = c + d x\) The publisher calculated the regression line for a sample of novels and obtained the equation $$y = 3100 + 1.2 x$$ She wants to increase the sales of books by spending more money on advertising.
  6. State, giving your reasons, whether the publisher should spend more money on advertising textbooks or novels.
Edexcel S1 2017 January Q3
17 marks Moderate -0.3
  1. A scientist measured the salinity of water, \(x \mathrm {~g} / \mathrm { kg }\), and recorded the temperature at which the water froze, \(y ^ { \circ } \mathrm { C }\), for 12 different water samples. The summary statistics are listed below.
$$\begin{gathered} \sum x = 504 \quad \sum y = - 27 \quad \sum x ^ { 2 } = 22842 \quad \sum y ^ { 2 } = 62.98 \\ \sum x y = - 1190.7 \quad \mathrm {~S} _ { x x } = 1674 \quad \mathrm {~S} _ { y y } = 2.23 \end{gathered}$$
  1. Find the mean and variance of the recorded temperatures.
    (3) Priya believes that the higher the salinity of water, the higher the temperature at which the water freezes.
    1. Calculate the product moment correlation coefficient between \(x\) and \(y\)
    2. State, with a reason, whether or not this value supports Priya's belief.
  2. Find the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  3. Estimate the temperature at which water freezes when the salinity is \(32 \mathrm {~g} / \mathrm { kg }\) The coding \(w = 1.8 y + 32\) is used to convert the recorded temperatures from \({ } ^ { \circ } \mathrm { C }\) to \({ } ^ { \circ } \mathrm { F }\)
  4. Find an equation of the least squares regression line of \(w\) on \(x\) in the form \(w = c + d x\)
  5. Find
    1. the variance of the recorded temperatures when converted to \({ } ^ { \circ } \mathrm { F }\)
    2. the product moment correlation coefficient between \(w\) and \(x\) \href{http://PhysicsAndMathsTutor.com}{PhysicsAndMathsTutor.com}
Edexcel S1 2018 January Q3
8 marks Moderate -0.8
3. Martin is investigating the relationship between a person's daily caffeine consumption, \(c\) milligrams, and the amount of sleep they get, \(h\) hours, per night. He collected this information from 20 people and the results are summarised below. $$\begin{array} { c c } \sum c = 3660 \quad \sum h = 126 \quad \sum c ^ { 2 } = 973228 \\ \sum c h = 20023.4 \quad S _ { c c } = 303448 \quad S _ { c h } = - 3034.6 \end{array}$$ Martin calculates the product moment correlation coefficient for these data and obtains - 0.833
  1. Give a reason why this value supports a linear relationship between \(c\) and \(h\) The amount of sleep per night is the response variable.
  2. Explain what you understand by the term 'response variable'. Martin says that for each additional 100 mg of caffeine consumed, the expected number of hours of sleep decreases by 1
  3. Determine, by calculation, whether or not the data support this statement.
  4. Use the data to calculate an estimate for the expected number of hours of sleep per night when no caffeine is consumed.
Edexcel S1 2018 January Q5
12 marks Moderate -0.3
5. Franca is the manager of an accountancy firm. She is investigating the relationship between the salary, \(\pounds x\), and the length of commute, \(y\) minutes, for employees at the firm. She collected this information from 9 randomly selected employees. The salary of each employee was then coded using \(w = \frac { x - 20000 } { 1000 }\) The table shows the values of \(w\) and \(y\) for the 9 employees.
\(w\)688- 125153- 219
\(y\)455035652540507520
(You may use \(\sum w = 81 \quad \sum y = 405 \quad \sum w y = 2490 \quad S _ { w w } = 660 \quad S _ { y y } = 2500\) )
  1. Calculate the salary of the employee with \(w = - 2\)
  2. Show that, to 3 significant figures, the value of the product moment correlation coefficient between \(w\) and \(y\) is - 0.899
  3. State, giving a reason, the value of the product moment correlation coefficient between \(x\) and \(y\) The least squares regression line of \(y\) on \(w\) is \(y = 60.75 - 1.75 w\)
  4. Find the equation of the least squares regression line of \(y\) on \(x\) giving your answer in the form \(y = a + b x\)
  5. Estimate the length of commute for an employee with a salary of \(\pounds 21000\) Franca uses the regression line to estimate the length of commute for employees with salaries between \(\pounds 25000\) and \(\pounds 40000\)
  6. State, giving a reason, whether or not these estimates are reliable.
Edexcel S1 2019 January Q6
18 marks Moderate -0.3
  1. Following some school examinations, Chetna is studying the results of the 16 students in her class. The mark for paper \(1 , x\), and the mark for paper \(2 , y\), for each student are summarised in the following statistics.
$$\bar { x } = 35.75 \quad \bar { y } = 25.75 \quad \sigma _ { x } = 7.79 \quad \sigma _ { y } = 11.91 \quad \sum x y = 15837$$
  1. Comment on the differences between the marks of the students on paper 1 and paper 2 Chetna decides to examine these data in more detail and plots the marks for each of the 16 students on the scatter diagram opposite.
    1. Explain why the circled point \(( 38,0 )\) is possibly an outlier.
    2. Suggest a possible reason for this result. Chetna decides to omit the data point \(( 38,0 )\) and examine the other 15 students' marks.
  2. Find the value of \(\bar { x }\) and the value of \(\bar { y }\) for these 15 students. For these 15 students
    1. explain why \(\sum x y\) is still 15837
    2. show that \(\mathrm { S } _ { x y } = 1169.8\) For these 15 students, Chetna calculates \(\mathrm { S } _ { x x } = 965.6\) and \(\mathrm { S } _ { y y } = 1561.7\) correct to 1 decimal place.
  3. Calculate the product moment correlation coefficient for these 15 students.
  4. Calculate the equation of the line of regression of \(y\) on \(x\) for these 15 students, giving your answer in the form \(y = a + b x\) The product moment correlation coefficient between \(x\) and \(y\) for all 16 students is 0.746
  5. Explain how your calculation in part (e) supports Chetna's decision to omit the point \(( 38,0 )\) before calculating the equation of the linear regression line.
    (1)
  6. Estimate the mark in the second paper for a student who scored 38 marks in the first paper.
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-17_1127_1146_301_406}
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-20_2630_1828_121_121}
Edexcel S1 2021 January Q5
17 marks Moderate -0.8
  1. A company director wants to introduce a performance-related pay structure for her managers. A random sample of 15 managers is taken and the annual salary, \(y\) in \(\pounds 1000\), was recorded for each manager. The director then calculated a performance score, \(x\), for each of these managers.
    The results are shown on the scatter diagram in Figure 1 on the next page.
    1. Describe the correlation between performance score and annual salary.
    The results are also summarised in the following statistics. $$\sum x = 465 \quad \sum y = 562 \quad \mathrm {~S} _ { x x } = 2492 \quad \sum y ^ { 2 } = 23140 \quad \sum x y = 19428$$
    1. Show that \(\mathrm { S } _ { x y } = 2006\)
    2. Find \(\mathrm { S } _ { y y }\)
  2. Find the product moment correlation coefficient between performance score and annual salary. The director believes that there is a linear relationship between performance score and annual salary.
  3. State, giving a reason, whether or not these data are consistent with the director's belief.
  4. Calculate the equation of the regression line of \(y\) on \(x\), in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  5. Give an interpretation of the value of \(b\).
  6. Plot your regression line on the scatter diagram in Figure 1 The director hears that one of the managers in the sample seems to be underperforming.
  7. On the scatter diagram, circle the point that best identifies this manager. The director decides to use this regression line for the new performance related pay structure.
    1. Estimate, to 3 significant figures, the new salary of a manager with a performance score of 30 \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{4f034b9a-94c8-42f2-bd77-9adec277aba6-15_1390_1408_299_187} \captionsetup{labelformat=empty} \caption{Figure 1}
      \end{figure} \includegraphics[max width=\textwidth, alt={}, center]{4f034b9a-94c8-42f2-bd77-9adec277aba6-17_2654_99_115_9} Annual salary (£1000) \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{Only use this scatter diagram if you need to redraw your line.} \includegraphics[alt={},max width=\textwidth]{4f034b9a-94c8-42f2-bd77-9adec277aba6-17_1378_1143_402_468}
      \end{figure}
Edexcel S1 2023 January Q6
14 marks Moderate -0.3
  1. A research student is investigating the maximum weight, \(y\) grams, of sugar that will dissolve in 100 grams of water at various temperatures, \(x ^ { \circ } \mathrm { C }\), where \(10 \leqslant x \leqslant 80\)
The research student calculated the regression line of \(y\) on \(x\) and found it to be $$y = 151.2 + 2.72 x$$
  1. Give an interpretation of the gradient of the regression line.
  2. Use the regression line to estimate the maximum weight of sugar that will dissolve in 100 grams of water when the temperature is \(90 ^ { \circ } \mathrm { C }\).
  3. Comment on the reliability of your estimate, giving a reason for your answer. Using the regression line of \(y\) on \(x\) and the following summary statistics $$\sum y = 3119 \quad \sum y ^ { 2 } = 851093 \quad \sum x ^ { 2 } = 24500 \quad n = 12$$
  4. show that the product moment correlation coefficient for these data is 0.988 to 3 decimal places. The research student's supervisor plotted the original data on a scatter diagram, shown on page 23 With reference to both the scatter diagram and the correlation coefficient,
  5. discuss the suitability of a linear regression model to describe the relationship between \(x\) and \(y\).
    \includegraphics[max width=\textwidth, alt={}]{c316fa29-dedc-4890-bd82-31eb0bb819f9-23_990_1138_205_356}
Edexcel S1 2024 January Q2
12 marks Moderate -0.3
  1. The average minimum monthly temperature, \(x\) degrees Fahrenheit ( \({ } ^ { \circ } \mathrm { F }\) ), and the average maximum monthly temperature, \(y\) degrees Fahrenheit ( \({ } ^ { \circ } \mathrm { F }\) ), in Kolkata were recorded for 12 months.
Some of the summary statistics are given below. $$\sum x = 862 \quad \sum x ^ { 2 } = 62802 \quad \mathrm {~S} _ { y y } = 413.67 \quad S _ { x y } = 512.67 \quad n = 12$$
    1. Calculate the mean of the 12 values of the average minimum
      monthly temperature.
    2. Show that the standard deviation of the 12 values of the average minimum monthly temperature is \(8.57 ^ { \circ } \mathrm { F }\) to 3 significant figures.
  1. Calculate the product moment correlation coefficient between \(x\) and \(y\) For comparative purposes with a UK city, it was necessary to convert the temperatures from degrees Fahrenheit ( \({ } ^ { \circ } \mathrm { F }\) ) to degrees Celsius ( \({ } ^ { \circ } \mathrm { C }\) ). The formula used was $$c = \frac { 5 } { 9 } ( f - 32 )$$ where \(f\) is the temperature in \({ } ^ { \circ } \mathrm { F }\) and \(c\) is the temperature in \({ } ^ { \circ } \mathrm { C }\)
  2. Use this formula and the values from part (a) to calculate, in \({ } ^ { \circ } \mathrm { C }\), the mean and the standard deviation of the 12 values of the average minimum monthly temperature in Kolkata.
    Give your answers to 3 significant figures. Given that
    • \(u\) is the equivalent temperature in \({ } ^ { \circ } \mathrm { C }\) of \(x\)
    • \(\quad v\) is the equivalent temperature in \({ } ^ { \circ } \mathrm { C }\) of \(y\)
    • state, giving a reason, the product moment correlation coefficient between \(u\) and \(v\)
Edexcel S1 2014 June Q1
12 marks Moderate -0.8
  1. A medical researcher is studying the relationship between age ( \(x\) years) and volume of blood ( \(y \mathrm { ml }\) ) pumped by each contraction of the heart. The researcher obtained the following data from a random sample of 8 patients.
Age (x)2025304555606570
Volume (y)7476777268676462
[You may use \(\sum x = 370 , \mathrm {~S} _ { x x } = 2587.5 , \sum y = 560 , \sum y ^ { 2 } = 39418 , \mathrm {~S} _ { x y } = - 710\) ]
  1. Calculate \(\mathrm { S } _ { y y }\)
  2. Calculate the product moment correlation coefficient for these data.
  3. Interpret your value of the correlation coefficient. The researcher believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the researcher's belief.
  5. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\) Jack is a 40-year-old patient.
    1. Use your regression line to estimate the volume of blood pumped by each contraction of Jack's heart.
    2. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 2015 June Q7
6 marks Easy -1.8
7. A doctor is investigating the correlation between blood protein, \(p\), and body mass index, \(b\). He takes a random sample of 8 patients and the data are shown in the table below.
Patient\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(b\)3236404442212737
\(p\)1821313921121970
  1. Draw a scatter diagram of these data on the axes provided. \includegraphics[max width=\textwidth, alt={}, center]{36cf6341-1957-45b9-9f7d-0914506f5919-13_938_673_785_614} The doctor decides to leave out patient \(H\) from his calculations.
  2. Give a reason for the doctor's decision. For the 7 patients \(A , B , C , D , E , F\) and \(G\), $$S _ { b p } = 369 , \quad S _ { p p } = 490 \text { and } S _ { b b } = 423 \frac { 5 } { 7 }$$
  3. Find the product moment correlation coefficient, \(r\), for these 7 patients.
  4. Without any further calculations, state how \(r\) would differ from your answer in part (c) if it was calculated for all 8 patients. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{36cf6341-1957-45b9-9f7d-0914506f5919-15_1322_1593_207_173} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} The histogram in Figure 1 summarises the times, in minutes, that 200 people spent shopping in a supermarket.
    1. Give a reason to justify the use of a histogram to represent these data. Given that 40 people spent between 11 and 21 minutes shopping in the supermarket, estimate
    2. the number of people that spent between 18 and 25 minutes shopping in the supermarket,
    3. the median time spent shopping in the supermarket by these 200 people. The mid-point of each bar is represented by \(x\) and the corresponding frequency by f .
    4. Show that \(\sum \mathrm { f } x = 6390\) Given that \(\sum \mathrm { f } x ^ { 2 } = 238430\)
    5. for the data shown in the histogram, calculate estimates of
      1. the mean,
      2. the standard deviation. A coefficient of skewness is given by \(\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }\)
    6. Calculate this coefficient of skewness for these data. The manager of the supermarket decides to model these data with a normal distribution.
    7. Comment on the manager's decision. Give a justification for your answer.
OCR S1 2009 January Q2
8 marks Moderate -0.8
2 The table shows the age, \(x\) years, and the mean diameter, \(y \mathrm {~cm}\), of the trunk of each of seven randomly selected trees of a certain species.
Age \(( x\) years \()\)11122028354551
Mean trunk diameter \(( y \mathrm {~cm} )\)12.216.026.439.239.651.360.6
$$\left[ n = 7 , \Sigma x = 202 , \Sigma y = 245.3 , \Sigma x ^ { 2 } = 7300 , \Sigma y ^ { 2 } = 10510.65 , \Sigma x y = 8736.9 . \right]$$
  1. (a) Use an appropriate formula to show that the gradient of the regression line of \(y\) on \(x\) is 1.13 , correct to 2 decimal places.
    (b) Find the equation of the regression line of \(y\) on \(x\).
  2. Use your equation to estimate the mean trunk diameter of a tree of this species with age
    (a) 30 years,
    (b) 100 years. It is given that the value of the product moment correlation coefficient for the data in the table is 0.988 , correct to 3 decimal places.
  3. Comment on the reliability of each of your two estimates.
OCR S1 2011 January Q3
12 marks Moderate -0.8
3 A firm wishes to assess whether there is a linear relationship between the annual amount spent on advertising, \(\pounds x\) thousand, and the annual profit, \(\pounds y\) thousand. A summary of the figures for 12 years is as follows. $$n = 12 \quad \Sigma x = 86.6 \quad \Sigma y = 943.8 \quad \Sigma x ^ { 2 } = 658.76 \quad \Sigma y ^ { 2 } = 83663.00 \quad \Sigma x y = 7351.12$$
  1. Calculate the product moment correlation coefficient, showing that it is greater than 0.9 .
  2. Comment briefly on this value in this context.
  3. A manager claims that this result shows that spending more money on advertising in the future will result in greater profits. Make two criticisms of this claim.
  4. Calculate the equation of the regression line of \(y\) on \(x\).
  5. Estimate the annual profit during a year when \(\pounds 7400\) was spent on advertising.
OCR S1 2012 January Q2
10 marks Easy -1.8
2 In an experiment, the percentage sand content, \(y\), of soil in a given region was measured at nine different depths, \(x \mathrm {~cm}\), taken at intervals of 6 cm from 0 cm to 48 cm . The results are summarised below. $$n = 9 \quad \Sigma x = 216 \quad \Sigma x ^ { 2 } = 7344 \quad \Sigma y = 512.4 \quad \Sigma y ^ { 2 } = 30595 \quad \Sigma x y = 10674$$
  1. State, with a reason, which variable is the independent variable.
  2. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. (a) Calculate the equation of the appropriate regression line.
    (b) This regression line is used to estimate the percentage sand content at depths of 25 cm and 100 cm . Comment on the reliability of each of these estimates. You are not asked to find the estimates.