5.08c Pearson: measure of straight-line fit

26 questions

Sort by: Default | Easiest first | Hardest first
Edexcel S1 2016 January Q3
15 marks Moderate -0.3
3. A publisher collects information about the amount spent on advertising, \(\pounds x\), and the sales, \(y\) books, for some of her publications. She collects information for a random sample of 8 textbooks and codes the data using \(v = \frac { x + 50 } { 200 }\) and \(s = \frac { y } { 1000 }\) to give
\(v\)0.608.104.300.401.606.402.505.10
\(s\)1.846.735.951.302.457.464.826.25
[You may use: \(\sum v = 29 \sum s = 36.8 \sum s ^ { 2 } = 209.72 \sum v s = 177.311 \quad \mathrm {~S} _ { v v } = 55.275\) ]
  1. Find \(\mathrm { S } _ { v s }\) and \(\mathrm { S } _ { s s }\)
  2. Calculate the product moment correlation coefficient for these data. The publisher believes that a linear regression model may be appropriate to describe these data.
  3. State, giving a reason, whether or not your answer to part (b) supports the publisher's belief.
  4. Find the equation of the regression line of \(s\) on \(v\), giving your answer in the form \(s = a + b v\)
  5. Hence find the equation of the regression line of \(y\) on \(x\) for the sample of textbooks, giving your answer in the form \(y = c + d x\) The publisher calculated the regression line for a sample of novels and obtained the equation $$y = 3100 + 1.2 x$$ She wants to increase the sales of books by spending more money on advertising.
  6. State, giving your reasons, whether the publisher should spend more money on advertising textbooks or novels.
Edexcel S1 2017 January Q3
17 marks Moderate -0.3
  1. A scientist measured the salinity of water, \(x \mathrm {~g} / \mathrm { kg }\), and recorded the temperature at which the water froze, \(y ^ { \circ } \mathrm { C }\), for 12 different water samples. The summary statistics are listed below.
$$\begin{gathered} \sum x = 504 \quad \sum y = - 27 \quad \sum x ^ { 2 } = 22842 \quad \sum y ^ { 2 } = 62.98 \\ \sum x y = - 1190.7 \quad \mathrm {~S} _ { x x } = 1674 \quad \mathrm {~S} _ { y y } = 2.23 \end{gathered}$$
  1. Find the mean and variance of the recorded temperatures.
    (3) Priya believes that the higher the salinity of water, the higher the temperature at which the water freezes.
    1. Calculate the product moment correlation coefficient between \(x\) and \(y\)
    2. State, with a reason, whether or not this value supports Priya's belief.
  2. Find the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  3. Estimate the temperature at which water freezes when the salinity is \(32 \mathrm {~g} / \mathrm { kg }\) The coding \(w = 1.8 y + 32\) is used to convert the recorded temperatures from \({ } ^ { \circ } \mathrm { C }\) to \({ } ^ { \circ } \mathrm { F }\)
  4. Find an equation of the least squares regression line of \(w\) on \(x\) in the form \(w = c + d x\)
  5. Find
    1. the variance of the recorded temperatures when converted to \({ } ^ { \circ } \mathrm { F }\)
    2. the product moment correlation coefficient between \(w\) and \(x\) \href{http://PhysicsAndMathsTutor.com}{PhysicsAndMathsTutor.com}
Edexcel S1 2018 January Q3
8 marks Moderate -0.8
3. Martin is investigating the relationship between a person's daily caffeine consumption, \(c\) milligrams, and the amount of sleep they get, \(h\) hours, per night. He collected this information from 20 people and the results are summarised below. $$\begin{array} { c c } \sum c = 3660 \quad \sum h = 126 \quad \sum c ^ { 2 } = 973228 \\ \sum c h = 20023.4 \quad S _ { c c } = 303448 \quad S _ { c h } = - 3034.6 \end{array}$$ Martin calculates the product moment correlation coefficient for these data and obtains - 0.833
  1. Give a reason why this value supports a linear relationship between \(c\) and \(h\) The amount of sleep per night is the response variable.
  2. Explain what you understand by the term 'response variable'. Martin says that for each additional 100 mg of caffeine consumed, the expected number of hours of sleep decreases by 1
  3. Determine, by calculation, whether or not the data support this statement.
  4. Use the data to calculate an estimate for the expected number of hours of sleep per night when no caffeine is consumed.
Edexcel S1 2018 January Q5
12 marks Moderate -0.3
5. Franca is the manager of an accountancy firm. She is investigating the relationship between the salary, \(\pounds x\), and the length of commute, \(y\) minutes, for employees at the firm. She collected this information from 9 randomly selected employees. The salary of each employee was then coded using \(w = \frac { x - 20000 } { 1000 }\) The table shows the values of \(w\) and \(y\) for the 9 employees.
\(w\)688- 125153- 219
\(y\)455035652540507520
(You may use \(\sum w = 81 \quad \sum y = 405 \quad \sum w y = 2490 \quad S _ { w w } = 660 \quad S _ { y y } = 2500\) )
  1. Calculate the salary of the employee with \(w = - 2\)
  2. Show that, to 3 significant figures, the value of the product moment correlation coefficient between \(w\) and \(y\) is - 0.899
  3. State, giving a reason, the value of the product moment correlation coefficient between \(x\) and \(y\) The least squares regression line of \(y\) on \(w\) is \(y = 60.75 - 1.75 w\)
  4. Find the equation of the least squares regression line of \(y\) on \(x\) giving your answer in the form \(y = a + b x\)
  5. Estimate the length of commute for an employee with a salary of \(\pounds 21000\) Franca uses the regression line to estimate the length of commute for employees with salaries between \(\pounds 25000\) and \(\pounds 40000\)
  6. State, giving a reason, whether or not these estimates are reliable.
CAIE FP2 2012 June Q11 OR
Challenging +1.2
For a random sample of 5 pairs of values of \(x\) and \(y\), the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\) are respectively $$y = - 0.5 x + 5 \quad \text { and } \quad x = - 1.2 y + 7.6$$ Find the value of the product moment correlation coefficient for this sample. Test, at the \(5 \%\) significance level, whether the population product moment correlation coefficient differs from zero. The following table shows the sample data.
\(x\)1255\(p\)
\(y\)5342\(q\)
Find the values of \(p\) and \(q\).
OCR H240/02 2018 June Q11
6 marks Moderate -0.8
11 Christa used Pearson's product-moment correlation coefficient, \(r\), to compare the use of public transport with the use of private vehicles for travel to work in the UK.
  1. Using the pre-release data set for all 348 UK Local Authorities, she considered the following four variables.
    Number of employees using
    public transport
    \(x\)
    Number of employees using
    private vehicles
    \(y\)
    Proportion of employees using
    public transport
    \(a\)
    Proportion of employees using
    private vehicles
    \(b\)
    1. Explain, in context, why you would expect strong, positive correlation between \(x\) and \(y\).
    2. Explain, in context, what kind of correlation you would expect between \(a\) and \(b\).
    3. Christa also considered the data for the 33 London boroughs alone and she generated the following scatter diagram. \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{London} \includegraphics[alt={},max width=\textwidth]{65d9d34c-8c78-45fe-b9f0-dab071ae56bb-07_467_707_1366_653}
      \end{figure} One London Borough is represented by an outlier in the diagram.
      (a) Suggest what effect this outlier is likely to have on the value of \(r\) for the 32 London Boroughs.
      (b) Suggest what effect this outlier is likely to have on the value of \(r\) for the whole country.
    4. What can you deduce about the area of the London Borough represented by the outlier? Explain your answer.
Edexcel S1 Specimen Q1
4 marks Easy -1.2
  1. Gary compared the total attendance, \(x\), at home matches and the total number of goals, \(y\), scored at home during a season for each of 12 football teams playing in a league. He correctly calculated:
$$S _ { x x } = 1022500 \quad S _ { y y } = 130.9 \quad S _ { x y } = 8825$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Interpret the value of the correlation coefficient. Helen was given the same data to analyse. In view of the large numbers involved she decided to divide the attendance figures by 100 . She then calculated the product moment correlation coefficient between \(\frac { x } { 100 }\) and \(y\).
  3. Write down the value Helen should have obtained.
Edexcel S1 2007 January Q1
15 marks Moderate -0.8
  1. As part of a statistics project, Gill collected data relating to the length of time, to the nearest minute, spent by shoppers in a supermarket and the amount of money they spent. Her data for a random sample of 10 shoppers are summarised in the table below, where \(t\) represents time and \(\pounds m\) the amount spent over \(\pounds 20\).
\(t\) (minutes)£m
15-3
2317
5-19
164
3012
6-9
3227
236
3520
276
  1. Write down the actual amount spent by the shopper who was in the supermarket for 15 minutes.
  2. Calculate \(S _ { t t } , S _ { m m }\) and \(S _ { t m }\). $$\text { (You may use } \Sigma t ^ { 2 } = 5478 \Sigma m ^ { 2 } = 2101 \Sigma t m = 2485 \text { ) }$$
  3. Calculate the value of the product moment correlation coefficient between \(t\) and \(m\).
  4. Write down the value of the product moment correlation coefficient between \(t\) and the actual amount spent. Give a reason to justify your value. On another day Gill collected similar data. For these data the product moment correlation coefficient was 0.178
  5. Give an interpretation to both of these coefficients.
  6. Suggest a practical reason why these two values are so different.
Edexcel S1 2007 June Q1
4 marks Easy -1.2
  1. A young family were looking for a new 3 bedroom semi-detached house. A local survey recorded the price \(x\), in \(\pounds 1000\), and the distance \(y\), in miles, from the station of such houses. The following summary statistics were provided
$$S _ { x x } = 113573 , \quad S _ { y y } = 8.657 , \quad S _ { x y } = - 808.917$$
  1. Use these values to calculate the product moment correlation coefficient.
  2. Give an interpretation of your answer to part (a). Another family asked for the distances to be measured in km rather than miles.
  3. State the value of the product moment correlation coefficient in this case.
Edexcel S1 2013 June Q5
11 marks Moderate -0.3
5. A researcher believes that parents with a short family name tended to give their children a long first name. A random sample of 10 children was selected and the number of letters in their family name, \(x\), and the number of letters in their first name, \(y\), were recorded. The data are summarised as: $$\sum x = 60 , \quad \sum y = 61 , \quad \sum y ^ { 2 } = 393 , \quad \sum x y = 382 , \quad \mathrm {~S} _ { x x } = 28$$
  1. Find \(\mathrm { S } _ { y y }\) and \(\mathrm { S } _ { x y }\)
  2. Calculate the product moment correlation coefficient, \(r\), between \(x\) and \(y\).
  3. State, giving a reason, whether or not these data support the researcher's belief. The researcher decides to add a child with family name "Turner" to the sample.
  4. Using the definition \(\mathrm { S } _ { x x } = \sum ( x - \bar { x } ) ^ { 2 }\), state the new value of \(\mathrm { S } _ { x x }\) giving a reason for your answer. Given that the addition of the child with family name "Turner" to the sample leads to an increase in \(\mathrm { S } _ { y y }\)
  5. use the definition \(\mathrm { S } _ { x y } = \sum ( x - \bar { x } ) ( y - \bar { y } )\) to determine whether or not the value of \(r\) will increase, decrease or stay the same. Give a reason for your answer.
Edexcel S1 2017 June Q1
14 marks Moderate -0.5
  1. A clothes shop manager records the weekly sales figures, \(\pounds s\), and the average weekly temperature, \(t ^ { \circ } \mathrm { C }\), for 6 weeks during the summer. The sales figures were coded so that \(w = \frac { s } { 1000 }\)
The data are summarised as follows $$\mathrm { S } _ { w w } = 50 \quad \sum w t = 784 \quad \sum t ^ { 2 } = 2435 \quad \sum t = 119 \quad \sum w = 42$$
  1. Find \(\mathrm { S } _ { w t }\) and \(\mathrm { S } _ { t t }\)
  2. Write down the value of \(\mathrm { S } _ { s s }\) and the value of \(\mathrm { S } _ { s t }\)
  3. Find the product moment correlation coefficient between \(s\) and \(t\). The manager of the clothes shop believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the manager's belief.
  5. Find the equation of the regression line of \(w\) on \(t\), giving your answer in the form \(w = a + b t\)
  6. Hence find the equation of the regression line of \(s\) on \(t\), giving your answer in the form \(s = c + d t\), where \(c\) and \(d\) are correct to 3 significant figures.
  7. Using your equation in part (f), interpret the effect of a \(1 ^ { \circ } \mathrm { C }\) increase in average weekly temperature on weekly sales during the summer.
Edexcel S1 2004 November Q6
18 marks Easy -1.2
6. Students in Mr Brawn's exercise class have to do press-ups and sit-ups. The number of press-ups \(x\) and the number of sit-ups \(y\) done by a random sample of 8 students are summarised below. $$\begin{array} { l l } \Sigma x = 272 , & \Sigma x ^ { 2 } = 10164 , \quad \Sigma x y = 11222 , \\ \Sigma y = 320 , & \Sigma y ^ { 2 } = 13464 . \end{array}$$
  1. Evaluate \(S _ { x x } , S _ { y y }\) and \(S _ { x y }\).
  2. Calculate, to 3 decimal places, the product moment correlation coefficient between \(x\) and \(y\).
  3. Give an interpretation of your coefficient.
  4. Calculate the mean and the standard deviation of the number of press-ups done by these students. Mr Brawn assumes that the number of press-ups that can be done by any student can be modelled by a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Assuming that \(\mu\) and \(\sigma\) take the same values as those calculated in part (d),
  5. find the value of \(a\) such that \(\mathrm { P } ( \mu - a < X < \mu + a ) = 0.95\).
  6. Comment on Mr Brawn's assumption of normality.
AQA S1 2013 January Q4
12 marks Moderate -0.3
4 Ashok is a work-experience student with an organisation that offers two separate professional examination papers, I and II. For each of a random sample of 12 students, A to L , he records the mark, \(x\) per cent, achieved on Paper I, and the mark, \(y\) per cent, achieved on Paper II.
\cline { 2 - 13 } \multicolumn{1}{c|}{}\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)\(\mathbf { K }\)\(\mathbf { L }\)
\(\boldsymbol { x }\)344653626772605470718285
\(\boldsymbol { y }\)616672788881496054444936
    1. Calculate the value of the product moment correlation coefficient, \(r\), between \(x\) and \(y\).
    2. Interpret your value of \(r\) in the context of this question.
    1. Give two possible advantages of plotting data on a graph before calculating the value of a product moment correlation coefficient.
    2. Complete the plotting of Ashok's data on the scatter diagram on page 5.
    3. State what is now revealed by the scatter diagram.
  1. Ashok subsequently discovers that students A to F have a more scientific background than students G to L. With reference to your scatter diagram, estimate the value of the product moment correlation coefficient for each of the two groups of students. You are not expected to calculate the two values.
    \cline { 2 - 7 } \multicolumn{1}{c|}{}\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)\(\mathbf { K }\)\(\mathbf { L }\)
    \(\boldsymbol { x }\)605470718285
    \(\boldsymbol { y }\)496054444936
    \section*{Examination Marks}
    \includegraphics[max width=\textwidth, alt={}]{68830a6a-5479-4e5c-a845-a6536ab51cee-5_1616_1634_836_189}
AQA S1 2008 June Q3
10 marks Easy -1.3
3 [Figure 1, printed on the insert, is provided for use in this question.]
The table shows, for each of a sample of 12 handmade decorative ceramic plaques, the length, \(x\) millimetres, and the width, \(y\) millimetres.
Plaque\(\boldsymbol { x }\)\(\boldsymbol { y }\)
A232109
B235112
C236114
D234118
E230117
F230113
G246121
H240125
I244128
J241122
K246126
L245123
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
  3. On Figure 1, complete the scatter diagram for these data.
  4. In fact, the 6 plaques \(\mathrm { A } , \mathrm { B } , \ldots , \mathrm { F }\) are from a different source to the 6 plaques \(\mathrm { G } , \mathrm { H } , \ldots , \mathrm { L }\). With reference to your scatter diagram, but without further calculations, estimate the value of the product moment correlation coefficient between \(x\) and \(y\) for each source of plaque.
AQA S1 2013 June Q1
7 marks Moderate -0.8
1 The average maximum monthly temperatures, \(u\) degrees Fahrenheit, and the average minimum monthly temperatures, \(v\) degrees Fahrenheit, in New York City are as follows.
JanFebMarAprMayJunJulAugSepOctNovDec
Maximum (u)394048617181858377675441
Minimum (v)262734445363686660514130
    1. Calculate, to one decimal place, the mean and the standard deviation of the 12 values of the average maximum monthly temperature.
    2. For comparative purposes with a UK city, it was necessary to convert the temperatures from degrees Fahrenheit ( \({ } ^ { \circ } \mathrm { F }\) ) to degrees Celsius ( \({ } ^ { \circ } \mathrm { C }\) ). The formula used to convert \(f ^ { \circ } \mathrm { F }\) to \(c ^ { \circ } \mathrm { C }\) is: $$c = \frac { 5 } { 9 } ( f - 32 )$$ Use this formula and your answers in part (a)(i) to calculate, in \({ } ^ { \circ } \mathbf { C }\), the mean and the standard deviation of the 12 values of the average maximum monthly temperature.
      (3 marks)
  1. The value of the product moment correlation coefficient, \(r _ { u v }\), between the above 12 values of \(u\) and \(v\) is 0.997 , correct to three decimal places. State, giving a reason, the corresponding value of \(r _ { x y }\), where \(x\) and \(y\) are the exact equivalent temperatures in \({ } ^ { \circ } \mathrm { C }\) of \(u\) and \(v\) respectively.
    (2 marks)
Edexcel S1 Q2
7 marks Moderate -0.8
2. A tennis coach believes that taller players are generally capable of hitting faster serves. To investigate this hypothesis he collects data on the 20 adult male players he coaches. The height, \(h\), in metres and the speed of each player's fastest serve, \(v\), in miles per hour were recorded and summarised as follows: $$\Sigma h = 36.22 , \quad \Sigma v = 2275 , \quad \Sigma h ^ { 2 } = 65.7396 , \quad \Sigma v ^ { 2 } = 259853 , \quad \Sigma h v = 4128.03 .$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Comment on the coach's hypothesis.
Edexcel S1 Q2
10 marks Moderate -0.3
2. A statistics student gave a questionnaire to a random sample of 50 pupils at his school. The sample included pupils aged from 11 to 18 years old. The student summarised the data on age in completed years, \(A\), and the number of hours spent doing homework in the previous week, \(H\), giving the following: $$\Sigma A = 703 , \quad \Sigma H = 217 , \quad \Sigma A ^ { 2 } = 10131 , \quad \Sigma H ^ { 2 } = 1338.5 , \quad \Sigma A H = 3253.5$$
  1. Calculate the product moment correlation coefficient for these data and explain what is shown by your result.
    (6 marks)
    The student also asked each pupil how many hours of paid work they had done in the previous week. He then calculated the product moment correlation coefficient for the data on hours doing homework and hours doing paid work, giving a value of \(r = 0.5213\) The student concluded that paid work did not interfere with homework as pupils doing more paid work also tended to do more homework.
  2. Explain why this conclusion may not be valid.
  3. Explain briefly how the student could more effectively investigate the effect of paid work on homework.
    (2 marks)
OCR MEI Further Statistics Minor 2019 June Q5
16 marks Standard +0.3
5 A student wants to know if there is a positive correlation between the amounts of two pollutants, sulphur dioxide and PM10 particulates, on different days in the area of London in which he lives; these amounts, measured in suitable units, are denoted by \(s\) and \(p\) respectively.
He uses a government website to obtain data for a random sample of 15 days on which the amounts of these pollutants were measured simultaneously. Fig. 5.1 is a scatter diagram showing the data. Summary statistics for these 15 values of \(s\) and \(p\) are as follows. \(\sum s _ { 1 } = 155.4 \quad \sum p = 518.9 \quad \sum s ^ { 2 } = 2322.7 \quad \sum p ^ { 2 } = 21270.5 \quad \sum s p = 6009.1\) \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{4a4d5816-5b53-49a1-b72f-f8bcf3b4e8bc-4_935_1134_683_260} \captionsetup{labelformat=empty} \caption{Fig. 5.1}
\end{figure}
  1. Explain why the student might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
  2. Find the value of Pearson's product moment correlation coefficient.
  3. Carry out a test at the \(5 \%\) significance level to investigate whether there is positive correlation between the amounts of sulphur dioxide and PM10 particulates.
  4. Explain why the student made sure that the sample chosen was a random sample. The student also wishes to model the relationship between the amounts of nitrogen dioxide \(n\) and PM10 particulates \(p\).
    He takes a random sample of 54 values of the two variables, both measured at the same times. Fig. 5.2 is a scatter diagram which shows the data, together with the regression line of \(n\) on \(p\), the equation of the regression line and the value of \(r ^ { 2 }\). \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{4a4d5816-5b53-49a1-b72f-f8bcf3b4e8bc-5_824_1230_495_258} \captionsetup{labelformat=empty} \caption{Fig. 5.2}
    \end{figure}
  5. Predict the value of \(n\) for \(p = 150\).
  6. Discuss the reliability of your prediction in part (e).
OCR MEI Further Statistics Minor 2024 June Q3
13 marks Standard +0.3
3 The scatter diagram below illustrates data concerning average annual income per person, \(\\) x\(, and average life expectancy, \)y$ years, for 45 randomly selected cities. \includegraphics[max width=\textwidth, alt={}, center]{464c80be-007b-4d5a-9fe5-2f35100bdea6-3_860_1465_354_244}
  1. State whether neither variable, one variable or both variables can be considered to be random in this situation. A student is researching possible positive association between average annual income and average life expectancy. The student decides that the data point labelled A on the scatter diagram is an outlier.
  2. Describe the apparent relationship between average annual income and average life expectancy for this data point relative to the rest of the data. The data for point A is removed. The student now wishes to carry out a hypothesis test using the product moment correlation coefficient for the remaining 44 data points to investigate whether there is positive correlation between average annual income and average life expectancy.
  3. Explain why this type of hypothesis test is appropriate in this situation. Justify your answer. The summary statistics for these 44 data points are as follows. \(\sum x = 751120 \sum y = 2397.1 \sum x ^ { 2 } = 14363849200 \sum y ^ { 2 } = 133014.63 \sum x y = 42465962\)
  4. Determine the value of the product moment correlation coefficient.
  5. Carry out the test at the 1\% significance level.
Edexcel FS2 Specimen Q7
8 marks Standard +0.8
  1. Over a period of time, researchers took 10 blood samples from one patient with a blood disease. For each sample, they measured the levels of serum magnesium, \(s \mathrm { mg } / \mathrm { dl }\), in the blood and the corresponding level of the disease protein, \(d \mathrm { mg } / \mathrm { dl }\). One of the researchers coded the data for each sample using \(x = 10 s\) and \(y = 10 ( d - 9 )\) but spilt ink over his work.
The following summary statistics and unfinished scatter diagram are the only remaining information. $$\sum d ^ { 2 } = 1081.74 \quad \mathrm {~S} _ { d s } = 59.524$$ and $$\sum y = 64 \quad \mathrm {~S} _ { x x } = 2658.9$$ \(d \mathrm { mg } / \mathrm { dl }\) \includegraphics[max width=\textwidth, alt={}, center]{e777c787-0d39-4d84-a0f9-fc4a6712184f-22_983_1534_840_303}
  1. Use the formula for \(\mathrm { S } _ { x x }\) to show that \(\mathrm { S } _ { s s } = 26.589\)
  2. Find the value of the product moment correlation coefficient between \(s\) and \(d\).
  3. With reference to the unfinished scatter diagram, comment on your result in part (b).
AQA S1 2006 June Q1
8 marks Moderate -0.3
1 The table shows, for each of a random sample of 8 paperback fiction books, the number of pages, \(x\), and the recommended retail price, \(\pounds y\), to the nearest 10 p.
\(\boldsymbol { x }\)223276374433564612704766
\(\boldsymbol { y }\)6.504.005.508.004.505.008.005.50
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Interpret your value in the context of this question.
    3. Suggest one other variable, in addition to the number of pages, which may affect the recommended retail price of a paperback fiction book.
  1. The same 8 books were later included in a book sale. The value of the product moment correlation coefficient between the number of pages and the sale price was 0.959 , correct to three decimal places. What can be concluded from this value?
OCR MEI Further Statistics Major Specimen Q3
11 marks Standard +0.3
3 A researcher is investigating factors that might affect how many hours per day different species of mammals spend asleep. First she investigates human beings. She collects data on body mass index, \(x\), and hours of sleep, \(y\), for a random sample of people. A scatter diagram of the data is shown in Fig. 3.1 together with the regression line of \(y\) on \(x\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-04_885_1584_598_274} \captionsetup{labelformat=empty} \caption{Fig. 3.1}
\end{figure}
  1. Calculate the residual for the data point which has the residual with the greatest magnitude.
  2. Use the equation of the regression line to estimate the mean number of hours spent asleep by a person with body mass index
    (A) 26,
    (B) 16,
    commenting briefly on each of your predictions. The researcher then collects additional data for a large number of species of mammals and analyses different factors for effect size. Definitions of the variables measured for a typical animal of the species, the correlations between these variables, and guidelines often used when considering effect size are given in Fig. 3.2.
    VariableDefinition
    Body massMass of animal in kg
    Brain massMass of brain in g
    Hours of sleep/dayNumber of hours per day spent asleep
    Life spanHow many years the animal lives
    DangerA measure of how dangerous the animal's situation is when asleep, taking into account predators and how protected the animal's den is: higher value indicates greater danger.
    Correlations (pmcc)Body MassBrain MassHours of sleep/dayLife spanDanger
    Body Mass1.00
    Brain Mass0.931.00
    Hours of sleep/day-0.31-0.361.00
    Life span0.300.51-0.411.00
    Danger0.130.15-0.590.061.00
    \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Fig. 3.2}
    \end{table}
  3. State two conclusions the researcher might draw from these tables, relevant to her investigation into how many hours mammals spend asleep. One of the researcher's students notices the high correlation between body mass and brain mass and produces a scatter diagram for these two variables, shown in Fig. 3.3 below. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-05_675_698_1802_735} \captionsetup{labelformat=empty} \caption{Fig. 3.3}
    \end{figure}
  4. Comment on the suitability of a linear model for these two variables.
Edexcel S3 Q7
16 marks Standard +0.3
For one of the activities at a gymnastics competition, 8 gymnasts were awarded marks out of 10 for each of artistic performance and technical ability. The results were as follows.
Gymnast\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Technical ability8.58.69.57.56.89.19.49.2
Artistic performance6.27.58.26.76.07.28.09.1
The value of the product moment correlation coefficient for these data is 0.774.
  1. Stating your hypotheses clearly and using a 1% level of significance, interpret this value. [5]
  2. Calculate the value of the rank correlation coefficient for these data. [6]
  3. Stating your hypotheses clearly and using a 1% level of significance, interpret this coefficient. [3]
  4. Explain why the rank correlation coefficient might be the better one to use with these data. [2]
OCR S1 2009 June Q3
8 marks Moderate -0.3
In an agricultural experiment, the relationship between the amount of water supplied, \(x\) units, and the yield, \(y\) units, was investigated. Six values of \(x\) were chosen and for each value of \(x\) the corresponding value of \(y\) was measured. The results are shown in the table.
\(x\)123456
\(y\)36881110
These results, together with the regression line of \(y\) on \(x\), are plotted on the graph. \includegraphics{figure_1}
  1. Give a reason why the regression line of \(x\) on \(y\) is not suitable in this context. [1]
  2. Explain the significance, for the regression line of \(y\) on \(x\), of the distances shown by the vertical dotted lines in the diagram. [2]
  3. Calculate the value of the product moment correlation coefficient, \(r\). [3]
  4. Comment on your value of \(r\) in relation to the diagram. [2]
Edexcel S1 Q1
6 marks Moderate -0.8
  1. Draw two separate scatter diagrams, each with eight points, to illustrate the relationship between \(x\) and \(y\) in the cases where they have a product moment correlation coefficient equal to
    1. exactly \(+1\),
    2. about \(-0.4\). [4 marks]
  2. Explain briefly how the conclusion you would draw from a product moment correlation coefficient of \(+0.3\) would vary according to the number of pairs of data used in its calculation. [2 marks]