5.08b Linear coding: effect on pmcc

37 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2016 June Q2
10 marks Moderate -0.3
2
  1. The table shows the amount, \(x\), in hundreds of pounds, spent on heating and the number of absences, \(y\), at a factory during each month in 2014.
    Amount, \(x\), spent on
    heating (£ hundreds)
    212319151452109201823
    Number of absences, \(y\)2325181812104911152026
    \(n = 12 \quad \Sigma x = 179 \quad \Sigma x ^ { 2 } = 3215 \quad \Sigma y = 191 \quad \Sigma y ^ { 2 } = 3565 \quad \Sigma x y = 3343\)
    1. Calculate \(r\), the product moment correlation coefficient, showing that \(r > 0.92\).
    2. A manager says, 'The value of \(r\) shows that spending more money on heating causes more absences, so we should spend less on heating.' Comment on this claim.
    3. The months in 2014 were numbered \(1,2,3 , \ldots , 12\). The output, \(z\), in suitable units was recorded along with the month number, \(n\), for each month in 2014. The equation of the regression line of \(z\) on \(n\) was found to be \(z = 0.6 n + 17\).
      (a) Use this equation to explain whether output generally increased or decreased over these months.
      (b) Find the mean of \(n\) and use the equation of the regression line to calculate the mean of \(z\).
    4. Hence calculate the total output in 2014.
Edexcel S1 2016 January Q3
15 marks Moderate -0.3
3. A publisher collects information about the amount spent on advertising, \(\pounds x\), and the sales, \(y\) books, for some of her publications. She collects information for a random sample of 8 textbooks and codes the data using \(v = \frac { x + 50 } { 200 }\) and \(s = \frac { y } { 1000 }\) to give
\(v\)0.608.104.300.401.606.402.505.10
\(s\)1.846.735.951.302.457.464.826.25
[You may use: \(\sum v = 29 \sum s = 36.8 \sum s ^ { 2 } = 209.72 \sum v s = 177.311 \quad \mathrm {~S} _ { v v } = 55.275\) ]
  1. Find \(\mathrm { S } _ { v s }\) and \(\mathrm { S } _ { s s }\)
  2. Calculate the product moment correlation coefficient for these data. The publisher believes that a linear regression model may be appropriate to describe these data.
  3. State, giving a reason, whether or not your answer to part (b) supports the publisher's belief.
  4. Find the equation of the regression line of \(s\) on \(v\), giving your answer in the form \(s = a + b v\)
  5. Hence find the equation of the regression line of \(y\) on \(x\) for the sample of textbooks, giving your answer in the form \(y = c + d x\) The publisher calculated the regression line for a sample of novels and obtained the equation $$y = 3100 + 1.2 x$$ She wants to increase the sales of books by spending more money on advertising.
  6. State, giving your reasons, whether the publisher should spend more money on advertising textbooks or novels.
Edexcel S1 2017 January Q3
17 marks Moderate -0.3
  1. A scientist measured the salinity of water, \(x \mathrm {~g} / \mathrm { kg }\), and recorded the temperature at which the water froze, \(y ^ { \circ } \mathrm { C }\), for 12 different water samples. The summary statistics are listed below.
$$\begin{gathered} \sum x = 504 \quad \sum y = - 27 \quad \sum x ^ { 2 } = 22842 \quad \sum y ^ { 2 } = 62.98 \\ \sum x y = - 1190.7 \quad \mathrm {~S} _ { x x } = 1674 \quad \mathrm {~S} _ { y y } = 2.23 \end{gathered}$$
  1. Find the mean and variance of the recorded temperatures.
    (3) Priya believes that the higher the salinity of water, the higher the temperature at which the water freezes.
    1. Calculate the product moment correlation coefficient between \(x\) and \(y\)
    2. State, with a reason, whether or not this value supports Priya's belief.
  2. Find the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  3. Estimate the temperature at which water freezes when the salinity is \(32 \mathrm {~g} / \mathrm { kg }\) The coding \(w = 1.8 y + 32\) is used to convert the recorded temperatures from \({ } ^ { \circ } \mathrm { C }\) to \({ } ^ { \circ } \mathrm { F }\)
  4. Find an equation of the least squares regression line of \(w\) on \(x\) in the form \(w = c + d x\)
  5. Find
    1. the variance of the recorded temperatures when converted to \({ } ^ { \circ } \mathrm { F }\)
    2. the product moment correlation coefficient between \(w\) and \(x\) \href{http://PhysicsAndMathsTutor.com}{PhysicsAndMathsTutor.com}
Edexcel S1 2018 January Q3
8 marks Moderate -0.8
3. Martin is investigating the relationship between a person's daily caffeine consumption, \(c\) milligrams, and the amount of sleep they get, \(h\) hours, per night. He collected this information from 20 people and the results are summarised below. $$\begin{array} { c c } \sum c = 3660 \quad \sum h = 126 \quad \sum c ^ { 2 } = 973228 \\ \sum c h = 20023.4 \quad S _ { c c } = 303448 \quad S _ { c h } = - 3034.6 \end{array}$$ Martin calculates the product moment correlation coefficient for these data and obtains - 0.833
  1. Give a reason why this value supports a linear relationship between \(c\) and \(h\) The amount of sleep per night is the response variable.
  2. Explain what you understand by the term 'response variable'. Martin says that for each additional 100 mg of caffeine consumed, the expected number of hours of sleep decreases by 1
  3. Determine, by calculation, whether or not the data support this statement.
  4. Use the data to calculate an estimate for the expected number of hours of sleep per night when no caffeine is consumed.
Edexcel S1 2018 January Q5
12 marks Moderate -0.3
5. Franca is the manager of an accountancy firm. She is investigating the relationship between the salary, \(\pounds x\), and the length of commute, \(y\) minutes, for employees at the firm. She collected this information from 9 randomly selected employees. The salary of each employee was then coded using \(w = \frac { x - 20000 } { 1000 }\) The table shows the values of \(w\) and \(y\) for the 9 employees.
\(w\)688- 125153- 219
\(y\)455035652540507520
(You may use \(\sum w = 81 \quad \sum y = 405 \quad \sum w y = 2490 \quad S _ { w w } = 660 \quad S _ { y y } = 2500\) )
  1. Calculate the salary of the employee with \(w = - 2\)
  2. Show that, to 3 significant figures, the value of the product moment correlation coefficient between \(w\) and \(y\) is - 0.899
  3. State, giving a reason, the value of the product moment correlation coefficient between \(x\) and \(y\) The least squares regression line of \(y\) on \(w\) is \(y = 60.75 - 1.75 w\)
  4. Find the equation of the least squares regression line of \(y\) on \(x\) giving your answer in the form \(y = a + b x\)
  5. Estimate the length of commute for an employee with a salary of \(\pounds 21000\) Franca uses the regression line to estimate the length of commute for employees with salaries between \(\pounds 25000\) and \(\pounds 40000\)
  6. State, giving a reason, whether or not these estimates are reliable.
OCR H240/02 2018 June Q11
6 marks Moderate -0.8
11 Christa used Pearson's product-moment correlation coefficient, \(r\), to compare the use of public transport with the use of private vehicles for travel to work in the UK.
  1. Using the pre-release data set for all 348 UK Local Authorities, she considered the following four variables.
    Number of employees using
    public transport
    \(x\)
    Number of employees using
    private vehicles
    \(y\)
    Proportion of employees using
    public transport
    \(a\)
    Proportion of employees using
    private vehicles
    \(b\)
    1. Explain, in context, why you would expect strong, positive correlation between \(x\) and \(y\).
    2. Explain, in context, what kind of correlation you would expect between \(a\) and \(b\).
    3. Christa also considered the data for the 33 London boroughs alone and she generated the following scatter diagram. \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{London} \includegraphics[alt={},max width=\textwidth]{65d9d34c-8c78-45fe-b9f0-dab071ae56bb-07_467_707_1366_653}
      \end{figure} One London Borough is represented by an outlier in the diagram.
      (a) Suggest what effect this outlier is likely to have on the value of \(r\) for the 32 London Boroughs.
      (b) Suggest what effect this outlier is likely to have on the value of \(r\) for the whole country.
    4. What can you deduce about the area of the London Borough represented by the outlier? Explain your answer.
Edexcel S1 2019 June Q6
13 marks Moderate -0.8
  1. Ranpose hospital offers services to a large number of clinics that refer patients to a range of hospitals.
    The manager at Ranpose hospital took a random sample of 16 clinics and recorded
  • the distance, \(x \mathrm {~km}\), of the clinic from Ranpose hospital
  • the percentage, \(y \%\), of the referrals from the clinic who attend Ranpose hospital.
The data are summarised as $$\bar { x } = 8.1 \quad \bar { y } = 20.5 \quad \sum y ^ { 2 } = 8266 \quad \mathrm {~S} _ { x x } = 368.16 \quad \mathrm {~S} _ { x y } = - 630.9$$
  1. Find the product moment correlation coefficient for these data.
  2. Give an interpretation of your correlation coefficient. The manager at Ranpose hospital believes that there may be a linear relationship between the distance of a clinic from the hospital and the percentage of the referrals who attend the hospital. She drew the following scatter diagram for these data. \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-20_1106_926_1133_511}
  3. State, giving a reason, whether or not these data support the manager's belief.
    (1)
    \section*{[The summary data and the scatter diagram are repeated below.]} The data are summarised as $$\bar { x } = 8.1 \quad \bar { y } = 20.5 \quad \sum y ^ { 2 } = 8266 \quad \mathrm {~S} _ { x x } = 368.16 \quad \mathrm {~S} _ { x y } = - 630.9$$ \includegraphics[max width=\textwidth, alt={}, center]{9ac7647f-b291-4a64-9518-fa6438a0cc7d-22_1118_936_612_504}
  4. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form $$y = a + b x$$
  5. Give an interpretation of the gradient of your regression line.
  6. Draw your regression line on the scatter diagram. The manager believes that Ranpose hospital should be attracting an "above average" percentage of referrals from clinics that are less than 5 km from the hospital. She proposes to target one clinic with some extra publicity about the services Ranpose offers.
  7. On the scatter diagram circle the point representing the clinic she should target.
    VIIIV SIHI NI JIIYM ION OCNAMV SIHIL NI JAHAM ION OCVJ4V SIHII NI JIIYM ION OO
Edexcel S1 2007 January Q1
15 marks Moderate -0.8
  1. As part of a statistics project, Gill collected data relating to the length of time, to the nearest minute, spent by shoppers in a supermarket and the amount of money they spent. Her data for a random sample of 10 shoppers are summarised in the table below, where \(t\) represents time and \(\pounds m\) the amount spent over \(\pounds 20\).
\(t\) (minutes)£m
15-3
2317
5-19
164
3012
6-9
3227
236
3520
276
  1. Write down the actual amount spent by the shopper who was in the supermarket for 15 minutes.
  2. Calculate \(S _ { t t } , S _ { m m }\) and \(S _ { t m }\). $$\text { (You may use } \Sigma t ^ { 2 } = 5478 \Sigma m ^ { 2 } = 2101 \Sigma t m = 2485 \text { ) }$$
  3. Calculate the value of the product moment correlation coefficient between \(t\) and \(m\).
  4. Write down the value of the product moment correlation coefficient between \(t\) and the actual amount spent. Give a reason to justify your value. On another day Gill collected similar data. For these data the product moment correlation coefficient was 0.178
  5. Give an interpretation to both of these coefficients.
  6. Suggest a practical reason why these two values are so different.
Edexcel S1 2007 June Q1
4 marks Easy -1.2
  1. A young family were looking for a new 3 bedroom semi-detached house. A local survey recorded the price \(x\), in \(\pounds 1000\), and the distance \(y\), in miles, from the station of such houses. The following summary statistics were provided
$$S _ { x x } = 113573 , \quad S _ { y y } = 8.657 , \quad S _ { x y } = - 808.917$$
  1. Use these values to calculate the product moment correlation coefficient.
  2. Give an interpretation of your answer to part (a). Another family asked for the distances to be measured in km rather than miles.
  3. State the value of the product moment correlation coefficient in this case.
Edexcel S1 2008 June Q4
15 marks Moderate -0.8
4. Crickets make a noise. The pitch, \(v \mathrm { kHz }\), of the noise made by a cricket was recorded at 15 different temperatures, \(t ^ { \circ } \mathrm { C }\). These data are summarised below. $$\sum t ^ { 2 } = 10922.81 , \sum v ^ { 2 } = 42.3356 , \sum t v = 677.971 , \sum t = 401.3 , \sum v = 25.08$$
  1. Find \(S _ { t t } , S _ { v v }\) and \(S _ { t v }\) for these data.
  2. Find the product moment correlation coefficient between \(t\) and \(v\).
  3. State, with a reason, which variable is the explanatory variable.
  4. Give a reason to support fitting a regression model of the form \(v = a + b t\) to these data.
  5. Find the value of \(a\) and the value of \(b\). Give your answers to 3 significant figures.
  6. Using this model, predict the pitch of the noise at \(19 ^ { \circ } \mathrm { C }\).
Edexcel S1 2009 June Q1
6 marks Easy -1.2
  1. The volume of a sample of gas is kept constant. The gas is heated and the pressure, \(p\), is measured at 10 different temperatures, \(t\). The results are summarised below. \(\sum p = 445 \quad \sum p ^ { 2 } = 38125 \quad \sum t = 240 \quad \sum t ^ { 2 } = 27520 \quad \sum p t = 26830\)
    1. Find \(\mathrm { S } _ { p p }\) and \(\mathrm { S } _ { p t }\).
    Given that \(\mathrm { S } _ { t t } = 21760\),
  2. calculate the product moment correlation coefficient.
  3. Give an interpretation of your answer to part (b).
Edexcel S1 2012 June Q2
6 marks Moderate -0.8
2. A bank reviews its customer records at the end of each month to find out how many customers have become unemployed, \(u\), and how many have had their house repossessed, \(h\), during that month. The bank codes the data using variables \(x = \frac { u - 100 } { 3 }\) and \(y = \frac { h - 20 } { 7 }\) The results for the 12 months of 2009 are summarised below. $$\sum x = 477 \quad S _ { x x } = 5606.25 \quad \sum y = 480 \quad S _ { y y } = 4244 \quad \sum x y = 23070$$
  1. Calculate the value of the product moment correlation coefficient for \(x\) and \(y\).
  2. Write down the product moment correlation coefficient for \(u\) and \(h\). The bank claims that an increase in unemployment among its customers is associated with an increase in house repossessions.
  3. State, with a reason, whether or not the bank's claim is supported by these data.
Edexcel S1 2013 June Q5
11 marks Moderate -0.3
5. A researcher believes that parents with a short family name tended to give their children a long first name. A random sample of 10 children was selected and the number of letters in their family name, \(x\), and the number of letters in their first name, \(y\), were recorded. The data are summarised as: $$\sum x = 60 , \quad \sum y = 61 , \quad \sum y ^ { 2 } = 393 , \quad \sum x y = 382 , \quad \mathrm {~S} _ { x x } = 28$$
  1. Find \(\mathrm { S } _ { y y }\) and \(\mathrm { S } _ { x y }\)
  2. Calculate the product moment correlation coefficient, \(r\), between \(x\) and \(y\).
  3. State, giving a reason, whether or not these data support the researcher's belief. The researcher decides to add a child with family name "Turner" to the sample.
  4. Using the definition \(\mathrm { S } _ { x x } = \sum ( x - \bar { x } ) ^ { 2 }\), state the new value of \(\mathrm { S } _ { x x }\) giving a reason for your answer. Given that the addition of the child with family name "Turner" to the sample leads to an increase in \(\mathrm { S } _ { y y }\)
  5. use the definition \(\mathrm { S } _ { x y } = \sum ( x - \bar { x } ) ( y - \bar { y } )\) to determine whether or not the value of \(r\) will increase, decrease or stay the same. Give a reason for your answer.
Edexcel S1 2013 June Q1
13 marks Moderate -0.8
  1. A meteorologist believes that there is a relationship between the height above sea level, \(h \mathrm {~m}\), and the air temperature, \(t ^ { \circ } \mathrm { C }\). Data is collected at the same time from 9 different places on the same mountain. The data is summarised in the table below.
\(h\)140011002608409005501230100770
\(t\)310209101352416
[You may assume that \(\sum h = 7150 , \sum t = 110 , \sum h ^ { 2 } = 7171500 , \sum t ^ { 2 } = 1716\), \(\sum t h = 64980\) and \(\mathrm { S } _ { t t } = 371.56\) ]
  1. Calculate \(\mathrm { S } _ { t h }\) and \(\mathrm { S } _ { h h }\). Give your answers to 3 significant figures.
  2. Calculate the product moment correlation coefficient for this data.
  3. State whether or not your value supports the use of a regression equation to predict the air temperature at different heights on this mountain. Give a reason for your answer.
  4. Find the equation of the regression line of \(t\) on \(h\) giving your answer in the form \(t = a + b h\).
  5. Interpret the value of \(b\).
  6. Estimate the difference in air temperature between a height of 500 m and a height of 1000 m .
Edexcel S1 2015 June Q2
8 marks Moderate -0.8
2. An estate agent recorded the price per square metre, \(p \pounds / \mathrm { m } ^ { 2 }\), for 7 two-bedroom houses. He then coded the data using the coding \(q = \frac { p - a } { b }\), where \(a\) and \(b\) are positive constants. His results are shown in the table below.
\(p\)1840184818301824181918341850
\(q\)4.04.83.02.41.93.45.0
  1. Find the value of \(a\) and the value of \(b\) The estate agent also recorded the distance, \(d \mathrm {~km}\), of each house from the nearest train station. The results are summarised below. $$\mathrm { S } _ { d d } = 1.02 \quad \mathrm {~S} _ { q q } = 8.22 \quad \mathrm {~S} _ { d q } = - 2.17$$
  2. Calculate the product moment correlation coefficient between \(d\) and \(q\)
  3. Write down the value of the product moment correlation coefficient between \(d\) and \(p\) The estate agent records the price and size of 2 additional two-bedroom houses, \(H\) and \(J\).
    HousePrice \(( \pounds )\)Size \(\left( \mathrm { m } ^ { 2 } \right)\)
    \(H\)15640085
    \(J\)17290095
  4. Suggest which house is most likely to be closer to a train station. Justify your answer.
Edexcel S1 2018 June Q6
14 marks Moderate -0.8
6. A group of climbers collected information about the height above sea level, \(h\) metres, and the air temperature, \(t ^ { \circ } \mathrm { C }\), at the same time at 8 different points on the same mountain. The data are summarised by $$\sum h = 6370 \quad \sum t = 61 \quad \sum t h = 31070 \quad \sum t ^ { 2 } = 693$$
  1. Show that \(\mathrm { S } _ { \text {th } } = - 17501.25\) and \(\mathrm { S } _ { \text {tt } } = 227.875\) The product moment correlation coefficient for these data is - 0.985
  2. State, giving a reason, whether or not this value supports the use of a regression equation to predict the air temperature at different heights on this mountain.
  3. Find the equation of the regression line of \(t\) on \(h\), giving your answer in the form \(t = a + b h\). Give the value of your coefficients to 3 significant figures.
  4. Give an interpretation of your value of \(a\). One of the climbers has just stopped for a short break before climbing the next 150 metres.
  5. Estimate the drop in temperature over this 150 metre climb.
Edexcel S1 2004 November Q6
18 marks Easy -1.2
6. Students in Mr Brawn's exercise class have to do press-ups and sit-ups. The number of press-ups \(x\) and the number of sit-ups \(y\) done by a random sample of 8 students are summarised below. $$\begin{array} { l l } \Sigma x = 272 , & \Sigma x ^ { 2 } = 10164 , \quad \Sigma x y = 11222 , \\ \Sigma y = 320 , & \Sigma y ^ { 2 } = 13464 . \end{array}$$
  1. Evaluate \(S _ { x x } , S _ { y y }\) and \(S _ { x y }\).
  2. Calculate, to 3 decimal places, the product moment correlation coefficient between \(x\) and \(y\).
  3. Give an interpretation of your coefficient.
  4. Calculate the mean and the standard deviation of the number of press-ups done by these students. Mr Brawn assumes that the number of press-ups that can be done by any student can be modelled by a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Assuming that \(\mu\) and \(\sigma\) take the same values as those calculated in part (d),
  5. find the value of \(a\) such that \(\mathrm { P } ( \mu - a < X < \mu + a ) = 0.95\).
  6. Comment on Mr Brawn's assumption of normality.
AQA S1 2008 January Q2
7 marks Moderate -0.8
2 The head and body length, \(x\) millimetres, and tail length, \(y\) millimetres, of each of a sample of 20 adult dormice were measured. The following statistics are derived from the results. $$S _ { x x } = 1280.55 \quad S _ { y y } = 281.8 \quad S _ { x y } = 416.3$$
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
  3. Write down the value of the product moment correlation coefficient if the measurements had been recorded in centimetres.
  4. Give a reason why it is not generally advisable to calculate the value of the product moment correlation coefficient without first viewing a scatter diagram of the data. Illustrate your answer with a sketch.
AQA S1 2009 January Q2
7 marks Moderate -0.3
2 A greengrocer sells bunches of 9 carrots at his Saturday market stall. Tom and Geri are two Statistics students who work on the stall. Each selects a bunch of carrots at random.
  1. At home, Tom measures the length, \(x\) centimetres, and the maximum diameter, \(y\) centimetres, of each carrot in his selected bunch with the following results.
    \(\boldsymbol { x }\)16.213.110.412.114.69.711.813.617.3
    \(\boldsymbol { y }\)4.23.94.73.33.72.43.13.52.7
    1. Calculate the value of the product moment correlation coefficient.
    2. Interpret your value in context.
  2. At her home, Geri measures the length, in centimetres, and the weight, in grams, of each carrot in her selected bunch and then obtains a value of - 0.986 for the product moment correlation coefficient. Comment, with a reason, on the likely validity of Geri's value.
AQA S1 2011 January Q1
7 marks Easy -1.3
1
  1. Estimate, without undertaking any calculations, the value of the product moment correlation coefficient between the variables \(x\) and \(y\) for each of the two scatter diagrams. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{(i)} \includegraphics[alt={},max width=\textwidth]{156f9453-ebc6-4406-b5bc-08d1918ebc62-02_487_652_733_356}
    \end{figure} \includegraphics[max width=\textwidth, alt={}, center]{156f9453-ebc6-4406-b5bc-08d1918ebc62-02_576_714_733_1153}
  2. The table gives the circumference, \(x\) centimetres, and the weight, \(y\) grams, of each of 12 new cricket balls.
    \(\boldsymbol { x }\)22.522.722.622.422.522.822.622.722.822.422.922.6
    \(\boldsymbol { y }\)160.3159.4157.8158.0157.3159.8158.3159.6161.3156.4162.5161.2
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Assuming that the 12 balls may be considered to be a random sample, interpret your value in context.
AQA S1 2011 June Q7
9 marks Moderate -0.3
7
  1. Three airport management trainees, Ryan, Sunil and Tim, were each instructed to select a random sample of 12 suitcases from those waiting to be loaded onto aircraft. Each trainee also had to measure the volume, \(x\), and the weight, \(y\), of each of the 12 suitcases in his sample, and then calculate the value of the product moment correlation coefficient, \(r\), between \(x\) and \(y\).
    • Ryan obtained a value of - 0.843 .
    • Sunil obtained a value of + 0.007 .
    Explain why neither of these two values is likely to be correct.
  2. Peggy, a supervisor with many years' experience, measured the volume, \(x\) cubic feet, and the weight, \(y\) pounds, of each suitcase in a random sample of 6 suitcases, and then obtained a value of 0.612 for \(r\).
    • Ryan and Sunil each claimed that Peggy's value was different from their values because she had measured the volumes in cubic feet and the weights in pounds, whereas they had measured the volumes in cubic metres and the weights in kilograms.
    • Tim claimed that Peggy's value was almost exactly half his calculated value because she had used a sample of size 6 whereas he had used one of size 12 .
    Explain why neither of these two claims is valid.
  3. Quentin, a manager, recorded the volumes, \(v\), and the weights, \(w\), of a random sample of 8 suitcases as follows.
    \(\boldsymbol { v }\)28.119.746.423.631.117.535.813.8
    \(\boldsymbol { w }\)14.912.121.118.019.819.216.214.7
    1. Calculate the value of \(r\) between \(v\) and \(w\).
    2. Interpret your value in the context of this question.
AQA S1 2013 June Q4
17 marks Standard +0.3
4 The girth, \(g\) metres, the length, \(l\) metres, and the weight, \(y\) kilograms, of each of a sample of 20 pigs were measured. The data collected is summarised as follows. $$S _ { g g } = 0.1196 \quad S _ { l l } = 0.0436 \quad S _ { y y } = 5880 \quad S _ { g y } = 24.15 \quad S _ { l y } = 10.25$$
  1. Calculate the value of the product moment correlation coefficient between:
    1. girth and weight;
    2. length and weight.
  2. Interpret, in context, each of the values that you obtained in part (a).
  3. Weighing pigs requires expensive equipment, whereas measuring their girths and lengths simply requires a tape measure. With this in mind, the following formula is proposed to make an estimate of a pig's weight, \(x\) kilograms, from its girth and length. $$x = 69.3 \times g ^ { 2 } \times l$$ Applying this formula to the relevant data on the 20 pigs resulted in $$S _ { x x } = 5656.15 \quad S _ { x y } = 5662.97$$
    1. By calculating a third value of the product moment correlation coefficient, state which of \(g , l\) or \(x\) is the most strongly correlated with \(y\), the weight.
    2. Estimate the weight of a pig that has a girth of 1.25 metres and a length of 1.15 metres.
    3. Given the additional information that \(\bar { x } = 115.4\) and \(\bar { y } = 116.0\), calculate the equation of the least squares regression line of \(y\) on \(x\), in the form \(y = a + b x\).
    4. Comment on the likely accuracy of the estimated weight found in part (c)(ii). Your answer should make reference to the value of the product moment correlation coefficient found in part (c)(i) and to the values of \(b\) and \(a\) found in part (c)(iii).
      (4 marks)
AQA S1 2014 June Q5
13 marks Moderate -0.5
5 As part of a study of charity shops in a small market town, two such shops, \(X\) and \(Y\), were each asked to provide details of its takings on 12 randomly selected days. The table shows, for each of the 12 days, the day's takings, \(\pounds x\), of charity shop \(X\) and the day's takings, \(\pounds y\), of charity shop \(Y\).
Day\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)\(\mathbf { K }\)\(\mathbf { L }\)
\(\boldsymbol { x }\)4657391166277416115536861
\(\boldsymbol { y }\)781026621498729813421679583
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Interpret your value in the context of this question.
  1. Complete the scatter diagram shown on the opposite page.
  2. The investigator realised subsequently that one of the 12 selected days was a particularly popular town market day and another was a day on which the weather was extremely severe. Identify each of these days giving a reason for each choice.
  3. Removing the two days described in part (c) from the data gives the following information. $$S _ { x x } = 1292.5 \quad S _ { y y } = 3850.1 \quad S _ { x y } = 407.5$$
    1. Use this information to recalculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Hence revise, as necessary, your interpretation in part (a)(ii).
      [0pt] [3 marks] Shop \(X\) takings(£) \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{harity Shops} \includegraphics[alt={},max width=\textwidth]{ddf7f158-b6ae-42c6-98f1-d59c205646ad-17_33_21_294_1617}
      \end{figure} \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{harity Shops} \includegraphics[alt={},max width=\textwidth]{ddf7f158-b6ae-42c6-98f1-d59c205646ad-17_49_24_276_1710}
      \end{figure}
      \includegraphics[max width=\textwidth, alt={}]{ddf7f158-b6ae-42c6-98f1-d59c205646ad-17_1304_415_406_1391}
AQA S1 2014 June Q4
7 marks Moderate -0.3
4 Every year, usually during early June, the Isle of Man hosts motorbike races. Each race consists of three consecutive laps of the island's course. To compete in a race, a rider must first complete at least one qualifying lap. The data refer to the lightweight motorbike class in 2012 and show, for each of a random sample of 10 riders, values of $$u = x - 100 \quad \text { and } \quad v = y - 100$$ where \(x\) denotes the average speed, in mph, for the rider's fastest qualifying lap and \(y\) denotes the average speed, in mph, for the rider's three laps of the race.
\cline { 2 - 11 } \multicolumn{1}{c|}{}Rider
\cline { 2 - 11 } \multicolumn{1}{c|}{}\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)
\(\boldsymbol { u }\)7.8813.024.292.886.267.033.6011.7813.1511.69
\(\boldsymbol { v }\)6.6310.163.630.475.708.013.307.3113.0811.82
    1. Calculate the value of \(r _ { u v }\), the product moment correlation coefficient between \(u\) and \(v\).
    2. Hence state the value of \(r _ { x y }\), giving a reason for your answer.
  1. Interpret your value of \(r _ { x y }\) in the context of this question.
OCR MEI Further Statistics Major 2023 June Q6
12 marks Standard +0.3
6 A student wonders if there is any correlation between download and upload speeds of data to and from the internet. The student decides to carry out a hypothesis test to investigate this and so measures the download speed \(x\) and upload speed \(y\) in suitable units on 20 randomly chosen occasions. The scatter diagram below illustrates the data which the student collected. \includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-07_824_1411_440_246}
  1. Explain why the student decides to carry out a test based on the product moment correlation coefficient. Summary statistics for the 20 occasions are as follows. $$\sum x = 342.10 \quad \sum y = 273.65 \quad \sum x ^ { 2 } = 5989.53 \quad \sum y ^ { 2 } = 3919.53 \quad \sum x y = 4713.62$$
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient.
  3. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any correlation between download speed and upload speed.
  4. Both of the variables, download speed and upload speed, are random. Explain why, if download speed had been a non-random variable, the student could not have carried out the hypothesis test to investigate whether there was any correlation between download speed and upload speed.