5.08b Linear coding: effect on pmcc

37 questions

Sort by: Default | Easiest first | Hardest first
Edexcel FS2 2019 June Q2
10 marks Standard +0.3
2 A large field of wheat is split into 8 plots of equal area. Each plot is treated with a different amount of fertiliser, \(f\) grams \(/ \mathrm { m } ^ { 2 }\). The yield of wheat, \(w\) tonnes, from each plot is recorded. The results are summarised below. $$\sum f = 28 \quad \sum w = 303 \quad \sum w ^ { 2 } = 13447 \quad \mathrm {~S} _ { f f } = 42 \quad \mathrm {~S} _ { f w } = 269.5$$
  1. Calculate the product moment correlation coefficient between \(f\) and \(w\)
  2. Interpret the value of your product moment correlation coefficient.
  3. Find the equation of the regression line of \(w\) on \(f\) in the form \(w = a + b f\)
  4. Using your equation, estimate the decrease in yield when the amount of fertiliser decreases by 0.5 grams \(/ \mathrm { m } ^ { 2 }\) The residuals of the data recorded are calculated and plotted on the graph below. \includegraphics[max width=\textwidth, alt={}, center]{67df73d4-6ce4-45f7-8a69-aa94292ea814-04_1232_1294_1169_301}
  5. With reference to this graph, comment on the suitability of the model you found in part (c).
  6. Suggest how you might be able to refine your model.
Edexcel S1 2022 January Q2
6 marks Moderate -0.8
2. Tom's car holds 50 litres of petrol when the fuel tank is full. For each of 10 journeys, each starting with 50 litres of petrol in the fuel tank, Tom records the distance travelled, \(d\) kilometres, and the amount of petrol used, \(p\) litres. The summary statistics for the 10 journeys are given below. $$\sum d = 1029 \quad \sum p = 50.8 \quad \sum d p = 5240.8 \quad \mathrm {~S} _ { d d } = 344.9 \quad \mathrm {~S} _ { p p } = 0.576$$
  1. Calculate the product moment correlation coefficient between \(d\) and \(p\) The amount of petrol remaining in the fuel tank for each journey, \(w\) litres, is recorded.
    1. Write down an equation for \(w\) in terms of \(p\)
    2. Hence, write down the value of the product moment correlation coefficient between \(w\) and \(p\)
  2. Write down the value of the product moment correlation coefficient between \(d\) and \(w\)
AQA S1 2005 June Q1
6 marks Easy -1.2
1 For each of a random sample of 10 customers, a store records the time, \(x\) minutes, spent shopping and the value, \(\pounds y\), to the nearest 10 p, of items purchased. The results are tabulated below.
Time (x)1345109172316216
Value (y)12.55.72.318.47.917.117.918.68.321.3
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Interpret your value in context.
  1. Write down the value of the product moment correlation coefficient if the time had been recorded in seconds and the value in pence to the nearest 10p.
OCR S1 Q4
8 marks Moderate -0.3
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54. State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate. \section*{June 2005}
OCR FS1 AS 2021 June Q1
5 marks Moderate -0.3
1 Five observations of bivariate data \(( x , y )\) are given in the table.
\(x\)781264
\(y\)201671723
  1. Find the value of Pearson's product-moment correlation coefficient.
  2. State what your answer to part (a) tells you about a scatter diagram representing the data.
  3. A new variable \(a\) is defined by \(a = 3 x + 4\). Dee says "The value of Pearson's product-moment correlation coefficient between \(a\) and \(y\) will not be the same as the answer to part (a)." State with a reason whether you agree with Dee. An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
    AccountABCDEFGH
    \(p\)1.62.12.42.72.83.35.28.4
    \(q\)1.62.32.22.23.12.97.64.8
    \(n = 8 \quad \Sigma p = 28.5 \quad \Sigma q = 26.7 \quad \Sigma p ^ { 2 } = 136.35 \quad \Sigma q ^ { 2 } = 116.35 \quad \Sigma p q = 116.70\) \includegraphics[max width=\textwidth, alt={}, center]{4c7546b9-03ee-47a1-915f-41e2b4ca19c0-03_762_1248_906_260}
    1. State which, if either, of the variables \(p\) and \(q\) is independent.
    2. Calculate the equation of the regression line of \(q\) on \(p\).
      1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
      2. Give two reasons why this estimate could be considered reliable.
    3. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
OCR Further Statistics 2021 June Q2
12 marks Standard +0.3
2 A book collector compared the prices of some books, \(\pounds x\), when new in 1972 and the prices of copies of the same books, \(\pounds y\), on a second-hand website in 2018.
The results are shown in Table 1 and are summarised below the table. \begin{table}[h]
BookABCDEFGHIJKL
\(x\)0.950.650.700.900.551.401.500.501.150.350.200.35
\(y\)6.067.002.005.874.005.367.192.503.008.291.372.00
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} $$n = 12 , \Sigma x = 9.20 , \Sigma y = 54.64 , \Sigma x ^ { 2 } = 8.9950 , \Sigma y ^ { 2 } = 310.4572 , \Sigma x y = 46.0545$$
  1. It is given that the value of Pearson's product-moment correlation coefficient for the data is 0.381 , correct to 3 significant figures.
    1. State what this information tells you about a scatter diagram illustrating the data.
    2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018.
  2. The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books.
Edexcel S1 2023 June Q2
13 marks Moderate -0.3
Two students, Olive and Shan, collect data on the weight, \(w\) grams, and the tail length, \(t\) cm, of 15 mice. Olive summarised the data as follows \(S_tt = 5.3173\) \quad \(\sum w^2 = 6089.12\) \quad \(\sum tw = 2304.53\) \quad \(\sum w = 297.8\) \quad \(\sum t = 114.8\)
  1. Calculate the value of \(S_{ww}\) and the value of \(S_{tw}\) [3]
  2. Calculate the value of the product moment correlation coefficient between \(w\) and \(t\) [2]
  3. Show that the equation of the regression line of \(w\) on \(t\) can be written as $$w = -16.7 + 4.77t$$ [3]
  4. Give an interpretation of the gradient of the regression line. [1]
  5. Explain why it would not be appropriate to use the regression line in part (c) to estimate the weight of a mouse with a tail length of 2cm. [2]
Shan decided to code the data using \(x = t - 6\) and \(y = \frac{w}{2} - 5\)
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\) [1]
  2. Write down an equation of the regression line of \(y\) on \(x\) You do not need to simplify your equation. [1]
Edexcel S1 2011 June Q1
7 marks Moderate -0.8
On a particular day the height above sea level, \(x\) metres, and the mid-day temperature, \(y\)°C, were recorded in 8 north European towns. These data are summarised below \(S_{xx} = 3\,535\,237.5 \quad \sum y = 181 \quad \sum y^2 = 4305 \quad S_{yy} = -23\,726.25\)
  1. Find \(S_{yy}\). [2]
  2. Calculate, to 3 significant figures, the product moment correlation coefficient for these data. [2]
  3. Give an interpretation of your coefficient. [1]
A student thought that the calculations would be simpler if the height above sea level, \(h\), was measured in kilometres and used the variable \(h = \frac{x}{1000}\) instead of \(x\).
  1. Write down the value of \(S_{hh}\) [1]
  2. Write down the value of the correlation coefficient between \(h\) and \(y\). [1]
Edexcel S1 2002 November Q5
12 marks Standard +0.3
An agricultural researcher collected data, in appropriate units, on the annual rainfall \(x\) and the annual yield of wheat \(y\) at 8 randomly selected places. The data were coded using \(s = x - 6\) and \(t = y - 20\) and the following summations were obtained. \(\Sigma s = 48.5\), \(\Sigma t = 65.0\), \(\Sigma s^2 = 402.11\), \(\Sigma t^2 = 701.80\), \(\Sigma st = 523.23\)
  1. Find the equation of the regression line of \(t\) on \(s\) in the form \(t = p + qs\). [7]
  2. Find the equation of the regression line of \(y\) on \(x\) in the form \(y = a + bx\), giving \(a\) and \(b\) to 3 decimal places. [3]
The value of the product moment correlation coefficient between \(s\) and \(t\) is 0.943, to 3 decimal places.
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\). Give a justification for your answer. [2]
Edexcel S3 Q7
16 marks Standard +0.3
For one of the activities at a gymnastics competition, 8 gymnasts were awarded marks out of 10 for each of artistic performance and technical ability. The results were as follows.
Gymnast\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
Technical ability8.58.69.57.56.89.19.49.2
Artistic performance6.27.58.26.76.07.28.09.1
The value of the product moment correlation coefficient for these data is 0.774.
  1. Stating your hypotheses clearly and using a 1% level of significance, interpret this value. [5]
  2. Calculate the value of the rank correlation coefficient for these data. [6]
  3. Stating your hypotheses clearly and using a 1% level of significance, interpret this coefficient. [3]
  4. Explain why the rank correlation coefficient might be the better one to use with these data. [2]
OCR S1 2010 January Q3
7 marks Moderate -0.8
The heights, \(h\) m, and weights, \(m\) kg, of five men were measured. The results are plotted on the diagram. \includegraphics{figure_3} The results are summarised as follows. \(n = 5\) \(\Sigma h = 9.02\) \(\Sigma m = 377.7\) \(\Sigma h^2 = 16.382\) \(\Sigma m^2 = 28558.67\) \(\Sigma hm = 681.612\)
  1. Use the summarised data to calculate the value of the product moment correlation coefficient, \(r\). [3]
  2. Comment on your value of \(r\) in relation to the diagram. [2]
  3. It was decided to re-calculate the value of \(r\) after converting the heights to feet and the masses to pounds. State what effect, if any, this will have on the value of \(r\). [1]
  4. One of the men had height 1.63 m and mass 78.4 kg. The data for this man were removed and the value of \(r\) was re-calculated using the original data for the remaining four men. State in general terms what effect, if any, this will have on the value of \(r\). [1]
OCR S1 2013 January Q3
12 marks Moderate -0.3
The Gross Domestic Product per Capita (GDP), \(x\) dollars, and the Infant Mortality Rate per thousand (IMR), \(y\), of 6 African countries were recorded and summarised as follows. \(n = 6\) \quad \(\sum x = 7000\) \quad \(\sum x^2 = 8700000\) \quad \(\sum y = 456\) \quad \(\sum y^2 = 36262\) \quad \(\sum xy = 509900\)
  1. Calculate the equation of the regression line of \(y\) on \(x\) for these 6 countries. [4]
The original data were plotted on a scatter diagram and the regression line of \(y\) on \(x\) was drawn, as shown below. \includegraphics{figure_3}
  1. The GDP for another country, Tanzania, is 1300 dollars. Use the regression line in the diagram to estimate the IMR of Tanzania. [1]
  2. The GDP for Nigeria is 2400 dollars. Give two reasons why the regression line is unlikely to give a reliable estimate for the IMR for Nigeria. [2]
  3. The actual value of the IMR for Tanzania is 96. The data for Tanzania (\(x = 1300, y = 96\)) is now included with the original 6 countries. Calculate the value of the product moment correlation coefficient, \(r\), for all 7 countries. [4]
  4. The IMR is now redefined as the infant mortality rate per hundred instead of per thousand, and the value of \(r\) is recalculated for all 7 countries. Without calculation state what effect, if any, this would have on the value of \(r\) found in part (iv). [1]