5.08d Hypothesis test: Pearson correlation

109 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 Specimen Q8
13 marks Moderate -0.8
8 An experiment was conducted to see whether there was any relationship between the maximum tidal current, \(y \mathrm {~cm} \mathrm {~s} ^ { - 1 }\), and the tidal range, \(x\) metres, at a particular marine location. [The tidal range is the difference between the height of high tide and the height of low tide.] Readings were taken over a period of 12 days, and the results are shown in the following table.
\(x\)2.02.43.03.13.43.73.83.94.04.54.64.9
\(y\)15.222.025.233.033.134.251.042.345.050.761.059.2
$$\left[ \Sigma x = 43.3 , \Sigma y = 471.9 , \Sigma x ^ { 2 } = 164.69 , \Sigma y ^ { 2 } = 20915.75 , \Sigma x y = 1837.78 . \right]$$ The scatter diagram below illustrates the data. \includegraphics[max width=\textwidth, alt={}, center]{2fb25fc5-0445-44fa-a23e-647d14b1a376-4_462_793_1464_644}
  1. Calculate the product moment correlation coefficient for the data, and comment briefly on your answer with reference to the appearance of the scatter diagram.
  2. Calculate the equation of the regression line of maximum tidal current on tidal range.
  3. Estimate the maximum tidal current on a day when the tidal range is 4.2 m , and comment briefly on how reliable you consider your estimate is likely to be.
  4. It is suggested that the equation found in part (ii) could be used to predict the maximum tidal current on a day when the tidal range is 15 m . Comment briefly on the validity of this suggestion.
OCR MEI S2 2006 June Q3
18 marks Standard +0.3
3 A student is investigating the relationship between the length \(x \mathrm {~mm}\) and circumference \(y \mathrm {~mm}\) of plums from a large crop. The student measures the dimensions of a random sample of 10 plums from this crop. Summary statistics for these dimensions are as follows. $$\begin{aligned} & \sum x = 4715 \quad \sum y = 13175 \quad \sum x ^ { 2 } = 2237725 \\ & \sum y ^ { 2 } = 17455825 \quad \sum x y = 6235575 \quad n = 10 \end{aligned}$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between length and circumference of plums from this crop. State your hypotheses clearly, defining any symbols which you use.
  3. (A) Explain the meaning of a 5\% significance level.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(5 \%\) significance level in a hypothesis test. The student decides to take another random sample of 10 plums. Using the same hypotheses as in part (ii), the correlation coefficient for this second sample is significant at the \(5 \%\) level. The student decides to ignore the first result and concludes that there is correlation between the length and circumference of plums in the crop.
  4. Comment on the student's decision to ignore the first result. Suggest a better way in which the student could proceed.
OCR MEI S2 2007 June Q2
19 marks Standard +0.3
2 A medical student is trying to estimate the birth weight of babies using pre-natal scan images. The actual weights, \(x \mathrm {~kg}\), and the estimated weights, \(y \mathrm {~kg}\), of ten randomly selected babies are given in the table below.
\(x\)2.612.732.872.963.053.143.173.243.764.10
\(y\)3.22.63.53.12.82.73.43.34.44.1
  1. Calculate the value of Spearman's rank correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) level to determine whether there is positive association between the student's estimates and the actual birth weights of babies in the underlying population.
  3. Calculate the value of the product moment correlation coefficient of the sample. You may use the following summary statistics in your calculations: $$\Sigma x = 31.63 , \quad \Sigma y = 33.1 , \quad \Sigma x ^ { 2 } = 101.92 , \quad \Sigma y ^ { 2 } = 112.61 , \quad \Sigma x y = 106.51 .$$
  4. Explain why, if the underlying population has a bivariate Normal distribution, it would be preferable to carry out a hypothesis test based on the product moment correlation coefficient. Comment briefly on the significance of the product moment correlation coefficient in relation to that of Spearman's rank correlation coefficient.
OCR MEI S2 2008 June Q1
18 marks Standard +0.3
1 A researcher believes that there is a negative correlation between money spent by the government on education and population growth in various countries. A random sample of 48 countries is selected to investigate this belief. The level of government spending on education \(x\), measured in suitable units, and the annual percentage population growth rate \(y\), are recorded for these countries. Summary statistics for these data are as follows. $$\Sigma x = 781.3 \quad \Sigma y = 57.8 \quad \Sigma x ^ { 2 } = 14055 \quad \Sigma y ^ { 2 } = 106.3 \quad \Sigma x y = 880.1 \quad n = 48$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the researcher's belief. State your hypotheses clearly, defining any symbols which you use.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A student suggests that if the variables are negatively correlated then population growth rates can be reduced by increasing spending on education. Explain why the student may be wrong. Discuss an alternative explanation for the correlation.
  5. State briefly one advantage and one disadvantage of using a smaller sample size in this investigation.
Edexcel S1 2016 January Q3
15 marks Moderate -0.3
3. A publisher collects information about the amount spent on advertising, \(\pounds x\), and the sales, \(y\) books, for some of her publications. She collects information for a random sample of 8 textbooks and codes the data using \(v = \frac { x + 50 } { 200 }\) and \(s = \frac { y } { 1000 }\) to give
\(v\)0.608.104.300.401.606.402.505.10
\(s\)1.846.735.951.302.457.464.826.25
[You may use: \(\sum v = 29 \sum s = 36.8 \sum s ^ { 2 } = 209.72 \sum v s = 177.311 \quad \mathrm {~S} _ { v v } = 55.275\) ]
  1. Find \(\mathrm { S } _ { v s }\) and \(\mathrm { S } _ { s s }\)
  2. Calculate the product moment correlation coefficient for these data. The publisher believes that a linear regression model may be appropriate to describe these data.
  3. State, giving a reason, whether or not your answer to part (b) supports the publisher's belief.
  4. Find the equation of the regression line of \(s\) on \(v\), giving your answer in the form \(s = a + b v\)
  5. Hence find the equation of the regression line of \(y\) on \(x\) for the sample of textbooks, giving your answer in the form \(y = c + d x\) The publisher calculated the regression line for a sample of novels and obtained the equation $$y = 3100 + 1.2 x$$ She wants to increase the sales of books by spending more money on advertising.
  6. State, giving your reasons, whether the publisher should spend more money on advertising textbooks or novels.
Edexcel S1 2017 January Q3
17 marks Moderate -0.3
  1. A scientist measured the salinity of water, \(x \mathrm {~g} / \mathrm { kg }\), and recorded the temperature at which the water froze, \(y ^ { \circ } \mathrm { C }\), for 12 different water samples. The summary statistics are listed below.
$$\begin{gathered} \sum x = 504 \quad \sum y = - 27 \quad \sum x ^ { 2 } = 22842 \quad \sum y ^ { 2 } = 62.98 \\ \sum x y = - 1190.7 \quad \mathrm {~S} _ { x x } = 1674 \quad \mathrm {~S} _ { y y } = 2.23 \end{gathered}$$
  1. Find the mean and variance of the recorded temperatures.
    (3) Priya believes that the higher the salinity of water, the higher the temperature at which the water freezes.
    1. Calculate the product moment correlation coefficient between \(x\) and \(y\)
    2. State, with a reason, whether or not this value supports Priya's belief.
  2. Find the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  3. Estimate the temperature at which water freezes when the salinity is \(32 \mathrm {~g} / \mathrm { kg }\) The coding \(w = 1.8 y + 32\) is used to convert the recorded temperatures from \({ } ^ { \circ } \mathrm { C }\) to \({ } ^ { \circ } \mathrm { F }\)
  4. Find an equation of the least squares regression line of \(w\) on \(x\) in the form \(w = c + d x\)
  5. Find
    1. the variance of the recorded temperatures when converted to \({ } ^ { \circ } \mathrm { F }\)
    2. the product moment correlation coefficient between \(w\) and \(x\) \href{http://PhysicsAndMathsTutor.com}{PhysicsAndMathsTutor.com}
Edexcel S1 2018 January Q3
8 marks Moderate -0.8
3. Martin is investigating the relationship between a person's daily caffeine consumption, \(c\) milligrams, and the amount of sleep they get, \(h\) hours, per night. He collected this information from 20 people and the results are summarised below. $$\begin{array} { c c } \sum c = 3660 \quad \sum h = 126 \quad \sum c ^ { 2 } = 973228 \\ \sum c h = 20023.4 \quad S _ { c c } = 303448 \quad S _ { c h } = - 3034.6 \end{array}$$ Martin calculates the product moment correlation coefficient for these data and obtains - 0.833
  1. Give a reason why this value supports a linear relationship between \(c\) and \(h\) The amount of sleep per night is the response variable.
  2. Explain what you understand by the term 'response variable'. Martin says that for each additional 100 mg of caffeine consumed, the expected number of hours of sleep decreases by 1
  3. Determine, by calculation, whether or not the data support this statement.
  4. Use the data to calculate an estimate for the expected number of hours of sleep per night when no caffeine is consumed.
Edexcel S1 2018 January Q5
12 marks Moderate -0.3
5. Franca is the manager of an accountancy firm. She is investigating the relationship between the salary, \(\pounds x\), and the length of commute, \(y\) minutes, for employees at the firm. She collected this information from 9 randomly selected employees. The salary of each employee was then coded using \(w = \frac { x - 20000 } { 1000 }\) The table shows the values of \(w\) and \(y\) for the 9 employees.
\(w\)688- 125153- 219
\(y\)455035652540507520
(You may use \(\sum w = 81 \quad \sum y = 405 \quad \sum w y = 2490 \quad S _ { w w } = 660 \quad S _ { y y } = 2500\) )
  1. Calculate the salary of the employee with \(w = - 2\)
  2. Show that, to 3 significant figures, the value of the product moment correlation coefficient between \(w\) and \(y\) is - 0.899
  3. State, giving a reason, the value of the product moment correlation coefficient between \(x\) and \(y\) The least squares regression line of \(y\) on \(w\) is \(y = 60.75 - 1.75 w\)
  4. Find the equation of the least squares regression line of \(y\) on \(x\) giving your answer in the form \(y = a + b x\)
  5. Estimate the length of commute for an employee with a salary of \(\pounds 21000\) Franca uses the regression line to estimate the length of commute for employees with salaries between \(\pounds 25000\) and \(\pounds 40000\)
  6. State, giving a reason, whether or not these estimates are reliable.
Edexcel S1 2015 June Q7
6 marks Easy -1.8
7. A doctor is investigating the correlation between blood protein, \(p\), and body mass index, \(b\). He takes a random sample of 8 patients and the data are shown in the table below.
Patient\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(b\)3236404442212737
\(p\)1821313921121970
  1. Draw a scatter diagram of these data on the axes provided. \includegraphics[max width=\textwidth, alt={}, center]{36cf6341-1957-45b9-9f7d-0914506f5919-13_938_673_785_614} The doctor decides to leave out patient \(H\) from his calculations.
  2. Give a reason for the doctor's decision. For the 7 patients \(A , B , C , D , E , F\) and \(G\), $$S _ { b p } = 369 , \quad S _ { p p } = 490 \text { and } S _ { b b } = 423 \frac { 5 } { 7 }$$
  3. Find the product moment correlation coefficient, \(r\), for these 7 patients.
  4. Without any further calculations, state how \(r\) would differ from your answer in part (c) if it was calculated for all 8 patients. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{36cf6341-1957-45b9-9f7d-0914506f5919-15_1322_1593_207_173} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} The histogram in Figure 1 summarises the times, in minutes, that 200 people spent shopping in a supermarket.
    1. Give a reason to justify the use of a histogram to represent these data. Given that 40 people spent between 11 and 21 minutes shopping in the supermarket, estimate
    2. the number of people that spent between 18 and 25 minutes shopping in the supermarket,
    3. the median time spent shopping in the supermarket by these 200 people. The mid-point of each bar is represented by \(x\) and the corresponding frequency by f .
    4. Show that \(\sum \mathrm { f } x = 6390\) Given that \(\sum \mathrm { f } x ^ { 2 } = 238430\)
    5. for the data shown in the histogram, calculate estimates of
      1. the mean,
      2. the standard deviation. A coefficient of skewness is given by \(\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }\)
    6. Calculate this coefficient of skewness for these data. The manager of the supermarket decides to model these data with a normal distribution.
    7. Comment on the manager's decision. Give a justification for your answer.
OCR S1 2014 June Q5
9 marks Moderate -0.8
5 Tariq collected information about typical prices, \(\pounds y\) million, of four-bedroomed houses at varying distances, \(x\) miles, from a large city. He chose houses at 10 -mile intervals from the city. His results are shown below.
\(x\)1020304050607080
\(y\)1.21.41.20.90.80.50.50.3
$$n = 8 \quad \Sigma x = 360 \quad \Sigma x ^ { 2 } = 20400 \quad \Sigma y = 6.8 \quad \Sigma y ^ { 2 } = 6.88 \quad \Sigma x y = 241$$
  1. Use an appropriate formula to calculate the product moment correlation coefficient, \(r\), showing that \(- 1.0 < r < - 0.9\).
  2. State what this value of \(r\) shows in this context.
  3. Tariq decides to recalculate the value of \(r\) with the house prices measured in hundreds of thousands of pounds, instead of millions of pounds. State what effect, if any, this will have on the value of \(r\).
  4. Calculate the equation of the regression line of \(y\) on \(x\).
  5. Explain why the regression line of \(y\) on \(x\), rather than \(x\) on \(y\), should be used for estimating a value of \(x\) from a given value of \(y\).
OCR MEI S2 2011 January Q1
17 marks Standard +0.3
1 The scatter diagram below shows the birth rates \(x\), and death rates \(y\), measured in standard units, in a random sample of 14 countries in a particular year. Summary statistics for the data are as follows. $$\Sigma x = 139.8 \quad \Sigma y = 140.4 \quad \Sigma x ^ { 2 } = 1411.66 \quad \Sigma y ^ { 2 } = 1417.88 \quad \Sigma x y = 1398.56 \quad n = 14$$ \includegraphics[max width=\textwidth, alt={}, center]{cd1a8f39-dd3c-44c9-90b0-6a919361d593-2_643_1047_488_550}
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to determine whether there is any correlation between birth rates and death rates.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly in the light of the scatter diagram why it appears that the assumption may be valid.
  4. The values of \(x\) and \(y\) for another country in that year are 14.4 and 7.8 respectively. If these values are included, the value of the sample product moment correlation coefficient is - 0.5694 . Explain why this one observation causes such a large change to the value of the sample product moment correlation coefficient. Discuss whether this brings the validity of the test into question.
OCR MEI S2 2009 June Q1
16 marks Standard +0.3
1 An investment analyst thinks that there may be correlation between the cost of oil, \(x\) dollars per barrel, and the price of a particular share, \(y\) pence. The analyst selects 50 days at random and records the values of \(x\) and \(y\). Summary statistics for these data are shown below, together with a scatter diagram. $$\Sigma x = 2331.3 \quad \Sigma y = 6724.3 \quad \Sigma x ^ { 2 } = 111984 \quad \Sigma y ^ { 2 } = 921361 \quad \Sigma x y = 316345 \quad n = 50$$ \includegraphics[max width=\textwidth, alt={}, center]{ae79cdd9-a57c-490e-a9f3-f47c7c8a1aa6-2_857_905_516_621}
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the analyst's belief. State your hypotheses clearly, defining any symbols which you use.
  3. An assumption that there is a bivariate Normal distribution is required for this test to be valid. State whether it is the sample or the population which is required to have such a distribution. State, with a reason, whether in this case the assumption appears to be justified.
  4. Explain why a 2-tail test is appropriate even though it is clear from the scatter diagram that the sample has a positive correlation coefficient.
OCR MEI S2 2012 June Q1
19 marks Standard +0.3
1 The times, in seconds, taken by ten randomly selected competitors for the first and last sections of an Olympic bobsleigh run are denoted by \(x\) and \(y\) respectively. Summary statistics for these data are as follows. $$\Sigma x = 113.69 \quad \Sigma y = 52.81 \quad \Sigma x ^ { 2 } = 1292.56 \quad \Sigma y ^ { 2 } = 278.91 \quad \Sigma x y = 600.41 \quad n = 10$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any correlation between times taken for the first and last sections of the bobsleigh run.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A commentator says that in order to have a fast time on the last section, you must have a fast time on the first section. Comment briefly on this suggestion.
  5. (A) Would your conclusion in part (ii) have been different if you had carried out the hypothesis test at the \(1 \%\) level rather than the \(10 \%\) level? Explain your answer.
    (B) State one advantage and one disadvantage of using a \(1 \%\) significance level rather than a \(10 \%\) significance level in a hypothesis test.
OCR MEI S2 2013 June Q1
18 marks Standard +0.3
1 Salbutamol is a drug used to improve lung function. In a medical trial, a random sample of 60 people with impaired lung function was selected. The forced expiratory volume in one second (FEV1) was measured for each person, both before being given salbutamol and again after a two-week course of the drug. The variables \(x\) and \(y\), measured in suitable units, represent FEV1 before and after the two-week course respectively. The data are illustrated in the scatter diagram below, together with the summary statistics for these data. \includegraphics[max width=\textwidth, alt={}, center]{f3690bc0-3392-4f29-86f7-797d33fab4f1-2_682_1024_502_516} Summary statistics: $$n = 60 , \quad \sum x = 43.62 , \quad \sum y = 55.15 , \quad \sum x ^ { 2 } = 32.68 , \quad \sum y ^ { 2 } = 51.44 , \quad \sum x y = 40.66$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is positive correlation between FEV1 before and after the course.
  3. State the distributional assumption which is necessary for this test to be valid. State, with a reason, whether the assumption appears to be valid.
  4. Explain the meaning of the term 'significance level'.
  5. Calculate the values of the summary statistics if the data point \(x = 0.55 , y = 1.00\) had been incorrectly recorded as \(x = 1.00 , y = 0.55\).
OCR MEI S2 2014 June Q1
18 marks Standard +0.3
1 A medical student is investigating the claim that young adults with high diastolic blood pressure tend to have high systolic blood pressure. The student measures the diastolic and systolic blood pressures of a random sample of ten young adults. The data are shown in the table and illustrated in the scatter diagram.
Diastolic blood pressure60616263737684879095
Systolic blood pressure98121118114108112132130134139
\includegraphics[max width=\textwidth, alt={}, center]{17e474c4-f5be-4ca1-b7c3-e444b46c3bec-2_865_809_593_628}
  1. Calculate the value of Spearman's rank correlation coefficient for these data.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to examine whether there is positive association between diastolic blood pressure and systolic blood pressure in the population of young adults.
  3. Explain why, in the light of the scatter diagram, it might not be valid to carry out a test based on the product moment correlation coefficient. The product moment correlation coefficient between the diastolic and systolic blood pressures of a random sample of 10 athletes is 0.707 .
  4. Carry out a hypothesis test at the \(1 \%\) significance level to investigate whether there appears to be positive correlation between these two variables in the population of athletes. You may assume that in this case such a test is valid.
CAIE FP2 2010 June Q9
10 marks Moderate -0.3
9
  1. The following are values of the product moment correlation coefficient between the \(x\) and \(y\) values of three different large samples of bivariate data. State what each indicates about the appearance of a scatter diagram illustrating the data.
    1. - 1 ,
    2. 0.02 ,
    3. 0.92 .
  2. In 1852 Dr William Farr published data on deaths due to cholera during an outbreak of the disease in London. The table shows the altitude (in feet, above the level of the river Thames) at which people lived and the corresponding number of deaths from cholera per 10000 people.
    Altitude, \(x\)1030507090100350
    Number of deaths, \(y\)10265342722178
    $$\left[ \Sigma x = 700 , \Sigma x ^ { 2 } = 149000 , \Sigma y = 275 , \Sigma y ^ { 2 } = 17351 , \Sigma x y = 13040 . \right]$$
    1. Calculate the product moment correlation coefficient.
    2. Test, at the \(5 \%\) significance level, whether there is evidence of negative correlation.
CAIE FP2 2011 June Q10
10 marks Standard +0.3
10 The mid-day temperature, \(x ^ { \circ } \mathrm { C }\), and the amount of sunshine, \(y\) hours, were recorded at a winter holiday resort on each of 12 days, chosen at random during the winter season. The results are summarised as follows. $$\Sigma x = 18.7 \quad \Sigma x ^ { 2 } = 106.43 \quad \Sigma y = 34.7 \quad \Sigma y ^ { 2 } = 133.43 \quad \Sigma x y = 92.01$$
  1. Find the product moment correlation coefficient for the data.
  2. Stating your hypotheses, test at the \(1 \%\) significance level whether there is a non-zero correlation between mid-day temperature and amount of sunshine.
  3. Use the equation of a suitable regression line to estimate the number of hours of sunshine on a day when the mid-day temperature is \(2 ^ { \circ } \mathrm { C }\).
CAIE FP2 2011 June Q9
11 marks Standard +0.3
9 The marks achieved by a random sample of 15 college students in a Physics examination ( \(x\) ) and in a General Studies examination (y) are summarised as follows. $$\Sigma x = 752 \quad \Sigma x ^ { 2 } = 38814 \quad \Sigma y = 773 \quad \Sigma y ^ { 2 } = 45351 \quad \Sigma x y = 40236$$
  1. Find the mean values, \(\bar { x }\) and \(\bar { y }\).
  2. Another college student achieved a mark of 56 in the General Studies examination, but was unable to take the Physics examination. Use the equation of a suitable regression line to estimate the mark that the student would have obtained in the Physics examination.
  3. Find the product moment correlation coefficient for the given data.
  4. Stating your hypotheses, test at the \(5 \%\) level of significance whether there is a non-zero product moment correlation coefficient between examination marks in Physics and in General Studies achieved by college students.
CAIE FP2 2012 June Q11 OR
Challenging +1.2
For a random sample of 5 pairs of values of \(x\) and \(y\), the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\) are respectively $$y = - 0.5 x + 5 \quad \text { and } \quad x = - 1.2 y + 7.6$$ Find the value of the product moment correlation coefficient for this sample. Test, at the \(5 \%\) significance level, whether the population product moment correlation coefficient differs from zero. The following table shows the sample data.
\(x\)1255\(p\)
\(y\)5342\(q\)
Find the values of \(p\) and \(q\).
CAIE FP2 2013 June Q5
4 marks Standard +0.3
5 For a random sample of 12 observations of pairs of values \(( x , y )\), the product moment correlation coefficient is - 0.456 . Test, at the \(5 \%\) significance level, whether there is evidence of negative correlation between the variables.
CAIE FP2 2014 June Q10
11 marks Standard +0.3
10 Samples of rock from a number of geological sites were analysed for the quantities of two types, \(X\) and \(Y\), of rare minerals. The results, in milligrams, for 10 randomly chosen samples, each of 10 kg , are summarised as follows. $$\Sigma x = 866 \quad \Sigma x ^ { 2 } = 121276 \quad \Sigma y = 639 \quad \Sigma y ^ { 2 } = 55991 \quad \Sigma x y = 73527$$ Find the product moment correlation coefficient. Stating your hypotheses, test at the \(5 \%\) significance level whether there is non-zero correlation between quantities of the two rare minerals. Find the equation of the regression line of \(x\) on \(y\) in the form \(x = p y + q\), where \(p\) and \(q\) are constants to be determined.
CAIE FP2 2015 June Q8
8 marks Standard +0.3
8
  1. For a random sample of ten pairs of values of \(x\) and \(y\) taken from a bivariate distribution, the equations of the regression lines of \(y\) on \(x\) and of \(x\) on \(y\) are, respectively, $$y = 0.38 x + 1.41 \quad \text { and } \quad x = 0.96 y + 7.47$$
    1. Find the value of the product moment correlation coefficient for this sample.
    2. Using a \(5 \%\) significance level, test whether there is positive correlation between the variables.
  2. For a random sample of \(n\) pairs of values of \(u\) and \(v\) taken from another bivariate distribution, the value of the product moment correlation coefficient is 0.507 . Using a test at the \(5 \%\) significance level, there is evidence of non-zero correlation between the variables. Find the least possible value of \(n\).
CAIE FP2 2015 June Q7
11 marks Standard +0.8
7 For a random sample of 10 observations of pairs of values \(( x , y )\), the equation of the regression line of \(y\) on \(x\) is \(y = 3.25 x - 4.27\). The sum of the ten \(x\) values is 15.6 and the product moment correlation coefficient for the sample is 0.56 . Find the equation of the regression line of \(x\) on \(y\). Test, at the \(5 \%\) significance level, whether there is evidence of non-zero correlation between the variables.
CAIE FP2 2018 June Q11 OR
Standard +0.8
The regression line of \(y\) on \(x\), obtained from a random sample of 6 pairs of values of \(x\) and \(y\), has equation $$y = 0.25 x + k$$ where \(k\) is a constant. The values from the sample are shown in the following table.
\(x\)45781014
\(y\)58\(p\)7\(p\)9
  1. Find the value of \(p\) and the value of \(k\).
  2. Find the product moment correlation coefficient for the data.
  3. Test, at the \(5 \%\) significance level, whether there is evidence of positive correlation between the variables.
    If you use the following lined page to complete the answer(s) to any question(s), the question number(s) must be clearly shown.
CAIE FP2 2018 June Q6
6 marks Standard +0.3
6 A random sample of 15 observations of pairs of values of two variables gives a product moment correlation coefficient of 0.430 .
  1. Test at the \(10 \%\) significance level whether there is evidence of non-zero correlation between the variables.
    A second random sample of \(N\) observations gives a product moment correlation coefficient of 0.615 . Using a 5\% significance level, there is evidence of positive correlation between the variables.
  2. Find the least possible value of \(N\), justifying your answer.