5.08a Pearson correlation: calculate pmcc

246 questions

Sort by: Default | Easiest first | Hardest first
OCR FS1 AS 2018 March Q5
4 marks Easy -1.8
5 The speed \(v \mathrm {~ms} ^ { - 1 }\) of a car at time \(t\) seconds after it starts to accelerate was measured at 1 -second intervals. The results are shown in the following diagram. \includegraphics[max width=\textwidth, alt={}, center]{d5843350-52f9-4fed-adf4-86ceb958033f-3_661_1186_1078_443}
  1. State whether \(t\) or \(v\) or neither is a controlled variable. The value of the product moment correlation coefficient \(r\) for the data is 0.987 correct to 3 significant figures.
  2. The speed of the car is converted to miles per hour and the time to minutes. State the value of \(r\) for the converted data.
  3. State the value of Spearman's rank correlation coefficient \(r _ { s }\) for the data.
  4. What information does \(r\) give about the data that is not given by \(r _ { s }\) ?
OCR Further Statistics 2018 September Q1
4 marks Moderate -0.8
1 An experiment involves releasing a coin on a sloping plane so that it slides down the slope and then slides along a horizontal plane at the bottom of the slope before coming to rest. The angle \(\theta ^ { \circ }\) of the sloping plane is varied, and for each value of \(\theta\), the distance \(d \mathrm {~cm}\) the coin slides on the horizontal plane is recorded. A scatter diagram to illustrate the results of the experiment is shown below, together with the least squares regression line of \(d\) on \(\theta\). \includegraphics[max width=\textwidth, alt={}, center]{28c6a0d9-09a6-4743-af0e-fe2e43e256c9-2_639_972_561_548}
  1. State which two of the following correctly describe the variable \(\theta\).
    Controlled variableCorrelation coefficient
    Dependent variableIndependent variable
    Response variableRegression coefficient
    The least squares regression line of \(d\) on \(\theta\) has equation \(d = 1.96 + 0.11 \theta\).
  2. Use the diagram in the Printed Answer Booklet to explain the term "least squares".
  3. State what difference, if any, it would make to the equation of the regression line if \(d\) were measured in inches rather than centimetres. ( 1 inch \(\approx 2.54 \mathrm {~cm}\) ).
OCR Further Statistics 2018 September Q7
11 marks Standard +0.3
7 The table shows the values of 5 observations of bivariate data \(( x , y )\).
\(x\)4.65.96.57.88.3
\(y\)15.610.810.410.19.7
$$n = 5 , \Sigma x = 33.1 , \Sigma y = 56.6 , \Sigma x ^ { 2 } = 227.95 , \Sigma y ^ { 2 } = 664.26 , \Sigma x y = 362.37$$
  1. Calculate Pearson's product-moment correlation coefficient \(r\) for the data.
  2. State what this value of \(r\) tells you about a scatter diagram illustrating the data.
  3. Test at the \(5 \%\) significance level whether there is association between \(x\) and \(y\).
  4. State the value of Spearman's rank correlation coefficient \(r _ { s }\) for the data.
  5. State whether \(r , r _ { s }\), or both or neither is changed when the values of \(x\) are replaced by
    1. \(3 x - 2\),
    2. \(\sqrt { x }\).
OCR Further Statistics 2018 December Q5
10 marks Moderate -0.3
5 The birth rate, \(x\) per thousand members of the population, and the life expectancy at birth, \(y\) years, in 14 randomly selected African countries are given in the table.
Country\(x\)\(y\)Country\(x\)\(y\)
Benin4.859.2Mozambique5.454.63
Cameroon4.754.87Nigeria5.752.29
Congo4.961.42Senegal5.165.81
Gambia5.759.83Somalia6.554.88
Liberia4.760.25Sudan4.463.08
Malawi5.160.97Uganda5.857.25
Mauretania4.662.77Zambia5.458.75
\(n = 14 , \sum x = 72.8 , \sum y = 826 , \sum x ^ { 2 } = 392.96 , \sum y ^ { 2 } = 48924.54 , \sum x y = 4279.16\)
  1. Calculate Pearson's product-moment correlation coefficient \(r\) for the data.
  2. State what would be the effect on the value of \(r\) if the birth rate were given per hundred and not per thousand.
  3. Explain what the sign of \(r\) tells you about the relationship between life expectancy and birth rate for these countries.
  4. Test at the \(5 \%\) significance level whether there is correlation between birth rate and life expectancy at birth in African countries.
  5. A researcher wants to estimate the life expectancy at birth in Zimbabwe, where the birth rate is 3.9 per thousand. Explain whether a reliable estimate could be obtained using the regression line of \(y\) on \(x\) for the given data.
Edexcel S1 2022 January Q2
6 marks Moderate -0.8
2. Tom's car holds 50 litres of petrol when the fuel tank is full. For each of 10 journeys, each starting with 50 litres of petrol in the fuel tank, Tom records the distance travelled, \(d\) kilometres, and the amount of petrol used, \(p\) litres. The summary statistics for the 10 journeys are given below. $$\sum d = 1029 \quad \sum p = 50.8 \quad \sum d p = 5240.8 \quad \mathrm {~S} _ { d d } = 344.9 \quad \mathrm {~S} _ { p p } = 0.576$$
  1. Calculate the product moment correlation coefficient between \(d\) and \(p\) The amount of petrol remaining in the fuel tank for each journey, \(w\) litres, is recorded.
    1. Write down an equation for \(w\) in terms of \(p\)
    2. Hence, write down the value of the product moment correlation coefficient between \(w\) and \(p\)
  2. Write down the value of the product moment correlation coefficient between \(d\) and \(w\)
Edexcel S1 2017 June Q2
11 marks Easy -1.2
2. The box plot shows the times, \(t\) minutes, it takes a group of office workers to travel to work. \includegraphics[max width=\textwidth, alt={}, center]{7d45bacd-20ac-49b4-8f3f-613edf3739f9-04_365_1237_351_356}
  1. Find the range of the times.
  2. Find the interquartile range of the times.
  3. Using the quartiles, describe the skewness of these data. Give a reason for your answer. Chetna believes that house prices will be higher if the time to travel to work is shorter. She asks a random sample of these office workers for their house prices \(\pounds x\), where \(x\) is measured in thousands, and obtains the following statistics $$\mathrm { S } _ { x x } = 5514 \quad \mathrm {~S} _ { x t } = 10 \quad \mathrm {~S} _ { t t } = 1145.6$$
  4. Calculate the product moment correlation coefficient between \(x\) and \(t\).
  5. State, giving a reason, whether or not your correlation coefficient supports Chetna's belief. Adam and Betty are part of the group of office workers and they have both moved house. Adam's time to travel to work changes from 32 minutes to 36 minutes. Betty's time to travel to work changes from 38 minutes to 58 minutes. Outliers are defined as values that are more than 1.5 times the interquartile range above the upper quartile.
  6. Showing all necessary calculations, determine how the box plot of times to travel to work will change and draw a new box plot on the grid on page 5. \includegraphics[max width=\textwidth, alt={}, center]{7d45bacd-20ac-49b4-8f3f-613edf3739f9-05_499_1413_2122_180}
Edexcel S1 2017 June Q5
15 marks Moderate -0.3
  1. Tomas is studying the relationship between temperature and hours of sunshine in Seapron. He records the midday temperature, \(t ^ { \circ } \mathrm { C }\), and the hours of sunshine, \(s\) hours, for a random sample of 9 days in October. He calculated the following statistics
$$\sum s = 15 \quad \sum s ^ { 2 } = 44.22 \quad \sum t = 127 \quad \mathrm {~S} _ { t t } = 10.89$$
  1. Calculate \(\mathrm { S } _ { s s }\) Tomas calculated the product moment correlation coefficient between \(s\) and \(t\) to be 0.832 correct to 3 decimal places.
  2. State, giving a reason, whether or not this correlation coefficient supports the use of a linear regression model to describe the relationship between midday temperature and hours of sunshine.
  3. State, giving a reason, why the hours of sunshine would be the explanatory variable in a linear regression model between midday temperature and hours of sunshine.
  4. Find \(\mathrm { S } _ { s t }\)
  5. Calculate a suitable linear regression equation to model the relationship between midday temperature and hours of sunshine.
  6. Calculate the standard deviation of \(s\) Tomas uses this model to estimate the midday temperature in Seapron for a day in October with 5 hours of sunshine.
  7. State the value of Tomas' estimate. Given that the values of \(s\) are all within 2 standard deviations of the mean,
  8. comment, giving your reason, on the reliability of this estimate.
AQA S1 2005 January Q1
7 marks Moderate -0.3
1 Each Monday, Azher has a stall at a town's outdoor market. The table below shows, for each of a random sample of 10 Mondays during 2003, the air temperature, \(x ^ { \circ } \mathrm { C }\), at 9 am and Azher's takings, £y.
Monday\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)
\(\boldsymbol { x }\)2691813712134
\(\boldsymbol { y }\)9710313624512178145128141312
  1. A scatter diagram of these data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{7faa4a2d-f5cc-4cc3-a3a9-5d8290ceabdc-2_901_1068_1078_447} Give two distinct comments, in context, on what this diagram reveals.
  2. One of the Mondays is found to be Easter Monday, the busiest Monday market of the year. Identify which Monday this is most likely to be.
  3. Removing the data for the Monday you identified in part (b), calculate the value of the product moment correlation coefficient for the remaining 9 pairs of values of \(x\) and \(y\).
  4. Name one other variable that would have been likely to affect Azher's takings at this town's outdoor market.
    (l mark)
AQA S1 2010 January Q7
13 marks Standard +0.3
7 [Figure 1, printed on the insert, is provided for use in this question.]
Harold considers himself to be an expert in assessing the auction value of antiques. He regularly visits car boot sales to buy items that he then sells at his local auction rooms. Harold's father, Albert, who is not convinced of his son's expertise, collects the following data from a random sample of 12 items bought by Harold.
ItemPurchase price (£ \(\boldsymbol { x }\) )Auction price (£ y)
A2030
B3545
C1825
D5050
E4538
F5545
G4350
H8190
I9085
J30190
K5765
L11225
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
    1. On Figure 1, complete the scatter diagram for these data.
    2. Comment on what this reveals.
  3. When items J and L are omitted from the data, it is found that $$S _ { x x } = 4854.4 \quad S _ { y y } = 4216.1 \quad S _ { x y } = 4268.8$$
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\) for the remaining 10 items.
    2. Hence revise as necessary your interpretation in part (b).
AQA S1 2005 June Q1
6 marks Easy -1.2
1 For each of a random sample of 10 customers, a store records the time, \(x\) minutes, spent shopping and the value, \(\pounds y\), to the nearest 10 p, of items purchased. The results are tabulated below.
Time (x)1345109172316216
Value (y)12.55.72.318.47.917.117.918.68.321.3
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Interpret your value in context.
  1. Write down the value of the product moment correlation coefficient if the time had been recorded in seconds and the value in pence to the nearest 10p.
AQA S1 2006 June Q1
8 marks Moderate -0.3
1 The table shows, for each of a random sample of 8 paperback fiction books, the number of pages, \(x\), and the recommended retail price, \(\pounds y\), to the nearest 10 p.
\(\boldsymbol { x }\)223276374433564612704766
\(\boldsymbol { y }\)6.504.005.508.004.505.008.005.50
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Interpret your value in the context of this question.
    3. Suggest one other variable, in addition to the number of pages, which may affect the recommended retail price of a paperback fiction book.
  1. The same 8 books were later included in a book sale. The value of the product moment correlation coefficient between the number of pages and the sale price was 0.959 , correct to three decimal places. What can be concluded from this value?
AQA S1 2015 June Q3
11 marks Moderate -0.5
3 Fourteen candidates each sat two test papers, Paper 1 and Paper 2, on the same day. The marks, out of a total of 50, achieved by the students on each paper are shown in the table.
AQA S1 2015 June Q1
4 marks Moderate -0.8
1
The table shows the annual gas consumption, \(x \mathrm { kWh }\), and the annual electricity consumption, \(y \mathrm { kWh }\), for a sample of 10 bungalows of similar size and occupancy.
\(\boldsymbol { x }\)21371185211522217312198542356120738221111789724523
\(\boldsymbol { y }\)2281232722212378278728563078264725662559
$$S _ { x x } = 76581640 \quad S _ { y y } = 694250 \quad S _ { x y } = 3629670$$
  1. Calculate the value of \(r _ { x y }\), the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value of \(r _ { x y }\) in the context of this question.
AQA S1 2015 June Q4
15 marks Moderate -0.3
4 Stephan is a roofing contractor who is often required to replace loose ridge tiles on house roofs. In order to help him to quote more accurately the prices for such jobs in the future, he records, for each of 11 recently repaired roofs, the number of ridge tiles replaced, \(x _ { i }\), and the time taken, \(y _ { i }\) hours. His results are shown in the table.
Roof \(( \boldsymbol { i } )\)\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)\(\mathbf { 1 1 }\)
\(\boldsymbol { x } _ { \boldsymbol { i } }\)811141416202222252730
\(\boldsymbol { y } _ { \boldsymbol { i } }\)5.05.26.37.28.08.810.611.011.812.113.0
  1. The pairs of data values for roofs 1 to 7 are plotted on the scatter diagram shown on the opposite page. Plot the 4 pairs of data values for roofs 8 to 11 on the scatter diagram.
    1. Calculate the equation of the least squares regression line of \(y _ { i }\) on \(x _ { i }\), and draw your line on the scatter diagram.
    2. Interpret your values for the gradient and for the intercept of this regression line.
  2. Estimate the time that it would take Stephan to replace 15 loose ridge tiles on a house roof.
  3. Given that \(r _ { i }\) denotes the residual for the point representing roof \(i\) :
    1. calculate the value of \(r _ { 6 }\);
    2. state why the value of \(\sum _ { i = 1 } ^ { 11 } r _ { i }\) gives no useful information about the connection between the number of ridge tiles replaced and the time taken.
      [0pt] [1 mark]
      \section*{Answer space for question 4}
      \includegraphics[max width=\textwidth, alt={}]{6fbb8891-e6de-42fe-a195-ea643552fdcf-11_2385_1714_322_155}
OCR S1 Q4
8 marks Moderate -0.3
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54. State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate. \section*{June 2005}
OCR S1 Q8
13 marks Moderate -0.3
8 The table shows the population, \(x\) million, of each of nine countries in Western Europe together with the population, \(y\) million, of its capital city.
GermanyUnited KingdomFranceItalySpainThe NetherlandsPortugalAustriaSwitzerland
\(x\)82.159.259.156.739.215.99.98.17.3
\(y\)3.57.09.02.72.90.80.71.60.1
$$\left[ n = 9 , \Sigma x = 337.5 , \Sigma x ^ { 2 } = 18959.11 , \Sigma y = 28.3 , \Sigma y ^ { 2 } = 161.65 , \Sigma x y = 1533.76 . \right]$$
  1. (a) Calculate Spearman's rank correlation coefficient, \(r _ { s }\).
    (b) Explain what your answer indicates about the populations of these countries and their capital cities.
  2. Calculate the product moment correlation coefficient, \(r\). The data are illustrated in the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{11316ea6-3999-4003-b77d-bee8b547c1da-09_936_881_1162_632}
  3. By considering the diagram, state the effect on the value of the product moment correlation coefficient, \(r\), if the data for France and the United Kingdom were removed from the calculation.
  4. In a certain country in Africa, most people live in remote areas and hence the population of the country is unknown. However, the population of the capital city is known to be approximately 1 million. An official suggests that the population of this country could be estimated by using a regression line drawn on the above scatter diagram.
    (a) State, with a reason, whether the regression line of \(y\) on \(x\) or the regression line of \(x\) on \(y\) would need to be used.
    (b) Comment on the reliability of such an estimate in this situation. 1 Some observations of bivariate data were made and the equations of the two regression lines were found to be as follows. $$\begin{array} { c c } y \text { on } x : & y = - 0.6 x + 13.0 \\ x \text { on } y : & x = - 1.6 y + 21.0 \end{array}$$
  5. State, with a reason, whether the correlation between \(x\) and \(y\) is negative or positive.
  6. Neither variable is controlled. Calculate an estimate of the value of \(x\) when \(y = 7.0\).
  7. Find the values of \(\bar { x }\) and \(\bar { y }\). 2 A bag contains 5 black discs and 3 red discs. A disc is selected at random from the bag. If it is red it is replaced in the bag. If it is black, it is not replaced. A second disc is now selected at random from the bag. Find the probability that
  8. the second disc is black, given that the first disc was black,
  9. the second disc is black,
  10. the two discs are of different colours. 3 Each of the 7 letters in the word DIVIDED is printed on a separate card. The cards are arranged in a row.
  11. How many different arrangements of the letters are possible?
  12. In how many of these arrangements are all three Ds together? The 7 cards are now shuffled and 2 cards are selected at random, without replacement.
  13. Find the probability that at least one of these 2 cards has D printed on it. 4
  14. The random variable \(X\) has the distribution \(\mathrm { B } ( 25,0.2 )\). Using the tables of cumulative binomial probabilities, or otherwise, find \(\mathrm { P } ( X \geqslant 5 )\).
  15. The random variable \(Y\) has the distribution \(\mathrm { B } ( 10,0.27 )\). Find \(\mathrm { P } ( Y = 3 )\).
  16. The random variable \(Z\) has the distribution \(B ( n , 0.27 )\). Find the smallest value of \(n\) such that \(\mathrm { P } ( Z \geqslant 1 ) > 0.95\). 5 The probability distribution of a discrete random variable, \(X\), is given in the table.
    \(x\)0123
    \(\mathrm { P } ( X = x )\)\(\frac { 1 } { 3 }\)\(\frac { 1 } { 4 }\)\(p\)\(q\)
    It is given that the expectation, \(\mathrm { E } ( X )\), is \(1 \frac { 1 } { 4 }\).
  17. Calculate the values of \(p\) and \(q\).
  18. Calculate the standard deviation of \(X\).
AQA S3 2006 June Q2
7 marks Standard +0.3
2 The table below shows the heart rates, \(x\) beats per minute, and the systolic blood pressures, \(y\) milligrams of mercury, of a random sample of 10 patients undergoing kidney dialysis.
Patient\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)\(\mathbf { 9 }\)\(\mathbf { 1 0 }\)
\(\boldsymbol { x }\)838688929498101111115121
\(\boldsymbol { y }\)157172161154171169179180192182
  1. Calculate the value of the product moment correlation coefficient for these data.
  2. Assuming that these data come from a bivariate normal distribution, investigate, at the \(1 \%\) level of significance, the claim that, for patients undergoing kidney dialysis, there is a positive correlation between heart rate and systolic blood pressure.
AQA AS Paper 2 2019 June Q12
1 marks Easy -1.8
12 Manny is studying the price and number of pages of a random sample of books.
He calculates the value of the product moment correlation coefficient between the price and number of pages in each book as 1.05 Which of the following best describes the value 1.05 ?
Tick ( \(\checkmark\) ) one box.
definitely correct □
probably correct □
probably incorrect □
definitely incorrect □ \includegraphics[max width=\textwidth, alt={}, center]{b45dc98e-1699-47c9-9228-5abe0e5c9195-15_2488_1716_219_153}
OCR MEI Further Statistics Major Specimen Q3
11 marks Standard +0.3
3 A researcher is investigating factors that might affect how many hours per day different species of mammals spend asleep. First she investigates human beings. She collects data on body mass index, \(x\), and hours of sleep, \(y\), for a random sample of people. A scatter diagram of the data is shown in Fig. 3.1 together with the regression line of \(y\) on \(x\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-04_885_1584_598_274} \captionsetup{labelformat=empty} \caption{Fig. 3.1}
\end{figure}
  1. Calculate the residual for the data point which has the residual with the greatest magnitude.
  2. Use the equation of the regression line to estimate the mean number of hours spent asleep by a person with body mass index
    (A) 26,
    (B) 16,
    commenting briefly on each of your predictions. The researcher then collects additional data for a large number of species of mammals and analyses different factors for effect size. Definitions of the variables measured for a typical animal of the species, the correlations between these variables, and guidelines often used when considering effect size are given in Fig. 3.2.
    VariableDefinition
    Body massMass of animal in kg
    Brain massMass of brain in g
    Hours of sleep/dayNumber of hours per day spent asleep
    Life spanHow many years the animal lives
    DangerA measure of how dangerous the animal's situation is when asleep, taking into account predators and how protected the animal's den is: higher value indicates greater danger.
    Correlations (pmcc)Body MassBrain MassHours of sleep/dayLife spanDanger
    Body Mass1.00
    Brain Mass0.931.00
    Hours of sleep/day-0.31-0.361.00
    Life span0.300.51-0.411.00
    Danger0.130.15-0.590.061.00
    \begin{table}[h]
    Product moment
    correlation coefficient
    Effect size
    0.1Small
    0.3Medium
    0.5Large
    \captionsetup{labelformat=empty} \caption{Fig. 3.2}
    \end{table}
  3. State two conclusions the researcher might draw from these tables, relevant to her investigation into how many hours mammals spend asleep. One of the researcher's students notices the high correlation between body mass and brain mass and produces a scatter diagram for these two variables, shown in Fig. 3.3 below. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{e6ee3a4a-3e76-4422-9a78-17b64b458f83-05_675_698_1802_735} \captionsetup{labelformat=empty} \caption{Fig. 3.3}
    \end{figure}
  4. Comment on the suitability of a linear model for these two variables.
WJEC Unit 4 Specimen Q5
7 marks Moderate -0.3
5. A hotel owner in Cardiff is interested in what factors hotel guests think are important when staying at a hotel. From a hotel booking website he collects the ratings for 'Cleanliness', 'Location', 'Comfort' and 'Value for money' for a random sample of 17 Cardiff hotels.
(Each rating is the average of all scores awarded by guests who have contributed reviews using a scale from 1 to 10 , where 10 is 'Excellent'.) The scatter graph shows the relationship between 'Value for money' and 'Cleanliness' for the sample of Cardiff hotels. \includegraphics[max width=\textwidth, alt={}, center]{b35e94ab-a426-4fca-9ecb-c659e0143ed7-4_693_1033_749_516}
  1. The product moment correlation coefficient for 'Value for money' and 'Cleanliness' for the sample of 17 Cardiff hotels is 0.895 . Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether this correlation is significant. State your conclusion in context.
  2. The hotel owner also wishes to investigate whether 'Value for money' has a significant correlation with 'Cost per night'. He used a statistical analysis package which provided the following output which includes the Pearson correlation coefficient of interest and the corresponding \(p\)-value.
    Value for moneyCost per night
    Value for money1
    Cost per night
    0.047
    \(( 0.859 )\)
    1
    Comment on the correlation between 'Value for money' and 'Cost per night'.
OCR FS1 AS 2021 June Q2
6 marks Moderate -0.8
2 In the manufacture of fibre optical cable (FOC), flaws occur randomly. Whether any point on a cable is flawed is independent of whether any other point is flawed. The number of flaws in 100 m of FOC of standard diameter is denoted by \(X\).
  1. State a further assumption needed for \(X\) to be well modelled by a Poisson distribution. Assume now that \(X\) can be well modelled by the distribution \(\operatorname { Po } ( 0.7 )\).
  2. Find the probability that in 300 m of FOC of standard diameter there are exactly 3 flaws. The number of flaws in 100 m of FOC of a larger diameter has the distribution \(\mathrm { Po } ( 1.6 )\).
  3. Find the probability that in 200 m of FOC of standard diameter and 100 m of FOC of the larger diameter the total number of flaws is at least 4 . Judith believes that mathematical ability and chess-playing ability are related. She asks 20 randomly chosen chess players, with known British Chess Federation (BCF) ratings \(X\), to take a mathematics aptitude test, with scores \(Y\). The results are summarised as follows. $$n = 20 , \Sigma x = 3600 , \Sigma x ^ { 2 } = 660500 , \Sigma y = 1440 , \Sigma y ^ { 2 } = 105280 , \Sigma x y = 260990$$
    1. Calculate the value of Pearson's product-moment correlation coefficient \(r\).
    2. State an assumption needed to be able to carry out a significance test on the value of \(r\).
    3. Assume now that the assumption in part (b) is valid. Test at the \(5 \%\) significance level whether there is evidence that chess players with higher BCF ratings are better at mathematics.
    4. There are two different grading systems for chess players, the BCF system and the international ELO system. The two sets of ratings are related by $$\text { ELO rating } = 8 \times \text { BCF rating } + 650$$ Magnus says that the experiment should have used ELO ratings instead of BCF ratings. Comment on Magnus's suggestion.
    5. Calculate the value of Pearson's product-moment correlation coefficient \(r\).
    6. State an assumption needed to be able to carry out a significance test on the value of \(r\).
    7. Assume now that the assumption in part (b) is valid. Test at the \(5 \%\) significance level whether there is evidence that chess players with higher BCF ratings are better at mathematics.
    8. There are two different grading systems for chess players, the BCF system and the international ELO system. The two sets of ratings are related by $$\mathrm { ELO } \text { rating } = 8 \times \mathrm { BCF } \text { rating } + 650 .$$ Magnus says that the experiment should have used ELO ratings instead of BCF ratings. Comment on Magnus's suggestion. An environmentalist measures the mean concentration, \(c\) milligrams per litre, of a particular chemical in a group of rivers, and the mean mass, \(m\) pounds, of fish of a certain species found in those rivers. The results are given in the table.
      QuestionAnswerMarksAOGuidance
      1(a)\(\begin{aligned}0.25 + 0.36 + x + x ^ { 2 } = 1
      x ^ { 2 } + x - 0.39 = 0
      x = 0.3 \text { (or } - 1.3 \text { ) }
      x \text { cannot be negative }
      \mathrm { E } ( W ) = 2.23
      \mathrm { E } \left( W ^ { 2 } \right) = \Sigma w ^ { 2 } \mathrm { p } ( w ) \quad [ = 5.83 ]
      \text { Subtract } [ \mathrm { E } ( W ) ] ^ { 2 } \text { to get } \mathbf { 0 . 8 5 7 1 } \end{aligned}\)\(\begin{gathered} \text { M1 }
      \text { A1 }
      \text { A1 }
      \text { B1ft }
      \text { B1 }
      \text { M1 }
      \text { A1 }
      { [ 7 ] } \end{gathered}\)
      3.1a
      1.1b
      1.1b
      2.3
      1.1b
      1.1
      2.1
      Equation using \(\Sigma p = 1\)
      Correct simplified quadratic Correctly obtain \(x = 0.3\)
      Explicitly reject other solution
      2.23 or exact equivalent only Use \(\Sigma w ^ { 2 } \mathrm { p } ( w )\)
      Correctly obtain given answer, www
      Can be implied
      Method needed ft on their quadratic Allow for \(\mathrm { E } ( W ) ^ { 2 } = 4.9729\)
      Need 2.23 or 4.9729 and 5.83 or full numerical \(\Sigma w ^ { 2 } \mathrm { p } ( w )\)
      1(b)\(9 \times 0.8571 = 7.7139\)
      B1
      [1]
      1.1bAllow 7.71 or 7.714
      2(a)Flaws must occur at constant average rate (uniform rate)
      B1
      [1]
      1.2
      Context (e.g. "flaws") needed
      Extra answers, e.g. "singly": B0
      Not "constant rate" or "average constant rate".
      2(b)\(\operatorname { Po(2.1)~or~ } e ^ { - \lambda } \frac { \lambda ^ { 3 } } { 3 ! }\)
      M1
      A1
      [2]
      1.1
      1.1b
      Po(2.1) stated or implied, or formula with \(\lambda = 2.1\) stated Awrt 0.189
      2(c)
      Po(3)
      \(1 - \mathrm { P } ( \leq 3 )\)
      M1
      M1
      A1
      [3]
      1.1
      1.1
      1.1b
      \(\operatorname { Po } ( 2 \times 0.7 + 1.6 )\) stated or implied
      Allow \(1 - \mathrm { P } ( \leq 4 ) = 0.1847\), or from wrong \(\lambda\)
      Awrt 0.353
      Or all combinations \(\leq 3\)
      \(1 -\) above, not just \(= 3\)
      QuestionAnswerMarksAOGuidance
      3(a)0.4(00)
      B2
      [2]
      1.1
      1.1b
      SC: if B0, give SC B1 for two of \(S _ { x x } = 12500 , S _ { y y } = 1600 , S _ { x y } = 1790\) and \(S _ { x y } / \sqrt { } \left( S _ { x x } S _ { y y } \right)\)Also allow SC B1 for equivalent methods using Covariance \SDs
      3(b)Data needs to have a bivariate normal distribution
      B1
      [1]
      1.2Needs "bivariate normal" or clear equivalent. Not just "both normally distributed"Allow "scatter diagram forms ellipse"
      3(c)
      \(\mathrm { H } _ { 0 }\) : higher maths scores are not associated with higher BCF grading; \(\mathrm { H } _ { 1 }\) : positively associated
      CV 0.3783
      \(0.400 > 0.3783\) so reject \(\mathrm { H } _ { 0 }\)
      Significant evidence that higher maths scores are associated with higher BCF grading
      B1
      B1
      M1ft
      A1ft
      [4]
      2.5
      1.1b
      2.2b
      3.5a
      Needs context and clearly onetailed \(O R \rho\) used and defined Not "evidence that ..."
      Allow 0.378
      Reject/do not reject \(\mathrm { H } _ { 0 }\)
      Contextualised, not too definite Needn't say "positive" if \(\mathrm { H } _ { 1 } \mathrm { OK }\)
      SC 2-tail: B0; 0.4438, or 0.3783 B1; then M1A0
      \(\mathrm { H } _ { 0 } : \rho = 0 , \mathrm { H } _ { 1 } : \rho > 0\) where \(\rho\) is population pmcc (not \(r\) )
      FT on their \(r\), but not CV
      Not "scores are associated
      ...". FT on their \(r\) only
      3(d)It makes no difference as this is a linear transformation
      B1
      [1]
      2.2aNeed both "unchanged" oe and reason, need "linear" or exact equivalent"oe" includes "their 0.4"
      4(a)Neither
      B1
      [1]
      2.5OENot "neither is independent of the other"
      4(b)\(c = 2.848 - 0.1567 m\)
      B1
      B1
      B1
      [3]
      1.1
      1.1
      1.1
      Correct \(a\), awrt 2.85
      Correct \(b\), awrt 0.157
      Letters correct from correct method
      (If both wrongly rounded, e.g. \(c = 2.84 - 0.156 m\), give B2)
      \(\mathrm { SC } : m\) on \(c\) :
      \(m = 15.65 - 4.832 c\) : B2
      \(y = 15.65 - 4.832 x\) : B1
      \(c = 15.65 - 4.832 m : \mathrm { B } 1\)
      If B0B0, give B1 for correct letters from valid working
      QuestionAnswerMarksAOGuidance
      4(c)\(a\) unchanged, \(b\) multiplied by 2.2 (allow " \(a\) unchanged, \(b\) increases", etc)B1 [1]2.2aoe, e.g. \(c = 2.848 - 0.345 m\); \(m = 7.114 - 2.196 c\)SC: \(m\) on \(c\) in (b): Both divided by 2.2 B1
      4(d)
      Draw approximate line of best fit
      Draw at least one vertical from line to point
      Say that "Best fit" line minimises the sum of squares of these distances
      M1
      M1
      A1
      [3]
      1.1
      2.4
      2.4
      Needs M2 and "minimises" and "sums of squares" oe
      SC: Horizontal(s):
      full marks (indept of (b))
OCR FS1 AS 2021 June Q3
5 marks Moderate -0.3
3 Sixteen candidates took an examination paper in mechanics and an examination paper in statistics.
  1. For all sixteen candidates, the value of the product moment correlation coefficient \(r\) for the marks on the two papers was 0.701 correct to 3 significant figures. Test whether there is evidence, at the \(5 \%\) significance level, of association between the marks on the two papers.
  2. A teacher decided to omit the marks of the candidates who were in the top three places in mechanics and the candidates who were in the bottom three places in mechanics. The marks for the remaining 10 candidates can be summarised by \(n = 10 , \Sigma x = 750 , \Sigma y = 690 , \Sigma x ^ { 2 } = 57690 , \Sigma y ^ { 2 } = 49676 , \Sigma x y = 50829\).
    1. Calculate the value of \(r\) for these 10 candidates.
    2. What do the two values of \(r\), in parts (a) and (b)(i), tell you about the scores of the sixteen candidates? A bag contains a mixture of blue and green beads, in unknown proportions. The proportion of green beads in the bag is denoted by \(p\).
      1. Sasha selects 10 beads at random, with replacement. Write down an expression, in terms of \(p\), for the variance of the number of green beads Sasha selects. Freda selects one bead at random from the bag, notes its colour, and replaces it in the bag. She continues to select beads in this way until a green bead is selected. The first green bead is the \(X\) th bead that Freda selects.
      2. Assume that \(p = 0.3\). Find
        1. \(\mathrm { P } ( X \geqslant 5 )\),
        2. \(\operatorname { Var } ( X )\).
    3. In fact, on the basis of a large number of observations of \(X\), it is found that \(\mathrm { P } ( X = 3 ) = \frac { 4 } { 25 } \times \mathrm { P } ( X = 1 )\). Estimate the value of \(p\).
OCR FS1 AS 2021 June Q1
5 marks Moderate -0.3
1 Five observations of bivariate data \(( x , y )\) are given in the table.
\(x\)781264
\(y\)201671723
  1. Find the value of Pearson's product-moment correlation coefficient.
  2. State what your answer to part (a) tells you about a scatter diagram representing the data.
  3. A new variable \(a\) is defined by \(a = 3 x + 4\). Dee says "The value of Pearson's product-moment correlation coefficient between \(a\) and \(y\) will not be the same as the answer to part (a)." State with a reason whether you agree with Dee. An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
    AccountABCDEFGH
    \(p\)1.62.12.42.72.83.35.28.4
    \(q\)1.62.32.22.23.12.97.64.8
    \(n = 8 \quad \Sigma p = 28.5 \quad \Sigma q = 26.7 \quad \Sigma p ^ { 2 } = 136.35 \quad \Sigma q ^ { 2 } = 116.35 \quad \Sigma p q = 116.70\) \includegraphics[max width=\textwidth, alt={}, center]{4c7546b9-03ee-47a1-915f-41e2b4ca19c0-03_762_1248_906_260}
    1. State which, if either, of the variables \(p\) and \(q\) is independent.
    2. Calculate the equation of the regression line of \(q\) on \(p\).
      1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
      2. Give two reasons why this estimate could be considered reliable.
    3. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
OCR Further Statistics 2021 June Q2
12 marks Standard +0.3
2 A book collector compared the prices of some books, \(\pounds x\), when new in 1972 and the prices of copies of the same books, \(\pounds y\), on a second-hand website in 2018.
The results are shown in Table 1 and are summarised below the table. \begin{table}[h]
BookABCDEFGHIJKL
\(x\)0.950.650.700.900.551.401.500.501.150.350.200.35
\(y\)6.067.002.005.874.005.367.192.503.008.291.372.00
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} $$n = 12 , \Sigma x = 9.20 , \Sigma y = 54.64 , \Sigma x ^ { 2 } = 8.9950 , \Sigma y ^ { 2 } = 310.4572 , \Sigma x y = 46.0545$$
  1. It is given that the value of Pearson's product-moment correlation coefficient for the data is 0.381 , correct to 3 significant figures.
    1. State what this information tells you about a scatter diagram illustrating the data.
    2. Test at the \(5 \%\) significance level whether there is evidence of positive correlation between prices in 1972 and prices in 2018.
  2. The collector noticed that the second-hand copy of book J was unusually expensive and he decided to ignore the data for book J. Calculate the value of Pearson's product-moment correlation coefficient for the other 11 books.
Edexcel S1 2024 October Q2
Moderate -0.8
  1. A biologist records the length, \(y \mathrm {~cm}\), and the weight, \(w \mathrm {~kg}\), of 50 rabbits. The following summary statistics are calculated from these data.
$$\sum y = 2015 \quad \sum y ^ { 2 } = 81938.5 \quad \sum w = 125 \quad \mathrm {~S} _ { w w } = 72.25 \quad \mathrm {~S} _ { y w } = 219.55$$
    1. Show that \(\mathrm { S } _ { y y } = 734\)
    2. Calculate the product moment correlation coefficient for these data. Give your answer to 3 decimal places.
  1. Interpret your value of the product moment correlation coefficient. The biologist believes that a linear regression model may be appropriate to describe these data.
  2. State, with a reason, whether or not your value of the product moment correlation coefficient is consistent with the biologist’s belief.
  3. Find the equation of the regression line of \(w\) on \(y\), giving your answer in the form \(w = a + b y\) Jeff has a pet rabbit of length 45 cm .
  4. Use your regression equation to estimate the weight of Jeff's rabbit.