2.05f Pearson correlation coefficient

27 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2005 June Q4
9 marks Moderate -0.3
4 The table shows the latitude, \(x\) (in degrees correct to 3 significant figures), and the average rainfall \(y\) (in cm correct to 3 significant figures) of five European cities.
City\(x\)\(y\)
Berlin52.558.2
Bucharest44.458.7
Moscow55.853.3
St Petersburg60.047.8
Warsaw52.356.6
$$\left[ n = 5 , \Sigma x = 265.0 , \Sigma y = 274.6 , \Sigma x ^ { 2 } = 14176.54 , \Sigma y ^ { 2 } = 15162.22 , \Sigma x y = 14464.10 . \right]$$
  1. Calculate the product moment correlation coefficient.
  2. The values of \(y\) in the table were in fact obtained from measurements in inches and converted into centimetres by multiplying by 2.54 . State what effect it would have had on the value of the product moment correlation coefficient if it had been calculated using inches instead of centimetres.
  3. It is required to estimate the annual rainfall at Bergen, where \(x = 60.4\). Calculate the equation of an appropriate line of regression, giving your answer in simplified form, and use it to find the required estimate.
Edexcel S1 2015 January Q3
9 marks Moderate -0.8
  1. The table shows the price of a bottle of milk, \(m\) pence, and the price of a loaf of bread, \(b\) pence, for 8 different years.
\(m\)2929353941434446
\(b\)758391121120126119126
(You may use \(\mathrm { S } _ { b b } = 3083.875\) and \(\mathrm { S } _ { m m } = 305.5\) )
  1. Find the exact value of \(\sum b m\)
  2. Find \(\mathrm { S } _ { b m }\)
  3. Calculate the product moment correlation coefficient between \(b\) and \(m\)
  4. Interpret the value of the correlation coefficient. A ninth year is added to the data set. In this year the price of the bottle of milk is 46 pence and the price of a loaf of bread is 175 pence.
  5. Without further calculation, state whether the value of the product moment correlation coefficient will increase, decrease or stay the same when all nine years are used. Give a reason for your answer.
Edexcel Paper 3 2022 June Q6
9 marks Standard +0.3
6. Anna is investigating the relationship between exercise and resting heart rate. She takes a random sample of 19 people in her year at school and records for each person
  • their resting heart rate, \(h\) beats per minute
  • the number of minutes, \(m\), spent exercising each week
Her results are shown on the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{3a09f809-fa28-4b3d-bb69-ea074433bd8f-16_531_551_653_740}
  1. Interpret the nature of the relationship between \(h\) and \(m\) Anna codes the data using the formulae $$\begin{aligned} & x = \log _ { 10 } m \\ & y = \log _ { 10 } h \end{aligned}$$ The product moment correlation coefficient between \(x\) and \(y\) is - 0.897
  2. Test whether or not there is significant evidence of a negative correlation between \(x\) and \(y\) You should
    The equation of the line of best fit of \(y\) on \(x\) is $$y = - 0.05 x + 1.92$$
  3. Use the equation of the line of best fit of \(y\) on \(x\) to find a model for \(h\) on \(m\) in the form $$h = a m ^ { k }$$ where \(a\) and \(k\) are constants to be found.
Edexcel Paper 3 2024 June Q2
6 marks Moderate -0.3
  1. Amar is studying the flight of a bird from its nest.
He measures the bird's height above the ground, \(h\) metres, at time \(t\) seconds for 10 values of \(t\) Amar finds the equation of the regression line for the data to be \(h = 38.6 - 1.28 t\)
  1. Interpret the gradient of this line. The product moment correlation coefficient between \(h\) and \(t\) is - 0.510
  2. Test whether or not there is evidence of a negative correlation between the height above the ground and the time during the flight.
    You should
    • state your hypotheses clearly
    • use a \(5 \%\) level of significance
    • state the critical value used
    Jane draws the following scatter diagram for Amar's data. \includegraphics[max width=\textwidth, alt={}, center]{ab7f7951-e6fe-4853-bb69-8016cf3e796c-06_1024_1033_1135_516}
  3. With reference to the scatter diagram, state, giving a reason, whether or not the regression line \(h = 38.6 - 1.28 t\) is an appropriate model for these data. Jane suggests an improved model using the variable \(u = ( t - k ) ^ { 2 }\) where \(k\) is a constant.
    She obtains the equation \(h = 38.1 - 0.78 u\)
  4. Choose a suitable value for \(k\) to write Jane's improved model for \(h\) in terms of \(t\) only.
OCR MEI AS Paper 2 2022 June Q8
11 marks Moderate -0.3
8 In 2018 research showed that 81\% of young adults in England had never donated blood.
Following an advertising campaign in 2021, it is believed that the percentage of young adults in England who had never donated blood in 2021 is less than \(81 \%\). Ling decides to carry out a hypothesis test at the 5\% level.
Ling collects data from a random sample of 400 young adults in England.
  1. State the null and alternative hypotheses for the test, defining the parameter used.
  2. Write down the probability that the null hypothesis is rejected when it should in fact be accepted.
  3. Assuming the null hypothesis is correct, calculate the expected number of young adults in the sample who had never donated blood.
  4. Calculate the probability that there were no more than 308 young adults who had never donated blood in the sample.
  5. Determine the critical region for the test. In fact, the sample contained 314 young adults who had never donated blood.
  6. Carry out the test, giving the conclusion in the context of the question.
OCR MEI AS Paper 2 2023 June Q13
6 marks Moderate -0.3
13 In a report published in October 2021 it is stated that \(37 \%\) of adults in the United Kingdom never exercise or play sport. A researcher believes that the true percentage is less than this. They decide to carry out a hypothesis test at the \(5 \%\) level to investigate the claim.
  1. State the null and alternative hypotheses for their test.
  2. Define the parameter for their test. In a random sample of 118 adults, they find that 35 of them never exercise or play sport.
  3. Carry out the test.
Edexcel S3 2017 June Q7
12 marks Standard +0.3
7. The independent random variables \(X\) and \(Y\) are such that $$X \sim \mathrm {~N} \left( 30,4.5 ^ { 2 } \right) \text { and } Y \sim \mathrm {~N} \left( 20,3.5 ^ { 2 } \right)$$ The random variables \(X _ { 1 } , X _ { 2 }\) and \(X _ { 3 }\) are independent and each has the same distribution as \(X\). The random variables \(Y _ { 1 }\) and \(Y _ { 2 }\) are independent and each has the same distribution as \(Y\). Given that the random variable \(A\) is defined as $$A = \frac { X _ { 1 } + X _ { 2 } + X _ { 3 } + Y _ { 1 } + Y _ { 2 } } { 5 }$$
  1. find \(\mathrm { P } ( A < 24 )\) The random variable \(W\) is such that \(W \sim \mathrm {~N} \left( \mu , 2.8 ^ { 2 } \right)\) Given that \(\mathrm { P } ( W - X < 4 ) = 0.1\) and that \(W\) and \(X\) are independent,
  2. find the value of \(\mu\), giving your answer to 3 significant figures.
AQA S1 2012 January Q2
3 marks Moderate -0.8
2 Dr Hanna has a special clinic for her older patients. She asked a medical student, Lenny, to select a random sample of 25 of her male patients, aged between 55 and 65 years, and, from their clinical records, to list their heights, weights and waist measurements. Lenny was then asked to calculate three values of the product moment correlation coefficient based upon his collected data. His results were:
  1. 0.365 between height and waist measurement;
  2. 1.16 between height and weight;
  3. - 0.583 between weight and waist measurement. For each of Lenny's three calculated values, state whether the value is definitely correct, probably correct, probably incorrect or definitely incorrect.
AQA S1 2013 January Q4
12 marks Moderate -0.3
4 Ashok is a work-experience student with an organisation that offers two separate professional examination papers, I and II. For each of a random sample of 12 students, A to L , he records the mark, \(x\) per cent, achieved on Paper I, and the mark, \(y\) per cent, achieved on Paper II.
\cline { 2 - 13 } \multicolumn{1}{c|}{}\(\mathbf { A }\)\(\mathbf { B }\)\(\mathbf { C }\)\(\mathbf { D }\)\(\mathbf { E }\)\(\mathbf { F }\)\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)\(\mathbf { K }\)\(\mathbf { L }\)
\(\boldsymbol { x }\)344653626772605470718285
\(\boldsymbol { y }\)616672788881496054444936
    1. Calculate the value of the product moment correlation coefficient, \(r\), between \(x\) and \(y\).
    2. Interpret your value of \(r\) in the context of this question.
    1. Give two possible advantages of plotting data on a graph before calculating the value of a product moment correlation coefficient.
    2. Complete the plotting of Ashok's data on the scatter diagram on page 5.
    3. State what is now revealed by the scatter diagram.
  1. Ashok subsequently discovers that students A to F have a more scientific background than students G to L. With reference to your scatter diagram, estimate the value of the product moment correlation coefficient for each of the two groups of students. You are not expected to calculate the two values.
    \cline { 2 - 7 } \multicolumn{1}{c|}{}\(\mathbf { G }\)\(\mathbf { H }\)\(\mathbf { I }\)\(\mathbf { J }\)\(\mathbf { K }\)\(\mathbf { L }\)
    \(\boldsymbol { x }\)605470718285
    \(\boldsymbol { y }\)496054444936
    \section*{Examination Marks}
    \includegraphics[max width=\textwidth, alt={}]{68830a6a-5479-4e5c-a845-a6536ab51cee-5_1616_1634_836_189}
AQA S1 2007 June Q1
5 marks Moderate -0.8
1 The table shows the length, in centimetres, and maximum diameter, in centimetres, of each of 10 honeydew melons selected at random from those on display at a market stall.
Length24251928272135233226
Maximum diameter18141611131412161514
  1. Calculate the value of the product moment correlation coefficient.
  2. Interpret your value in the context of this question.
AQA S1 2008 June Q3
10 marks Easy -1.3
3 [Figure 1, printed on the insert, is provided for use in this question.]
The table shows, for each of a sample of 12 handmade decorative ceramic plaques, the length, \(x\) millimetres, and the width, \(y\) millimetres.
Plaque\(\boldsymbol { x }\)\(\boldsymbol { y }\)
A232109
B235112
C236114
D234118
E230117
F230113
G246121
H240125
I244128
J241122
K246126
L245123
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
  3. On Figure 1, complete the scatter diagram for these data.
  4. In fact, the 6 plaques \(\mathrm { A } , \mathrm { B } , \ldots , \mathrm { F }\) are from a different source to the 6 plaques \(\mathrm { G } , \mathrm { H } , \ldots , \mathrm { L }\). With reference to your scatter diagram, but without further calculations, estimate the value of the product moment correlation coefficient between \(x\) and \(y\) for each source of plaque.
OCR MEI Further Statistics A AS 2020 November Q2
12 marks Standard +0.3
2 A researcher is investigating the concentration of bacteria and fungi in the air in buildings. The researcher selects a random sample of 12 buildings and measures the concentrations of bacteria, \(x\), and fungi, \(y\), in the air in each building. Both concentrations are measured in the same standard units. Fig. 2 illustrates the data collected. The researcher wishes to test for a relationship between \(x\) and \(y\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ba3fcd3c-6834-4116-be0e-d5b27aed0a7e-3_595_844_513_255} \captionsetup{labelformat=empty} \caption{Fig. 2}
\end{figure}
  1. Explain why a test based on the product moment correlation coefficient is likely to be appropriate for these data. Summary statistics for the data are as follows. \(n = 12 \quad \sum x = 18030 \quad \sum y = 15550 \quad \sum x ^ { 2 } = 31458700 \quad \sum y ^ { 2 } = 21980500 \quad \sum x y = 25626800\)
  2. In this question you must show detailed reasoning. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. Carry out a test at the \(5 \%\) significance level based on the product moment correlation coefficient to investigate whether there is any correlation between concentrations of bacteria and fungi.
  4. Explain why, in order for proper inference to be undertaken, the sample should be chosen randomly.
OCR MEI Further Statistics A AS 2021 November Q3
9 marks Standard +0.3
3 A student is investigating the link between temperature (in degrees Celsius) and electricity consumption (in Gigawatt-hours) in the country in which he lives. The student has read that there is strong negative correlation between daily mean temperature over the whole country and daily electricity consumption during a year. He wonders if this applies to an individual season. He therefore obtains data on the mean temperature and electricity consumption on ten randomly selected days in the summer. The spreadsheet output below shows the data, together with a scatter diagram to illustrate the data. \includegraphics[max width=\textwidth, alt={}, center]{5be067ff-4668-48d6-8ed2-b8dfa3e678f7-3_798_1593_639_251}
  1. Calculate Pearson's product moment correlation coefficient between daily mean temperature and daily electricity consumption. The student decides to carry out a hypothesis test to investigate whether there is negative correlation between daily mean temperature and daily electricity consumption during the summer.
  2. Explain why the student decides to carry out a test based on Pearson's product moment correlation coefficient.
  3. Show that the test at the \(5 \%\) significance level does not result in the null hypothesis being rejected.
  4. The student concludes that there is no correlation between the variables in the summer months. Comment on the student's conclusion.
Edexcel S1 Q2
9 marks Moderate -0.8
  1. Plot a scatter diagram showing these data. The student wanted to investigate further whether or not her data provided evidence of an increase in temperature in June each year. Using \(Y\) for the number of years since 1993 and \(T\) for the mean temperature, she calculated the following summary statistics. $$\Sigma Y = 28 , \quad \Sigma T = 182.5 , \quad \Sigma Y ^ { 2 } = 140 , \quad \Sigma T ^ { 2 } = 4173.93 , \quad \Sigma Y T = 644.7 .$$
  2. Calculate the product moment correlation coefficient for these data.
  3. Comment on your result in relation to the student's enquiry.
OCR H240/02 2018 June Q13
12 marks Standard +0.8
13 In this question you must show detailed reasoning. The probability that Paul's train to work is late on any day is 0.15 , independently of other days.
  1. The number of days on which Paul's train to work is late during a 450-day period is denoted by the random variable \(Y\). Find a value of \(a\) such that \(\mathrm { P } ( Y > a ) \approx \frac { 1 } { 6 }\). In the expansion of \(( 0.15 + 0.85 ) ^ { 50 }\), the terms involving \(0.15 ^ { r }\) and \(0.15 ^ { r + 1 }\) are denoted by \(T _ { r }\) and \(T _ { r + 1 }\) respectively.
  2. Show that \(\frac { T _ { r } } { T _ { r + 1 } } = \frac { 17 ( r + 1 ) } { 3 ( 50 - r ) }\).
  3. The number of days on which Paul's train to work is late during a 50-day period is modelled by the random variable \(X\).
    1. Find the values of \(r\) for which \(\mathrm { P } ( X = r ) \leqslant \mathrm { P } ( X = r + 1 )\).
    2. Hence find the most likely number of days on which the train will be late during a 50-day period.
Edexcel S1 2017 October Q5
13 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. Last year's performance score \(x\) and annual salary \(y\), in thousands of dollars, were recorded for a random sample of 10 employees of the company.
The performance scores were $$\begin{array} { l l l l l l l l l l } 15 & 24 & 32 & 39 & 41 & 18 & 16 & 22 & 34 & 42 \end{array}$$ (You may use \(\sum x ^ { 2 } = 9011\) )
  1. Find the mean and the variance of these performance scores. The corresponding \(y\) values for these 10 employees are summarised by $$\sum y = 306.1 \quad \text { and } \quad \mathrm { S } _ { y y } = 546.3$$
  2. Find the mean and the variance of these \(y\) values. The regression line of \(y\) on \(x\) based on this sample is $$y = 12.0 + 0.659 x$$
  3. Find the product moment correlation coefficient for these data.
  4. State, giving a reason, whether or not the value of the product moment correlation coefficient supports the use of a regression line to model the relationship between performance score and annual salary. The company decides to use this regression model to determine future salaries.
  5. Find the proposed annual salary, in dollars, for an employee who has a performance score of 35
Edexcel S1 2021 October Q2
12 marks Moderate -0.5
2. A large company is analysing how much money it spends on paper in its offices each year. The number of employees in the office, \(x\), and the amount spent on paper in a year, \(p\) (\$ hundreds), in each of 12 randomly selected offices were recorded. The results are summarised in the following statistics. $$\sum x = 93 \quad \mathrm {~S} _ { x x } = 148.25 \quad \sum p = 273 \quad \sum p ^ { 2 } = 6602.72 \quad \sum x p = 2347$$
  1. Show that \(\mathrm { S } _ { x p } = 231.25\)
  2. Find the product moment correlation coefficient for these data.
  3. Find the equation of the regression line of \(p\) on \(x\) in the form \(p = a + b x\)
  4. Give an interpretation of the gradient of your regression line. The director of the company wants to reduce the amount spent on paper each year. He wants each office to aim for a model of the form \(p = \frac { 4 } { 5 } a + \frac { 1 } { 2 } b x\), where \(a\) and \(b\) are the values found in part (c). Using the data for the 93 employees from the 12 offices,
  5. estimate the percentage saving in the amount spent on paper each year by the company using the director's model.
Edexcel S1 2003 June Q3
10 marks Moderate -0.8
3. A company owns two petrol stations \(P\) and \(Q\) along a main road. Total daily sales in the same week for \(P ( \pounds p )\) and for \(Q ( \pounds q )\) are summarised in the table below.
\(p\)\(q\)
Monday47605380
Tuesday53954460
Wednesday58404640
Thursday46505450
Friday53654340
Saturday49905550
Sunday43655840
When these data are coded using \(x = \frac { p - 4365 } { 100 }\) and \(y = \frac { q - 4340 } { 100 }\), $$\Sigma x = 48.1 , \Sigma y = 52.8 , \Sigma x ^ { 2 } = 486.44 , \Sigma y ^ { 2 } = 613.22 \text { and } \Sigma x y = 204.95 .$$
  1. Calculate \(S _ { x y } , S _ { x x }\) and \(S _ { y y }\).
  2. Calculate, to 3 significant figures, the value of the product moment correlation coefficient between \(x\) and \(y\).
    1. Write down the value of the product moment correlation coefficient between \(p\) and \(q\).
    2. Give an interpretation of this value.
Edexcel Paper 3 2018 June Q2
7 marks Standard +0.3
  1. Tessa owns a small clothes shop in a seaside town. She records the weekly sales figures, \(\pounds w\), and the average weekly temperature, \(t ^ { \circ } \mathrm { C }\), for 8 weeks during the summer.
    The product moment correlation coefficient for these data is - 0.915
    1. Stating your hypotheses clearly and using a \(5 \%\) level of significance, test whether or not the correlation between sales figures and average weekly temperature is negative.
    2. Suggest a possible reason for this correlation.
    Tessa suggests that a linear regression model could be used to model these data.
  2. State, giving a reason, whether or not the correlation coefficient is consistent with Tessa's suggestion.
  3. State, giving a reason, which variable would be the explanatory variable. Tessa calculated the linear regression equation as \(w = 10755 - 171 t\)
  4. Give an interpretation of the gradient of this regression equation.
Edexcel Paper 3 Specimen Q2
6 marks Standard +0.3
  1. A meteorologist believes that there is a relationship between the daily mean windspeed, \(w \mathrm { kn }\), and the daily mean temperature, \(t ^ { \circ } \mathrm { C }\). A random sample of 9 consecutive days is taken from past records from a town in the UK in July and the relevant data is given in the table below.
\(\boldsymbol { t }\)13.316.215.716.616.316.419.317.113.2
\(\boldsymbol { w }\)711811138151011
The meteorologist calculated the product moment correlation coefficient for the 9 days and obtained \(r = 0.609\)
  1. Explain why a linear regression model based on these data is unreliable on a day when the mean temperature is \(24 ^ { \circ } \mathrm { C }\)
  2. State what is measured by the product moment correlation coefficient.
  3. Stating your hypotheses clearly test, at the \(5 \%\) significance level, whether or not the product moment correlation coefficient for the population is greater than zero. Using the same 9 days a location from the large data set gave \(\bar { t } = 27.2\) and \(\bar { w } = 3.5\)
  4. Using your knowledge of the large data set, suggest, giving your reason, the location that gave rise to these statistics.
Edexcel Paper 3 Specimen Q2
7 marks Moderate -0.3
2. A researcher believes that there is a linear relationship between daily mean temperature and daily total rainfall. The 7 places in the northern hemisphere from the large data set are used. The mean of the daily mean temperatures, \(t ^ { \circ } \mathrm { C }\), and the mean of the daily total rainfall, \(s \mathrm {~mm}\), for the month of July in 2015 are shown on the scatter diagram below. \includegraphics[max width=\textwidth, alt={}, center]{565bfa73-8095-4242-80b6-cd47aaff6a31-03_844_1339_497_372}
  1. With reference to the scatter diagram, explain why a linear regression model may not be suitable for the relationship between \(t\) and s .
    (1) The researcher calculated the product moment correlation coefficient for the 7 places and obtained \(r = 0.658\).
  2. Stating your hypotheses clearly, test at the \(10 \%\) level of significance, whether or not the product moment correlation coefficient for the population is greater than zero.
    (3)
  3. Using your knowledge of the large data set, suggest the names of the 2 places labelled \(G\) and \(H\).
    (1)
  4. Using your knowledge from the large data set, and with reference to the locations of the two places labelled \(G\) and \(H\), give a reason why these places have the highest temperatures in July.
    (2)
  5. Suggest how you could make better use of the large data set to investigate the relationship between daily mean temperature and daily total rainfall.
    (1)
    (Total 7 marks)
WJEC Unit 4 Specimen Q5
7 marks Moderate -0.3
5. A hotel owner in Cardiff is interested in what factors hotel guests think are important when staying at a hotel. From a hotel booking website he collects the ratings for 'Cleanliness', 'Location', 'Comfort' and 'Value for money' for a random sample of 17 Cardiff hotels.
(Each rating is the average of all scores awarded by guests who have contributed reviews using a scale from 1 to 10 , where 10 is 'Excellent'.) The scatter graph shows the relationship between 'Value for money' and 'Cleanliness' for the sample of Cardiff hotels. \includegraphics[max width=\textwidth, alt={}, center]{b35e94ab-a426-4fca-9ecb-c659e0143ed7-4_693_1033_749_516}
  1. The product moment correlation coefficient for 'Value for money' and 'Cleanliness' for the sample of 17 Cardiff hotels is 0.895 . Stating your hypotheses clearly, test, at the \(5 \%\) level of significance, whether this correlation is significant. State your conclusion in context.
  2. The hotel owner also wishes to investigate whether 'Value for money' has a significant correlation with 'Cost per night'. He used a statistical analysis package which provided the following output which includes the Pearson correlation coefficient of interest and the corresponding \(p\)-value.
    Value for moneyCost per night
    Value for money1
    Cost per night
    0.047
    \(( 0.859 )\)
    1
    Comment on the correlation between 'Value for money' and 'Cost per night'.
OCR FS1 AS 2021 June Q3
5 marks Moderate -0.3
3 Sixteen candidates took an examination paper in mechanics and an examination paper in statistics.
  1. For all sixteen candidates, the value of the product moment correlation coefficient \(r\) for the marks on the two papers was 0.701 correct to 3 significant figures. Test whether there is evidence, at the \(5 \%\) significance level, of association between the marks on the two papers.
  2. A teacher decided to omit the marks of the candidates who were in the top three places in mechanics and the candidates who were in the bottom three places in mechanics. The marks for the remaining 10 candidates can be summarised by \(n = 10 , \Sigma x = 750 , \Sigma y = 690 , \Sigma x ^ { 2 } = 57690 , \Sigma y ^ { 2 } = 49676 , \Sigma x y = 50829\).
    1. Calculate the value of \(r\) for these 10 candidates.
    2. What do the two values of \(r\), in parts (a) and (b)(i), tell you about the scores of the sixteen candidates? A bag contains a mixture of blue and green beads, in unknown proportions. The proportion of green beads in the bag is denoted by \(p\).
      1. Sasha selects 10 beads at random, with replacement. Write down an expression, in terms of \(p\), for the variance of the number of green beads Sasha selects. Freda selects one bead at random from the bag, notes its colour, and replaces it in the bag. She continues to select beads in this way until a green bead is selected. The first green bead is the \(X\) th bead that Freda selects.
      2. Assume that \(p = 0.3\). Find
        1. \(\mathrm { P } ( X \geqslant 5 )\),
        2. \(\operatorname { Var } ( X )\).
    3. In fact, on the basis of a large number of observations of \(X\), it is found that \(\mathrm { P } ( X = 3 ) = \frac { 4 } { 25 } \times \mathrm { P } ( X = 1 )\). Estimate the value of \(p\).
Pre-U Pre-U 9794/3 2015 June Q1
5 marks Moderate -0.8
1 The information below summarises the percentages of males unemployed ( \(x\) ) and the percentages of females unemployed ( \(y\) ) in 10 different locations in the UK. $$n = 10 \quad \Sigma x = 87.6 \quad \Sigma x ^ { 2 } = 804.34 \quad \Sigma y = 76.4 \quad \Sigma y ^ { 2 } = 596 \quad \Sigma x y = 684.02$$ Find the product-moment correlation coefficient for these data.
Edexcel S3 2016 June Q3
Moderate -0.3
  1. Describe when you would use Spearman's rank correlation coefficient rather than the product moment correlation coefficient to measure the strength of the relationship between two variables. (1) A shop sells sunglasses and ice cream. For one week in the summer the shopkeeper ranked the daily sales of ice cream and sunglasses. The ranks are shown in the table below.
    SunMonTuesWedsThursFriSat
    Ice cream6475321
    Sunglasses6572341
  2. Calculate Spearman's rank correlation coefficient for these data. (3)
  3. Test, at the 5\% level of significance, whether or not there is a positive correlation between sales of ice cream and sales of sunglasses. State your hypotheses clearly. (4) The shopkeeper calculates the product moment correlation coefficient from his raw data and finds \(r = 0.65\)
  4. Using this new coefficient, test, at the 5\% level of significance, whether or not there is a positive correlation between sales of ice cream and sales of sunglasses. (2)
  5. Using your answers to part (c) and part (d), comment on the nature of the relationship between sales of sunglasses and sales of ice cream. (1)