Bivariate data

140 questions · 21 question types identified

Sort by: Question count | Difficulty
Calculate r from summary statistics

Questions that provide pre-calculated summary statistics (such as Σx, Σy, Σx², Σy², Σxy, or Sxx, Syy, Sxy) and ask to calculate r using these given values.

31 Moderate -0.5
22.1% of questions
Show example »
1 The information below summarises the percentages of males unemployed ( \(x\) ) and the percentages of females unemployed ( \(y\) ) in 10 different locations in the UK. $$n = 10 \quad \Sigma x = 87.6 \quad \Sigma x ^ { 2 } = 804.34 \quad \Sigma y = 76.4 \quad \Sigma y ^ { 2 } = 596 \quad \Sigma x y = 684.02$$ Find the product-moment correlation coefficient for these data.
View full question →
Easiest question Easy -1.2 »
  1. Gary compared the total attendance, \(x\), at home matches and the total number of goals, \(y\), scored at home during a season for each of 12 football teams playing in a league. He correctly calculated:
$$S _ { x x } = 1022500 \quad S _ { y y } = 130.9 \quad S _ { x y } = 8825$$
  1. Calculate the product moment correlation coefficient for these data.
  2. Interpret the value of the correlation coefficient. Helen was given the same data to analyse. In view of the large numbers involved she decided to divide the attendance figures by 100 . She then calculated the product moment correlation coefficient between \(\frac { x } { 100 }\) and \(y\).
  3. Write down the value Helen should have obtained.
View full question →
Hardest question Standard +0.8 »
6 The discrete random variable \(X\) has a uniform distribution over \(\{ n , n + 1 , \ldots , 2 n \}\).
  1. Given that \(n\) is odd, find \(\mathrm { P } \left( X < \frac { 3 } { 2 } n \right)\).
  2. Given instead that \(n\) is even, find \(\mathrm { P } \left( X < \frac { 3 } { 2 } n \right)\), giving your answer as a single algebraic fraction.
  3. The sum of 6 independent values of \(X\) is denoted by \(Y\). Find \(\operatorname { Var } ( Y )\).
View full question →
Calculate r from raw bivariate data

Questions that provide raw paired data values (x, y) in a table and ask to calculate the product moment correlation coefficient r, requiring computation of all summary statistics from scratch.

21 Moderate -0.5
15.0% of questions
Show example »
3 Fourteen candidates each sat two test papers, Paper 1 and Paper 2, on the same day. The marks, out of a total of 50, achieved by the students on each paper are shown in the table.
View full question →
Easiest question Easy -1.2 »
1 For each of a random sample of 10 customers, a store records the time, \(x\) minutes, spent shopping and the value, \(\pounds y\), to the nearest 10 p, of items purchased. The results are tabulated below.
Time (x)1345109172316216
Value (y)12.55.72.318.47.917.117.918.68.321.3
    1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
    2. Interpret your value in context.
  1. Write down the value of the product moment correlation coefficient if the time had been recorded in seconds and the value in pence to the nearest 10p.
View full question →
Hardest question Standard +0.3 »
3 Samples of water are taken from 10 randomly chosen wells in an area of a country. A researcher is investigating whether there is any relationship between the levels of dissolved oxygen, \(x\), and the amounts of radium, \(y\), in the water from the wells. Both quantities are measured in suitable units. The table and the scatter diagram in Fig. 3 show the values of \(x\) and \(y\) for the ten wells.
\(x\)45.948.352.264.666.667.669.375.077.482.8
\(y\)25.423.926.618.818.919.016.816.317.817.2
\begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{e3ac0ba0-9692-4018-894e-2b04b07eaf32-3_865_786_657_635} \captionsetup{labelformat=empty} \caption{Fig. 3}
\end{figure}
  1. Explain why it may not be appropriate to carry out a hypothesis test based on the product moment correlation coefficient.
  2. Calculate Spearman's rank correlation coefficient for these data.
  3. Using this value of Spearman's rank correlation coefficient, carry out a hypothesis test at the 1\% significance level to investigate whether there is any association between \(x\) and \(y\).
  4. Explain the meaning of the term 'significance level' in the context of the test carried out in part (iii).
View full question →
Interpret census or real-world data

A question is this type if and only if it asks to analyze or interpret bivariate relationships in census data, population data, or other real-world datasets with multiple Local Authorities or regions.

10 Easy -1.1
7.1% of questions
Show example »
This question deals with information about the populations of Local Authorities (LAs) in the North of England, taken from the 2011 census. \includegraphics{figure_6} Fig. 1 and Fig. 2 both show strong correlation, but of two different kinds.
  1. For each diagram, use a single word to describe the kind of correlation shown. [1]
  2. For each diagram, suggest a reason, in context, why the correlation is of the particular kind described in part (a). [2]
Fig. 3 is the same as Fig. 2 but with the point \(A\) marked. Fig. 4 shows information about the same LAs as Fig. 2 and Fig. 3. \includegraphics{figure_7}
  1. Point \(A\) in Fig. 3 and point \(B\) in Fig. 4 represent the same LA. Explain how you can tell that this LA has a large population. [1]
View full question →
Easiest question Easy -2.3 »
This question deals with information about the populations of Local Authorities (LAs) in the North of England, taken from the 2011 census. \includegraphics{figure_6} Fig. 1 and Fig. 2 both show strong correlation, but of two different kinds.
  1. For each diagram, use a single word to describe the kind of correlation shown. [1]
  2. For each diagram, suggest a reason, in context, why the correlation is of the particular kind described in part (a). [2]
Fig. 3 is the same as Fig. 2 but with the point \(A\) marked. Fig. 4 shows information about the same LAs as Fig. 2 and Fig. 3. \includegraphics{figure_7}
  1. Point \(A\) in Fig. 3 and point \(B\) in Fig. 4 represent the same LA. Explain how you can tell that this LA has a large population. [1]
View full question →
Hardest question Moderate -0.5 »
13 The table and the four scatter diagrams below show data taken from the 2011 UK census for four regions. On the scatter diagrams the names have been replaced by letters.
The table shows, for each region, the mean and standard deviation of the proportion of workers in each Local Authority who travel to work by driving a car or van and the proportion of workers in each Local Authority who travel to work as a passenger in a car or van.
Each scatter diagram shows, for each of the Local Authorities in a particular region, the proportion of workers who travel to work by driving a car or van and the proportion of workers who travel to work as a passenger in a car or van.
Driving a car or vanPassenger in a car or van
MeanStandard deviationMeanStandard deviation
London0.2570.1330.0170.008
South East0.5780.0640.0450.010
South West0.5800.0840.0490.007
Wales0.6440.0450.0680.015
Region A \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-14_634_1116_1308_299} Region B \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-14_636_1109_2049_301} \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-15_737_1183_237_240} \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-15_723_1169_1046_246}
  1. Using the values given in the table, match each region to its corresponding scatter diagram, explaining your reasoning.
  2. Steven claims that the outlier in the scatter diagram for Region C consists of a group of small islands. Explain whether or not the data given above support his claim.
  3. One of the Local Authorities in Region B consists of a single large island. Explain whether or not you would expect this Local Authority to appear as an outlier in the scatter diagram for Region B.
View full question →
Hypothesis test for correlation

A question is this type if and only if it asks to perform a formal hypothesis test to determine if correlation is significant (positive, negative, or non-zero).

8 Moderate -0.1
5.7% of questions
Show example »
2. A shopper estimates the cost, \(\pounds X\) per item, of each of 12 items in a supermarket. The shopper's estimates are compared with the actual cost, \(\pounds Y\) per item, of each item. The results are summarised as follows. \(n = 12\) \(\sum x ^ { 2 } = 28127\) \(\sum x = 399\) \(\Sigma y ^ { 2 } = 116509.0212\) \(\Sigma y = 623.88\) \(\sum x y = 45006.01\) Test at the \(1 \%\) significance level whether the shopper's estimates are positively correlated with the actual cost of the items.
[0pt]
View full question →
Calculate summary statistics (Sxx, Syy, Sxy)

A question is this type if and only if it asks to calculate the summary statistics Sxx, Syy, or Sxy from raw data or other given statistics.

8 Moderate -1.0
5.7% of questions
Show example »
  1. The volume of a sample of gas is kept constant. The gas is heated and the pressure, \(p\), is measured at 10 different temperatures, \(t\). The results are summarised below. \(\sum p = 445 \quad \sum p ^ { 2 } = 38125 \quad \sum t = 240 \quad \sum t ^ { 2 } = 27520 \quad \sum p t = 26830\)
    1. Find \(\mathrm { S } _ { p p }\) and \(\mathrm { S } _ { p t }\).
    Given that \(\mathrm { S } _ { t t } = 21760\),
  2. calculate the product moment correlation coefficient.
  3. Give an interpretation of your answer to part (b).
View full question →
Calculate regression line equation

A question is this type if and only if it asks to find the equation of a regression line (y on x or x on y) from summary statistics.

7 Moderate -0.5
5.0% of questions
Show example »
Twenty pairs of observations are made of two variables \(x\) and \(y\), which are believed to be related. It is found that $$\sum x = 200, \quad \sum y = 174, \quad \sum x^2 = 6201, \quad \sum y^2 = 5102, \quad \sum xy = 5200.$$ Find
  1. the product-moment correlation coefficient between \(x\) and \(y\), [3 marks]
  2. the equation of the regression line of \(y\) on \(x\). [4 marks]
Given that \(p = x + 30\) and \(q = y + 50\),
  1. find the equation of the regression line of \(q\) on \(p\), in the form \(q = mp + c\). [3 marks]
  2. Estimate the value of \(q\) when \(p = 46\), stating any assumptions you make. [3 marks]
View full question →
Analyze large data set correlations

A question is this type if and only if it specifically uses the large data set to investigate correlations between variables like temperature, rainfall, pressure, etc.

6 Moderate -0.4
4.3% of questions
Show example »
  1. A meteorologist believes that there is a relationship between the daily mean windspeed, \(w \mathrm { kn }\), and the daily mean temperature, \(t ^ { \circ } \mathrm { C }\). A random sample of 9 consecutive days is taken from past records from a town in the UK in July and the relevant data is given in the table below.
\(\boldsymbol { t }\)13.316.215.716.616.316.419.317.113.2
\(\boldsymbol { w }\)711811138151011
The meteorologist calculated the product moment correlation coefficient for the 9 days and obtained \(r = 0.609\)
  1. Explain why a linear regression model based on these data is unreliable on a day when the mean temperature is \(24 ^ { \circ } \mathrm { C }\)
  2. State what is measured by the product moment correlation coefficient.
  3. Stating your hypotheses clearly test, at the \(5 \%\) significance level, whether or not the product moment correlation coefficient for the population is greater than zero. Using the same 9 days a location from the large data set gave \(\bar { t } = 27.2\) and \(\bar { w } = 3.5\)
  4. Using your knowledge of the large data set, suggest, giving your reason, the location that gave rise to these statistics.
View full question →
Interpret correlation coefficient value

A question is this type if and only if it asks to interpret the meaning or context of a given or calculated correlation coefficient value.

5 Moderate -0.9
3.6% of questions
Show example »
8. The lifetimes of bulbs used in a lamp are normally distributed. A company \(X\) sells bulbs with a mean lifetime of 850 hours and a standard deviation of 50 hours.
  1. Find the probability of a bulb, from company \(X\), having a lifetime of less than 830 hours.
  2. In a box of 500 bulbs, from company \(X\), find the expected number having a lifetime of less than 830 hours. A rival company \(Y\) sells bulbs with a mean lifetime of 860 hours and \(20 \%\) of these bulbs have a lifetime of less than 818 hours.
  3. Find the standard deviation of the lifetimes of bulbs from company \(Y\). Both companies sell the bulbs for the same price.
  4. State which company you would recommend. Give reasons for your answer.
    \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{}
    \end{table}
View full question →
Estimate correlation from scatter diagram

A question is this type if and only if it asks to estimate or identify the approximate value of a correlation coefficient by visual inspection of a scatter diagram.

5 Easy -1.3
3.6% of questions
Show example »
  1. The scatter diagrams below were drawn by a student.
$$\begin{aligned} & y \underset { x } { \begin{array} { l l l l } & & \\ + & & & \\ + & + & + & \\ + & + & + \end{array} } \end{aligned}$$ The student calculated the value of the product moment correlation coefficient for each of the sets of data. The values were $$\begin{array} { l l l } 0.68 & - 0.79 & 0.08 \end{array}$$ Write down, with a reason, which value corresponds to which scatter diagram.
(6)
View full question →
Identify outliers or unusual points

A question is this type if and only if it asks to identify outliers, errors, or unusual data points in bivariate data or scatter diagrams.

5 Easy -1.4
3.6% of questions
Show example »
The table below shows the daily salt intake, \(x\) grams, and the daily Vitamin C intake, \(y\) milligrams, for a group of 10 adults.
AdultABCDEFGHIJ
\(x\)5.36.23.610.42.49.4657.111.2
\(y\)9014588481144480955541
A scatter diagram of the data is shown below. \includegraphics{figure_3} One of the adults is an outlier. Identify the letter of the adult that is the outlier. Circle your answer below. [1 mark] A \(\qquad\) B \(\qquad\) E \(\qquad\) J
View full question →
Assess appropriateness of correlation analysis

A question is this type if and only if it asks whether correlation analysis is appropriate, sensible, or reliable for a given dataset or context.

5 Moderate -0.7
3.6% of questions
Show example »
  1. (a) Explain briefly what you understand by a statistical model.
    (2 marks)
    A zoologist is analysing data on the weights of adult female otters.
    (b) Name a distribution that you think might be suitable for modelling such data.
    (1 mark)
    (c) Describe two features that you would expect to find in the distribution of the weights of adult female otters and that led to your choice in part (b).
    (2 marks)
    (d) Why might your choice in part (b) not be suitable for modelling the weights of all adult otters?
    (1 mark)
  2. For a geography project a student studied weather records kept by her school since 1993. To see if there was any evidence of global warming she worked out the mean temperature in degrees Celsius at noon for the month of June in each year.
Her results are shown in the table below.
Year19931994199519961997199819992000
Mean temperature
\(\left( { } ^ { \circ } \mathrm { C } \right)\)
21.924.120.723.024.222.122.623.9
View full question →
Draw scatter diagram from data

Question provides numerical data in a table and asks the student to draw or plot a scatter diagram.

5 Easy -1.2
3.6% of questions
Show example »
The Principal of a school believes that more students are absent on days when the temperature is lower. Over a two-week period in December she records the percentage of students who are absent, \(A\%\), and the temperature, \(T°\)C, at 9 am each morning giving these results.
\(T\) (°C)4\(-3\)\(-2\)\(-6\)037\(-1\)32
\(A\) (\%)8.514.117.020.317.915.512.412.813.711.6
  1. Represent these data on a scatter diagram. [4 marks]
You may use $$\Sigma T = 7, \quad \Sigma A = 143.8, \quad \Sigma T^2 = 137, \quad \Sigma A^2 = 2172.66, \quad \Sigma TA = 20.7$$
  1. Calculate the product moment correlation coefficient for these data and comment on the Principal's hypothesis. [6 marks]
  2. Find an equation of the regression line of \(A\) on \(T\) in the form \(A = p + qT\). [4 marks]
  3. Draw the regression line on your scatter diagram. [2 marks]
View full question →
Interpret or describe given scatter diagram

Question provides a scatter diagram (already drawn) and asks the student to describe the relationship, correlation strength, or other features shown.

5 Easy -1.9
3.6% of questions
Show example »
Which of the options below best describes the correlation shown in the diagram below? \includegraphics{figure_10} Tick \((\checkmark)\) one box. [1 mark] moderate positive strong positive moderate negative strong negative
View full question →
Find missing data values

A question is this type if and only if it asks to find missing or unknown data values given regression equations or correlation information.

4 Standard +0.5
2.9% of questions
Show example »
  1. The table below relates the values of two variables \(x\) and \(y\).
    \(x\)1\(A\)\(A + 3\)10
    \(y\)2\(A - 1\)\(A\)5
    \(A\) is a positive integer and \(\sum xy = 92\).
    1. Calculate the value of \(A\). [3]
    2. Explain how you can tell that the product-moment correlation coefficient is 1. [1]
  2. A music society has 300 members. 240 like Puccini, 100 like Wagner and 50 like neither.
    1. Calculate the probability that a member chosen at random likes Puccini but not Wagner. [3]
    2. Calculate the probability that a member chosen at random likes Puccini given that this member likes Wagner. [2]
View full question →
Identify errors in correlation analysis

A question is this type if and only if it asks to identify impossible, incorrect, or problematic correlation coefficient values or calculations.

3 Easy -1.7
2.1% of questions
Show example »
Which of the following is not a possible value for a product moment correlation coefficient? Circle your answer. [1 mark] \(-\frac{6}{5}\) \(-\frac{3}{5}\) \(0\) \(1\)
View full question →
Distinguish dependent and independent variables

A question is this type if and only if it asks to identify which variable is dependent, independent, controlled, or response in a bivariate context.

3 Easy -1.5
2.1% of questions
Show example »
5 The speed \(v \mathrm {~ms} ^ { - 1 }\) of a car at time \(t\) seconds after it starts to accelerate was measured at 1 -second intervals. The results are shown in the following diagram. \includegraphics[max width=\textwidth, alt={}, center]{d5843350-52f9-4fed-adf4-86ceb958033f-3_661_1186_1078_443}
  1. State whether \(t\) or \(v\) or neither is a controlled variable. The value of the product moment correlation coefficient \(r\) for the data is 0.987 correct to 3 significant figures.
  2. The speed of the car is converted to miles per hour and the time to minutes. State the value of \(r\) for the converted data.
  3. State the value of Spearman's rank correlation coefficient \(r _ { s }\) for the data.
  4. What information does \(r\) give about the data that is not given by \(r _ { s }\) ?
View full question →
Use regression line for prediction

A question is this type if and only if it asks to estimate or predict a value using a regression equation, or assess reliability of such predictions.

2 Moderate -0.8
1.4% of questions
Show example »
1 An experiment involves releasing a coin on a sloping plane so that it slides down the slope and then slides along a horizontal plane at the bottom of the slope before coming to rest. The angle \(\theta ^ { \circ }\) of the sloping plane is varied, and for each value of \(\theta\), the distance \(d \mathrm {~cm}\) the coin slides on the horizontal plane is recorded. A scatter diagram to illustrate the results of the experiment is shown below, together with the least squares regression line of \(d\) on \(\theta\). \includegraphics[max width=\textwidth, alt={}, center]{28c6a0d9-09a6-4743-af0e-fe2e43e256c9-2_639_972_561_548}
  1. State which two of the following correctly describe the variable \(\theta\).
    Controlled variableCorrelation coefficient
    Dependent variableIndependent variable
    Response variableRegression coefficient
    The least squares regression line of \(d\) on \(\theta\) has equation \(d = 1.96 + 0.11 \theta\).
  2. Use the diagram in the Printed Answer Booklet to explain the term "least squares".
  3. State what difference, if any, it would make to the equation of the regression line if \(d\) were measured in inches rather than centimetres. ( 1 inch \(\approx 2.54 \mathrm {~cm}\) ).
View full question →
Compare correlation coefficients

A question is this type if and only if it asks to compare two or more correlation coefficients (e.g., Pearson vs Spearman, or different datasets) or explain differences.

2 Standard +0.3
1.4% of questions
Show example »
3 A sample of bivariate data was taken and the results were summarised as follows. $$n = 5 \quad \Sigma x = 24 \quad \Sigma x ^ { 2 } = 130 \quad \Sigma y = 39 \quad \Sigma y ^ { 2 } = 361 \quad \Sigma x y = 212$$
  1. Show that the value of the product moment correlation coefficient \(r\) is 0.855 , correct to 3 significant figures.
  2. The ranks of the data were found. One student calculated Spearman's rank correlation coefficient \(r _ { s }\), and found that \(r _ { s } = 0.7\). Another student calculated the product moment coefficient, \(R\), of these ranks. State which one of the following statements is true, and explain your answer briefly.
    (A) \(R = 0.855\) (B) \(R = 0.7\) (C) It is impossible to give the value of \(R\) without carrying out a calculation using the original data.
  3. All the values of \(x\) are now multiplied by a scaling factor of 2 . State the new values of \(r\) and \(r _ { s }\).
View full question →
Effect of data transformation on correlation

A question is this type if and only if it asks about how linear transformations or coding of variables affects correlation or regression.

2 Moderate -0.6
1.4% of questions
Show example »
Two students, Olive and Shan, collect data on the weight, \(w\) grams, and the tail length, \(t\) cm, of 15 mice. Olive summarised the data as follows \(S_tt = 5.3173\) \quad \(\sum w^2 = 6089.12\) \quad \(\sum tw = 2304.53\) \quad \(\sum w = 297.8\) \quad \(\sum t = 114.8\)
  1. Calculate the value of \(S_{ww}\) and the value of \(S_{tw}\) [3]
  2. Calculate the value of the product moment correlation coefficient between \(w\) and \(t\) [2]
  3. Show that the equation of the regression line of \(w\) on \(t\) can be written as $$w = -16.7 + 4.77t$$ [3]
  4. Give an interpretation of the gradient of the regression line. [1]
  5. Explain why it would not be appropriate to use the regression line in part (c) to estimate the weight of a mouse with a tail length of 2cm. [2]
Shan decided to code the data using \(x = t - 6\) and \(y = \frac{w}{2} - 5\)
  1. Write down the value of the product moment correlation coefficient between \(x\) and \(y\) [1]
  2. Write down an equation of the regression line of \(y\) on \(x\) You do not need to simplify your equation. [1]
View full question →
Sketch theoretical scatter diagram

Question asks the student to sketch a scatter diagram that would illustrate a specified correlation coefficient or theoretical relationship, without providing actual data.

2 Moderate -0.8
1.4% of questions
Show example »
  1. Draw two separate scatter diagrams, each with eight points, to illustrate the relationship between \(x\) and \(y\) in the cases where they have a product moment correlation coefficient equal to
    1. exactly \(+1\),
    2. about \(-0.4\). [4 marks]
  2. Explain briefly how the conclusion you would draw from a product moment correlation coefficient of \(+0.3\) would vary according to the number of pairs of data used in its calculation. [2 marks]
View full question →
Calculate Spearman's rank correlation

A question is this type if and only if it asks to calculate or use Spearman's rank correlation coefficient rather than Pearson's.

1 Moderate -0.3
0.7% of questions
Show example »
  1. A personnel manager wants to find out if a test carried out during an employee's interview and a skills assessment at the end of basic training is a guide to performance after working for the company for one year.
The table below shows the results of the interview test of 10 employees and their performance after one year.
EmployeeA\(B\)CD\(E\)\(F\)G\(H\)IJ
Interview test, \(x\) \%.65717977857885908162
Performance after one year, \(y \%\).65748264877861657969
$$\text { [You may use } \sum x ^ { 2 } = 60475 , \sum y ^ { 2 } = 53122 , \sum x y = 56076 \text { ] }$$
  1. Showing your working clearly, calculate the product moment correlation coefficient between the interview test and the performance after one year. The product moment correlation coefficient between the skills assessment and the performance after one year is - 0.156 to 3 significant figures.
  2. Use your answer to part (a) to comment on whether or not the interview test and skills assessment are a guide to the performance after one year. Give clear reasons for your answers.
View full question →