Bivariate data

135 questions · 21 question types identified

Calculate r from summary statistics

Questions that provide pre-calculated summary statistics (such as Σx, Σy, Σx², Σy², Σxy, or Sxx, Syy, Sxy) and ask to calculate r using these given values.

27
20.0% of questions
Show example »
6 The discrete random variable \(X\) has a uniform distribution over \(\{ n , n + 1 , \ldots , 2 n \}\).
  1. Given that \(n\) is odd, find \(\mathrm { P } \left( X < \frac { 3 } { 2 } n \right)\).
  2. Given instead that \(n\) is even, find \(\mathrm { P } \left( X < \frac { 3 } { 2 } n \right)\), giving your answer as a single algebraic fraction.
  3. The sum of 6 independent values of \(X\) is denoted by \(Y\). Find \(\operatorname { Var } ( Y )\).
View full question →
Calculate r from raw bivariate data

Questions that provide raw paired data values (x, y) in a table and ask to calculate the product moment correlation coefficient r, requiring computation of all summary statistics from scratch.

22
16.3% of questions
Show example »
3 Fourteen candidates each sat two test papers, Paper 1 and Paper 2, on the same day. The marks, out of a total of 50, achieved by the students on each paper are shown in the table.
View full question →
Interpret census or real-world data

A question is this type if and only if it asks to analyze or interpret bivariate relationships in census data, population data, or other real-world datasets with multiple Local Authorities or regions.

11
8.1% of questions
Show example »
9 The diagram below shows some "Cycle to work" data taken from the 2001 and 2011 UK censuses. The diagram shows the percentages, by age group, of male and female workers in England and Wales, excluding London, who cycled to work in 2001 and 2011.
\includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-10_951_1635_559_207} The following questions refer to the workers represented by the graphs in the diagram.
  1. A researcher is going to take a sample of men and a sample of women and ask them whether or not they cycle to work. Why would it be more important to stratify the sample of men? A research project followed a randomly chosen large sample of the group of male workers who were aged 30-34 in 2001.
  2. Does the diagram suggest that the proportion of this group who cycled to work has increased or decreased from 2001 to 2011?
    Justify your answer.
  3. Write down one assumption that you have to make about these workers in order to draw this conclusion.
View full question →
Calculate summary statistics (Sxx, Syy, Sxy)

A question is this type if and only if it asks to calculate the summary statistics Sxx, Syy, or Sxy from raw data or other given statistics.

8
5.9% of questions
Show example »
2. On a particular day in summer 1993 at 0800 hours the height above sea level, \(x\) metres, and the temperature, \(y ^ { \circ } \mathrm { C }\), were recorded in 10 Mediterranean towns. The following summary statistics were calculated from the results. $$\Sigma x = 7300 , \Sigma x ^ { 2 } = 6599600 , S _ { x y } = - 13060 , S _ { y y } = 140.9 .$$
  1. Find \(S _ { x x }\).
  2. Calculate, to 3 significant figures, the product moment correlation coefficient between \(x\) and \(y\).
  3. Give an interpretation of your coefficient.
View full question →
Calculate regression line equation

A question is this type if and only if it asks to find the equation of a regression line (y on x or x on y) from summary statistics.

6
4.4% of questions
Show example »
  1. The table shows the price of a bottle of milk, \(m\) pence, and the price of a loaf of bread, \(b\) pence, for 8 different years.
\(m\)2929353941434446
\(b\)758391121120126119126
(You may use \(\mathrm { S } _ { b b } = 3083.875\) and \(\mathrm { S } _ { m m } = 305.5\) )
  1. Find the exact value of \(\sum b m\)
  2. Find \(\mathrm { S } _ { b m }\)
  3. Calculate the product moment correlation coefficient between \(b\) and \(m\)
  4. Interpret the value of the correlation coefficient. A ninth year is added to the data set. In this year the price of the bottle of milk is 46 pence and the price of a loaf of bread is 175 pence.
  5. Without further calculation, state whether the value of the product moment correlation coefficient will increase, decrease or stay the same when all nine years are used. Give a reason for your answer.
View full question →
Hypothesis test for correlation

A question is this type if and only if it asks to perform a formal hypothesis test to determine if correlation is significant (positive, negative, or non-zero).

6
4.4% of questions
Show example »
2. A shopper estimates the cost, \(\pounds X\) per item, of each of 12 items in a supermarket. The shopper's estimates are compared with the actual cost, \(\pounds Y\) per item, of each item. The results are summarised as follows.
\(n = 12\)
\(\sum x ^ { 2 } = 28127\)
\(\sum x = 399\)
\(\Sigma y ^ { 2 } = 116509.0212\)
\(\Sigma y = 623.88\)
\(\sum x y = 45006.01\) Test at the \(1 \%\) significance level whether the shopper's estimates are positively correlated with the actual cost of the items.
[0pt] [BLANK PAGE]
View full question →
Analyze large data set correlations

A question is this type if and only if it specifically uses the large data set to investigate correlations between variables like temperature, rainfall, pressure, etc.

6
4.4% of questions
Show example »
  1. A meteorologist believes that there is a relationship between the daily mean windspeed, \(w \mathrm { kn }\), and the daily mean temperature, \(t ^ { \circ } \mathrm { C }\). A random sample of 9 consecutive days is taken from past records from a town in the UK in July and the relevant data is given in the table below.
\(\boldsymbol { t }\)13.316.215.716.616.316.419.317.113.2
\(\boldsymbol { w }\)711811138151011
The meteorologist calculated the product moment correlation coefficient for the 9 days and obtained \(r = 0.609\)
  1. Explain why a linear regression model based on these data is unreliable on a day when the mean temperature is \(24 ^ { \circ } \mathrm { C }\)
  2. State what is measured by the product moment correlation coefficient.
  3. Stating your hypotheses clearly test, at the \(5 \%\) significance level, whether or not the product moment correlation coefficient for the population is greater than zero. Using the same 9 days a location from the large data set gave \(\bar { t } = 27.2\) and \(\bar { w } = 3.5\)
  4. Using your knowledge of the large data set, suggest, giving your reason, the location that gave rise to these statistics.
View full question →
Interpret or describe given scatter diagram

Question provides a scatter diagram (already drawn) and asks the student to describe the relationship, correlation strength, or other features shown.

6
4.4% of questions
Show example »
10 Which of the options below best describes the correlation shown in the diagram below?
\includegraphics[max width=\textwidth, alt={}, center]{c8a41c47-bbda-4e91-a7a2-d0bcf6a46f25-12_750_1246_847_395} Tick \(( \checkmark )\) one box.
moderate positive □
strong positive □
moderate negative □
strong negative □
View full question →
Interpret correlation coefficient value

A question is this type if and only if it asks to interpret the meaning or context of a given or calculated correlation coefficient value.

5
3.7% of questions
Show example »
7
  1. Three airport management trainees, Ryan, Sunil and Tim, were each instructed to select a random sample of 12 suitcases from those waiting to be loaded onto aircraft. Each trainee also had to measure the volume, \(x\), and the weight, \(y\), of each of the 12 suitcases in his sample, and then calculate the value of the product moment correlation coefficient, \(r\), between \(x\) and \(y\).
    • Ryan obtained a value of - 0.843 .
    • Sunil obtained a value of + 0.007 .
    Explain why neither of these two values is likely to be correct.
  2. Peggy, a supervisor with many years' experience, measured the volume, \(x\) cubic feet, and the weight, \(y\) pounds, of each suitcase in a random sample of 6 suitcases, and then obtained a value of 0.612 for \(r\).
    • Ryan and Sunil each claimed that Peggy's value was different from their values because she had measured the volumes in cubic feet and the weights in pounds, whereas they had measured the volumes in cubic metres and the weights in kilograms.
    • Tim claimed that Peggy's value was almost exactly half his calculated value because she had used a sample of size 6 whereas he had used one of size 12 .
    Explain why neither of these two claims is valid.
  3. Quentin, a manager, recorded the volumes, \(v\), and the weights, \(w\), of a random sample of 8 suitcases as follows.
    \(\boldsymbol { v }\)28.119.746.423.631.117.535.813.8
    \(\boldsymbol { w }\)14.912.121.118.019.819.216.214.7
    1. Calculate the value of \(r\) between \(v\) and \(w\).
    2. Interpret your value in the context of this question.
View full question →
Estimate correlation from scatter diagram

A question is this type if and only if it asks to estimate or identify the approximate value of a correlation coefficient by visual inspection of a scatter diagram.

5
3.7% of questions
Show example »
  1. The scatter diagrams below were drawn by a student.
$$\begin{aligned} & y \underset { x } { \begin{array} { l l l l } & &
+ & & &
+ & + & + &
+ & + & + \end{array} } \end{aligned}$$ The student calculated the value of the product moment correlation coefficient for each of the sets of data. The values were $$\begin{array} { l l l } 0.68 & - 0.79 & 0.08 \end{array}$$ Write down, with a reason, which value corresponds to which scatter diagram.
(6)
View full question →
Identify errors in correlation analysis

A question is this type if and only if it asks to identify impossible, incorrect, or problematic correlation coefficient values or calculations.

5
3.7% of questions
Show example »
10 Which of the following is not a possible value for a product moment correlation coefficient? Circle your answer. $$- \frac { 6 } { 5 } \quad - \frac { 3 } { 5 } \quad 0$$
View full question →
Assess appropriateness of correlation analysis

A question is this type if and only if it asks whether correlation analysis is appropriate, sensible, or reliable for a given dataset or context.

5
3.7% of questions
Show example »
  1. (a) Explain briefly what you understand by a statistical model.
    (2 marks)
    A zoologist is analysing data on the weights of adult female otters.
    (b) Name a distribution that you think might be suitable for modelling such data.
    (1 mark)
    (c) Describe two features that you would expect to find in the distribution of the weights of adult female otters and that led to your choice in part (b).
    (2 marks)
    (d) Why might your choice in part (b) not be suitable for modelling the weights of all adult otters?
    (1 mark)
  2. For a geography project a student studied weather records kept by her school since 1993. To see if there was any evidence of global warming she worked out the mean temperature in degrees Celsius at noon for the month of June in each year.
Her results are shown in the table below.
Year19931994199519961997199819992000
Mean temperature
\(\left( { } ^ { \circ } \mathrm { C } \right)\)
21.924.120.723.024.222.122.623.9
View full question →
Identify outliers or unusual points

A question is this type if and only if it asks to identify outliers, errors, or unusual data points in bivariate data or scatter diagrams.

4
3.0% of questions
Show example »
15
The number of hours of sunshine and the daily maximum temperature were recorded over a 9-day period in June at an English seaside town. A scatter diagram representing the recorded data is shown below.
\includegraphics[max width=\textwidth, alt={}, center]{f87d1b36-26db-4a0b-b9ec-d7d82a396aba-20_872_1511_488_264} One of the points on the scatter diagram is an error. 15
    1. Write down the letter that identifies this point.
      15
  1. (ii) Suggest one possible action that could be taken to deal with this error.
    15
  2. It is claimed that the scatter diagram proves that longer hours of sunshine cause
    higher maximum daily temperatures. Comment on the validity of this claim.
    [0pt] [1 mark]
View full question →
Draw scatter diagram from data

Question provides numerical data in a table and asks the student to draw or plot a scatter diagram.

4
3.0% of questions
Show example »
3 [Figure 1, printed on the insert, is provided for use in this question.]
The table shows, for each of a sample of 12 handmade decorative ceramic plaques, the length, \(x\) millimetres, and the width, \(y\) millimetres.
Plaque\(\boldsymbol { x }\)\(\boldsymbol { y }\)
A232109
B235112
C236114
D234118
E230117
F230113
G246121
H240125
I244128
J241122
K246126
L245123
  1. Calculate the value of the product moment correlation coefficient between \(x\) and \(y\).
  2. Interpret your value in the context of this question.
  3. On Figure 1, complete the scatter diagram for these data.
  4. In fact, the 6 plaques \(\mathrm { A } , \mathrm { B } , \ldots , \mathrm { F }\) are from a different source to the 6 plaques \(\mathrm { G } , \mathrm { H } , \ldots , \mathrm { L }\). With reference to your scatter diagram, but without further calculations, estimate the value of the product moment correlation coefficient between \(x\) and \(y\) for each source of plaque.
View full question →
Use regression line for prediction

A question is this type if and only if it asks to estimate or predict a value using a regression equation, or assess reliability of such predictions.

3
2.2% of questions
Show example »
1 An experiment involves releasing a coin on a sloping plane so that it slides down the slope and then slides along a horizontal plane at the bottom of the slope before coming to rest. The angle \(\theta ^ { \circ }\) of the sloping plane is varied, and for each value of \(\theta\), the distance \(d \mathrm {~cm}\) the coin slides on the horizontal plane is recorded. A scatter diagram to illustrate the results of the experiment is shown below, together with the least squares regression line of \(d\) on \(\theta\).
\includegraphics[max width=\textwidth, alt={}, center]{28c6a0d9-09a6-4743-af0e-fe2e43e256c9-2_639_972_561_548}
  1. State which two of the following correctly describe the variable \(\theta\).
    Controlled variableCorrelation coefficient
    Dependent variableIndependent variable
    Response variableRegression coefficient
    The least squares regression line of \(d\) on \(\theta\) has equation \(d = 1.96 + 0.11 \theta\).
  2. Use the diagram in the Printed Answer Booklet to explain the term "least squares".
  3. State what difference, if any, it would make to the equation of the regression line if \(d\) were measured in inches rather than centimetres. ( 1 inch \(\approx 2.54 \mathrm {~cm}\) ).
View full question →
Distinguish dependent and independent variables

A question is this type if and only if it asks to identify which variable is dependent, independent, controlled, or response in a bivariate context.

3
2.2% of questions
Show example »
2 In an experiment, the percentage sand content, \(y\), of soil in a given region was measured at nine different depths, \(x \mathrm {~cm}\), taken at intervals of 6 cm from 0 cm to 48 cm . The results are summarised below. $$n = 9 \quad \Sigma x = 216 \quad \Sigma x ^ { 2 } = 7344 \quad \Sigma y = 512.4 \quad \Sigma y ^ { 2 } = 30595 \quad \Sigma x y = 10674$$
  1. State, with a reason, which variable is the independent variable.
  2. Calculate the product moment correlation coefficient between \(x\) and \(y\).
  3. (a) Calculate the equation of the appropriate regression line.
    (b) This regression line is used to estimate the percentage sand content at depths of 25 cm and 100 cm . Comment on the reliability of each of these estimates. You are not asked to find the estimates.
View full question →
Find missing data values

A question is this type if and only if it asks to find missing or unknown data values given regression equations or correlation information.

3
2.2% of questions
Show example »
For a random sample, \(A\), of 5 pairs of values of \(x\) and \(y\), the equations of the regression lines of \(y\) on \(x\) and \(x\) on \(y\) are respectively \(y = 4.5 + 0.3 x\) and \(x = 3 y - 13\). Four of the five pairs of data are given in the following table.
\(x\)1579
\(y\)5677
Find
  1. the fifth pair of values of \(x\) and \(y\),
  2. the value of the product moment correlation coefficient. A second random sample, \(B\), of 5 pairs of values of \(x\) and \(y\) is summarised as follows. $$\Sigma x = 20 \quad \Sigma x ^ { 2 } = 100 \quad \Sigma y = 17 \quad \Sigma y ^ { 2 } = 69 \quad \Sigma x y = 75$$ The two samples, \(A\) and \(B\), are combined to form a single random sample of size 10 .
  3. Use this combined sample to test, at the \(5 \%\) significance level, whether the population product moment correlation coefficient is different from zero.
View full question →
Compare correlation coefficients

A question is this type if and only if it asks to compare two or more correlation coefficients (e.g., Pearson vs Spearman, or different datasets) or explain differences.

2
1.5% of questions
Show example »
3 A sample of bivariate data was taken and the results were summarised as follows. $$n = 5 \quad \Sigma x = 24 \quad \Sigma x ^ { 2 } = 130 \quad \Sigma y = 39 \quad \Sigma y ^ { 2 } = 361 \quad \Sigma x y = 212$$
  1. Show that the value of the product moment correlation coefficient \(r\) is 0.855 , correct to 3 significant figures.
  2. The ranks of the data were found. One student calculated Spearman's rank correlation coefficient \(r _ { s }\), and found that \(r _ { s } = 0.7\). Another student calculated the product moment coefficient, \(R\), of these ranks. State which one of the following statements is true, and explain your answer briefly.
    (A) \(R = 0.855\)
    (B) \(R = 0.7\)
    (C) It is impossible to give the value of \(R\) without carrying out a calculation using the original data.
  3. All the values of \(x\) are now multiplied by a scaling factor of 2 . State the new values of \(r\) and \(r _ { s }\).
View full question →
Sketch theoretical scatter diagram

Question asks the student to sketch a scatter diagram that would illustrate a specified correlation coefficient or theoretical relationship, without providing actual data.

2
1.5% of questions
Show example »
6 Six pairs of values of variables \(x\) and \(y\) are measured. Draw a sketch of a possible scatter diagram of the data for each of the following cases:
  1. the product moment correlation coefficient is approximately zero;
  2. the product moment correlation coefficient is exactly - 1 . On your diagram for part (i), sketch the regression line of \(y\) on \(x\) and the regression line of \(x\) on \(y\), labelling each line. On your diagram for part (ii), sketch the regression line of \(y\) on \(x\) and state its relationship to the regression line of \(x\) on \(y\).
View full question →
Calculate Spearman's rank correlation

A question is this type if and only if it asks to calculate or use Spearman's rank correlation coefficient rather than Pearson's.

1
0.7% of questions
Show example »
  1. A personnel manager wants to find out if a test carried out during an employee's interview and a skills assessment at the end of basic training is a guide to performance after working for the company for one year.
The table below shows the results of the interview test of 10 employees and their performance after one year.
EmployeeA\(B\)CD\(E\)\(F\)G\(H\)IJ
Interview test, \(x\) \%.65717977857885908162
Performance after one year, \(y \%\).65748264877861657969
$$\text { [You may use } \sum x ^ { 2 } = 60475 , \sum y ^ { 2 } = 53122 , \sum x y = 56076 \text { ] }$$
  1. Showing your working clearly, calculate the product moment correlation coefficient between the interview test and the performance after one year. The product moment correlation coefficient between the skills assessment and the performance after one year is - 0.156 to 3 significant figures.
  2. Use your answer to part (a) to comment on whether or not the interview test and skills assessment are a guide to the performance after one year. Give clear reasons for your answers.
View full question →
Effect of data transformation on correlation

A question is this type if and only if it asks about how linear transformations or coding of variables affects correlation or regression.

1
0.7% of questions
Show example »
1 The average maximum monthly temperatures, \(u\) degrees Fahrenheit, and the average minimum monthly temperatures, \(v\) degrees Fahrenheit, in New York City are as follows.
JanFebMarAprMayJunJulAugSepOctNovDec
Maximum (u)394048617181858377675441
Minimum (v)262734445363686660514130
    1. Calculate, to one decimal place, the mean and the standard deviation of the 12 values of the average maximum monthly temperature.
    2. For comparative purposes with a UK city, it was necessary to convert the temperatures from degrees Fahrenheit ( \({ } ^ { \circ } \mathrm { F }\) ) to degrees Celsius ( \({ } ^ { \circ } \mathrm { C }\) ). The formula used to convert \(f ^ { \circ } \mathrm { F }\) to \(c ^ { \circ } \mathrm { C }\) is: $$c = \frac { 5 } { 9 } ( f - 32 )$$ Use this formula and your answers in part (a)(i) to calculate, in \({ } ^ { \circ } \mathbf { C }\), the mean and the standard deviation of the 12 values of the average maximum monthly temperature.
      (3 marks)
  1. The value of the product moment correlation coefficient, \(r _ { u v }\), between the above 12 values of \(u\) and \(v\) is 0.997 , correct to three decimal places. State, giving a reason, the corresponding value of \(r _ { x y }\), where \(x\) and \(y\) are the exact equivalent temperatures in \({ } ^ { \circ } \mathrm { C }\) of \(u\) and \(v\) respectively.
    (2 marks)
View full question →