Assess appropriateness of correlation analysis

A question is this type if and only if it asks whether correlation analysis is appropriate, sensible, or reliable for a given dataset or context.

5 questions

OCR MEI AS Paper 2 2022 June Q11
11 The pre-release material contains information about the Median Income of Taxpayers and the Percentage of Pupils Achieving at Least 5 A*- C grades, including English and Maths, at the end of KS4 in different areas of London. Alex is investigating whether there is a relationship between median income and the percentage of pupils achieving at least 5 A* - C grades, including English and Maths, at the end of KS4. Alex decides to use the first 12 rows of data for 2014-5 from the pre-release data as a sample. The sample is shown in Fig. 11.1. \begin{table}[h]
AreaMedian Income of TaxpayersPercentage of Pupils Achieving at Least 5 A*- C grades including English and Maths
City of London61100\#N/A
Barking and Dagenham2180054.0
Barnet2710070.1
Bexley2440055.0
Brent2270060.0
Bromley2810068.0
Camden3310056.4
Croydon2510059.6
Ealing2460062.1
Enfield2530054.5
Greenwich2460057.7
Hackney2600060.4
\captionsetup{labelformat=empty} \caption{Fig. 11.1}
\end{table}
  1. Explain whether the data in Fig. 11.1 is a simple random sample of the data for 2014-5.
  2. The City of London is included in Alex's sample. Explain why Alex is not able to use the data for the City of London in this investigation. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 11.2 shows a scatter diagram showing Percentage of Pupils against Median Income for all of the areas of London for which data is available.} \includegraphics[alt={},max width=\textwidth]{e0b502a8-c742-4d78-993c-8c0c7329ec9c-09_716_1378_356_244}
    \end{figure} Fig. 11.2 Alex identifies some outliers.
  3. On the copy of Fig. 11.2 in the Printed Answer Booklet, ring three of these outliers. Alex then discards all the outliers and uses the LINEST function on a spreadsheet to obtain the following model.
    \(\mathrm { P } = 0.0009049 \mathrm { M } + 37.38\),
    where \(P =\) percentage of pupils and \(M =\) median income.
  4. Show that the model is a good fit for the data for Hackney.
  5. Use the model to find an estimate of the value of \(P\) for City of London.
  6. Give two reasons why this estimate may not be reliable. Alex states that more than 50\% of the pupils in London achieved at least a grade C at the end of KS4 in English and Maths in 2014-5.
  7. Use the information in Fig. 11.2 together with your knowledge of the pre-release material to explain whether there is evidence to support this statement.
OCR MEI AS Paper 2 2023 June Q8
8 The pre-release material contains information on Pulse Rate and Body Mass Index (BMI). A student is investigating whether there is a relationship between pulse rate and BMI. A section of the available data is shown in the table.
SexAgeBMIPulse
Male6229.5460
Female2023.68\#N/A
Male1726.9772
Male3524.764
Male1720.0954
Male8523.8654
Female8124.04\#N/A
The student decides to draw a scatter diagram.
  1. With reference to the table, explain which data should be cleaned before any analysis takes place. The student cleans the data for BMI and Pulse Rate in the pre-release material and draws a scatter diagram. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Scatter diagram of Pulse Rate against BMI} \includegraphics[alt={},max width=\textwidth]{82438df0-6550-4ffd-92d8-3c67bec59a6b-06_869_1575_1585_246}
    \end{figure} The student identifies one outlier.
  2. On the copy of the scatter diagram in the Printed Answer Booklet, circle this outlier. The student decides to remove this outlier from the data. They then use the LINEST function in the spreadsheet to obtain the following formula for the line of best fit.
    \(\mathrm { P } = 0.29 \mathrm { Q } + 64.2\),
    where \(P =\) PulseRate and \(Q = \mathrm { BMI }\). They use this to estimate the Pulse Rate of a person with BMI 23.68.
    They obtain a value of 71 correct to the nearest whole number.
  3. With reference to the scatter diagram, explain whether it is appropriate to use the formula for the line of best fit. It is suggested that all pairs of values where the pulse rate is above 100 should also be cleaned from the data, as they must be incorrect.
  4. Use your knowledge of the pre-release material to explain whether or not all pairs of values with a pulse rate of more than 100 should be cleaned from the data.
OCR MEI AS Paper 2 2020 November Q10
10 Fig. 10.1 shows a sample collected from the large data set. BMI is defined as \(\frac { \text { mass of person in kilograms } } { \text { square of person's height in metres } }\). \begin{table}[h]
SexAge in yearsMass in kgHeight in cmBMI
Male3877.6164.828.57
Male1763.5170.321.89
Male1868.0172.322.91
Male1857.2172.219.29
Male1977.6191.221.23
Male2472.7177.023.21
Male2592.5177.929.23
Male2670.4159.427.71
Male3177.5174.025.60
Male34132.4182.239.88
Male38115.0186.433.10
Male40112.1171.738.02
\captionsetup{labelformat=empty} \caption{Fig. 10.1}
\end{table}
  1. Calculate the mass in kg of a person with a BMI of 23.56 and a height of 181.6 cm , giving your answer correct to 1 decimal place. Fig. 10.2 shows a scatter diagram of BMI against age for the data in the table. A line of best fit has also been drawn. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c08a2212-3104-425e-8aee-7f2d46f23924-09_682_1212_351_248} \captionsetup{labelformat=empty} \caption{Fig. 10.2}
    \end{figure}
  2. Describe the correlation between age and BMI.
  3. Use the line of best fit to estimate the BMI of a 30-year-old man.
  4. Explain why it would not be sensible to use the line of best fit to estimate the BMI of a 60-year-old man.
  5. Use your knowledge of the large data set to suggest two reasons why the sample data in the table may not be representative of the population.
  6. Once the data in the large data set had been cleaned there were 196 values available for selection. Describe how a sample of size 12 could be generated using systematic sampling so that each of the 196 values could be selected in the sample.
OCR MEI Paper 2 2021 November Q12
12 Fig. 12.1 shows an excerpt from the pre-release material. \begin{table}[h]
ABCDEFGH
1SexAgeMaritalWeightHeightBMIWaistPulse
2Female34Married60.3173.420.0582.574
3Female85Widowed64.7161.224.9\#N/A\#N/A
4Female48Divorced100.6171.434.24105.692
5Male61Married70.9169.524.6892.270
6Male68Divorced96.8181.629.35112.968
\captionsetup{labelformat=empty} \caption{Fig. 12.1}
\end{table} There was no data available for cell H3.
  1. Explain why \#N/A is used when no data is available. Fig. 12.2 shows a scatter diagram of pulse rate against BMI (Body Mass Index) for females. All the available data was used. Pulse rate against BMI for females \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c9d14a4d-a1c8-42ad-9c0b-42cef6b3612f-08_659_1552_1363_233} \captionsetup{labelformat=empty} \caption{Fig. 12.2}
    \end{figure} There are two outliers on the diagram.
  2. On the copy of Fig. 12.2 in the Printed Answer Booklet, ring these outliers.
  3. Use your knowledge of the pre-release material to explain whether either of these outliers should be removed.
  4. State whether the diagram suggests there is any correlation between pulse rate and BMI. The product moment correlation coefficient between waist measurement, \(w\), in cm and BMI, \(b\), for females was found to be 0.912 . All the available data was used.
  5. Explain why a model of the form \(\mathrm { w } = \mathrm { mb } + \mathrm { c }\) for the relationship between waist measurement and BMI is likely to be appropriate. The LINEST function on a spreadsheet gives \(m = 2.16\) and \(c = 33.0\).
  6. Calculate an estimate of the value for cell G3 in Fig. 12.1.
Edexcel S1 Q1
  1. (a) Explain briefly what you understand by a statistical model.
    (2 marks)
    A zoologist is analysing data on the weights of adult female otters.
    (b) Name a distribution that you think might be suitable for modelling such data.
    (1 mark)
    (c) Describe two features that you would expect to find in the distribution of the weights of adult female otters and that led to your choice in part (b).
    (2 marks)
    (d) Why might your choice in part (b) not be suitable for modelling the weights of all adult otters?
    (1 mark)
  2. For a geography project a student studied weather records kept by her school since 1993. To see if there was any evidence of global warming she worked out the mean temperature in degrees Celsius at noon for the month of June in each year.
Her results are shown in the table below.
Year19931994199519961997199819992000
Mean temperature
\(\left( { } ^ { \circ } \mathrm { C } \right)\)
21.924.120.723.024.222.122.623.9