Interpret census or real-world data

A question is this type if and only if it asks to analyze or interpret bivariate relationships in census data, population data, or other real-world datasets with multiple Local Authorities or regions.

11 questions

OCR H240/02 2018 June Q11
11 Christa used Pearson's product-moment correlation coefficient, \(r\), to compare the use of public transport with the use of private vehicles for travel to work in the UK.
  1. Using the pre-release data set for all 348 UK Local Authorities, she considered the following four variables.
    Number of employees using
    public transport
    \(x\)
    Number of employees using
    private vehicles
    \(y\)
    Proportion of employees using
    public transport
    \(a\)
    Proportion of employees using
    private vehicles
    \(b\)
    (a) Explain, in context, why you would expect strong, positive correlation between \(x\) and \(y\).
    (b) Explain, in context, what kind of correlation you would expect between \(a\) and \(b\).
  2. Christa also considered the data for the 33 London boroughs alone and she generated the following scatter diagram. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{London} \includegraphics[alt={},max width=\textwidth]{65d9d34c-8c78-45fe-b9f0-dab071ae56bb-07_467_707_1366_653}
    \end{figure} One London Borough is represented by an outlier in the diagram.
    (a) Suggest what effect this outlier is likely to have on the value of \(r\) for the 32 London Boroughs.
    (b) Suggest what effect this outlier is likely to have on the value of \(r\) for the whole country.
    (c) What can you deduce about the area of the London Borough represented by the outlier? Explain your answer.
OCR H240/02 Q9
9 The diagram below shows some "Cycle to work" data taken from the 2001 and 2011 UK censuses. The diagram shows the percentages, by age group, of male and female workers in England and Wales, excluding London, who cycled to work in 2001 and 2011.
\includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-10_951_1635_559_207} The following questions refer to the workers represented by the graphs in the diagram.
  1. A researcher is going to take a sample of men and a sample of women and ask them whether or not they cycle to work. Why would it be more important to stratify the sample of men? A research project followed a randomly chosen large sample of the group of male workers who were aged 30-34 in 2001.
  2. Does the diagram suggest that the proportion of this group who cycled to work has increased or decreased from 2001 to 2011?
    Justify your answer.
  3. Write down one assumption that you have to make about these workers in order to draw this conclusion.
OCR H240/02 Q13
13 The table and the four scatter diagrams below show data taken from the 2011 UK census for four regions. On the scatter diagrams the names have been replaced by letters.
The table shows, for each region, the mean and standard deviation of the proportion of workers in each Local Authority who travel to work by driving a car or van and the proportion of workers in each Local Authority who travel to work as a passenger in a car or van.
Each scatter diagram shows, for each of the Local Authorities in a particular region, the proportion of workers who travel to work by driving a car or van and the proportion of workers who travel to work as a passenger in a car or van.
Driving a car or vanPassenger in a car or van
MeanStandard deviationMeanStandard deviation
London0.2570.1330.0170.008
South East0.5780.0640.0450.010
South West0.5800.0840.0490.007
Wales0.6440.0450.0680.015
Region A
\includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-14_634_1116_1308_299} Region B
\includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-14_636_1109_2049_301}
\includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-15_737_1183_237_240}
\includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-15_723_1169_1046_246}
  1. Using the values given in the table, match each region to its corresponding scatter diagram, explaining your reasoning.
  2. Steven claims that the outlier in the scatter diagram for Region C consists of a group of small islands. Explain whether or not the data given above support his claim.
  3. One of the Local Authorities in Region B consists of a single large island. Explain whether or not you would expect this Local Authority to appear as an outlier in the scatter diagram for Region B.
OCR PURE Q9
9 A researcher is studying changes in behaviour in travelling to work by people who live outside London, between 2001 and 2011. He chooses the 15 Local Authorities (LAs) outside London with the largest decreases in the percentage of people driving to work, and arranges these in descending order. The table shows the changes in percentages from 2001 to 2011 in various travel categories, for these Local Authorities.
Local AuthorityWork mainly at or from homeUnderground, metro, light rail, tramTrainBus, minibus or coachDriving a car or vanPassenger in a car or vanBicycleOn foot
Brighton and Hove3.20.11.50.8-8.2-1.52.12.3
Cambridge2.20.01.61.2-7.4-1.03.10.6
Elmbridge2.90.44.10.2-6.6-0.70.3-0.3
Oxford2.00.00.6-0.4-5.2-1.12.22.1
Epsom and Ewell1.60.43.91.1-5.2-0.90.0-0.6
Watford0.72.03.10.4-4.5-1.20.0-0.1
Tandridge3.30.24.0-0.1-4.5-1.10.0-1.3
Mole Valley3.30.11.90.3-4.4-0.70.2-0.3
St Albans2.30.33.4-0.3-4.3-1.20.3-0.2
Chiltern2.91.41.40.1-4.2-0.6-0.2-0.8
Exeter0.70.01.0-0.6-4.2-1.51.73.4
Woking2.10.13.70.0-4.2-1.3-0.10.0
Reigate and Banstead1.80.13.20.6-4.1-1.00.1-0.2
Waverley4.30.12.5-0.5-3.9-0.9-0.3-0.9
Guildford2.70.12.40.2-3.6-1.20.0-0.3
  1. Explain why these LAs are not necessarily the 15 LAs with the largest decreases in the percentage of people driving to work.
  2. The researcher wants to talk to those LAs outside London which have been most successful in encouraging people to change to cycling or walking to work.
    Suggest four LAs that he should talk to and why.
  3. The researcher claims that Waverley is the LA outside London which has had the largest increase in the number of people working mainly at or from home.
    Does the data support his claim? Explain your answer.
  4. Which two categories have replaced driving to work for the highest percentages of workers in these LAs? Support your answer with evidence from the table.
  5. The researcher suggested that there would be strong correlation between the decrease in the percentage driving to work and the increase in percentage working mainly at or from home. Without calculation, use data from the table to comment briefly on this suggestion.
OCR PURE Q11
11 A student is investigating changes in the number of residents in Local Authorities in the SouthEast Region between 2001 and 2011. The scatter diagram shows the number \(x\) of residents in these Local Authorities in the age group 8 to 9 in 2001 and the number \(y\) of residents in the same Local Authorities in the age group 18 to 19 in 2011.
\includegraphics[max width=\textwidth, alt={}, center]{d44919ed-806d-48c0-9726-c5fd67764504-07_1662_1406_431_210}
  1. Suggest a reason why the student is comparing these two age groups in 2001 and 2011. The student notices that most of the data points are close to the line \(y = x\).
    1. Explain what this suggests about the residents in these Local Authorities.
    2. The student says that correlation does not imply causation, so there is no causal link between the values of \(x\) and the values of \(y\). Explain whether or not they are correct.
  2. Some of these Local Authorities contain universities.
    1. On the diagram in the Printed Answer Booklet, circle three points that are likely to represent Local Authorities containing universities.
    2. Give a reason for your choice of points in part (c)(i). Assume that the proportion of residents in age group 8 to 9 in 2001 was roughly the same in each Local Authority in the South-East. The Local Authority in this region with the largest population is Medway.
  3. On the diagram in the Printed Answer Booklet, label clearly with the letter \(M\) the point that corresponds to Medway.
OCR PURE Q12
12 This question deals with information about the populations of Local Authorities (LAs) in the North of England, taken from the 2011 census. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Fig. 1} \includegraphics[alt={},max width=\textwidth]{e42b1a99-c3ca-4ce1-becd-cd346aec757e-10_437_903_450_109}
\end{figure} \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Fig. 2} \includegraphics[alt={},max width=\textwidth]{e42b1a99-c3ca-4ce1-becd-cd346aec757e-10_423_905_466_1046}
\end{figure} Fig. 1 and Fig. 2 both show strong correlation, but of two different kinds.
  1. For each diagram, use a single word to describe the kind of correlation shown.
  2. For each diagram, suggest a reason, in context, why the correlation is of the particular kind described in part (a). Fig. 3 is the same as Fig. 2 but with the point \(A\) marked.
    Fig. 4 shows information about the same LAs as Fig. 2 and Fig. 3. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 3} \includegraphics[alt={},max width=\textwidth]{e42b1a99-c3ca-4ce1-becd-cd346aec757e-10_417_904_1674_95}
    \end{figure} \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 4} \includegraphics[alt={},max width=\textwidth]{e42b1a99-c3ca-4ce1-becd-cd346aec757e-10_449_887_1644_1080}
    \end{figure}
  3. Point \(A\) in Fig. 3 and point \(B\) in Fig. 4 represent the same LA. Explain how you can tell that this LA has a large population. \section*{END OF QUESTION PAPER}
OCR MEI AS Paper 2 2018 June Q11
11 The pre-release material contains data concerning the death rate per thousand people and the birth rate per thousand people in all the countries of the world. The diagram in Fig. 11.1 was generated using a spreadsheet and summarises the birth rates for all the countries in Africa. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{ea65a2ad-f066-4075-a237-b799a8fb6f50-6_526_896_386_589} \captionsetup{labelformat=empty} \caption{Fig. 11.1}
\end{figure}
  1. Identify two respects in which the presentation of the data is incorrect. Fig. 11.2 shows a scatter diagram of death rate, \(y\), against birth rate, \(x\), for a sample of 55 countries, all of which are in Africa. A line of best fit has also been drawn. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{ea65a2ad-f066-4075-a237-b799a8fb6f50-6_609_1073_1283_497} \captionsetup{labelformat=empty} \caption{Fig. 11.2}
    \end{figure} The equation of the line of best fit is \(y = 0.15 x + 4.72\).
  2. (A) What does the diagram suggest about the relationship between death rate and birth rate?
    (B) The birth rate in Togo is recorded as 34.13 per thousand, but the data on death rate has been lost. Use the equation of the line of best fit to estimate the death rate in Togo.
    (C) Explain why it would not be sensible to use the equation of the line of best fit to estimate the death rate in a country where the birth rate is 5.5 per thousand.
    (D) Explain why it would not be sensible to use the equation of the line of best fit to estimate the death rate in a Caribbean country where the birth rate is known.
    (E) Explain why it is unlikely that the sample is random. Including Togo there were 56 items available for selection.
  3. Describe how a sample of size 14 from this data could be generated for further analysis using systematic sampling.
OCR MEI Paper 2 2019 June Q14
14 The pre-release material includes data concerning crude death rates in different countries of the world. Fig. 14.1 shows some information concerning crude death rates in countries in Europe and in Africa. \begin{table}[h]
EuropeAfrica
\(n\)4856
minimum6.283.58
lower quartile8.507.31
median9.538.71
upper quartile11.4111.93
maximum14.4614.89
\captionsetup{labelformat=empty} \caption{Fig. 14.1}
\end{table}
  1. Use your knowledge of the large data set to suggest a reason why the statistics in Fig. 14.1 refer to only 48 of the 51 European countries.
  2. Use the information in Fig. 14.1 to show that there are no outliers in either data set. The crude death rate in Libya is recorded as 3.58 and the population of Libya is recorded as 6411776.
  3. Calculate an estimate of the number of deaths in Libya in a year. The median age in Germany is 46.5 and the crude death rate is 11.42. The median age in Cyprus is 36.1 and the crude death rate is 6.62 .
  4. Explain why a country like Germany, with a higher median age than Cyprus, might also be expected to have a higher crude death rate than Cyprus. Fig. 14.2 shows a scatter diagram of median age against crude death rate for countries in Africa and Fig. 14.3 shows a scatter diagram of median age against crude death rate for countries in Europe. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{95eb3bcc-6d3c-4f7e-9b27-5e046ab57ec5-10_678_1221_1975_248} \captionsetup{labelformat=empty} \caption{Fig. 14.2}
    \end{figure} \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{95eb3bcc-6d3c-4f7e-9b27-5e046ab57ec5-11_588_1248_223_228} \captionsetup{labelformat=empty} \caption{Fig. 14.3}
    \end{figure} The rank correlation coefficient for the data shown in Fig. 14.2 is - 0.281206 .
    The rank correlation coefficient for the data shown in Fig. 14.3 is 0.335215 .
  5. Compare and contrast what may be inferred about the relationship between median age and crude death rate in countries in Africa and in countries in Europe.
OCR MEI Paper 2 2023 June Q14
14 The pre-release material contains information concerning the median income of taxpayers in \(\pounds\) and the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C, including English and Maths, for different areas of London. Some of the data for 2014/15 is shown in Fig. 14.1. \begin{table}[h]
\captionsetup{labelformat=empty} \caption{Fig. 14.1}
Median Income of Taxpayers in £Percentage of Pupils Achieving 5 or more A*-C, including English and Maths
City of London61100\#N/A
Barking and Dagenham2180054.0
Barnet2710070.1
Bexley2440055.0
Brent2270060.0
Bromley2810068.0
\end{table} A student investigated whether there is any relationship between median income of taxpayers and percentage of pupils achieving 5 or more GCSEs at grade A*-C, including English and Maths.
  1. With reference to Fig. 14.1, explain how the data should be cleaned before any analysis can take place. After the data was cleaned, the student used software to draw the scatter diagram shown in Fig. 14.2. Scatter diagram to show percentage of pupils achieving 5 A*-C grades against median income of taxpayers \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 14.2} \includegraphics[alt={},max width=\textwidth]{11788aaf-98fb-4a78-8a40-a40743b1fe15-10_574_1481_1900_241}
    \end{figure} The student calculated that the product moment correlation coefficient for these data is 0.3743 .
  2. Give two reasons why it may not be appropriate to use a linear model for the relationship between median income of taxpayers in \(\pounds\) and the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C. The student carried out some further analysis. The results are shown in Fig. 14.3. \begin{table}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 14.3}
    median income of
    taxpayers in \(\pounds\)
    percentage of pupils
    achieving \(5 + \mathrm { A } ^ { * } - \mathrm { C }\)
    mean2721661.0
    standard deviation4177.55.32
    \end{table} The student identified three outliers in total.
    • Use the information in Fig. 14.3 to determine the range of values of the median income of taxpayers in \(\pounds\) which are outliers.
    • Use the information in Fig. 14.3 to determine the range of values of the percentage of all pupils at the end of KS4 achieving 5 or more GCSEs at grade A*-C which are outliers.
    • On the copy of Fig. 14.2 in the Printed Answer Booklet, circle the three outliers identified by the student.
    The student decided to remove these outliers and recalculate the product moment correlation coefficient.
  3. Explain whether the new value of the product moment correlation coefficient would be between 0.3743 and 1 or between 0 and 0.3743 .
SPS SPS SM Statistics 2024 April Q3
3. The table shows the increases, between 2001 and 2011, in the percentages of employees travelling to work by various methods, in the Local Authorities (LAs) in the North East region of the UK.
Geography codeLocal authorityWork mainly at or from homeUnderground, metro, light rail or tramBus, minibus or coachDriving a car or vanPassenger in a car or vanOn foot
E06000047County Durham0.74\%0.05\%-1.50\%4.58\%-2.99\%-0.97\%
E06000005Darlington0.26\%-0.01\%-3.25\%3.06\%-1.28\%0.99\%
E08000020Gateshead-0.01\%-0.01\%-2.28\%4.62\%-2.35\%-0.18\%
E06000001Hartlepool0.03\%-0.04\%-1.62\%4.80\%-2.38\%-0.26\%
E06000002Middlesbrough-0.34\%-0.01\%-2.32\%2.19\%-1.33\%0.67\%
E08000021Newcastle upon Tyne0.10\%-0.23\%-0.67\%-0.48\%-1.51\%1.75\%
E08000022North Tyneside0.05\%0.54\%-1.18\%3.30\%-2.21\%-0.60\%
E06000048Northumberland1.39\%-0.08\%-0.95\%3.50\%-2.37\%-1.44\%
E06000003Redcar and Cleveland-0.02\%-0.01\%-2.09\%4.20\%-2.06\%-0.49\%
E08000023South Tyneside-0.36\%2.03\%-3.05\%4.50\%-2.41\%-0.51\%
E06000004Stockton-on-Tees0.14\%0.03\%-2.02\%3.52\%-2.01\%-0.15\%
E08000024Sunderland0.17\%1.48\%-3.11\%4.89\%-2.21\%-0.52\%
\section*{Increase in percentage of employees travelling to work by various methods} The first two digits of the Geography code give the type of each of the LAs:
06: Unitary authority
07: Non-metropolitan district
08: Metropolitan borough
  1. In what type of LA are the largest increases in percentages of people travelling by underground, metro, light rail or tram?
  2. Identify two main changes in the pattern of travel to work in the North East region between 2001 and 2011. Now assume the following.
    • The data refer to residents in the given LAs who are in the age range 20 to 65 at the time of each census.
    • The number of people in the age range 20 to 65 who move into or out of each given LA, or who die, between 2001 and 2011 is negligible.
    • Estimate the percentage of the people in the age range 20 to 65 in 2011 whose data appears in both 2001 and 2011.
    • In the light of your answer to part (c), suggest a reason for the changes in the pattern of travel to work in the North East region between 2001 and 2011.
      [0pt] [BLANK PAGE]
OCR Stats 1 2017 Specimen Q13
13 The table and the four scatter diagrams below show data taken from the 2011 UK census for four regions. On the scatter diagrams the names have been replaced by letters.
The table shows, for each region, the mean and standard deviation of the proportion of workers in each Local Authority who travel to work by driving a car or van and the proportion of workers in each Local Authority who travel to work as a passenger in a car or van.
Each scatter diagram shows, for each of the Local Authorities in a particular region, the proportion of workers who travel to work by driving a car or van and the proportion of workers who travel to work as a passenger in a car or van.
Driving a car or vanPassenger in a car or van
MeanStandard deviationMeanStandard deviation
London0.2570.1330.0170.008
South East0.5780.0640.0450.010
South West0.5800.0840.0490.007
Wales0.6440.0450.0680.015
Region A
\includegraphics[max width=\textwidth, alt={}, center]{cac31da4-f1ad-4c34-a47f-2bc68c2304f1-14_607_1047_1228_369} Region B
\includegraphics[max width=\textwidth, alt={}, center]{cac31da4-f1ad-4c34-a47f-2bc68c2304f1-14_598_1043_1927_371}
\includegraphics[max width=\textwidth, alt={}, center]{cac31da4-f1ad-4c34-a47f-2bc68c2304f1-15_678_1104_227_317} Region D
\includegraphics[max width=\textwidth, alt={}, center]{cac31da4-f1ad-4c34-a47f-2bc68c2304f1-15_615_1058_1048_367}
  1. Using the values given in the table, match each region to its corresponding scatter diagram, explaining your reasoning.
  2. Steven claims that the outlier in the scatter diagram for Region C consists of a group of small islands. Explain whether or not the data given above support his claim.
  3. One of the Local Authorities in Region B consists of a single large island. Explain whether or not you would expect this Local Authority to appear as an outlier in the scatter diagram for Region B.