2.02c Scatter diagrams and regression lines

115 questions

Sort by: Default | Easiest first | Hardest first
OCR S1 2005 January Q1
4 marks Easy -1.3
1 The scatter diagrams below illustrate three sets of bivariate data, \(A , B\) and \(C\). \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_440_428_360_317} \captionsetup{labelformat=empty} \caption{Set \(A\)}
\end{figure} \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_440_426_360_858} \captionsetup{labelformat=empty} \caption{Set \(B\)}
\end{figure} \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{f0c0a4ca-da0a-4c74-b8b1-bac4fd3f2487-2_435_424_365_1402} \captionsetup{labelformat=empty} \caption{Set \(C\)}
\end{figure} State, with an explanation in each case, which of the three sets of data has
  1. the largest,
  2. the smallest,
    value of the product moment correlation coefficient.
OCR MEI S2 2008 June Q1
18 marks Standard +0.3
1 A researcher believes that there is a negative correlation between money spent by the government on education and population growth in various countries. A random sample of 48 countries is selected to investigate this belief. The level of government spending on education \(x\), measured in suitable units, and the annual percentage population growth rate \(y\), are recorded for these countries. Summary statistics for these data are as follows. $$\Sigma x = 781.3 \quad \Sigma y = 57.8 \quad \Sigma x ^ { 2 } = 14055 \quad \Sigma y ^ { 2 } = 106.3 \quad \Sigma x y = 880.1 \quad n = 48$$
  1. Calculate the sample product moment correlation coefficient.
  2. Carry out a hypothesis test at the \(5 \%\) significance level to investigate the researcher's belief. State your hypotheses clearly, defining any symbols which you use.
  3. State the distributional assumption which is necessary for this test to be valid. Explain briefly how a scatter diagram may be used to check whether this assumption is likely to be valid.
  4. A student suggests that if the variables are negatively correlated then population growth rates can be reduced by increasing spending on education. Explain why the student may be wrong. Discuss an alternative explanation for the correlation.
  5. State briefly one advantage and one disadvantage of using a smaller sample size in this investigation.
Edexcel S1 2014 January Q3
11 marks Moderate -0.8
3. Jean works for an insurance company. She randomly selects 8 people and records the price of their car insurance, \(\pounds p\), and the time, \(t\) years, since they passed their driving test. The data is shown in the table below.
\(t\)1013171822242527
\(p\)720650430490500390280300
$$\text { (You may use } \bar { t } = 19.5 , \bar { p } = 470 , S _ { t p } = - 6080 , S _ { t t } = 254 , S _ { p p } = 169200 \text { ) }$$
  1. On the graph below draw a scatter diagram for these data.
  2. Comment on the relationship between \(p\) and \(t\).
  3. Find the equation of the regression line of \(p\) on \(t\).
  4. Use your regression equation to estimate the price of car insurance for someone who passed their driving test 20 years ago. Jack passed his test 39 years ago and decides to use Jean's data to predict the price of his car insurance.
  5. Comment on Jack's decision. Give a reason for your answer. \includegraphics[max width=\textwidth, alt={}, center]{a839a89a-17f0-473b-ac10-bcec3dbe97f7-06_951_1365_1603_294}
Edexcel S1 2015 January Q5
13 marks Easy -1.3
  1. The resting heart rate, \(h\) beats per minute (bpm), and average length of daily exercise, \(t\) minutes, of a random sample of 8 teachers are shown in the table below.
\(t\)2035402545707590
\(h\)8885777571666054
  1. State, with a reason, which variable is the response variable. The equation of the least squares regression line of \(h\) on \(t\) is $$h = 93.5 - 0.43 t$$
  2. Give an interpretation of the gradient of this regression line.
  3. Find the value of \(\bar { t }\) and the value of \(\bar { h }\)
  4. Show that the point \(( \bar { t } , \bar { h } )\) lies on the regression line.
  5. Estimate the resting heart rate of a teacher with an average length of daily exercise of 1 hour.
  6. Comment, giving a reason, on the reliability of the estimate in part (e). The resting heart rate of teachers is assumed to be normally distributed with mean 73 bpm and standard deviation 8 bpm . The middle \(95 \%\) of resting heart rates of teachers lies between \(a\) and \(b\)
  7. Find the value of \(a\) and the value of \(b\).
Edexcel S1 2019 January Q6
18 marks Moderate -0.3
  1. Following some school examinations, Chetna is studying the results of the 16 students in her class. The mark for paper \(1 , x\), and the mark for paper \(2 , y\), for each student are summarised in the following statistics.
$$\bar { x } = 35.75 \quad \bar { y } = 25.75 \quad \sigma _ { x } = 7.79 \quad \sigma _ { y } = 11.91 \quad \sum x y = 15837$$
  1. Comment on the differences between the marks of the students on paper 1 and paper 2 Chetna decides to examine these data in more detail and plots the marks for each of the 16 students on the scatter diagram opposite.
    1. Explain why the circled point \(( 38,0 )\) is possibly an outlier.
    2. Suggest a possible reason for this result. Chetna decides to omit the data point \(( 38,0 )\) and examine the other 15 students' marks.
  2. Find the value of \(\bar { x }\) and the value of \(\bar { y }\) for these 15 students. For these 15 students
    1. explain why \(\sum x y\) is still 15837
    2. show that \(\mathrm { S } _ { x y } = 1169.8\) For these 15 students, Chetna calculates \(\mathrm { S } _ { x x } = 965.6\) and \(\mathrm { S } _ { y y } = 1561.7\) correct to 1 decimal place.
  3. Calculate the product moment correlation coefficient for these 15 students.
  4. Calculate the equation of the line of regression of \(y\) on \(x\) for these 15 students, giving your answer in the form \(y = a + b x\) The product moment correlation coefficient between \(x\) and \(y\) for all 16 students is 0.746
  5. Explain how your calculation in part (e) supports Chetna's decision to omit the point \(( 38,0 )\) before calculating the equation of the linear regression line.
    (1)
  6. Estimate the mark in the second paper for a student who scored 38 marks in the first paper.
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-17_1127_1146_301_406}
    \includegraphics[max width=\textwidth, alt={}]{d3f4450d-60eb-49b6-be1b-d2fcfad0451f-20_2630_1828_121_121}
Edexcel S1 2021 January Q5
17 marks Moderate -0.8
  1. A company director wants to introduce a performance-related pay structure for her managers. A random sample of 15 managers is taken and the annual salary, \(y\) in \(\pounds 1000\), was recorded for each manager. The director then calculated a performance score, \(x\), for each of these managers.
    The results are shown on the scatter diagram in Figure 1 on the next page.
    1. Describe the correlation between performance score and annual salary.
    The results are also summarised in the following statistics. $$\sum x = 465 \quad \sum y = 562 \quad \mathrm {~S} _ { x x } = 2492 \quad \sum y ^ { 2 } = 23140 \quad \sum x y = 19428$$
    1. Show that \(\mathrm { S } _ { x y } = 2006\)
    2. Find \(\mathrm { S } _ { y y }\)
  2. Find the product moment correlation coefficient between performance score and annual salary. The director believes that there is a linear relationship between performance score and annual salary.
  3. State, giving a reason, whether or not these data are consistent with the director's belief.
  4. Calculate the equation of the regression line of \(y\) on \(x\), in the form \(y = a + b x\) Give the value of \(a\) and the value of \(b\) to 3 significant figures.
  5. Give an interpretation of the value of \(b\).
  6. Plot your regression line on the scatter diagram in Figure 1 The director hears that one of the managers in the sample seems to be underperforming.
  7. On the scatter diagram, circle the point that best identifies this manager. The director decides to use this regression line for the new performance related pay structure.
    1. Estimate, to 3 significant figures, the new salary of a manager with a performance score of 30 \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{4f034b9a-94c8-42f2-bd77-9adec277aba6-15_1390_1408_299_187} \captionsetup{labelformat=empty} \caption{Figure 1}
      \end{figure} \includegraphics[max width=\textwidth, alt={}, center]{4f034b9a-94c8-42f2-bd77-9adec277aba6-17_2654_99_115_9} Annual salary (£1000) \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{Only use this scatter diagram if you need to redraw your line.} \includegraphics[alt={},max width=\textwidth]{4f034b9a-94c8-42f2-bd77-9adec277aba6-17_1378_1143_402_468}
      \end{figure}
Edexcel S1 2023 January Q6
14 marks Moderate -0.3
  1. A research student is investigating the maximum weight, \(y\) grams, of sugar that will dissolve in 100 grams of water at various temperatures, \(x ^ { \circ } \mathrm { C }\), where \(10 \leqslant x \leqslant 80\)
The research student calculated the regression line of \(y\) on \(x\) and found it to be $$y = 151.2 + 2.72 x$$
  1. Give an interpretation of the gradient of the regression line.
  2. Use the regression line to estimate the maximum weight of sugar that will dissolve in 100 grams of water when the temperature is \(90 ^ { \circ } \mathrm { C }\).
  3. Comment on the reliability of your estimate, giving a reason for your answer. Using the regression line of \(y\) on \(x\) and the following summary statistics $$\sum y = 3119 \quad \sum y ^ { 2 } = 851093 \quad \sum x ^ { 2 } = 24500 \quad n = 12$$
  4. show that the product moment correlation coefficient for these data is 0.988 to 3 decimal places. The research student's supervisor plotted the original data on a scatter diagram, shown on page 23 With reference to both the scatter diagram and the correlation coefficient,
  5. discuss the suitability of a linear regression model to describe the relationship between \(x\) and \(y\).
    \includegraphics[max width=\textwidth, alt={}]{c316fa29-dedc-4890-bd82-31eb0bb819f9-23_990_1138_205_356}
Edexcel S1 2014 June Q1
12 marks Moderate -0.8
  1. A medical researcher is studying the relationship between age ( \(x\) years) and volume of blood ( \(y \mathrm { ml }\) ) pumped by each contraction of the heart. The researcher obtained the following data from a random sample of 8 patients.
Age (x)2025304555606570
Volume (y)7476777268676462
[You may use \(\sum x = 370 , \mathrm {~S} _ { x x } = 2587.5 , \sum y = 560 , \sum y ^ { 2 } = 39418 , \mathrm {~S} _ { x y } = - 710\) ]
  1. Calculate \(\mathrm { S } _ { y y }\)
  2. Calculate the product moment correlation coefficient for these data.
  3. Interpret your value of the correlation coefficient. The researcher believes that a linear regression model may be appropriate to describe these data.
  4. State, giving a reason, whether or not your value of the correlation coefficient supports the researcher's belief.
  5. Find the equation of the regression line of \(y\) on \(x\), giving your answer in the form \(y = a + b x\) Jack is a 40-year-old patient.
    1. Use your regression line to estimate the volume of blood pumped by each contraction of Jack's heart.
    2. Comment, giving a reason, on the reliability of your estimate.
Edexcel S1 2015 June Q2
13 marks Moderate -0.3
2. Paul believes there is a relationship between the value and the floor size of a house. He takes a random sample of 20 houses and records the value, \(\pounds v\), and the floor size, \(s \mathrm {~m} ^ { 2 }\) The data were coded using \(x = \frac { s - 50 } { 10 }\) and \(y = \frac { v } { 100000 }\) and the following statistics obtained. $$\sum x = 441.5 , \quad \sum y = 59.8 , \quad \sum x ^ { 2 } = 11261.25 , \quad \sum y ^ { 2 } = 196.66 , \quad \sum x y = 1474.1$$
  1. Find the value of \(S _ { x y }\) and the value of \(S _ { x x }\)
  2. Find the equation of the least squares regression line of \(y\) on \(x\) in the form \(y = a + b x\) The least squares regression line of \(v\) on \(s\) is \(v = c + d s\)
  3. Show that \(d = 1020\) to 3 significant figures and find the value of \(c\)
  4. Estimate the value of a house of floor size \(130 \mathrm {~m} ^ { 2 }\)
  5. Interpret the value \(d\) Paul wants to increase the value of his house. He decides to add an extension to increase the floor size by \(31 \mathrm {~m} ^ { 2 }\)
  6. Estimate the increase in the value of Paul's house after adding the extension.
Edexcel S1 2015 June Q7
6 marks Easy -1.8
7. A doctor is investigating the correlation between blood protein, \(p\), and body mass index, \(b\). He takes a random sample of 8 patients and the data are shown in the table below.
Patient\(A\)\(B\)\(C\)\(D\)\(E\)\(F\)\(G\)\(H\)
\(b\)3236404442212737
\(p\)1821313921121970
  1. Draw a scatter diagram of these data on the axes provided. \includegraphics[max width=\textwidth, alt={}, center]{36cf6341-1957-45b9-9f7d-0914506f5919-13_938_673_785_614} The doctor decides to leave out patient \(H\) from his calculations.
  2. Give a reason for the doctor's decision. For the 7 patients \(A , B , C , D , E , F\) and \(G\), $$S _ { b p } = 369 , \quad S _ { p p } = 490 \text { and } S _ { b b } = 423 \frac { 5 } { 7 }$$
  3. Find the product moment correlation coefficient, \(r\), for these 7 patients.
  4. Without any further calculations, state how \(r\) would differ from your answer in part (c) if it was calculated for all 8 patients. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{36cf6341-1957-45b9-9f7d-0914506f5919-15_1322_1593_207_173} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure} The histogram in Figure 1 summarises the times, in minutes, that 200 people spent shopping in a supermarket.
    1. Give a reason to justify the use of a histogram to represent these data. Given that 40 people spent between 11 and 21 minutes shopping in the supermarket, estimate
    2. the number of people that spent between 18 and 25 minutes shopping in the supermarket,
    3. the median time spent shopping in the supermarket by these 200 people. The mid-point of each bar is represented by \(x\) and the corresponding frequency by f .
    4. Show that \(\sum \mathrm { f } x = 6390\) Given that \(\sum \mathrm { f } x ^ { 2 } = 238430\)
    5. for the data shown in the histogram, calculate estimates of
      1. the mean,
      2. the standard deviation. A coefficient of skewness is given by \(\frac { 3 ( \text { mean } - \text { median } ) } { \text { standard deviation } }\)
    6. Calculate this coefficient of skewness for these data. The manager of the supermarket decides to model these data with a normal distribution.
    7. Comment on the manager's decision. Give a justification for your answer.
OCR MEI S2 2015 June Q1
17 marks Moderate -0.5
1 A random sample of wheat seedlings is planted and their growth is measured. The table shows their average growth, \(y \mathrm {~mm}\), at half-day intervals.
Time \(t\) days00.511.522.53
Average growth \(y \mathrm {~mm}\)072133455662
  1. Draw a scatter diagram to illustrate these data.
  2. Calculate the equation of the regression line of \(y\) on \(t\).
  3. Calculate the value of the residual for the data point at which \(t = 2\).
  4. Use the equation of the regression line to calculate an estimate of the average growth after 5 days for wheat seedlings. Comment on the reliability of this estimate. It is suggested that it would be better to replace the regression line by a line which passes through the origin. You are given that the equation of such a line is \(y = a t\), where \(a = \frac { \sum y t } { \sum t ^ { 2 } }\).
  5. Find the equation of this line and plot the line on your scatter diagram.
OCR MEI S2 2016 June Q1
18 marks Standard +0.3
1 A researcher believes that there may be negative association between the quantity of fertiliser used and the percentage of the population who live in rural areas in different countries. The data below show the percentage of the population who live in rural areas and the fertiliser use measured in kg per hectare, for a random sample of 11 countries.
Percentage of population33658358169617747117
Fertiliser use764466831071765137157
  1. Draw a scatter diagram to illustrate the data.
  2. Explain why it might not be valid to carry out a test based on the product moment correlation coefficient in this case.
  3. Calculate the value of Spearman's rank correlation coefficient.
  4. Carry out a hypothesis test at the \(1 \%\) significance level to investigate the researcher's belief.
  5. Explain the meaning of ' \(1 \%\) significance level'.
  6. In order to carry out a test based on Spearman's rank correlation coefficient, what modelling assumptions, if any, are required about the underlying distribution?
OCR H240/02 Q13
5 marks Moderate -0.5
13 The table and the four scatter diagrams below show data taken from the 2011 UK census for four regions. On the scatter diagrams the names have been replaced by letters.
The table shows, for each region, the mean and standard deviation of the proportion of workers in each Local Authority who travel to work by driving a car or van and the proportion of workers in each Local Authority who travel to work as a passenger in a car or van.
Each scatter diagram shows, for each of the Local Authorities in a particular region, the proportion of workers who travel to work by driving a car or van and the proportion of workers who travel to work as a passenger in a car or van.
Driving a car or vanPassenger in a car or van
MeanStandard deviationMeanStandard deviation
London0.2570.1330.0170.008
South East0.5780.0640.0450.010
South West0.5800.0840.0490.007
Wales0.6440.0450.0680.015
Region A \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-14_634_1116_1308_299} Region B \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-14_636_1109_2049_301} \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-15_737_1183_237_240} \includegraphics[max width=\textwidth, alt={}, center]{f2f45d6c-cfdc-455b-ab08-597b06a69f36-15_723_1169_1046_246}
  1. Using the values given in the table, match each region to its corresponding scatter diagram, explaining your reasoning.
  2. Steven claims that the outlier in the scatter diagram for Region C consists of a group of small islands. Explain whether or not the data given above support his claim.
  3. One of the Local Authorities in Region B consists of a single large island. Explain whether or not you would expect this Local Authority to appear as an outlier in the scatter diagram for Region B.
Edexcel AS Paper 2 2022 June Q1
5 marks Easy -1.2
  1. The relationship between two variables \(p\) and \(t\) is modelled by the regression line with equation
$$p = 22 - 1.1 t$$ The model is based on observations of the independent variable, \(t\), between 1 and 10
  1. Describe the correlation between \(p\) and \(t\) implied by this model. Given that \(p\) is measured in centimetres and \(t\) is measured in days,
  2. state the units of the gradient of the regression line. Using the model,
  3. calculate the change in \(p\) over a 3-day period. Tisam uses this model to estimate the value of \(p\) when \(t = 19\)
  4. Comment, giving a reason, on the reliability of this estimate.
Edexcel AS Paper 2 2023 June Q2
4 marks Moderate -0.8
  1. Fred and Nadine are investigating whether there is a linear relationship between Daily Mean Pressure, \(p \mathrm { hPa }\), and Daily Mean Air Temperature, \(t ^ { \circ } \mathrm { C }\), in Beijing using the 2015 data from the large data set.
Fred randomly selects one month from the data set and draws the scatter diagram in Figure 1 using the data from that month. The scale has been left off the horizontal axis. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{854568d2-b32d-44de-8a9c-26372e509c20-04_794_1539_589_264} \captionsetup{labelformat=empty} \caption{Figure 1}
\end{figure}
  1. Describe the correlation shown in Figure 1. Nadine chooses to use all of the data for Beijing from 2015 and draws the scatter diagram in Figure 2. She uses the same scales as Fred. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{854568d2-b32d-44de-8a9c-26372e509c20-04_777_1509_1841_278} \captionsetup{labelformat=empty} \caption{Figure 2}
    \end{figure}
  2. Explain, in context, what Nadine can infer about the relationship between \(p\) and \(t\) using the information shown in Figure 2.
  3. Using your knowledge of the large data set, state a value of \(p\) for which interpolation can be used with Figure 2 to predict a value of \(t\).
  4. Using your knowledge of the large data set, explain why it is not meaningful to look for a linear relationship between Daily Mean Wind Speed (Beaufort Conversion) and Daily Mean Air Temperature in Beijing in 2015.
  5. Explain, in context, what Nadine can infer about the relationship between \(p\) and \(t\) using the information shown in Figure 2.
Edexcel Paper 3 2022 June Q6
9 marks Standard +0.3
6. Anna is investigating the relationship between exercise and resting heart rate. She takes a random sample of 19 people in her year at school and records for each person
  • their resting heart rate, \(h\) beats per minute
  • the number of minutes, \(m\), spent exercising each week
Her results are shown on the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{3a09f809-fa28-4b3d-bb69-ea074433bd8f-16_531_551_653_740}
  1. Interpret the nature of the relationship between \(h\) and \(m\) Anna codes the data using the formulae $$\begin{aligned} & x = \log _ { 10 } m \\ & y = \log _ { 10 } h \end{aligned}$$ The product moment correlation coefficient between \(x\) and \(y\) is - 0.897
  2. Test whether or not there is significant evidence of a negative correlation between \(x\) and \(y\) You should
    The equation of the line of best fit of \(y\) on \(x\) is $$y = - 0.05 x + 1.92$$
  3. Use the equation of the line of best fit of \(y\) on \(x\) to find a model for \(h\) on \(m\) in the form $$h = a m ^ { k }$$ where \(a\) and \(k\) are constants to be found.
Edexcel Paper 3 2024 June Q2
6 marks Moderate -0.3
  1. Amar is studying the flight of a bird from its nest.
He measures the bird's height above the ground, \(h\) metres, at time \(t\) seconds for 10 values of \(t\) Amar finds the equation of the regression line for the data to be \(h = 38.6 - 1.28 t\)
  1. Interpret the gradient of this line. The product moment correlation coefficient between \(h\) and \(t\) is - 0.510
  2. Test whether or not there is evidence of a negative correlation between the height above the ground and the time during the flight.
    You should
    • state your hypotheses clearly
    • use a \(5 \%\) level of significance
    • state the critical value used
    Jane draws the following scatter diagram for Amar's data. \includegraphics[max width=\textwidth, alt={}, center]{ab7f7951-e6fe-4853-bb69-8016cf3e796c-06_1024_1033_1135_516}
  3. With reference to the scatter diagram, state, giving a reason, whether or not the regression line \(h = 38.6 - 1.28 t\) is an appropriate model for these data. Jane suggests an improved model using the variable \(u = ( t - k ) ^ { 2 }\) where \(k\) is a constant.
    She obtains the equation \(h = 38.1 - 0.78 u\)
  4. Choose a suitable value for \(k\) to write Jane's improved model for \(h\) in terms of \(t\) only.
OCR PURE Q11
6 marks Moderate -0.5
11 A student is investigating changes in the number of residents in Local Authorities in the SouthEast Region between 2001 and 2011. The scatter diagram shows the number \(x\) of residents in these Local Authorities in the age group 8 to 9 in 2001 and the number \(y\) of residents in the same Local Authorities in the age group 18 to 19 in 2011.
[diagram]
  1. Suggest a reason why the student is comparing these two age groups in 2001 and 2011. The student notices that most of the data points are close to the line \(y = x\).
    1. Explain what this suggests about the residents in these Local Authorities.
    2. The student says that correlation does not imply causation, so there is no causal link between the values of \(x\) and the values of \(y\). Explain whether or not they are correct.
  2. Some of these Local Authorities contain universities.
    1. On the diagram in the Printed Answer Booklet, circle three points that are likely to represent Local Authorities containing universities.
    2. Give a reason for your choice of points in part (c)(i). Assume that the proportion of residents in age group 8 to 9 in 2001 was roughly the same in each Local Authority in the South-East. The Local Authority in this region with the largest population is Medway.
  3. On the diagram in the Printed Answer Booklet, label clearly with the letter \(M\) the point that corresponds to Medway.
OCR PURE Q8
4 marks Easy -1.8
8 A random sample of 10 students from a college was chosen. They were asked how much time, \(x\) hours, they spent studying, and how much money, \(\pounds y\), they earned, in a typical week during term time. The results are shown in the scatter diagram. \includegraphics[max width=\textwidth, alt={}, center]{c4cc2cd8-46bf-448f-b223-92378984bfde-5_544_741_555_242}
  1. Comment on the relationship shown by the diagram between hours spent studying and money earned, during term time, by these 10 students. The coordinates of the points in the diagram are \(( 18,23 ) , ( 20,21 ) , ( 23,20 ) , ( 25,19 ) , ( 25,21 )\), \(( 27,18 ) , ( 32,16 ) , ( 38,17 ) , ( 40,16 )\) and \(( 41,23 )\).
  2. Find the mean and standard deviation of the number of hours spent per week studying during term time by these 10 students.
OCR MEI AS Paper 2 2019 June Q9
10 marks Moderate -0.3
9 In 2012 Adam bought a second hand car for \(\pounds 8500\). Each year Adam has his car valued. He believes that there is a non-linear relationship between \(t\), the time in years since he bought the car, and \(V\), the value of the car in pounds. Fig. 9.1 shows successive values of \(V\) and \(\log _ { 10 } V\). \begin{table}[h]
\(t\)01234
\(V\)85006970572046903840
\(\log _ { 10 } V\)3.933.843.763.673.58
\captionsetup{labelformat=empty} \caption{Fig. 9.1}
\end{table} Adam uses a spreadsheet to plot the points ( \(t , \log _ { 10 } V\) ) shown in Fig. 9.1, and then generates a line of best fit for these points. The line passes through the points \(( 0,3.93 )\) and \(( 4,3.58 )\). A copy of his graph is shown in Fig. 9.2. \begin{figure}[h]
\includegraphics[alt={},max width=\textwidth]{11e5167f-9f95-4494-9b66-b59fdce8b1ef-6_776_682_1886_246} \captionsetup{labelformat=empty} \caption{Fig. 9.2}
\end{figure}
  1. Find an expression for \(\log _ { 10 } V\) in terms of \(t\).
  2. Find a model for \(V\) in the form \(V = A \times b ^ { t }\), where \(A\) and \(b\) are constants to be determined. Give the values of \(A\) and \(b\) correct to 2 significant figures. In 2017 Adam's car was valued at \(\pounds 3150\).
  3. Determine whether the model is a good fit for this data. A company called Webuyoldcars pays \(\pounds 500\) for any second hand car. Adam decides that he will sell his car to this company when the annual valuation of his car is less than \(\pounds 500\).
  4. According to the model, after how many years will Adam sell his car to Webuyoldcars?
OCR MEI AS Paper 2 2023 June Q8
4 marks Moderate -0.5
8 The pre-release material contains information on Pulse Rate and Body Mass Index (BMI). A student is investigating whether there is a relationship between pulse rate and BMI. A section of the available data is shown in the table.
SexAgeBMIPulse
Male6229.5460
Female2023.68\#N/A
Male1726.9772
Male3524.764
Male1720.0954
Male8523.8654
Female8124.04\#N/A
The student decides to draw a scatter diagram.
  1. With reference to the table, explain which data should be cleaned before any analysis takes place. The student cleans the data for BMI and Pulse Rate in the pre-release material and draws a scatter diagram. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Scatter diagram of Pulse Rate against BMI} \includegraphics[alt={},max width=\textwidth]{82438df0-6550-4ffd-92d8-3c67bec59a6b-06_869_1575_1585_246}
    \end{figure} The student identifies one outlier.
  2. On the copy of the scatter diagram in the Printed Answer Booklet, circle this outlier. The student decides to remove this outlier from the data. They then use the LINEST function in the spreadsheet to obtain the following formula for the line of best fit. \(\mathrm { P } = 0.29 \mathrm { Q } + 64.2\),
    where \(P =\) PulseRate and \(Q = \mathrm { BMI }\). They use this to estimate the Pulse Rate of a person with BMI 23.68.
    They obtain a value of 71 correct to the nearest whole number.
  3. With reference to the scatter diagram, explain whether it is appropriate to use the formula for the line of best fit. It is suggested that all pairs of values where the pulse rate is above 100 should also be cleaned from the data, as they must be incorrect.
  4. Use your knowledge of the pre-release material to explain whether or not all pairs of values with a pulse rate of more than 100 should be cleaned from the data.
OCR MEI AS Paper 2 2024 June Q5
3 marks Easy -1.8
5 The pre-release material contains information for countries in the world concerning real GDP per capita in US\$ and mobile phone subscribers per 100 population. In an investigation into the relationship between these two variables, a student takes a sample of 20 countries in Africa. The student draws a scatter diagram for the data, which is shown in Fig. 5.1. \section*{Fig. 5.1} \section*{Africa 1st sample} \includegraphics[max width=\textwidth, alt={}, center]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-4_433_1043_842_244}
  1. What does Fig. 5.1 suggest about the relationship between real GDP per capita and the number of mobile phone subscribers per 100 population? Another student collects a different sample of 20 countries from Africa, and draws a scatter diagram for the data, which is shown in Fig. 5.2. \section*{Fig. 5.2} \section*{Africa 2nd sample}
    \includegraphics[max width=\textwidth, alt={}]{ce94c1ea-ffe5-42d0-8f8a-43c47105d6bf-4_273_1084_1818_244}
    Mobile phone subscribers per 100 population
  2. What does Fig. 5.2 suggest about the relationship between real GDP per capita and the number of mobile phone subscribers per 100 population?
  3. Explain whether either of the two scatter diagrams is likely to be representative of the true relationship between real GDP per capita and the number of mobile phone subscribers per 100 population, for countries in Africa.
OCR MEI AS Paper 2 2020 November Q10
9 marks Moderate -0.8
10 Fig. 10.1 shows a sample collected from the large data set. BMI is defined as \(\frac { \text { mass of person in kilograms } } { \text { square of person's height in metres } }\). \begin{table}[h]
SexAge in yearsMass in kgHeight in cmBMI
Male3877.6164.828.57
Male1763.5170.321.89
Male1868.0172.322.91
Male1857.2172.219.29
Male1977.6191.221.23
Male2472.7177.023.21
Male2592.5177.929.23
Male2670.4159.427.71
Male3177.5174.025.60
Male34132.4182.239.88
Male38115.0186.433.10
Male40112.1171.738.02
\captionsetup{labelformat=empty} \caption{Fig. 10.1}
\end{table}
  1. Calculate the mass in kg of a person with a BMI of 23.56 and a height of 181.6 cm , giving your answer correct to 1 decimal place. Fig. 10.2 shows a scatter diagram of BMI against age for the data in the table. A line of best fit has also been drawn. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{c08a2212-3104-425e-8aee-7f2d46f23924-09_682_1212_351_248} \captionsetup{labelformat=empty} \caption{Fig. 10.2}
    \end{figure}
  2. Describe the correlation between age and BMI.
  3. Use the line of best fit to estimate the BMI of a 30-year-old man.
  4. Explain why it would not be sensible to use the line of best fit to estimate the BMI of a 60-year-old man.
  5. Use your knowledge of the large data set to suggest two reasons why the sample data in the table may not be representative of the population.
  6. Once the data in the large data set had been cleaned there were 196 values available for selection. Describe how a sample of size 12 could be generated using systematic sampling so that each of the 196 values could be selected in the sample.
OCR MEI AS Paper 2 2021 November Q9
5 marks Moderate -0.5
9 Arun, Beth and Charlie are investigating whether there is any association between death rate per 1000 and physician density per 1000. They each collect a random sample of size 10. Arun's sample is shown in Fig.9.1. \begin{table}[h]
death rate per 1000physician density per 1000
Canberra7.23.62
Dhaka5.30.49
Brasilia6.82.23
Yaounde9.30.08
Zagreb12.53.08
Tehran5.41.16
Rome10.74.14
Tripoli3.82.09
Oslo7.94.51
Abuja9.70.35
\captionsetup{labelformat=empty} \caption{Fig. 9.1}
\end{table}
  1. Explain whether or not Arun collected his data from the pre-release material, or whether it is not possible to say. Beth and Charlie collected their samples from the pre-release material. Each of them drew a scatter diagram for their samples. The samples and scatter diagrams are shown in Figs. 9.2 and 9.3.
    Beth's sampledeath rate per 1000physician density per 1000
    Sudan6.70.41
    Cambodia7.40.17
    Gabon6.20.36
    Seychelles70.95
    Mexico5.42.25
    Kuwait2.32.58
    Haiti7.50.23
    Maldives41.04
    Nauru5.91.24
    Jordan3.42.34
    \includegraphics[max width=\textwidth, alt={}]{2b9ce212-84e2-4817-be94-98e2adff12a3-08_545_1024_340_918}
    \begin{table}[h]
    Charlie's sampledeath rate per 1000physician density per 1000
    Vanuata40.17
    Solomon Islands3.80.2
    N. Mariana Islands4.90.36
    Nauru5.91.24
    United Kingdom9.42.81
    Portugal10.63.34
    North Macedonia9.62.87
    Faroe Islands8.82.62
    Bulgaria14.53.99
    St. Kitts and Nevis7.22.52
    \captionsetup{labelformat=empty} \caption{Fig. 9.3}
    \end{table} \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Fig. 9.2} \includegraphics[alt={},max width=\textwidth]{2b9ce212-84e2-4817-be94-98e2adff12a3-08_572_899_1400_1041}
    \end{figure} Arun states that Charlie's sample and Beth's sample cannot both be random for the following reasons.
    Kofi collects a sample of 10 African countries and 10 European countries. The scatter diagram for his results is shown in Fig. 9.4. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{2b9ce212-84e2-4817-be94-98e2adff12a3-09_485_903_902_260} \captionsetup{labelformat=empty} \caption{Fig. 9.4}
    \end{figure}
  2. On the copy of Fig. 9.4 in the Printed Answer Booklet, use your knowledge of the pre-release material to identify the points representing the 10 European countries, justifying your choice.
OCR MEI Paper 2 2023 June Q9
5 marks Easy -1.2
9 The pre-release material contains information concerning the median income of taxpayers in different areas of London. Some of the data for Camden is shown in the table below. The years quoted in this question refer to the end of the financial years used in the pre-release material. For example, the year 2004 in the table refers to the year 2003/04 in the pre-release material.
Year20042005200620072008200920102011
Median
Income in \(\pounds\)
2130023200242002590026900\#N/A2840029400
  1. Explain whether these data are a sample or a population of Camden taxpayers. A time series for the data is shown below. \begin{figure}[h]
    \captionsetup{labelformat=empty} \caption{Median income of taxpayers in Camden 2004-2011} \includegraphics[alt={},max width=\textwidth]{11788aaf-98fb-4a78-8a40-a40743b1fe15-07_624_1469_950_242}
    \end{figure} The LINEST function on a spreadsheet is used to formulate the following model for the data: \(I = 1115 Y - 2212950\), where \(I =\) median income of taxpayers in \(\pounds\) and \(Y =\) year.
  2. Use this model to find an estimate of the median income of taxpayers in Camden in 2009.
  3. Give two reasons why this estimate is likely to be close to the true value. The median income of taxpayers in Croydon in 2009 is also not available.
  4. Use your knowledge of the pre-release material to explain whether the model used in part (b) would give a reasonable estimate of the missing value for Croydon.