5.09a Dependent/independent variables

164 questions

Sort by: Default | Easiest first | Hardest first
OCR MEI Further Statistics Major 2019 June Q6
18 marks Moderate -0.8
6
  1. A researcher is investigating the date of the 'start of spring' at different locations around the country.
    A suitable date (measured in days from the start of the year) can be identified by checking, for example, when buds first appear for certain species of trees and plants, but this is time-consuming and expensive. Satellite data, measuring microwave emissions, can alternatively be used to estimate the date that land-based measurements would give. The researcher chooses a random sample of 12 locations, and obtains land-based measurements for the start of spring date at each location, together with relevant satellite measurements. The scatter diagram in Fig. 6.1 shows the results; the land-based measurements are denoted by \(x\) days and the corresponding values derived from satellite measurements by \(y\) days. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-06_732_1342_781_333} \captionsetup{labelformat=empty} \caption{Fig. 6.1}
    \end{figure} Fig. 6.2 shows part of a spreadsheet used to analyse the data. Some rows of the spreadsheet have been deliberately omitted. \begin{table}[h]
    1ABCDEF
    1x\(\boldsymbol { y }\)\(\boldsymbol { x } ^ { \mathbf { 2 } }\)\(\boldsymbol { y } ^ { \mathbf { 2 } }\)xy
    2901028100104049180
    3
    10
    11
    129497883694099118
    13991019801102019999
    14Sum11311227107783126725116724
    15
    \captionsetup{labelformat=empty} \caption{Fig. 6.2}
    \end{table}
    1. Calculate the equation of a regression line suitable for estimating the land-based date of the start of spring from satellite measurements.
    2. Using this equation, estimate the land-based date of the start of spring for the following dates from satellite measurements.
      • 95 days
      • 60 days
        (iii) Comment on the reliability of each of your estimates.
      • The researcher is also investigating whether there is any correlation between the average temperature during a month in spring and the total rainfall during that month at a particular location. The average temperatures in degrees Celsius and total rainfall in mm for a random selection, over several years, of 10 spring months at this location are as follows.
      Temperature4.27.15.63.58.66.52.75.96.74.1
      Rainfall18264276154384536636
      The researcher plots the scatter diagram shown in Fig. 6.3 to check which type of test to carry out. \begin{figure}[h]
      \includegraphics[alt={},max width=\textwidth]{3a89edc4-ac93-4691-ade8-4d4665b55202-07_693_880_1174_338} \captionsetup{labelformat=empty} \caption{Fig. 6.3}
      \end{figure}
      1. Explain why the researcher might come to the conclusion that a test based on Pearson's product moment correlation coefficient may be valid.
      2. Find the value of Pearson's product moment correlation coefficient.
      3. Carry out a test at the \(5 \%\) significance level to investigate whether there is any correlation between temperature and rainfall.
OCR MEI Further Statistics Major 2022 June Q5
11 marks Moderate -0.3
5 A motorist is investigating the relationship between tyre pressure and temperature. As the temperature increases during a hot day, she records the pressure (measured in bars) of one of her car tyres at specific temperatures of \(20 ^ { \circ } \mathrm { C } , 22 ^ { \circ } \mathrm { C } , \ldots , 36 ^ { \circ } \mathrm { C }\). The results are shown in Table 5.1. \begin{table}[h]
Temperature \(\left( t ^ { \circ } \mathrm { C } \right)\)202224262830323436
Tyre pressure \(( P\) bar \()\)2.0122.0362.0652.0742.1142.1402.1492.1762.192
\captionsetup{labelformat=empty} \caption{Table 5.1}
\end{table}
  1. Calculate the equation of the regression line of pressure on temperature. Give your answer in the form \(P = a t + b\), giving the values of \(a\) and \(b\) to \(\mathbf { 4 }\) significant figures.
  2. Table 5.2 shows the residuals for most of the data values. Complete the copy of the table in the Printed Answer Booklet. \begin{table}[h]
    Temperature202224262830323436
    Residual tyre
    pressure
    - 0.003- 0.0020.004- 0.0100.011- 0.0030.001
    \captionsetup{labelformat=empty} \caption{Table 5.2}
    \end{table}
  3. With reference to the values of the residuals, comment on the goodness of fit of the regression line.
  4. Use your answer to part (a) to calculate an estimate of the pressure in the tyre at each of the following temperatures, giving your answers to \(\mathbf { 3 }\) decimal places.
OCR MEI Further Statistics Major 2023 June Q2
5 marks Easy -1.2
2 A student is investigating the link between temperature and electricity consumption in the winter months. The student finds the average minimum temperature, \(x ^ { \circ } \mathrm { C }\), from across the country on a day. The student then finds the total electricity consumption for that day, \(y \mathrm { GWh }\). The scatter diagram below shows the values of \(x\) and \(y\) obtained from a random sample of 10 winter days. It also shows the equation of the regression line of \(y\) on \(x\) and the value of \(r ^ { 2 }\), where \(r\) is the product moment correlation coefficient. \includegraphics[max width=\textwidth, alt={}, center]{c692fb20-436f-4bc1-89bd-10fdba41ceba-03_776_1043_609_244}
  1. Use the regression line to estimate the electricity consumption at each of the following average minimum temperatures.
OCR MEI Further Statistics Major 2020 November Q5
13 marks Moderate -0.3
5 A hearing expert is investigating whether web-based hearing tests can be used instead of hearing tests in a hearing laboratory. The expert selects a random sample of 16 people with normal hearing. Each of them is given two hearing tests, one in the laboratory and one web-based. The scores in the laboratory-based test, \(x\), and the web-based test, \(y\), are both measured in the same suitable units.
  1. Half of the participants do the laboratory-based test first and the other half do the web-based test first. Explain why the expert adopts this approach. The scatter diagram in Fig. 5 shows the data that the expert collected. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{8d36bc92-07ac-40c3-9e75-26f2bc9d2fcc-05_785_1360_1009_242} \captionsetup{labelformat=empty} \caption{Fig. 5}
    \end{figure} Summary statistics for these data are as follows. $$\Sigma x = 198.0 \quad \Sigma x ^ { 2 } = 2936.92 \quad \Sigma y = 188.7 \quad \Sigma y ^ { 2 } = 2605.35 \quad \Sigma x y = 2554.87$$
  2. Calculate the equation of the regression line suitable for estimating web-based scores from laboratory-based scores.
  3. Estimate the web-based scores of people whose laboratory-based scores were as follows.
    Stating the approximate coordinates of the outlier, suggest what the expert should do.
WJEC Further Unit 2 2019 June Q6
6 marks Moderate -0.3
6. The University of Arizona surveyed a large number of households. One purpose of the survey was to determine if annual household income could be predicted from size of family home. The graph of Annual household income, \(y\), versus Size of family home, \(x\), is shown below. \includegraphics[max width=\textwidth, alt={}, center]{4ecf99c5-c4b3-41b7-a8df-a7c2ca7fcd6a-5_616_1257_566_365}
  1. State the limitations of using the regression line above with reference to the scatter diagram. The data for size of family homes between 2000 and 3000 square feet are shown in the diagram below. \includegraphics[max width=\textwidth, alt={}, center]{4ecf99c5-c4b3-41b7-a8df-a7c2ca7fcd6a-5_652_1244_1516_360} Summary statistics for these data are as follows. $$\begin{array} { r c c } \sum x = 93160 & \sum y = 3907142 & n = 37 \\ S _ { x x } = 2869673.03 & S _ { y y } = 44312797167 & S _ { x y } = 348512820 \cdot 6 \end{array}$$
  2. Calculate the equation of the least squares regression line to predict Annual household income from Size of family home for these data.
WJEC Further Unit 2 2022 June Q7
7 marks Moderate -0.3
7. Data from a large dataset shows the percentage of children enrolled in secondary education and the percentage of the adult population who are literate. The following graphs show data from 30 randomly selected regions from each of the Arab World, Africa and Asia. In each case, the least squares regression line of '\% Literacy' on '\% Enrolled in Secondary Education' is shown. \includegraphics[max width=\textwidth, alt={}, center]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-6_682_1200_584_395} \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Africa} \includegraphics[alt={},max width=\textwidth]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-6_623_1191_1548_397}
\end{figure} \includegraphics[max width=\textwidth, alt={}, center]{77fd7ad7-f5a3-4947-afc6-e5ef45bef7a8-7_665_1200_331_434}
  1. Calculate the equation of the least squares regression line of '\% Literacy' ( \(y\) ) on '\% Enrolled in Secondary Education' ( \(x\) ) for Asia, given the following summary statistics. $$\begin{array} { l l l } \sum x = 2850.836 & \sum y = 2738.656 & S _ { x x } = 88.42142 \\ S _ { y y } = 204.733 & S _ { x y } = 96.60984 & n = 30 \end{array}$$
  2. The Arab World, Africa and Asia each contain a region where \(70 \%\) are enrolled in secondary education. The three regression lines are used to estimate the corresponding \% Literacy. Which of these estimates is likely to be the most reliable? Clearly explain your reasoning. \section*{END OF PAPER}
WJEC Further Unit 2 2024 June Q4
12 marks Standard +0.8
4. An author poses the following question: Does using cash for transactions affect people's financial behaviour?
She collects data on 'Cash transactions as a \% of all transactions' and 'Household debt as a \(\%\) of net disposable income' from a random sample of 25 countries. The table below shows the data she collected. There are missing values, \(p\) and \(q\), for Malta and Denmark respectively.
CountryCash transactions as a \% of all transactions \(\boldsymbol { x }\)Household debt as a \% of net disposable income \(\boldsymbol { y }\)CountryCash transactions as a \% of all transactions \(\boldsymbol { x }\)Household debt as a \% of net disposable income \(\boldsymbol { y }\)
Malta92\(p\)France68120
Mexico90-14Luxembourg64177
Greece88107Belgium63113
Spain87110Finland54137
Italy8687Estonia4882
Austria8591The Netherlands45247
Portugal81131UK42147
Slovenia8056Australia37214
Germany8095USA32109
Ireland79154Sweden20187
Slovakia7874South Korea14182
Lithuania7546Denmark\(q\)261
Latvia7143
The summary statistics and scatter diagram below are for the other 23 countries. \begin{figure}[h]
\captionsetup{labelformat=empty} \caption{Household debt versus Cash transactions} \includegraphics[alt={},max width=\textwidth]{1538fa56-5b61-40ec-bb02-cf1ed9da5eb0-13_664_1296_511_379}
\end{figure} $$\begin{gathered} \sum x = 1467 \sum y = 2695 \sum x ^ { 2 } = 105073 \quad S _ { x x } = 11503 \cdot 91304 \quad S _ { y y } = 78669 \cdot 30435 \\ \sum y ^ { 2 } = 394453 \sum x y = 152999 \quad S _ { x y } = - 18895 \cdot 13043 \end{gathered}$$
  1. Using the summary statistics for the 23 countries, calculate and interpret Pearson's product moment correlation coefficient.
  2. Calculate the equation of the least squares regression line of Household debt as a \% of net disposable income \(( y )\) on Cash transactions as a \% of all transactions ( \(x\) ). The regression line \(x\) on \(y\) is given below. $$x = - 0 \cdot 24 y + 91 \cdot 92$$
  3. By selecting the appropriate regression line in each case, estimate the values of \(p\) and \(q\) in the table.
  4. Comment on the reliability of your answers in part (c).
  5. Interpret the negative value of \(y\) for Mexico.
Edexcel FS2 AS 2018 June Q1
11 marks Moderate -0.3
  1. The scores achieved on a maths test, \(m\), and the scores achieved on a physics test, \(p\), by 16 students are summarised below.
$$\sum m = 392 \quad \sum p = 254 \quad \sum p ^ { 2 } = 4748 \quad \mathrm {~S} _ { m m } = 1846 \quad \mathrm {~S} _ { m p } = 1115$$
  1. Find the product moment correlation coefficient between \(m\) and \(p\)
  2. Find the equation of the linear regression line of \(p\) on \(m\) Figure 1 shows a plot of the residuals. \begin{figure}[h]
    \includegraphics[alt={},max width=\textwidth]{0fcb4d83-9763-4edd-8006-93f75a44c596-02_808_1222_997_429} \captionsetup{labelformat=empty} \caption{Figure 1}
    \end{figure}
  3. Calculate the residual sum of squares (RSS). For the person who scored 30 marks on the maths test,
  4. find the score on the physics test. The data for the person who scored 20 on the maths test is removed from the data set.
  5. Suggest a reason why. The product moment correlation coefficient between \(m\) and \(p\) is now recalculated for the remaining 15 students.
  6. Without carrying out any further calculations, suggest how you would expect this recalculated value to compare with your answer to part (a).
    Give a reason for your answer.
    V349 SIHI NI IMIMM ION OCVJYV SIHIL NI LIIIM ION OOVJYV SIHIL NI JIIYM ION OC
Edexcel FS2 AS 2019 June Q3
11 marks Standard +0.3
  1. Two students, Jim and Dora, collected data on the mean annual rainfall, \(w \mathrm {~cm}\), and the annual yield of leeks, \(l\) tonnes per hectare, for 10 years.
Jim summarised the data as follows $$\mathrm { S } _ { w l } = 42.786 \quad \mathrm {~S} _ { w w } = 9936.9 \quad \sum l ^ { 2 } = 26.2326 \quad \sum l = 16.06$$
  1. Find the product moment correlation coefficient between \(l\) and \(w\) Dora decided to code the data first using \(s = w - 6\) and \(t = l - 20\)
  2. Write down the value of the product moment correlation coefficient between \(s\) and \(t\). Give a justification for your answer. Dora calculates the equation of the regression line of \(t\) on \(s\) to be \(t = 0.00431 s - 18.87\)
  3. Find the equation of the regression line of \(l\) on \(w\) in the form \(l = a + b w\), giving the values of \(a\) and \(b\) to 3 significant figures.
  4. Use your equation to estimate the yield of leeks when \(w\) is 100 cm .
  5. Calculate the residual sum of squares. The graph shows the residual for each value of \(l\) \includegraphics[max width=\textwidth, alt={}, center]{7e46e14a-0f5a-4d02-8f00-a92bc4def6d7-08_716_1594_1594_239}
    1. State whether this graph suggests that the use of a linear regression model is suitable for these data. Give a reason for your answer.
    2. Other than collecting more data, suggest how to improve the fit of the model in part (c) to the data.
Edexcel FS2 AS 2020 June Q4
14 marks Standard +0.3
  1. Some students are investigating the strength of wire by suspending a weight at the end of the wire. They measure the diameter of the wire, \(d \mathrm {~mm}\), and the weight, \(w\) grams, when the wire fails. Their results are given in the following table.
\cline { 2 - 13 } \multicolumn{1}{l|}{}These 14 points are plotted on page 13Not yet plotted
\(d\)0.50.60.70.80.91.11.31.622.42.83.33.53.9\(\mathbf { 4 . 5 }\)\(\mathbf { 4 . 6 }\)\(\mathbf { 4 . 8 }\)\(\mathbf { 5 . 4 }\)
\(w\)1.21.72.33.03.85.67.711.61825.934.947.452.763.9\(\mathbf { 8 1 }\)\(\mathbf { 8 3 . 6 }\)\(\mathbf { 8 9 . 9 }\)\(\mathbf { 1 0 9 . 4 }\)
The first 14 points are plotted on the axes on page 13.
  1. On the axes on page 13, complete the scatter diagram for these data.
  2. Use your calculator to write down the equation of the regression line of \(w\) on \(d\).
  3. With reference to the scatter diagram, comment on the appropriateness of using this linear regression model to make predictions for \(w\) for different values of \(d\) between 0.5 and 5.4 The product moment correlation coefficient for these data is \(r = 0.987\) (to 3 significant figures).
  4. Calculate the residual sum of squares (RSS) for this model. Robert, one of the students, suggests that the model could be improved and intends to find the equation of the line of regression of \(w\) on \(u\), where \(u = d ^ { 2 }\) He finds the following statistics $$\mathrm { S } _ { w u } = 5721.625 \quad \mathrm {~S} _ { u u } = 1482.619 \quad \sum u = 157.57$$
  5. By considering the physical nature of the problem, give a reason to support Robert's suggestion.
  6. Find the equation of the regression line of \(w\) on \(u\).
  7. Find the residual sum of squares (RSS) for Robert's model.
  8. State, giving a reason based on these calculations, which of these models better describes these data.
    1. Hence estimate the weight at which a piece of wire with diameter 3 mm will fail. \begin{figure}[h]
      \captionsetup{labelformat=empty} \caption{Question 4 continued} \includegraphics[alt={},max width=\textwidth]{fbd7b196-5372-4956-8d38-92f05c92a5f7-13_2315_1363_301_358}
      \end{figure}
Edexcel FS2 AS 2022 June Q3
10 marks Standard +0.3
  1. Gabriela is investigating a particular type of fish, called bream. She wants to create a model to predict the weight, \(w\) grams, of bream based on their length, \(x \mathrm {~cm}\).
For a sample of 27 bream, some summary statistics are given below. $$\begin{gathered} \bar { x } = 31.07 \quad \bar { w } = 628.59 \quad \sum w ^ { 2 } = 11386134 \\ \mathrm {~S} _ { x w } = 13082.3 \quad \mathrm {~S} _ { x x } = 260.8 \end{gathered}$$
  1. Find the value of the product moment correlation coefficient between \(x\) and \(w\)
  2. Explain whether the answer to part (a) is consistent with a linear model for these data.
  3. Find the equation of the regression line of \(w\) on \(x\) in the form \(w = a + b x\) A residual plot for these data is shown below. \includegraphics[max width=\textwidth, alt={}, center]{128c408d-3e08-4f74-8f19-d33ecd5c882f-06_931_1790_1107_139} One of the bream in the sample has a length of 32 cm .
  4. Find its weight.
  5. With reference to the residual plot, comment on the model for bream with lengths above 33 cm .
Edexcel FS2 AS 2023 June Q3
10 marks Standard +0.3
  1. Pat is investigating the relationship between the height of professional tennis players and the speed of their serve. Data from 9 randomly selected professional male tennis players were collected. The variables recorded were the height of each player, \(h\) metres, and the maximum speed of their serve, \(v \mathrm {~km} / \mathrm { h }\).
Pat summarised these data as follows $$\sum h = 17.63 \quad \sum v = 2174.9 \quad \sum v ^ { 2 } = 526407.8 \quad S _ { h h } = 0.0487 \quad S _ { h v } = 5.1376$$
  1. Calculate the product moment correlation coefficient between \(h\) and \(v\)
  2. Explain whether the answer to part (a) is consistent with a linear model for these data.
  3. Find the equation of the regression line of \(v\) on \(h\) in the form \(v = a + b h\) where \(a\) and \(b\) are to be given to one decimal place. Pat calculated the sum of the residuals for the 9 tennis players as 1.04
  4. Without doing a calculation, explain how you know Pat has made a mistake. Pat made one mistake in the calculation. For the tennis player of height 1.96 m Pat misread the residual as 2.27
  5. Find the maximum speed of serve, in km/h, for the tennis player of height 1.96 m
Edexcel FS2 2019 June Q2
10 marks Standard +0.3
2 A large field of wheat is split into 8 plots of equal area. Each plot is treated with a different amount of fertiliser, \(f\) grams \(/ \mathrm { m } ^ { 2 }\). The yield of wheat, \(w\) tonnes, from each plot is recorded. The results are summarised below. $$\sum f = 28 \quad \sum w = 303 \quad \sum w ^ { 2 } = 13447 \quad \mathrm {~S} _ { f f } = 42 \quad \mathrm {~S} _ { f w } = 269.5$$
  1. Calculate the product moment correlation coefficient between \(f\) and \(w\)
  2. Interpret the value of your product moment correlation coefficient.
  3. Find the equation of the regression line of \(w\) on \(f\) in the form \(w = a + b f\)
  4. Using your equation, estimate the decrease in yield when the amount of fertiliser decreases by 0.5 grams \(/ \mathrm { m } ^ { 2 }\) The residuals of the data recorded are calculated and plotted on the graph below. \includegraphics[max width=\textwidth, alt={}, center]{67df73d4-6ce4-45f7-8a69-aa94292ea814-04_1232_1294_1169_301}
  5. With reference to this graph, comment on the suitability of the model you found in part (c).
  6. Suggest how you might be able to refine your model.
Edexcel FS2 2021 June Q4
10 marks Standard +0.3
  1. A researcher is investigating the relationship between elevation, \(x\) metres, and annual mean temperature, \(t ^ { \circ } \mathrm { C }\).
From a random sample of 20 weather stations in Switzerland, the following results were obtained $$\mathrm { S } _ { x x } = 8820655 \quad \mathrm {~S} _ { t t } = 444.7 \quad \sum x = 28130 \quad \sum t = 94.62$$ The product moment correlation coefficient for these data is found to be - 0.959
  1. Interpret the value of this correlation coefficient.
  2. Show that the equation of the regression line of \(t\) on \(x\) can be written as $$t = 14.3 - 0.00681 x$$ The random variable \(W\) represents the elevations of the weather stations in kilometres.
  3. Write down the equation of the regression line of \(t\) on \(w\) for these 20 weather stations in the form \(t = a + b w\)
  4. Show that the residual sum of squares (RSS) for the model for \(t\) and \(x\) is 35.7 correct to one decimal place. One of the weather stations in the sample had a recorded elevation of 1100 metres and an annual mean temperature of \(1.4 ^ { \circ } \mathrm { C }\)
    1. Calculate this weather station's contribution to the residual sum of squares. Give your answer as a percentage
    2. Comment on the data for this weather station in light of your answer to part (e)(i).
Edexcel FS2 2022 June Q1
7 marks Standard +0.3
  1. Kwame is investigating a possible relationship between average March temperature, \(t ^ { \circ } \mathrm { C }\), and tea yield, \(y \mathrm {~kg} /\) hectare, for tea grown in a particular location. He uses 30 years of past data to produce the following summary statistics for a linear regression model, with tea yield as the dependent variable.
$$\begin{aligned} & \text { Residual Sum of Squares } ( \mathrm { RSS } ) = 1666567 \quad \mathrm {~S} _ { t t } = 52.0 \quad \mathrm {~S} _ { y y } = 1774155 \\ & \text { least squares regression line: } \quad \text { gradient } = 45.5 \quad y \text {-intercept } = 2080 \end{aligned}$$
  1. Use the regression model to predict the tea yield for an average March temperature of \(20 ^ { \circ } \mathrm { C }\) He also produces the following residual plot for the data. \includegraphics[max width=\textwidth, alt={}, center]{d139840b-16ec-42ce-8501-f79c263c8017-02_663_880_868_589}
  2. Explain what you understand by the term residual.
  3. Calculate the product moment correlation coefficient between \(t\) and \(y\)
  4. Explain why the linear model may not be a good fit for the data
    1. with reference to your answer to part (c)
    2. with reference to the residual plot. \section*{Question 1 continues on page 4} Kwame also collects data on total March rainfall, \(w \mathrm {~mm}\), for each of these 30 years. For a linear regression model of \(w\) on \(t\) the following summary statistic is found. $$\text { Residual Sum of Squares (RSS) = } 86754$$ Kwame concludes that since this model has a smaller RSS, there must be a stronger linear relationship between \(w\) and \(t\) than between \(y\) and \(t\) (where RSS \(= 1666567\) )
  5. State, giving a reason, whether or not you agree with the reasoning that led to Kwame's conclusion.
Edexcel FS2 2024 June Q1
9 marks Standard +0.3
  1. Two students are experimenting with some water in a plastic bottle. The bottle is filled with water and a hole is put in the bottom of the bottle. The students record the time, \(t\) seconds, it takes for the water level to fall to each of 10 given values of the height, \(h \mathrm {~cm}\), above the hole.
Student \(A\) models the data with an equation of the form \(t = a + b \sqrt { h }\) The data is coded using \(v = t - 40\) and \(w = \sqrt { h }\) and the following information is obtained. $$\sum v = 626 \quad \sum v ^ { 2 } = 64678 \quad \sum w = 22.47 \quad \mathrm {~S} _ { w w } = 4.52 \quad \mathrm {~S} _ { v w } = - 338.83$$
  1. Find the equation of the regression line of \(t\) on \(\sqrt { h }\) in the form \(t = a + b \sqrt { h }\) The time it takes the water level to fall to a height of 9 cm above the hole is 47 seconds.
  2. Calculate the residual for this data point. Give your answer to 2 decimal places. Given that the residual sum of squares (RSS) for the model of \(t\) on \(\sqrt { h }\) is the same as the RSS for the model of \(v\) on \(w\),
  3. calculate the RSS for these 10 data points. Student \(B\) models the data with an equation of the form \(t = c + d h\) The regression line of \(t\) on \(h\) is calculated and the residual sum of squares (RSS) is found to be 980 to 3 significant figures.
  4. With reference to part (c) state, giving a reason, whether Student B's model or Student A's model is the more suitable for these data.
Edexcel FS2 Specimen Q6
12 marks Standard +0.3
  1. A random sample of 10 female pigs was taken. The number of piglets, \(x\), born to each female pig and their average weight at birth, \(m \mathrm {~kg}\), was recorded. The results were as follows:
Number of piglets, \(\boldsymbol { x }\)45678910111213
Average weight at
birth, \(\boldsymbol { m } \mathbf { ~ k g }\)
1.501.201.401.401.231.301.201.151.251.15
(You may use \(\mathrm { S } _ { x x } = 82.5\) and \(\mathrm { S } _ { m m } = 0.12756\) and \(\mathrm { S } _ { x m } = - 2.29\) )
  1. Find the equation of the regression line of \(m\) on \(x\) in the form \(m = a + b x\) as a model for these results.
  2. Show that the residual sum of squares (RSS) is 0.064 to 3 decimal places.
  3. Calculate the residual values.
  4. Write down the outlier.
    1. Comment on the validity of ignoring this outlier.
    2. Ignoring the outlier, produce another model.
    3. Use this model to estimate the average weight at birth if \(x = 15\)
    4. Comment, giving a reason, on the reliability of your estimate.
OCR FS1 AS 2017 December Q5
8 marks Moderate -0.5
5 A shop manager recorded the maximum daytime temperature \(T ^ { \circ } \mathrm { C }\) and the number \(C\) of ice creams sold on 9 summer days. The results are given in the table and illustrated in the scatter diagram.
\(T\)172125262727293030
\(C\)211620383237353942
\includegraphics[max width=\textwidth, alt={}]{64d7ed6d-fadd-4c59-afb0-97d1788ba369-3_661_1189_1320_431}
$$n = 9 , \Sigma t = 232 , \Sigma c = 280 , \Sigma t ^ { 2 } = 6130 , \Sigma c ^ { 2 } = 9444 , \Sigma t c = 7489$$
  1. State, with a reason, whether one of the variables \(C\) or \(T\) is likely to be dependent upon the other.
  2. Calculate Pearson's product-moment correlation coefficient \(r\) for the data.
  3. State with a reason what the value of \(r\) would have been if the temperature had been measured in \({ } ^ { \circ } \mathrm { F }\) rather than \({ } ^ { \circ } \mathrm { C }\).
  4. Calculate the equation of the least squares regression line of \(c\) on \(t\).
  5. The regression line is drawn on the copy of the scatter diagram in the Printed Answer Booklet. Use this diagram to explain what is meant by "least squares".
OCR Further Statistics 2018 March Q9
8 marks Challenging +1.2
9 The values of a set of bivariate data \(\left( x _ { i } , y _ { i } \right)\) can be summarised by $$n = 50 , \sum x = 1270 , \sum y = 5173 , \sum x ^ { 2 } = 42767 , \sum y ^ { 2 } = 701301 , \sum x y = 173161 .$$ Ten independent observations of \(Y\) are obtained, all corresponding to \(x = 20\). It may be assumed that the variance of \(Y\) is 1.9 , independently of the value of \(x\). Find a \(95 \%\) confidence interval for the mean \(\bar { Y }\) of the 10 observations of \(Y\). \section*{END OF QUESTION PAPER}
OCR Further Statistics 2018 September Q1
4 marks Moderate -0.8
1 An experiment involves releasing a coin on a sloping plane so that it slides down the slope and then slides along a horizontal plane at the bottom of the slope before coming to rest. The angle \(\theta ^ { \circ }\) of the sloping plane is varied, and for each value of \(\theta\), the distance \(d \mathrm {~cm}\) the coin slides on the horizontal plane is recorded. A scatter diagram to illustrate the results of the experiment is shown below, together with the least squares regression line of \(d\) on \(\theta\). \includegraphics[max width=\textwidth, alt={}, center]{28c6a0d9-09a6-4743-af0e-fe2e43e256c9-2_639_972_561_548}
  1. State which two of the following correctly describe the variable \(\theta\).
    Controlled variableCorrelation coefficient
    Dependent variableIndependent variable
    Response variableRegression coefficient
    The least squares regression line of \(d\) on \(\theta\) has equation \(d = 1.96 + 0.11 \theta\).
  2. Use the diagram in the Printed Answer Booklet to explain the term "least squares".
  3. State what difference, if any, it would make to the equation of the regression line if \(d\) were measured in inches rather than centimetres. ( 1 inch \(\approx 2.54 \mathrm {~cm}\) ).
OCR Further Statistics 2018 December Q5
10 marks Moderate -0.3
5 The birth rate, \(x\) per thousand members of the population, and the life expectancy at birth, \(y\) years, in 14 randomly selected African countries are given in the table.
Country\(x\)\(y\)Country\(x\)\(y\)
Benin4.859.2Mozambique5.454.63
Cameroon4.754.87Nigeria5.752.29
Congo4.961.42Senegal5.165.81
Gambia5.759.83Somalia6.554.88
Liberia4.760.25Sudan4.463.08
Malawi5.160.97Uganda5.857.25
Mauretania4.662.77Zambia5.458.75
\(n = 14 , \sum x = 72.8 , \sum y = 826 , \sum x ^ { 2 } = 392.96 , \sum y ^ { 2 } = 48924.54 , \sum x y = 4279.16\)
  1. Calculate Pearson's product-moment correlation coefficient \(r\) for the data.
  2. State what would be the effect on the value of \(r\) if the birth rate were given per hundred and not per thousand.
  3. Explain what the sign of \(r\) tells you about the relationship between life expectancy and birth rate for these countries.
  4. Test at the \(5 \%\) significance level whether there is correlation between birth rate and life expectancy at birth in African countries.
  5. A researcher wants to estimate the life expectancy at birth in Zimbabwe, where the birth rate is 3.9 per thousand. Explain whether a reliable estimate could be obtained using the regression line of \(y\) on \(x\) for the given data.
Edexcel S1 2017 October Q5
13 marks Moderate -0.8
  1. A company wants to pay its employees according to their performance at work. Last year's performance score \(x\) and annual salary \(y\), in thousands of dollars, were recorded for a random sample of 10 employees of the company.
The performance scores were $$\begin{array} { l l l l l l l l l l } 15 & 24 & 32 & 39 & 41 & 18 & 16 & 22 & 34 & 42 \end{array}$$ (You may use \(\sum x ^ { 2 } = 9011\) )
  1. Find the mean and the variance of these performance scores. The corresponding \(y\) values for these 10 employees are summarised by $$\sum y = 306.1 \quad \text { and } \quad \mathrm { S } _ { y y } = 546.3$$
  2. Find the mean and the variance of these \(y\) values. The regression line of \(y\) on \(x\) based on this sample is $$y = 12.0 + 0.659 x$$
  3. Find the product moment correlation coefficient for these data.
  4. State, giving a reason, whether or not the value of the product moment correlation coefficient supports the use of a regression line to model the relationship between performance score and annual salary. The company decides to use this regression model to determine future salaries.
  5. Find the proposed annual salary, in dollars, for an employee who has a performance score of 35
Edexcel S1 2021 October Q2
12 marks Moderate -0.5
2. A large company is analysing how much money it spends on paper in its offices each year. The number of employees in the office, \(x\), and the amount spent on paper in a year, \(p\) (\$ hundreds), in each of 12 randomly selected offices were recorded. The results are summarised in the following statistics. $$\sum x = 93 \quad \mathrm {~S} _ { x x } = 148.25 \quad \sum p = 273 \quad \sum p ^ { 2 } = 6602.72 \quad \sum x p = 2347$$
  1. Show that \(\mathrm { S } _ { x p } = 231.25\)
  2. Find the product moment correlation coefficient for these data.
  3. Find the equation of the regression line of \(p\) on \(x\) in the form \(p = a + b x\)
  4. Give an interpretation of the gradient of your regression line. The director of the company wants to reduce the amount spent on paper each year. He wants each office to aim for a model of the form \(p = \frac { 4 } { 5 } a + \frac { 1 } { 2 } b x\), where \(a\) and \(b\) are the values found in part (c). Using the data for the 93 employees from the 12 offices,
  5. estimate the percentage saving in the amount spent on paper each year by the company using the director's model.
Edexcel S1 2003 June Q7
16 marks Moderate -0.8
  1. Eight students took tests in mathematics and physics. The marks for each student are given in the table below where \(m\) represents the mathematics mark and \(p\) the physics mark.
\multirow{2}{*}{}Student
\(A\)B\(C\)D\(E\)\(F\)G\(H\)
\multirow{2}{*}{Mark}\(m\)9141310782017
\(p\)1123211519103126
A science teacher believes that students' marks in physics depend upon their mathematical ability. The teacher decides to investigate this relationship using the test marks.
  1. Write down which is the explanatory variable in this investigation.
  2. Draw a scatter diagram to illustrate these data.
  3. Showing your working, find the equation of the regression line of \(p\) on \(m\).
  4. Draw the regression line on your scatter diagram. A ninth student was absent for the physics test, but she sat the mathematics test and scored 15 .
  5. Using this model, estimate the mark she would have scored in the physics test.
AQA S1 2005 June Q4
12 marks Moderate -0.8
4 The time taken for a fax machine to scan an A4 sheet of paper is dependent, in part, on the number of lines of print on the sheet. The table below shows, for each of a random sample of 8 sheets of A4 paper, the number, \(x\), of lines of print and the scanning time, \(y\) seconds, taken by the fax machine.
Sheet\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)
\(\boldsymbol { x }\)1016232731353844
\(\boldsymbol { y }\)2.43.53.24.14.15.64.65.3
  1. Calculate the equation of the least squares regression line of \(y\) on \(x\).
  2. The following table lists some of the residuals for the regression line.
    Sheet\(\mathbf { 1 }\)\(\mathbf { 2 }\)\(\mathbf { 3 }\)\(\mathbf { 4 }\)\(\mathbf { 5 }\)\(\mathbf { 6 }\)\(\mathbf { 7 }\)\(\mathbf { 8 }\)
    Residual- 0.1740.4180.085- 0.2540.906- 0.157
    1. Calculate the values of the residuals for sheets 3 and 7 .
    2. Hence explain what can be deduced about the regression line.
  3. The time, \(z\) seconds, to transmit an A4 page after scanning is given by: $$z = 0.80 + 0.05 x$$ Estimate the total time to scan and transmit an A4 page containing:
    1. 15 lines of print;
    2. 75 lines of print. In each case comment on the likely reliability of your estimate.