OCR FS1 AS (Further Statistics 1 AS) 2021 June

Question 1
View details
1 Five observations of bivariate data \(( x , y )\) are given in the table.
\(x\)781264
\(y\)201671723
  1. Find the value of Pearson's product-moment correlation coefficient.
  2. State what your answer to part (a) tells you about a scatter diagram representing the data.
  3. A new variable \(a\) is defined by \(a = 3 x + 4\). Dee says "The value of Pearson's product-moment correlation coefficient between \(a\) and \(y\) will not be the same as the answer to part (a)." State with a reason whether you agree with Dee. An investor obtains data about the profits of 8 randomly chosen investment accounts over two one-year periods. The profit in the first year for each account is \(p \%\) and the profit in the second year for each account is \(q \%\). The results are shown in the table and in the scatter diagram.
    AccountABCDEFGH
    \(p\)1.62.12.42.72.83.35.28.4
    \(q\)1.62.32.22.23.12.97.64.8
    \(n = 8 \quad \Sigma p = 28.5 \quad \Sigma q = 26.7 \quad \Sigma p ^ { 2 } = 136.35 \quad \Sigma q ^ { 2 } = 116.35 \quad \Sigma p q = 116.70\)
    \includegraphics[max width=\textwidth, alt={}, center]{4c7546b9-03ee-47a1-915f-41e2b4ca19c0-03_762_1248_906_260}
  4. State which, if either, of the variables \(p\) and \(q\) is independent.
  5. Calculate the equation of the regression line of \(q\) on \(p\).
    1. Use the regression line to estimate the value of \(q\) for an investment account for which \(p = 2.5\).
    2. Give two reasons why this estimate could be considered reliable.
  6. Comment on the reliability of using the regression line to predict the value of \(q\) when \(p = 7.0\).
Question 3 38 marks
View details
3 At a cinema there are three film sessions each Saturday, "early", "middle" and "late". The numbers of the audience, in different age groups, at the three showings on a randomly chosen Saturday are given in Table 1. \begin{table}[h] \end{table}
QuestionSolutionMarksAOsGuidance
1(a)-0.954 BCB2 [2]1.1 1.1SC: If B0, give B1 if two of 7.04, 29.0[4], -13.6[4] (or 35.2, 145[.2], -68.2) seen
1(b)Points lie close to a straight line Line has negative gradientB1 B1 [2]2.2b 1.1Must refer to line, not just "negative correlation"
1(c)No, it will be the same as \(x \rightarrow a\) is a linear transformationB1 [1]2.2aOE. Either "same" with correct reason, or "disagree" with correct reason. Allow any clear valid technical term
2(a)NeitherB1 [1]1.2
2(b)\(q = 1.13 + 0.620 p\)B1B1 B1 [3]1.1,1.1 1.10.62(0) correct; both numbers correct Fully correct answer including letters
2(c)(i)2.68B1ft [1]1.1awrt 2.68, ft on their (b) if letters correct
2(c)(ii)2.5 is within data range, and points (here) are close to line/well correlatedB1 B1 [2]2.2b 2.2bAt least one reason, allow "no because points not close to line" Full argument, two reasons needed
2(d)
Not much data here/points scattered/ possible outliers
So not very reliable
M1 A1 [2]2.3 1.1Reason for not very reliable (not "extrapolation") Full argument and conclusion, not too assertive (not wholly unreliable!)
3(a)Expected frequency for Middle/25 to 60 is 4.4 which is < 5 so must combine cellsB1*ft depB1 [2]2.4 3.5bCorrectly obtain this \(F _ { E }\), ft on addition errors " < 5" explicit and correct deduction
3(b)
EarlyMiddleLate
29.423.131.5
26.620.928.5
EarlyMiddleLate
0.99180.41602.2937
1.09620.45982.5351
B11.1
Both, allow 28.4 for 28.5
awrt 2.29, but allow 2.3 In range [2.53, 2.54]
QuestionSolutionMarksAOsGuidance
3(c)
\(\mathrm { H } _ { 0 }\) : no association between session and age group. \(\mathrm { H } _ { 1 }\) : some association
\(\Sigma X ^ { 2 } = 7.793\)
\(v = 2 , \chi ^ { 2 } ( 2 ) _ { \text {crit } } = 5.991\)
Reject \(\mathrm { H } _ { 0 }\).
Significant evidence of association between session attended and age group.
B1
B1
B1
M1ft
A1ft [5]
1.1
1.1
1.1
1.1
2.2b
Both. Allow "independent" etc
Correct value of \(X ^ { 2 }\), awrt 7.79 (allow even if wrong in (b))
Correct CV and comparison
Correct first conclusion, FT on their TS only
Contextualised, not too assertive
3(d)The two biggest contributions to \(\chi ^ { 2 }\) are both for the late session ... ... when the proportion of younger people is higher, and of older people is lower, than the null hypothesis would suggest.
M1ft
A1ft
[2]
1.1
2.4
Refer to biggest contribution(s), FT on their answers to (b), needs "reject \(\mathrm { H } _ { 0 }\) "
Full answer, referring to at least one cell (ignore comments on next highest cells)
\multirow[t]{2}{*}{4}\multirow{2}{*}{}\multirow{2}{*}{OR:}
\(\frac { { } ^ { 2 m } C _ { 2 } \times m } { { } ^ { 3 m } C _ { 3 } }\)
\(= \frac { 2 m ( 2 m - 1 ) } { 2 } \times m \div \frac { 3 m ( 3 m - 1 ) ( 3 m - 2 ) } { 6 }\)
\(= \frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) }\) \(\frac { 2 m ( 2 m - 1 ) } { ( 3 m - 1 ) ( 3 m - 2 ) } = \frac { 28 } { 55 }\)
\(\Rightarrow 16 m ^ { 2 } - 71 m + 28 = 0\)
\(m = 4\) BC
Reject \(m = \frac { 7 } { 16 }\) as \(m\) is an integer
M1
M1
A1
M1
A1
M1
A1
[7]
3.1b
3.1b
2.1
3.1a
2.1
1.1
3.2a
Use \({ } ^ { 2 m } C _ { 2 }\) and \(m\)
Divide by \({ } ^ { 3 m } C _ { 3 }\)
Correct expression in terms of \(m\) (allow with \(m\) not cancelled yet)
Equate to \(\frac { 28 } { 55 }\) \simplify to three-term quadratic
Correct simplified quadratic, or (quadratic) \(\times m , = 0\), aef Solve to get both 4 and \(\frac { 7 } { 16 }\)
Explicitly reject \(m = \frac { 7 } { 16 }\)
\(\frac { 2 m ( 2 m - 1 ) \times m \times 3 ! } { 3 m ( 3 m - 1 ) ( 3 m - 2 ) \times 2 }\) then as above
Multiplication method can get full marks, but if no 3 or 3 !, max
M1M0A0 M1A0M0A0