Spreadsheet-based chi-squared test

A question is this type if and only if it presents chi-squared test data in a spreadsheet format with some values deliberately omitted to be calculated.

8 questions · Standard +0.2

Sort by: Default | Easiest first | Hardest first
Edexcel S3 2020 October Q2
9 marks Moderate -0.3
2. A university awards its graduates a degree in one of three categories, Distinction, Merit or Pass. Table 1 shows information about a random sample of 200 graduates from three departments, Arts, Humanities and Sciences. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{1}{c|}{}ArtsHumanitiesSciencesTotal
Distinction22323892
Merit15301358
Pass18151750
Total557768
\captionsetup{labelformat=empty} \caption{Table 1}
\end{table} Xiu wants to carry out a test of independence between the category of degree and the department. Table 2 shows some of the values of \(\frac { ( O - E ) ^ { 2 } } { E }\) for this test. \begin{table}[h]
\cline { 2 - 5 } \multicolumn{1}{c|}{}ArtsHumanitiesSciencesTotal
Distinction0.430.331.442.20
Merit0.062.632.294.98
Pass
\captionsetup{labelformat=empty} \caption{Table 2}
\end{table}
  1. Complete Table 2
  2. Hence, complete Xiu’s hypothesis test using a \(5 \%\) level of significance. You should state the hypotheses, the degrees of freedom and the critical value used for this test.
OCR MEI Further Statistics A AS 2018 June Q5
13 marks Standard +0.3
5 A random sample of workers for a large company were asked whether they are smokers, ex-smokers or have never smoked. The responses were classified by the type of worker: Managerial, Production line or Administrative. Fig. 5 is a screenshot showing part of the spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1Observed frequencies
2SmokerEx-smokerNever smokedTotals
3Managerial210517
4Production line18152154
5Administrative1361433
6Totals333140104
7
8Expected frequencies
95.39425.06736.5385
1017.134620.7692
1110.47129.836512.6923
12
13Contributions to the test statistic
142.13584.80170.3620
150.04370.0026
161.49640.1347
17Test statistic9.66
18
\captionsetup{labelformat=empty} \caption{Fig. 5}
\end{table}
  1. (A) State the sample size.
    (B) State the null and alternative hypotheses for a test to investigate whether there is any association between type of worker and smoking status.
  2. Showing your calculations, find the missing values in each of the following cells.
    • C 10
    • C 15
    • B 16
    • Complete the hypothesis test at the \(10 \%\) level of significance.
    • Discuss briefly what the data suggest about smoking status for different types of workers. You should make a comment for each type of worker.
OCR MEI Further Statistics A AS 2022 June Q5
14 marks Standard +0.3
5 A researcher is investigating whether there is any relationship between the overall performance of a student at GCSE and their grade in A Level Mathematics. Their A Level Mathematics grade is classified as A* or A, B, C or lower, and their overall performance at GCSE is classified as Low, Middle, High. Data are collected for a sample of 80 students in a particular area. The researcher carries out a chi-squared test. The screenshot below shows part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted.
1ABCDE
\multirow{2}{*}{
}Observed frequency
A* or ABC or lowerTotals
3Low613928
4Middle106824
5High1510328
6Totals31292080
7
8\multirow{2}{*}{}
9A* or ABC or lower
10Low10.85
11Middle9.30
12High10.85
13\multirow[b]{2}{*}{Contribution to the test statistic}
14
15A* or ABC or lower
16Low2.16800.80020.5714
17Middle0.05270.83790.6667
18High1.5873
2.2857
2.2857
19
  1. State what needs to be known about the sample for the test to be valid. For the remainder of this question, you should assume that the test is valid.
  2. Determine the missing values in each of the following cells.
    • C11
    • C18
    • In this question you must show detailed reasoning.
    Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any association between level of performance at GCSE and A Level Mathematics grade.
  3. Discuss briefly what the data suggest about A Level Mathematics grade for different levels of performance at GCSE.
  4. State one disadvantage of using a 10\% significance level rather than a 5\% significance level in a hypothesis test.
OCR MEI Further Statistics A AS 2023 June Q6
15 marks Standard +0.3
6 An eight-sided dice has its faces numbered \(1,2 , \ldots , 8\).
  1. In this part of the question you should assume that the dice is fair.
    1. State the probability that, when the dice is rolled once, the score is at least 6 .
    2. Show that the probability that the score is within 2 standard deviations of its mean is 1 .
  2. A student thinks that the dice may be biased. To investigate this, the student decides to roll the dice 80 times and then carry out a \(\chi ^ { 2 }\) goodness of fit test of a uniform distribution. The spreadsheet below shows the data for the test, where some of the values have been deliberately omitted.
    \multirow[b]{2}{*}{1}ABCD
    ScoreObserved frequencyExpected frequencyChi-squared contribution
    2114101.6
    324103.6
    4310100
    541510
    656101.6
    7611100.1
    877100.9
    98100.9
    1. Explain why all of the expected frequencies are equal to 10 .
    2. Determine the missing values in each of the following cells.
      • B9
  3. D5
    (iii) In this question you must show detailed reasoning.
  4. Carry out the \(\chi ^ { 2 }\) test at the \(5 \%\) significance level.
OCR MEI Further Statistics A AS 2020 November Q6
12 marks Standard +0.3
6 A researcher is investigating whether there is any relationship between whether a cyclist wears a helmet and the distance, \(x \mathrm {~m}\), the cyclist is from the kerb (the edge of the road). Data are collected at a particular location for a random sample of 250 cyclists. The researcher carries out a chi-squared test. Fig. 6 is a screenshot showing part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEFG
1\multirow{2}{*}{}Observed frequency
2\(\boldsymbol { x } \boldsymbol { \leq } \mathbf { 0 . 3 }\)\(0.3 < x \leq 0.5\)\(0.5 < x \leq 0.8\)x > 0.8Totals
3\multirow[t]{2}{*}{Wears helmet}Yes26272346122
4No45312131128
5\multirow{2}{*}{}Totals71584477250
6
7Expected frequency
8\(\boldsymbol { x } \boldsymbol { \leq } \mathbf { 0 . 3 }\)\(0.3 < x \leq 0.5\)\(0.5 < x \leq 0.8\)\(\boldsymbol { x } \boldsymbol { > } \mathbf { 0 . 8 }\)
9\multirow[t]{2}{*}{Wears helmet}Yes34.648037.5760
10No36.352039.4240
11
12\multirow{2}{*}{}Contribution to the test statistic
13\(\boldsymbol { x } \boldsymbol { \leq } \mathbf { 0 . 3 }\)\(0.3 < x \leq 0.5\)\(0.5 < x \leq 0.8\)\(\boldsymbol { x } \boldsymbol { > } \mathbf { 0 . 8 }\)
14\multirow[t]{2}{*}{Wears helmet}Yes2.15850.06010.10871.8885
15No2.05730.05731.8000
16
\captionsetup{labelformat=empty} \caption{Fig. 6}
\end{table}
  1. Showing your calculations, find the missing values in each of the following cells.
    • E10
    • E15
    • In this question you must show detailed reasoning.
    Carry out a hypothesis test at the \(10 \%\) significance level to investigate whether there is any association between helmet wearing and distance from the kerb.
  2. Discuss briefly what the data suggest about helmet wearing for different distances from the kerb.
OCR MEI Further Statistics Minor 2021 November Q3
13 marks Standard +0.3
3 A student wants to know whether there is any association between age and whether or not people smoke. The student takes a sample of 120 adults and asks each of them whether or not they smoke. Below is a screenshot showing part of a spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted.
ABCDE
1\multirow{3}{*}{}Observed frequency
2Age
316-3435-5960 and over
4\multirow{2}{*}{Smoking status}Smoker1373
5Non-smoker284326
6
7Expected frequency
87.8583
933.1417
10
11Contributions to the test statistic
123.36420.69641.1775
130.16510.2792
11
  1. The student wants to carry out a chi-squared test to analyse the data. State a requirement of the sample if the test is to be valid. For the rest of this question, you should assume that this requirement is met.
  2. Determine the missing values in each of the following cells.
    • E8
    • C13
    • In this question you must show detailed reasoning.
    Carry out a hypothesis test at the \(5 \%\) significance level to investigate whether there is any association between age and smoking status.
  3. Discuss what the data suggest about the smoking status for each different age group.
OCR MEI Further Statistics Major 2019 June Q5
13 marks Standard +0.3
5 In an investigation into the possible relationship between smoking and weight in adults in a particular country, a researcher selected a random sample of 500 adults.
The adults in the sample were classified according to smoking status (non-smoker, light smoker or heavy smoker, where light smoker indicates less than 10 cigarettes per day) and body weight (underweight, normal weight or overweight). Fig. 5 is a screenshot showing part of the spreadsheet used to calculate the contributions for a chisquared test. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1Observed frequencies
2UnderweightNormalOverweightTotals
3Non-smoker852178238
4Light smoker104068118
5Heavy smoker54792144
6Totals23139338500
7
8Expected frequencies
9Non-smoker10.948066.1640160.8880
10Light smoker5.428079.7680
11Heavy smoker40.032097.3440
12
13
14Non-smoker0.79381.8200
15Light smoker3.85101.57851.7361
16Heavy smoker0.39821.21290.2934
17
\captionsetup{labelformat=empty} \caption{Fig. 5}
\end{table}
  1. Showing your calculations, find the missing values in each of the following cells.
    • B11
    • C10
    • C14
    • Complete the hypothesis test at the \(1 \%\) level of significance.
    • For each smoking status, give a brief interpretation of the largest of the three contributions to the test statistic.
OCR MEI Further Statistics Major Specimen Q9
13 marks Standard +0.3
9 A random sample of adults in the UK were asked to state their primary source of news: television (T), internet (I), newspapers (N) or radio (R). The responses were classified by age group, and an analysis was carried out to see if there is any association between age group and primary source of news. Fig. 9 is a screenshot showing part of the spreadsheet used to analyse the data. Some values in the spreadsheet have been deliberately omitted. \begin{table}[h]
ABCDEF
1SourceAge group
2of news18-3233-4748-6465+
3T63617180275
4I33332212100
5N98112048
6R499527
7109111113117450
8
9Expected frequencies
1066.6167.8369.0671.50
1124.2224.6726.00
1211.6311.8412.0512.48
136.546.666.787.02
14
15Contributions to the test statistic
160.200.690.051.01
173.182.827.54
180.590.094.53
190.990.820.730.58
20test statistic25.45
\captionsetup{labelformat=empty} \caption{Fig. 9}
\end{table}
  1. (A) State the sample size.
    (B) Give the name of the appropriate hypothesis test.
    (C) State the null and alternative hypotheses.
  2. Showing your calculations, find the missing values in cells
    • D11,
    • D17 and
    • C18.
    • Complete the appropriate hypothesis test at the \(5 \%\) level of significance.
    • Discuss briefly what the data suggest about primary source of news. You should make a comment for each age group.