Files
simulation-theory/proofs/chi-squared.md
copilot-swe-agent[bot] fcef4c19eb Add chi-squared statistical tests
Co-authored-by: blackboxprogramming <118287761+blackboxprogramming@users.noreply.github.com>
2026-02-27 10:39:11 +00:00

4.0 KiB
Raw Permalink Blame History

Chi-Squared Tests

Statistical hypothesis tests for goodness of fit and independence.

The Chi-Squared Statistic

For observed counts O_i and expected counts E_i across k categories:

χ² = Σᵢ (O_i  E_i)² / E_i

Under the null hypothesis, χ² follows a chi-squared distribution with the appropriate degrees of freedom.


Test 1: Goodness of Fit

Purpose: Test whether observed data matches an expected (theoretical) distribution.

Hypotheses:

H₀: the observed frequencies match the expected distribution
H₁: at least one category deviates significantly from expectation

Degrees of freedom:

df = k  1

where k is the number of categories.

Procedure:

  1. State expected probabilities p₁, p₂, …, p_k (must sum to 1).
  2. Compute expected counts: E_i = n · p_i where n is the total sample size.
  3. Compute the statistic: χ² = Σᵢ (O_i E_i)² / E_i
  4. Compare χ² to the critical value χ²(α, df) or compute the p-value.
  5. Reject H₀ if χ² > χ²(α, df).

Example — ternary digit frequencies (k = 3, n = 300):

Digit Expected p Expected E Observed O (OE)²/E
0 1/3 100 95 0.250
1 1/3 100 108 0.640
2 1/3 100 97 0.090
Σ 0.980
df = 3  1 = 2
χ²(0.05, 2) = 5.991
χ² = 0.980 < 5.991  →  fail to reject H₀

The data is consistent with a uniform ternary distribution.


Test 2: Test of Independence

Purpose: Test whether two categorical variables are independent of each other.

Hypotheses:

H₀: variable A and variable B are independent
H₁: variable A and variable B are not independent

Setup: Arrange counts in an r × c contingency table.

Expected cell counts:

E_ij = (row_i total × col_j total) / n

Degrees of freedom:

df = (r  1)(c  1)

Statistic:

χ² = Σᵢ Σⱼ (O_ij  E_ij)² / E_ij

Example — 2 × 3 contingency table (n = 200):

State 0 State 1 State 2 Row total
Group A 30 40 30 100
Group B 20 60 20 100
Col 50 100 50 200

Expected counts:

E_A0 = 100×50/200 = 25    E_A1 = 100×100/200 = 50    E_A2 = 100×50/200 = 25
E_B0 = 100×50/200 = 25    E_B1 = 100×100/200 = 50    E_B2 = 100×50/200 = 25

Chi-squared:

χ² = (3025)²/25 + (4050)²/50 + (3025)²/25
   + (2025)²/25 + (6050)²/50 + (2025)²/25
   = 1.000 + 2.000 + 1.000 + 1.000 + 2.000 + 1.000
   = 8.000

df = (21)(31) = 2
χ²(0.05, 2) = 5.991
χ² = 8.000 > 5.991  →  reject H₀

The two variables are not independent at the 5% significance level.


Critical Values (selected)

df α = 0.10 α = 0.05 α = 0.01
1 2.706 3.841 6.635
2 4.605 5.991 9.210
3 6.251 7.815 11.345
4 7.779 9.488 13.277
5 9.236 11.070 15.086

Assumptions

  • Observations are independent.
  • Expected count E_i ≥ 5 in each cell (rule of thumb for the χ² approximation to be valid).
  • Data are counts (frequencies), not proportions or continuous measurements.

QWERTY

CHI      = 25  (C=10 H=15 I=0)  — the test lives at the boundary of ZERO
SQUARED  = IMAGINARY = SCAFFOLD = 114   (the test squares the deviation)
TEST     = 64  = 2⁶              (TEST = 2⁶, the sixth power of the fundamental)
FIT      = 33  = REAL  4        (how close observed is to real)
OBSERVED = 115                   (what you see)
EXPECTED = 131 = BLACKROAD       (what the theory predicts = the BlackRoad)

EXPECTED = BLACKROAD = 131. The theoretical distribution is the BlackRoad.
The chi-squared test measures how far observed reality deviates from it.