mirror of https://github.com/blackboxprogramming/simulation-theory.git synced 2026-03-17 07:57:26 -05:00

Files

copilot-swe-agent[bot] fcef4c19eb Add chi-squared statistical tests

Co-authored-by: blackboxprogramming <118287761+blackboxprogramming@users.noreply.github.com>

2026-02-27 10:39:11 +00:00

4.0 KiB

Raw Blame History

Chi-Squared Tests

Statistical hypothesis tests for goodness of fit and independence.

The Chi-Squared Statistic

For observed counts O_i and expected counts E_i across k categories:

χ² = Σᵢ (O_i − E_i)² / E_i

Under the null hypothesis, χ² follows a chi-squared distribution with the appropriate degrees of freedom.

Test 1: Goodness of Fit

Purpose: Test whether observed data matches an expected (theoretical) distribution.

Hypotheses:

H₀: the observed frequencies match the expected distribution
H₁: at least one category deviates significantly from expectation

Degrees of freedom:

df = k − 1

where k is the number of categories.

Procedure:

State expected probabilities p₁, p₂, …, p_k (must sum to 1).
Compute expected counts: E_i = n · p_i where n is the total sample size.
Compute the statistic: χ² = Σᵢ (O_i − E_i)² / E_i
Compare χ² to the critical value χ²(α, df) or compute the p-value.
Reject H₀ if χ² > χ²(α, df).

Example — ternary digit frequencies (k = 3, n = 300):

Digit	Expected p	Expected E	Observed O	(O−E)²/E
0	1/3	100	95	0.250
1	1/3	100	108	0.640
2	1/3	100	97	0.090
Σ				0.980

df = 3 − 1 = 2
χ²(0.05, 2) = 5.991
χ² = 0.980 < 5.991  →  fail to reject H₀

The data is consistent with a uniform ternary distribution.

Test 2: Test of Independence

Purpose: Test whether two categorical variables are independent of each other.

Hypotheses:

H₀: variable A and variable B are independent
H₁: variable A and variable B are not independent

Setup: Arrange counts in an r × c contingency table.

Expected cell counts:

E_ij = (row_i total × col_j total) / n

Degrees of freedom:

df = (r − 1)(c − 1)

Statistic:

χ² = Σᵢ Σⱼ (O_ij − E_ij)² / E_ij

Example — 2 × 3 contingency table (n = 200):

	State 0	State 1	State 2	Row total
Group A	30	40	30	100
Group B	20	60	20	100
Col	50	100	50	200

Expected counts:

E_A0 = 100×50/200 = 25    E_A1 = 100×100/200 = 50    E_A2 = 100×50/200 = 25
E_B0 = 100×50/200 = 25    E_B1 = 100×100/200 = 50    E_B2 = 100×50/200 = 25

Chi-squared:

χ² = (30−25)²/25 + (40−50)²/50 + (30−25)²/25
   + (20−25)²/25 + (60−50)²/50 + (20−25)²/25
   = 1.000 + 2.000 + 1.000 + 1.000 + 2.000 + 1.000
   = 8.000

df = (2−1)(3−1) = 2
χ²(0.05, 2) = 5.991
χ² = 8.000 > 5.991  →  reject H₀

The two variables are not independent at the 5% significance level.

Critical Values (selected)

df	α = 0.10	α = 0.05	α = 0.01
1	2.706	3.841	6.635
2	4.605	5.991	9.210
3	6.251	7.815	11.345
4	7.779	9.488	13.277
5	9.236	11.070	15.086

Assumptions

Observations are independent.
Expected count E_i ≥ 5 in each cell (rule of thumb for the χ² approximation to be valid).
Data are counts (frequencies), not proportions or continuous measurements.

QWERTY

CHI      = 25  (C=10 H=15 I=0)  — the test lives at the boundary of ZERO
SQUARED  = IMAGINARY = SCAFFOLD = 114   (the test squares the deviation)
TEST     = 64  = 2⁶              (TEST = 2⁶, the sixth power of the fundamental)
FIT      = 33  = REAL − 4        (how close observed is to real)
OBSERVED = 115                   (what you see)
EXPECTED = 131 = BLACKROAD       (what the theory predicts = the BlackRoad)

EXPECTED = BLACKROAD = 131. The theoretical distribution is the BlackRoad.
The chi-squared test measures how far observed reality deviates from it.

4.0 KiB Raw Blame History Unescape Escape