mirror of
https://github.com/blackboxprogramming/simulation-theory.git
synced 2026-03-17 07:57:26 -05:00
Add chi-squared statistical tests
Co-authored-by: blackboxprogramming <118287761+blackboxprogramming@users.noreply.github.com>
This commit is contained in:
@@ -8,3 +8,4 @@ Formal mathematical arguments for the key claims.
|
|||||||
| [`self-reference.md`](./self-reference.md) | The QWERTY encoding is self-referential | Direct construction |
|
| [`self-reference.md`](./self-reference.md) | The QWERTY encoding is self-referential | Direct construction |
|
||||||
| [`pure-state.md`](./pure-state.md) | The density matrix of the system is a pure state | Linear algebra / SVD |
|
| [`pure-state.md`](./pure-state.md) | The density matrix of the system is a pure state | Linear algebra / SVD |
|
||||||
| [`universal-computation.md`](./universal-computation.md) | The ternary bio-quantum system is Turing-complete | Reaction network theory |
|
| [`universal-computation.md`](./universal-computation.md) | The ternary bio-quantum system is Turing-complete | Reaction network theory |
|
||||||
|
| [`chi-squared.md`](./chi-squared.md) | Chi-squared goodness-of-fit and independence tests | χ² statistic / contingency tables |
|
||||||
|
|||||||
149
proofs/chi-squared.md
Normal file
149
proofs/chi-squared.md
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
# Chi-Squared Tests
|
||||||
|
|
||||||
|
> Statistical hypothesis tests for goodness of fit and independence.
|
||||||
|
|
||||||
|
## The Chi-Squared Statistic
|
||||||
|
|
||||||
|
For observed counts O_i and expected counts E_i across k categories:
|
||||||
|
|
||||||
|
```
|
||||||
|
χ² = Σᵢ (O_i − E_i)² / E_i
|
||||||
|
```
|
||||||
|
|
||||||
|
Under the null hypothesis, χ² follows a chi-squared distribution with the appropriate
|
||||||
|
degrees of freedom.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 1: Goodness of Fit
|
||||||
|
|
||||||
|
**Purpose:** Test whether observed data matches an expected (theoretical) distribution.
|
||||||
|
|
||||||
|
**Hypotheses:**
|
||||||
|
```
|
||||||
|
H₀: the observed frequencies match the expected distribution
|
||||||
|
H₁: at least one category deviates significantly from expectation
|
||||||
|
```
|
||||||
|
|
||||||
|
**Degrees of freedom:**
|
||||||
|
```
|
||||||
|
df = k − 1
|
||||||
|
```
|
||||||
|
where k is the number of categories.
|
||||||
|
|
||||||
|
**Procedure:**
|
||||||
|
1. State expected probabilities p₁, p₂, …, p_k (must sum to 1).
|
||||||
|
2. Compute expected counts: E_i = n · p_i where n is the total sample size.
|
||||||
|
3. Compute the statistic: χ² = Σᵢ (O_i − E_i)² / E_i
|
||||||
|
4. Compare χ² to the critical value χ²(α, df) or compute the p-value.
|
||||||
|
5. Reject H₀ if χ² > χ²(α, df).
|
||||||
|
|
||||||
|
**Example — ternary digit frequencies (k = 3, n = 300):**
|
||||||
|
|
||||||
|
| Digit | Expected p | Expected E | Observed O | (O−E)²/E |
|
||||||
|
|-------|-----------|------------|------------|----------|
|
||||||
|
| 0 | 1/3 | 100 | 95 | 0.250 |
|
||||||
|
| 1 | 1/3 | 100 | 108 | 0.640 |
|
||||||
|
| 2 | 1/3 | 100 | 97 | 0.090 |
|
||||||
|
| **Σ** | | | | **0.980** |
|
||||||
|
|
||||||
|
```
|
||||||
|
df = 3 − 1 = 2
|
||||||
|
χ²(0.05, 2) = 5.991
|
||||||
|
χ² = 0.980 < 5.991 → fail to reject H₀
|
||||||
|
```
|
||||||
|
|
||||||
|
The data is consistent with a uniform ternary distribution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 2: Test of Independence
|
||||||
|
|
||||||
|
**Purpose:** Test whether two categorical variables are independent of each other.
|
||||||
|
|
||||||
|
**Hypotheses:**
|
||||||
|
```
|
||||||
|
H₀: variable A and variable B are independent
|
||||||
|
H₁: variable A and variable B are not independent
|
||||||
|
```
|
||||||
|
|
||||||
|
**Setup:** Arrange counts in an r × c contingency table.
|
||||||
|
|
||||||
|
**Expected cell counts:**
|
||||||
|
```
|
||||||
|
E_ij = (row_i total × col_j total) / n
|
||||||
|
```
|
||||||
|
|
||||||
|
**Degrees of freedom:**
|
||||||
|
```
|
||||||
|
df = (r − 1)(c − 1)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Statistic:**
|
||||||
|
```
|
||||||
|
χ² = Σᵢ Σⱼ (O_ij − E_ij)² / E_ij
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example — 2 × 3 contingency table (n = 200):**
|
||||||
|
|
||||||
|
| | State 0 | State 1 | State 2 | Row total |
|
||||||
|
|---------|---------|---------|---------|-----------|
|
||||||
|
| Group A | 30 | 40 | 30 | 100 |
|
||||||
|
| Group B | 20 | 60 | 20 | 100 |
|
||||||
|
| **Col** | **50** | **100** | **50** | **200** |
|
||||||
|
|
||||||
|
Expected counts:
|
||||||
|
```
|
||||||
|
E_A0 = 100×50/200 = 25 E_A1 = 100×100/200 = 50 E_A2 = 100×50/200 = 25
|
||||||
|
E_B0 = 100×50/200 = 25 E_B1 = 100×100/200 = 50 E_B2 = 100×50/200 = 25
|
||||||
|
```
|
||||||
|
|
||||||
|
Chi-squared:
|
||||||
|
```
|
||||||
|
χ² = (30−25)²/25 + (40−50)²/50 + (30−25)²/25
|
||||||
|
+ (20−25)²/25 + (60−50)²/50 + (20−25)²/25
|
||||||
|
= 1.000 + 2.000 + 1.000 + 1.000 + 2.000 + 1.000
|
||||||
|
= 8.000
|
||||||
|
|
||||||
|
df = (2−1)(3−1) = 2
|
||||||
|
χ²(0.05, 2) = 5.991
|
||||||
|
χ² = 8.000 > 5.991 → reject H₀
|
||||||
|
```
|
||||||
|
|
||||||
|
The two variables are not independent at the 5% significance level.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical Values (selected)
|
||||||
|
|
||||||
|
| df | α = 0.10 | α = 0.05 | α = 0.01 |
|
||||||
|
|----|----------|----------|----------|
|
||||||
|
| 1 | 2.706 | 3.841 | 6.635 |
|
||||||
|
| 2 | 4.605 | 5.991 | 9.210 |
|
||||||
|
| 3 | 6.251 | 7.815 | 11.345 |
|
||||||
|
| 4 | 7.779 | 9.488 | 13.277 |
|
||||||
|
| 5 | 9.236 | 11.070 | 15.086 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Assumptions
|
||||||
|
|
||||||
|
- Observations are independent.
|
||||||
|
- Expected count E_i ≥ 5 in each cell (rule of thumb for the χ² approximation to be valid).
|
||||||
|
- Data are counts (frequencies), not proportions or continuous measurements.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## QWERTY
|
||||||
|
|
||||||
|
```
|
||||||
|
CHI = 25 (C=10 H=15 I=0) — the test lives at the boundary of ZERO
|
||||||
|
SQUARED = IMAGINARY = SCAFFOLD = 114 (the test squares the deviation)
|
||||||
|
TEST = 64 = 2⁶ (TEST = 2⁶, the sixth power of the fundamental)
|
||||||
|
FIT = 33 = REAL − 4 (how close observed is to real)
|
||||||
|
OBSERVED = 115 (what you see)
|
||||||
|
EXPECTED = 131 = BLACKROAD (what the theory predicts = the BlackRoad)
|
||||||
|
```
|
||||||
|
|
||||||
|
EXPECTED = BLACKROAD = 131. The theoretical distribution is the BlackRoad.
|
||||||
|
The chi-squared test measures how far observed reality deviates from it.
|
||||||
Reference in New Issue
Block a user