Merge branch origin/copilot/create-copilot-soul into main

This commit is contained in:
Alexa Amundson
2025-11-25 14:17:38 -06:00
11 changed files with 1469 additions and 1 deletions

280
.github/copilot-instructions.md vendored Normal file
View File

@@ -0,0 +1,280 @@
# 🧪 System Prompt for `blackroad-os-research` 🧬📚
You are an AI research engineer working **inside this repository**: `blackroad-os-research` in the BlackRoad OS ecosystem. 🌌🖤
Your mission:
- Be the **lab + codex** for BlackRoad OS math, theory, and experiments 🧠
- Capture and evolve concepts like **Spiral Information Geometry, QLM, Lucidia, agent theory**, etc. 🔺
- Keep experiments **reproducible, documented, and isolated from production** 🧪
- Never commit **secrets, huge binaries, or private datasets** 🚫
You operate **only inside this repo**.
You **inform** other repos (core, api, operator, docs, packs) with ideas and reference implementations, but you do **not** ship production code here. 🚦
---
## 1⃣ Purpose & Scope 🎯
`blackroad-os-research` is:
- The **R&D lab** for BlackRoad OS:
- New algorithms
- New representations (SIG, QLM, agent-state geometry)
- New orchestration strategies
- The place where you:
- Run experiments
- Analyze results
- Capture **research-grade notebooks + papers**
It is **NOT**:
- A production service repo (no live API, no jobs, no web UI) 🚫
- A dumping ground for messy, unstructured files 😵‍💫
- A data lake for giant raw dumps (keep references only) 🧊
Think: **"BlackRoad OS Research Lab & Library"** 🧪📖
---
## 2⃣ Recommended Layout 📁
You should keep a **clean, discoverable structure** like:
- `papers/` 📄
- `sig/` Spiral Information Geometry drafts + notes
- `qlm/` Quantum Language Models & related theory
- `agents/` theoretical work on agents, identity, memory, logic
- `notebooks/` 📓
- `sig/` Jupyter / scripts exploring spiral geometry, embeddings, etc.
- `qlm/`
- `orchestration/`
- `experiments/` 🧪
- `sig/experiment-YYYY-MM-DD-name/`
- `qlm/experiment-YYYY-MM-DD-name/`
- `data/` (⚠️ **metadata only**, no big raw data)
- `README.md` where actual datasets live (S3, external, etc.)
- small sample CSVs / synthetic datasets only
- `src/` 🧬
- Reusable research utilities:
- geometry helpers
- metrics
- model wrappers
- `docs/` 🔍
- `research-overview.md`
- `sig-overview.md`
- `qlm-overview.md`
- `experiment-template.md`
Match any existing structure but converge to something like this over time 🧱✨
---
## 3⃣ Papers & Theory 📄🧠
Under `papers/`, you should:
- Store **text-based** versions of research:
- Markdown (`.md`)
- LaTeX (`.tex`)
- Capture:
- Definitions
- Theorems/conjectures (if any)
- Intuition & diagrams (as text/mermaid)
- Link to experiments and notebooks that support each idea.
Example file: `papers/sig/spiral-information-geometry-v1.md`
Contents should include:
- Abstract
- Motivation
- Definitions & notation
- Relationship to BlackRoad OS agents/orchestration
- Links:
- `../../notebooks/sig/...`
- `../../experiments/sig/...`
Keep it **clear and explainable**, not just raw symbol soup 🌀
---
## 4⃣ Notebooks & Experiments 📓🧪
You must make experiments **reproducible and labeled**.
### 4.1 Experiment Folders
For each experiment, use a folder like:
- `experiments/sig/experiment-2025-11-24-spiral-metrics/`
- `experiments/qlm/experiment-2025-11-24-entanglement-proxy/`
Each experiment folder should contain:
- `README.md` what, why, how, metrics
- `config.json` or `config.yaml` key parameters
- `results/` small result summaries, plots (prefer text/CSV or tiny PNG if absolutely necessary)
- `notebook.ipynb` **or** a script referencing `notebooks/`
`README.md` should answer:
1. What question are we exploring? 🤔
2. What data / model / setup was used? 🧰
3. What metrics or signals did we track? 📊
4. What did we learn? 🌈
5. What should future agents/humans try next? 🔁
---
### 4.2 Notebooks
Under `notebooks/`, organize by theme:
- `notebooks/sig/`
- `notebooks/qlm/`
- `notebooks/orchestration/`
For each notebook:
- Start with a **header cell** explaining:
- Purpose
- Dependencies (Python packages, etc.)
- Data sources (with pointers, not actual big data)
- Use small synthetic or subsampled data where possible.
Avoid:
- Committing 100MB+ notebooks
- Embedding massive outputs directly in the notebook
---
## 5⃣ Research Code (src/) 🧬💻
`src/` is for **reusable research helpers**, not production SDKs.
Examples:
- Geometry:
- spiral coordinate transforms
- distance metrics
- manifold projections
- Models:
- small wrappers for model calls used frequently in experiments
- Metrics:
- custom evaluation metrics for agents / orchestrations
You must:
- Use types (TypeScript or Python type hints) wherever possible 🧾
- Keep functions small and composable
- Avoid hardcoding any secrets or environment-specific URLs
This code can later inspire or be ported to `blackroad-os-core` or other prod repos—but **only after hardening** ⚙️
---
## 6⃣ Data & Privacy 🔐📊
You **never** commit:
- Real private user data
- API keys
- Secrets or tokens
- Large proprietary datasets
Instead:
- Use synthetic or anonymized toy data
- Document where real data lives (external storage, with access policies)
- Store **only**:
- Small configs
- Sample rows
- Metadata (`data/README.md`)
If something looks like a secret or sensitive data:
> ⚠️ Add a TODO + note indicating it should be removed/moved and rotated.
---
## 7⃣ Docs Inside Research 📚🧠
This repo should also contain **meta-docs for researchers & agents**, e.g.:
- `docs/research-overview.md`
- What lives where
- How to add a new paper/experiment
- `docs/experiment-template.md`
- Copy-paste template for new experiment folders
- `docs/notebook-style-guide.md`
- Naming conventions
- Comment style
- Expectations for results sections
Goal: any agent or human can **land here and start contributing** without chaos 🧭
---
## 8⃣ Coding / Notebook Style 🎨
You must encourage:
- Clear function names & comments
- Minimal dependencies (only what's really needed)
- Separate config from code where reasonable
For Python:
- Use virtualenv/poetry/whatever is already present
- Provide `requirements.txt` or `pyproject.toml`
For TS/JS:
- Use `package.json` and document scripts
For notebooks:
- Keep top cells as:
- Imports
- Config
- High-level description
---
## 9⃣ Cross-Repo Links 🌉
This repo should link conceptually to:
- `blackroad-os-core`
- When a theoretical construct becomes a **core type or contract**
- `blackroad-os-docs`
- When a concept needs user-facing / operator-facing explanation
- `blackroad-os-operator`
- When orchestration research becomes a candidate workflow/strategy
Use **text links** and references, not hard dependencies:
- "See BlackRoad OS Docs: `service-operator.md` for how this might surface in production."
- "Candidate type for `blackroad-os-core`: `SpiralCoordinate`."
---
## 🔟 Pre-Commit Checklist ✅
Before finalizing changes in `blackroad-os-research`, confirm:
1. 🧾 Files are **text-based** (code, markdown, notebooks) with no huge binaries.
2. 🧪 Each new experiment has:
- A clear folder name
- A `README.md` describing purpose + findings
3. 📓 Notebooks have a header cell explaining what they do.
4. 🔐 No secrets, real private data, or proprietary datasets were added.
5. 🧬 Research helpers in `src/` are typed and reasonably documented.
6. 🔗 Papers and experiments are cross-linked where it makes sense.
You are optimizing for:
- 🧠 A **living research lab** that evolves BlackRoad OS theory
- 🧪 Experiments that are **reproducible and understandable**
- 🔗 A clean bridge from **wild ideas → tested experiments → production-ready concepts**

59
.gitignore vendored
View File

@@ -1,5 +1,62 @@
# Node.js
node_modules/ node_modules/
.next/ .next/
.DS_Store
coverage/ coverage/
# Environment variables & secrets
.env.local .env.local
.env
*.env
# OS files
.DS_Store
Thumbs.db
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
.venv
*.egg-info/
dist/
build/
# Jupyter Notebooks
.ipynb_checkpoints/
*/.ipynb_checkpoints/*
# Data files (keep only metadata/samples)
*.csv
*.parquet
*.h5
*.hdf5
*.pkl
*.pickle
!data/samples/*.csv
!data/examples/*.csv
# Large binaries
*.bin
*.weights
*.pt
*.pth
*.onnx
*.safetensors
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# Temporary files
tmp/
temp/
*.tmp
*.log

197
data/README.md Normal file
View File

@@ -0,0 +1,197 @@
# Data Directory
⚠️ **This directory contains METADATA and REFERENCES only** ⚠️
**DO NOT** commit large datasets, proprietary data, or sensitive information to this repository.
## Purpose
This directory documents:
- Where actual datasets are stored (external locations)
- Small sample datasets for testing and examples
- Synthetic datasets for demonstrations
- Data schemas and formats
## What Goes Here
**Allowed:**
- `README.md` files describing data sources
- Small sample CSVs (< 100 KB, < 1000 rows)
- Synthetic example datasets
- Data schema specifications
- References to external storage (S3, URLs, etc.)
- Metadata files (JSON, YAML)
**Not Allowed:**
- Large datasets (> 1 MB)
- Raw production data
- Personal or sensitive information
- API keys or credentials
- Proprietary datasets
- Binary data files (`.pkl`, `.h5`, `.parquet`)
## Directory Structure
```
data/
├── README.md # This file
├── samples/ # Small sample datasets for testing
│ └── *.csv
├── examples/ # Synthetic example datasets
│ └── *.csv
└── schemas/ # Data format specifications
└── *.json
```
Note: `/schemas` at the repository root contains JSON schemas for core concepts. This `data/schemas/` is for dataset format specifications.
## External Data Sources
### SIG Embeddings Dataset
- **Location:** `s3://blackroad-research/sig-embeddings/v2/`
- **Size:** ~10 GB (compressed)
- **Format:** Parquet files partitioned by date
- **Schema:** See `/schemas/sig.schema.json`
- **Access:** Requires AWS credentials with read access to `blackroad-research` bucket
- **Description:** Agent embeddings in spiral coordinate system with factor tree annotations
- **Last Updated:** 2025-11-20
### QLM Interaction Traces
- **Location:** `s3://blackroad-research/qlm-traces/`
- **Size:** ~5 GB
- **Format:** JSONL (JSON Lines)
- **Schema:** See `schemas/interaction-trace.json` (if created)
- **Access:** Requires credentials
- **Description:** Logged agent interactions for QLM analysis
- **Last Updated:** 2025-11-15
### Agent Worldline Dataset
- **Location:** External database (contact research team)
- **Size:** ~50 GB
- **Format:** PostgreSQL dump
- **Access:** Database credentials required
- **Description:** Historical agent worldlines with PS-SHA∞ hashes
- **Last Updated:** Ongoing
## Sample Data
Small samples from larger datasets are stored in `samples/` for:
- Quick testing in notebooks
- Example code demonstrations
- CI/CD pipeline testing
Example:
```python
import pandas as pd
# Load sample data (safe to commit)
df = pd.read_csv('data/samples/sig-sample-100.csv')
print(f"Sample has {len(df)} rows")
# Reference full dataset (not in repo)
# Full dataset: s3://blackroad-research/sig-embeddings/v2/
```
## Synthetic Data
The `examples/` directory contains **generated** synthetic data for:
- Demonstrations
- Tutorials
- Unit tests
- Public examples
These datasets are:
- Artificially created (not real data)
- Safe to share publicly
- Reproducible from code
Example:
```python
import numpy as np
import pandas as pd
# Generate synthetic spiral data
def generate_spiral_points(n=100, noise=0.1):
"""Generate synthetic points on a spiral."""
t = np.linspace(0, 4*np.pi, n)
r = np.exp(0.1 * t)
x = r * np.cos(t) + noise * np.random.randn(n)
y = r * np.sin(t) + noise * np.random.randn(n)
return pd.DataFrame({'x': x, 'y': y, 't': t, 'r': r})
# Save synthetic data
df = generate_spiral_points(n=200)
df.to_csv('data/examples/spiral-synthetic-200.csv', index=False)
```
## Adding External Data References
When you need to reference a new external dataset:
1. Add entry to this README under "External Data Sources"
2. Include:
- Location (URL, S3 path, database connection)
- Size
- Format
- Schema reference
- Access requirements
- Description
- Last updated date
3. Optionally create a small sample in `samples/`
4. Document the schema in `/schemas` if it's a new format
## Privacy & Security
🔒 **Never commit:**
- Personal identifiable information (PII)
- API keys, tokens, passwords
- Proprietary datasets
- Production database contents
- Real user data
If you accidentally commit sensitive data:
1. **Immediately** delete the file
2. **Rotate** any exposed credentials
3. **Notify** the team
4. **Rewrite** git history if necessary (force push)
## Data Usage in Code
### Good Practice
```python
# Reference external data
DATA_CONFIG = {
'source': 's3://blackroad-research/sig-embeddings/v2/',
'format': 'parquet',
'size_gb': 10,
'description': 'Agent embeddings in spiral coordinates'
}
# Load small sample for testing
sample_df = pd.read_csv('data/samples/sig-sample-100.csv')
# Generate synthetic data for demonstrations
synthetic_df = generate_test_data(n_samples=1000)
```
### Bad Practice
```python
# Don't do this - loading huge dataset
# df = pd.read_parquet('data/full-embeddings-10gb.parquet') # ❌
# Don't do this - committing sensitive data
# api_key = "sk-1234567890" # ❌
# df.to_csv('data/user-private-data.csv') # ❌
```
## Related Documentation
- [Research Overview](../docs/research-overview.md)
- [Experiment Template](../docs/experiment-template.md)
- [Notebook Style Guide](../docs/notebook-style-guide.md)
- Root `/schemas` directory for JSON Schemas

180
docs/experiment-template.md Normal file
View File

@@ -0,0 +1,180 @@
# Experiment Template
Use this template when creating a new experiment in the BlackRoad OS Research repository.
## Directory Structure
```
experiments/{domain}/experiment-YYYY-MM-DD-{descriptive-name}/
├── README.md # This file (copy and customize)
├── config.json # Experiment parameters
├── notebook.ipynb # Analysis notebook (optional)
├── scripts/ # Supporting scripts
└── results/ # Output summaries and findings
├── summary.md
└── metrics.json
```
## README Template
Copy the template below and customize for your experiment:
---
# Experiment: {Short Title}
**Date:** YYYY-MM-DD
**Domain:** {sig/qlm/agents/orchestration/etc.}
**Status:** {planned/running/completed/archived}
**Author:** {Name or Agent ID}
## 🎯 Research Question
What specific question are we exploring?
Example: "Can spiral coordinate distance metrics predict agent semantic similarity better than cosine similarity?"
## 🧰 Setup
### Data
- **Source:** {Describe data source}
- **Size:** {Number of samples, dimensions}
- **Format:** {CSV, JSON, synthetic, etc.}
- **Location:** {Path or external reference}
### Models/Tools
- **Model:** {If using a model, specify}
- **Libraries:** {Key dependencies}
- **Environment:** {Python version, etc.}
### Parameters
See `config.json` for full parameters. Key settings:
- Parameter 1: value
- Parameter 2: value
## 📊 Metrics & Evaluation
What signals are we tracking?
- **Primary Metric:** {e.g., F1 score, distance correlation}
- **Secondary Metrics:** {e.g., runtime, memory usage}
- **Success Criteria:** {What would validate the hypothesis?}
## 🔬 Methodology
Brief description of the experimental approach:
1. Step 1
2. Step 2
3. Step 3
## 📈 Results
### Key Findings
- Finding 1
- Finding 2
- Finding 3
### Quantitative Results
| Metric | Value | Baseline | Improvement |
|--------|-------|----------|-------------|
| {metric} | {val} | {baseline} | {%} |
### Visualizations
{Link to plots in results/ or describe findings}
## 💡 Insights & Interpretation
What did we learn?
- Insight 1
- Insight 2
- Unexpected observation 3
## 🚀 Next Steps
What should future experiments try?
- [ ] Follow-up experiment idea 1
- [ ] Variation to test 2
- [ ] Related question 3
## 🔗 Related Work
- **Papers:** Link to related papers in `/papers/`
- **Notebooks:** Link to analysis notebooks
- **Dependencies:** Link to experiments this builds on
## ⚠️ Limitations & Caveats
- Limitation 1
- Assumption 2
- Known issue 3
## 📚 References
- External paper 1
- External resource 2
---
## Config Template (config.json)
```json
{
"experiment": {
"id": "experiment-YYYY-MM-DD-name",
"domain": "sig|qlm|agents|orchestration",
"date": "YYYY-MM-DD",
"status": "planned|running|completed"
},
"data": {
"source": "path/to/data or description",
"samples": 1000,
"features": 128
},
"parameters": {
"param1": "value1",
"param2": 42,
"param3": true
},
"environment": {
"python_version": "3.11",
"key_packages": [
"numpy==1.24.0",
"pandas==2.0.0"
]
}
}
```
## Results Template (results/summary.md)
```markdown
# Experiment Results Summary
## Date: YYYY-MM-DD
## Quick Stats
- Total runs: X
- Success rate: Y%
- Average runtime: Z seconds
## Main Findings
1. Finding 1 with supporting data
2. Finding 2 with evidence
3. Finding 3 with analysis
## Recommendations
Based on these results, we recommend:
- Action 1
- Action 2
- Further investigation into 3
```

View File

@@ -0,0 +1,288 @@
# Notebook Style Guide
Guidelines for creating and maintaining Jupyter notebooks in the BlackRoad OS Research repository.
## 📋 General Principles
1. **Self-Contained**: Each notebook should be understandable on its own
2. **Reproducible**: Include all dependencies and setup instructions
3. **Documented**: Use markdown cells liberally to explain your thinking
4. **Minimal**: Avoid large outputs; summarize results instead
5. **Linked**: Cross-reference related papers and experiments
## 📁 Organization
### Directory Structure
Place notebooks in theme-specific subdirectories:
```
notebooks/
├── sig/ # Spiral Information Geometry
├── qlm/ # Quantum Language Models
├── orchestration/ # Orchestration strategies
└── agents/ # Agent behavior and identity
```
### Naming Convention
Use descriptive, lowercase names with hyphens:
```
notebooks/sig/spiral-distance-metrics.ipynb
notebooks/qlm/entanglement-proxy-exploration.ipynb
notebooks/orchestration/contradiction-resolution.ipynb
```
## 📝 Notebook Structure
### 1. Header Cell (Markdown)
Every notebook should start with a header cell:
```markdown
# {Title of Notebook}
**Date:** YYYY-MM-DD
**Author:** {Name/Agent}
**Domain:** {sig/qlm/agents/orchestration}
**Status:** {exploration/draft/validated}
## Purpose
Brief description of what this notebook explores.
## Dependencies
- Python 3.11+
- numpy 1.24+
- pandas 2.0+
- matplotlib 3.7+
- {any custom packages}
Install with:
\```bash
pip install -r requirements.txt
\```
## Data Sources
- Dataset 1: {description and location}
- Dataset 2: {synthetic, generated in cell X}
## Related Work
- Paper: `/papers/sig/spiral-geometry.md`
- Experiment: `/experiments/sig/experiment-2025-11-24-distance/`
```
### 2. Imports Cell (Code)
Group imports logically:
```python
# Standard library
import os
import sys
from pathlib import Path
# Data handling
import numpy as np
import pandas as pd
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Custom utilities
sys.path.append('../src')
from geometry.spiral import SpiralCoordinate
from metrics.distance import spiral_distance
```
### 3. Configuration Cell (Code)
Centralize configuration:
```python
# Configuration
CONFIG = {
'data_path': '../data/samples/',
'output_path': './outputs/',
'random_seed': 42,
'n_samples': 1000,
'dimensions': 128
}
# Set random seeds for reproducibility
np.random.seed(CONFIG['random_seed'])
```
### 4. Analysis Sections (Markdown + Code)
Use clear section headers:
```markdown
## 1. Data Loading and Preparation
Description of this section...
```
```python
# Load data
df = pd.read_csv(CONFIG['data_path'] + 'dataset.csv')
print(f"Loaded {len(df)} samples with {len(df.columns)} features")
```
## 🎨 Style Guidelines
### Markdown Cells
- Use headers (`##`, `###`) to structure content
- Include brief explanations before code blocks
- Summarize findings after visualizations
- Use bullet points for lists
- Include LaTeX for mathematical notation: `$x = y$` or `$$E = mc^2$$`
### Code Cells
- Keep cells focused on a single task
- Add comments for complex operations
- Use descriptive variable names
- Follow PEP 8 style guidelines
- Limit cell length (aim for < 20 lines when possible)
### Visualizations
- Always label axes
- Include titles
- Use consistent color schemes
- Add legends when needed
- Save important plots to `results/` folder
```python
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Metric A')
plt.xlabel('Dimension')
plt.ylabel('Distance')
plt.title('Spiral Distance Metrics Comparison')
plt.legend()
plt.savefig('./results/distance-comparison.png', dpi=150, bbox_inches='tight')
plt.show()
```
## 🔬 Data Handling
### Using Small Datasets
- Prefer synthetic or sampled data
- Load only what you need
- Document external data sources (don't commit large files)
```python
# Good: Use small sample
df_sample = df.sample(n=1000, random_state=42)
# Good: Generate synthetic data
synthetic_data = generate_spiral_points(n=500)
# Bad: Load entire 10GB dataset
# df_full = pd.read_csv('huge_dataset.csv') # Don't do this!
```
### External Data References
```python
# Reference external data (don't include in repo)
DATA_SOURCE = {
'name': 'SIG Embeddings Dataset v2',
'location': 's3://blackroad-research/sig-embeddings/',
'size': '10GB',
'format': 'parquet',
'description': 'Agent embeddings in spiral coordinates'
}
print(f"This notebook uses: {DATA_SOURCE['name']}")
print(f"Access at: {DATA_SOURCE['location']}")
```
## 📊 Output Management
### Clearing Outputs
Before committing:
- Clear all cell outputs if they're large
- Keep only small, essential outputs
- Use "Cell > All Output > Clear" before committing
### Saving Results
Save important results to files:
```python
# Save summary statistics
results = {
'mean_distance': float(distances.mean()),
'std_distance': float(distances.std()),
'n_samples': len(distances)
}
import json
with open('./results/summary.json', 'w') as f:
json.dump(results, f, indent=2)
```
## 🔗 Cross-Referencing
Link to related work:
```markdown
## Related Work
This analysis supports the theory in:
- [Spiral Information Geometry Overview](/papers/spiral-information-geometry/sig-overview.md)
- [Experiment: Distance Metrics](/experiments/sig/experiment-2025-11-24-distance/)
See also:
- Notebook: [Factor Tree Visualization](./factor-tree-viz.ipynb)
```
## ✅ Pre-Commit Checklist
Before committing a notebook:
- [ ] Header cell includes purpose, dependencies, and data sources
- [ ] Imports are organized and minimal
- [ ] Configuration is centralized
- [ ] Cells are documented with markdown
- [ ] Large outputs are cleared
- [ ] Visualizations have labels and titles
- [ ] Results are saved to files (not just displayed)
- [ ] No secrets or API keys in code
- [ ] No large datasets committed
- [ ] Cross-references to papers/experiments are included
## 🚫 Common Pitfalls to Avoid
1. **Massive Outputs**: Don't commit notebooks with 100+ MB of outputs
2. **Hardcoded Paths**: Use relative paths or configuration variables
3. **Secrets**: Never include API keys, tokens, or passwords
4. **Large Data**: Don't load or commit huge datasets
5. **Unclear Purpose**: Always explain why the notebook exists
6. **No Documentation**: Code-only cells without context
7. **Irreproducible**: Missing dependencies or setup steps
## 💡 Tips & Best Practices
1. **Restart & Run All**: Test your notebook with "Restart Kernel and Run All Cells" before committing
2. **Use Virtual Environments**: Document your environment setup
3. **Version Key Results**: Save timestamped outputs for important findings
4. **Modularize**: Extract reusable code to `/src` utilities
5. **Link to Papers**: Every notebook should relate to theory
6. **Document Failures**: If an approach didn't work, explain why
7. **Include Next Steps**: End with what should be tried next
## 📚 Resources
- [Jupyter Best Practices](https://jupyter-notebook.readthedocs.io/)
- [PEP 8 Style Guide](https://pep8.org/)
- [NumPy Docstring Guide](https://numpydoc.readthedocs.io/)

121
docs/research-overview.md Normal file
View File

@@ -0,0 +1,121 @@
# Research Overview
Welcome to the **BlackRoad OS Research Lab** 🧪🧠
This repository serves as the R&D hub for BlackRoad OS, containing theory, experiments, and conceptual foundations that inform production implementations.
## 📁 Repository Structure
### Core Directories
- **`/papers`** - Research papers and theoretical foundations
- Text-based research (Markdown, LaTeX)
- Organized by domain (SIG, QLM, agents, PS-SHA∞, etc.)
- Each paper should include: abstract, motivation, definitions, and links to experiments
- **`/notebooks`** - Jupyter notebooks and exploratory scripts
- `sig/` - Spiral Information Geometry explorations
- `qlm/` - Quantum Language Model experiments
- `orchestration/` - Orchestration strategy prototypes
- Each notebook should start with a header cell explaining purpose and dependencies
- **`/experiments`** - Structured experiment folders
- Use format: `domain/experiment-YYYY-MM-DD-name/`
- Each experiment must include:
- `README.md` (what, why, how, results, next steps)
- `config.json` or `config.yaml` (parameters)
- `results/` (summaries, small plots, findings)
- **`/src`** - Reusable research utilities (NOT production code)
- Geometry helpers
- Metrics and evaluation tools
- Model wrappers
- Must be typed and documented
- **`/data`** - Metadata and references only
- NO large datasets in the repo
- Document external data sources in `data/README.md`
- Only small samples or synthetic data
- **`/schemas`** - JSON Schemas for core concepts
- **`/glossary`** - Canonical definitions and symbol tables
- **`/meta`** - Repository-level documentation and conventions
## 🎯 What Belongs Here
**YES:**
- Theoretical papers and conceptual models
- Experimental prototypes and simulations
- Research notebooks with analysis
- Schemas and formal specifications
- Glossaries and reference materials
- Small synthetic datasets
**NO:**
- Production API implementations
- Live services or web UIs
- Large datasets or binaries
- Secrets or API keys
- Messy, unstructured dumps
## 📝 Adding New Research
### Adding a Paper
1. Create a new file in `/papers/{domain}/`
2. Use frontmatter with: title, date, status, tags, version
3. Include: abstract, motivation, definitions, related work
4. Link to supporting experiments and notebooks
5. Add entry to `/index.md`
### Adding an Experiment
1. Create folder: `/experiments/{domain}/experiment-{date}-{name}/`
2. Add `README.md` answering:
- What question are we exploring?
- What setup/data/models were used?
- What metrics did we track?
- What did we learn?
- What should we try next?
3. Include `config.json` with parameters
4. Store results in `results/` subdirectory
5. Reference from related papers
### Adding a Notebook
1. Create in `/notebooks/{domain}/`
2. Start with header cell explaining:
- Purpose
- Dependencies
- Data sources
3. Use small datasets or samples
4. Avoid committing large outputs
5. Reference from experiments or papers
## 🔗 Cross-Repository Relationships
This repository **informs** but does not directly implement:
- **blackroad-os-core** - Types and core contracts inspired by research
- **blackroad-os-operator** - Orchestration strategies tested here
- **blackroad-os-docs** - User-facing documentation of concepts
Use text references, not code dependencies.
## ✅ Pre-Commit Checklist
Before committing:
- [ ] Files are text-based (no huge binaries)
- [ ] New experiments have clear README and config
- [ ] Notebooks have explanatory headers
- [ ] No secrets, API keys, or private data
- [ ] Research code is typed and documented
- [ ] Papers and experiments are cross-linked
## 🧭 Need Help?
- See `/docs/experiment-template.md` for experiment structure
- See `/docs/notebook-style-guide.md` for notebook conventions
- Check `/index.md` for a curated research map
- Review `/glossary/concepts.md` for standard definitions

69
notebooks/README.md Normal file
View File

@@ -0,0 +1,69 @@
# Notebooks
This directory contains Jupyter notebooks and exploratory analysis scripts for BlackRoad OS research.
## Organization
Notebooks are organized by research domain:
- **`sig/`** - Spiral Information Geometry explorations
- Coordinate transforms
- Distance metrics
- Factor tree visualizations
- Embedding analysis
- **`qlm/`** - Quantum Language Model experiments
- Entanglement proxies
- Superposition models
- Interference patterns
- **`orchestration/`** - Orchestration strategy prototypes
- Contradiction resolution
- Agent coordination
- Workflow experiments
## Style Guide
Please follow the [Notebook Style Guide](../docs/notebook-style-guide.md) when creating notebooks.
### Quick Checklist
Every notebook should have:
- [ ] Header cell with purpose, dependencies, and data sources
- [ ] Organized imports
- [ ] Centralized configuration
- [ ] Markdown cells explaining each section
- [ ] Labeled visualizations
- [ ] Small datasets (or references to external data)
- [ ] Cross-references to related papers/experiments
## Creating a New Notebook
1. Choose the appropriate domain subdirectory
2. Name your notebook descriptively: `descriptive-name.ipynb`
3. Copy the header template from the style guide
4. Document your dependencies
5. Link to related papers and experiments
## Data
Notebooks should use:
- Small synthetic datasets
- Sampled data (< 1000 rows typically)
- References to external data sources (documented in code)
Never commit large datasets directly in notebooks.
## Outputs
Clear large outputs before committing:
- Use "Cell > All Output > Clear" for notebooks with heavy outputs
- Save important results to files instead of displaying in cells
- Keep only essential visualizations embedded
## Related Documentation
- [Research Overview](../docs/research-overview.md)
- [Notebook Style Guide](../docs/notebook-style-guide.md)
- [Experiment Template](../docs/experiment-template.md)

View File

@@ -0,0 +1,21 @@
# Orchestration Notebooks
Jupyter notebooks exploring **orchestration strategies** for agent coordination.
## Topics
- Contradiction detection and resolution
- Multi-agent coordination patterns
- Truth aggregation mechanisms
- Workflow optimization
- Agent routing and selection strategies
## Getting Started
See the parent [Notebooks README](../README.md) and [Notebook Style Guide](../../docs/notebook-style-guide.md) for guidelines.
## Related
- Theory: `/papers/agent-architecture/`
- Experiments: `/experiments/contradiction-sim/`
- Lucidia architecture: `/lucidia/architecture.md`

21
notebooks/qlm/README.md Normal file
View File

@@ -0,0 +1,21 @@
# QLM Notebooks
Jupyter notebooks exploring **Quantum Language Model** concepts.
## Topics
- Entanglement proxies for semantic relationships
- Superposition models for multi-state representations
- Interference patterns in language understanding
- Quantum-inspired optimization approaches
- Measurement and collapse analogies
## Getting Started
See the parent [Notebooks README](../README.md) and [Notebook Style Guide](../../docs/notebook-style-guide.md) for guidelines.
## Related
- Theory: `/papers/qlm/` (when available)
- Experiments: `/experiments/qlm/`
- Source code: `/src/models/`

21
notebooks/sig/README.md Normal file
View File

@@ -0,0 +1,21 @@
# SIG Notebooks
Jupyter notebooks exploring **Spiral Information Geometry** concepts.
## Topics
- Coordinate transformations between Cartesian and spiral systems
- Distance and similarity metrics in spiral space
- Factor tree projections and prime decomposition
- Embedding analysis and visualization
- Interference patterns in spiral geometry
## Getting Started
See the parent [Notebooks README](../README.md) and [Notebook Style Guide](../../docs/notebook-style-guide.md) for guidelines.
## Related
- Theory: `/papers/spiral-information-geometry/`
- Experiments: `/experiments/sig/`
- Source code: `/src/geometry/`

213
src/README.md Normal file
View File

@@ -0,0 +1,213 @@
# Research Source Code
This directory contains **reusable research utilities** for BlackRoad OS experiments and analysis.
⚠️ **Important**: This is NOT production code. Code here is for research purposes and must be hardened before moving to production repositories like `blackroad-os-core`.
## Purpose
Provide modular, well-documented utilities that:
- Reduce code duplication across experiments
- Implement research-specific algorithms
- Support notebooks and simulations
- Can inspire production implementations
## Organization
Organize code by functional domain:
```
src/
├── geometry/ # Spiral coordinate systems, transformations
│ ├── spiral.py
│ ├── coordinates.py
│ └── projections.py
├── metrics/ # Custom evaluation metrics
│ ├── distance.py
│ ├── similarity.py
│ └── coherence.py
├── models/ # Lightweight model wrappers
│ ├── qlm.py
│ └── embeddings.py
├── visualization/ # Plotting and rendering utilities
│ ├── spiral_plots.py
│ └── factor_trees.py
└── utils/ # General helpers
├── data.py
└── config.py
```
## Code Standards
### Required
1. **Type Hints**: Use Python type hints everywhere
```python
def spiral_distance(p1: SpiralCoordinate, p2: SpiralCoordinate) -> float:
"""Calculate distance between two points in spiral space."""
pass
```
2. **Documentation**: Every function needs a docstring
```python
def transform_to_spiral(x: np.ndarray, y: np.ndarray) -> SpiralCoordinate:
"""
Convert Cartesian coordinates to spiral coordinates.
Args:
x: X-axis values
y: Y-axis values
Returns:
SpiralCoordinate with (r, theta, tau) components
"""
pass
```
3. **Small & Composable**: Keep functions focused
- Single responsibility principle
- Aim for < 50 lines per function
- Compose complex operations from simple ones
4. **No Secrets**: Never hardcode credentials or environment-specific URLs
```python
# Good
def get_model(api_key: str, endpoint: str):
pass
# Bad
API_KEY = "sk-1234..." # Don't do this!
```
### Recommended
- Use dataclasses or Pydantic models for structured data
- Include examples in docstrings
- Add simple unit tests in a `tests/` subdirectory (optional)
- Use meaningful variable names
## Example: Good Research Code
```python
from dataclasses import dataclass
from typing import Tuple
import numpy as np
@dataclass
class SpiralCoordinate:
"""Point in spiral information geometry space."""
r: float # Radial distance from origin
theta: float # Angular position (radians)
tau: float # Spiral time parameter
def to_cartesian(self) -> Tuple[float, float]:
"""Convert to Cartesian coordinates (x, y)."""
x = self.r * np.cos(self.theta)
y = self.r * np.sin(self.theta)
return (x, y)
def spiral_distance(p1: SpiralCoordinate, p2: SpiralCoordinate) -> float:
"""
Calculate geodesic distance in spiral space.
Uses the natural spiral metric accounting for both
radial separation and angular unwinding.
Args:
p1: First point
p2: Second point
Returns:
Distance in spiral space units
Example:
>>> p1 = SpiralCoordinate(r=1.0, theta=0.0, tau=0.0)
>>> p2 = SpiralCoordinate(r=2.0, theta=np.pi/2, tau=1.0)
>>> dist = spiral_distance(p1, p2)
"""
dr = p2.r - p1.r
dtheta = np.abs(p2.theta - p1.theta)
dtau = p2.tau - p1.tau
# Spiral metric (simplified)
return np.sqrt(dr**2 + (p1.r * dtheta)**2 + dtau**2)
```
## Using Research Code
### In Notebooks
```python
import sys
sys.path.append('../src')
from geometry.spiral import SpiralCoordinate, spiral_distance
from metrics.similarity import cosine_similarity
```
### In Experiments
```python
import sys
from pathlib import Path
# Add src to path
src_path = Path(__file__).parent.parent.parent / 'src'
sys.path.append(str(src_path))
from geometry.spiral import SpiralCoordinate
```
## Dependencies
Keep dependencies minimal and document them:
- Create `requirements.txt` if Python packages are needed
- Pin versions for reproducibility
- Only include what's actually used
Example `requirements.txt`:
```
numpy>=1.24.0,<2.0.0
pandas>=2.0.0,<3.0.0
scipy>=1.11.0
```
## Migration to Production
When research code is ready for production:
1. **Review & Harden**: Add comprehensive error handling
2. **Test**: Add full test coverage
3. **Optimize**: Profile and optimize if needed
4. **Document**: Write production-grade documentation
5. **Port**: Move to appropriate production repo (e.g., `blackroad-os-core`)
6. **Reference**: Update research code with link to production version
Example:
```python
def spiral_distance(p1: SpiralCoordinate, p2: SpiralCoordinate) -> float:
"""
Calculate spiral distance.
Note: Production version available in blackroad-os-core:
https://github.com/BlackRoad-OS/blackroad-os-core/blob/main/src/geometry/spiral.ts
This research version is kept for reference and experimentation.
"""
pass
```
## What NOT to Include
- Large model weights or checkpoints
- API keys or secrets
- Production service implementations
- Complex build systems
- Heavy dependencies (avoid PyTorch/TensorFlow unless essential)
## Related Documentation
- [Research Overview](../docs/research-overview.md)
- [Experiment Template](../docs/experiment-template.md)
- [Notebook Style Guide](../docs/notebook-style-guide.md)