Files
blackroad-operating-system/infra/README.md
Claude abdbc764e6 Establish BlackRoad OS infrastructure control plane
Add comprehensive infrastructure management system to centralize all service
definitions, deployment configurations, and operational tooling.

## New Infrastructure Components

### 1. Service Manifest (infra/blackroad-manifest.yml)
- Complete catalog of all active and planned services
- Deployment configuration for each service
- Environment variable definitions
- Domain mappings and routing
- Database and cache dependencies
- Health check endpoints
- CI/CD integration specifications

### 2. Operations CLI (scripts/br_ops.py)
- Command-line tool for managing all BlackRoad services
- Commands: list, env, repo, open, status, health
- Reads from service manifest for unified operations
- Colored terminal output for better readability

### 3. Service Analysis Documents (infra/analysis/)
- Detailed technical analysis for each service
- Active services:
  - blackroad-backend.md (FastAPI backend)
  - postgres.md (PostgreSQL database)
  - redis.md (Redis cache)
  - docs-site.md (MkDocs documentation)
- Planned services:
  - blackroad-api.md (API gateway - Phase 2)
  - prism-console.md (Admin console - Phase 2)

### 4. Infrastructure Templates (infra/templates/)
- railway.toml.template - Railway deployment config
- railway.json.template - Alternative Railway config
- Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile
- github-workflow-railway-deploy.yml.template - CI/CD workflow
- .env.example.template - Comprehensive env var template

### 5. Documentation (infra/README.md)
- Complete guide to infrastructure control plane
- Usage instructions for ops CLI
- Service manifest documentation
- Deployment procedures
- Troubleshooting guide
- Phase 2 migration plan

## Architecture

This establishes BlackRoad-Operating-System as the canonical control plane
for all BlackRoad services, both current and planned:

**Phase 1 (Active)**:
- blackroad-backend (FastAPI + static UI)
- postgres (Railway managed)
- redis (Railway managed)
- docs-site (GitHub Pages)

**Phase 2 (Planned)**:
- blackroad-api (API gateway)
- blackroad-prism-console (Admin UI)
- blackroad-agents (Orchestration)
- blackroad-web (Marketing site)

**Phase 3 (Future)**:
- lucidia (AI orchestration)
- Additional microservices

## Usage

# List all services
python scripts/br_ops.py list

# Show environment variables
python scripts/br_ops.py env blackroad-backend

# Show repository info
python scripts/br_ops.py repo blackroad-backend

# Show service URL
python scripts/br_ops.py open blackroad-backend prod

# Show overall status
python scripts/br_ops.py status

# Show health checks
python scripts/br_ops.py health blackroad-backend

## Benefits

1. **Single Source of Truth**: All service configuration in one manifest
2. **Unified Operations**: One CLI for all services
3. **Documentation**: Comprehensive per-service analysis
4. **Templates**: Reusable infrastructure patterns
5. **Migration Ready**: Clear path to Phase 2 microservices

## References

- MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture
- ORG_STRUCTURE.md - Repository strategy
- PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state

Implemented by: Atlas (AI Infrastructure Orchestrator)
Date: 2025-11-19
2025-11-19 21:04:14 +00:00

632 lines
16 KiB
Markdown

# BlackRoad OS Infrastructure Control Plane
> **Last Updated**: 2025-11-19
> **Maintained By**: Atlas (AI Infrastructure Orchestrator) + Alexa Louise Amundson
> **Version**: 1.0
---
## 📋 Overview
This directory contains the **infrastructure control plane** for all BlackRoad OS services. It serves as the single source of truth for service definitions, deployment configurations, and operational tooling.
### Key Components
1. **`blackroad-manifest.yml`** - Complete service catalog with configuration
2. **`analysis/`** - Per-service technical analysis documents
3. **`templates/`** - Reusable infrastructure templates
4. **`../scripts/br_ops.py`** - Command-line operations tool
5. **`cloudflare/`** - DNS and CDN configuration
6. **`env/`** - Environment variable mapping
---
## 🗂️ Directory Structure
```
infra/
├── README.md # This file
├── blackroad-manifest.yml # Service manifest (SSOT)
├── analysis/ # Service analyses
│ ├── blackroad-backend.md # Active: Main backend
│ ├── postgres.md # Active: Database
│ ├── redis.md # Active: Cache
│ ├── docs-site.md # Active: Documentation
│ ├── blackroad-api.md # Planned: API gateway
│ └── prism-console.md # Planned: Admin console
├── templates/ # Infrastructure templates
│ ├── railway.toml.template # Railway config
│ ├── railway.json.template # Railway JSON config
│ ├── Dockerfile.fastapi.template # FastAPI Dockerfile
│ ├── github-workflow-railway-deploy.yml.template
│ └── .env.example.template # Environment variables
├── cloudflare/ # DNS & CDN
│ ├── records.yaml # DNS records
│ ├── CLOUDFLARE_DNS_BLUEPRINT.md # DNS setup guide
│ └── migrate_to_cloudflare.md # Migration guide
└── env/ # Environment mapping
└── ENVIRONMENT_MAP.md # Cross-platform env vars
```
---
## 🚀 Quick Start
### Using the Ops CLI
The `br_ops.py` CLI tool provides unified operations across all services:
```bash
# List all services
python scripts/br_ops.py list
# Show environment variables for a service
python scripts/br_ops.py env blackroad-backend
# Show repository information
python scripts/br_ops.py repo blackroad-backend
# Show service URL
python scripts/br_ops.py open blackroad-backend prod
# Show overall status
python scripts/br_ops.py status
# Show health check commands
python scripts/br_ops.py health blackroad-backend
# Show help
python scripts/br_ops.py help
```
### Example Output
```
$ python scripts/br_ops.py list
BLACKROAD OS SERVICES
================================================================================
🟢 ACTIVE SERVICES
--------------------------------------------------------------------------------
blackroad-backend
Type: backend
Repo: blackboxprogramming/BlackRoad-Operating-System
Domain: blackroad.systems
Project: blackroad-core
Phase: 1
postgres
Type: database
Domain: N/A
Project: blackroad-core
Phase: 1
redis
Type: cache
Domain: N/A
Project: blackroad-core
Phase: 1
📋 PLANNED SERVICES (Future)
--------------------------------------------------------------------------------
public-api
Type: api-gateway
Repo: blackboxprogramming/blackroad-api
Target Date: 2026-Q2
Project: blackroad-api
Total Services: 10
Active: 4
Development: 1
Planned: 5
```
---
## 📖 Service Manifest
### What is `blackroad-manifest.yml`?
The manifest is a YAML file that defines:
- All active and planned services
- Deployment configuration
- Environment variables
- Domain mappings
- Dependencies (databases, caches, etc.)
- Health check endpoints
- CI/CD integration
### Manifest Structure
```yaml
version: "1.0"
workspace: "BlackRoad OS, Inc."
# Deployment state
deployment_state:
phase: "Phase 1 - Monolith"
strategy: "Monorepo with consolidation"
active_services: 3
planned_services: 11
# Domain configuration
domains:
primary: "blackroad.systems"
api: "api.blackroad.systems"
prism: "prism.blackroad.systems"
docs: "docs.blackroad.systems"
# Active projects
projects:
blackroad-core:
description: "Core backend API + static UI"
services:
blackroad-backend:
repo: "blackboxprogramming/BlackRoad-Operating-System"
kind: "backend"
language: "python"
framework: "FastAPI 0.104.1"
# ... detailed configuration ...
# Planned projects
planned_projects:
blackroad-api:
description: "Public API gateway"
status: "planned"
phase: 2
# ... configuration ...
```
### When to Update the Manifest
Update `blackroad-manifest.yml` when:
- ✅ Adding a new service
- ✅ Changing environment variables
- ✅ Updating domain routing
- ✅ Modifying deployment configuration
- ✅ Changing service dependencies
- ✅ Adding or removing health endpoints
---
## 📊 Service Analysis Documents
Each service has a detailed analysis document in `analysis/`:
### Active Services
- **`blackroad-backend.md`** - Main FastAPI backend (33+ routers)
- **`postgres.md`** - PostgreSQL database (Railway managed)
- **`redis.md`** - Redis cache (Railway managed)
- **`docs-site.md`** - MkDocs documentation (GitHub Pages)
### Planned Services
- **`blackroad-api.md`** - Future API gateway (Phase 2)
- **`prism-console.md`** - Admin console UI (Phase 2)
### Analysis Document Contents
Each analysis includes:
- Overview and purpose
- Technology stack
- Current endpoints/features
- Infrastructure configuration
- Database schema (if applicable)
- Security configuration
- Environment variables
- Monitoring & observability
- Performance benchmarks
- Testing strategy
- Development workflow
- Deployment process
- Rollback procedures
- Future enhancements
---
## 🛠️ Infrastructure Templates
### Available Templates
#### 1. `railway.toml.template`
Railway deployment configuration.
**Usage**:
```bash
cp infra/templates/railway.toml.template ./railway.toml
# Edit and customize for your service
```
#### 2. `railway.json.template`
Alternative Railway configuration (JSON format).
**Usage**:
```bash
cp infra/templates/railway.json.template ./railway.json
# Edit and customize
```
#### 3. `Dockerfile.fastapi.template`
Multi-stage Dockerfile optimized for FastAPI services.
**Features**:
- Multi-stage build (smaller image)
- Non-root user (security)
- Health check integrated
- Optimized layer caching
**Usage**:
```bash
cp infra/templates/Dockerfile.fastapi.template ./Dockerfile
# Customize for your service
```
#### 4. `github-workflow-railway-deploy.yml.template`
GitHub Actions workflow for automated Railway deployment.
**Features**:
- Branch-based environment routing
- Health check verification
- Failure notifications
- Manual workflow dispatch
**Usage**:
```bash
cp infra/templates/github-workflow-railway-deploy.yml.template \
.github/workflows/railway-deploy.yml
# Edit secrets and configuration
```
#### 5. `.env.example.template`
Comprehensive environment variable template.
**Usage**:
```bash
cp infra/templates/.env.example.template ./.env.example
# Document your service's required env vars
```
---
## 🌍 Domain & DNS Configuration
### Current Domain Mapping
| Domain | Points To | Service | Status |
|--------|-----------|---------|--------|
| `blackroad.systems` | Railway backend | blackroad-backend | ✅ Active |
| `docs.blackroad.systems` | GitHub Pages | docs-site | ✅ Active |
| `api.blackroad.systems` | TBD | public-api | 📋 Planned (Phase 2) |
| `prism.blackroad.systems` | TBD | prism-console-web | 📋 Planned (Phase 2) |
| `console.blackroad.systems` | TBD | prism-console-web | 📋 Planned (Phase 2) |
| `agents.blackroad.systems` | TBD | agents-api | 📋 Planned (Phase 2) |
| `lucidia.earth` | TBD | lucidia-api | 📋 Planned (Phase 3) |
### DNS Management
DNS is managed via **Cloudflare**. See:
- `infra/cloudflare/CLOUDFLARE_DNS_BLUEPRINT.md` - Complete DNS setup
- `infra/cloudflare/records.yaml` - Current DNS records
---
## 🔐 Environment Variables
### Environment Variable Management
Environment variables are:
1. **Documented** in `infra/env/ENVIRONMENT_MAP.md`
2. **Defined** in manifest (`blackroad-manifest.yml`)
3. **Templated** in `.env.example.template`
4. **Set** in Railway dashboard (production)
5. **Accessed** via `br_ops.py env <service>`
### Viewing Required Variables
```bash
# Show all required env vars for a service
python scripts/br_ops.py env blackroad-backend
```
Output:
```
🔴 REQUIRED (Must Set)
--------------------------------------------------------------------------------
DATABASE_URL
Description: PostgreSQL connection string
Source: ${{Postgres.DATABASE_URL}}
Example: postgresql+asyncpg://user:pass@host:5432/blackroad
SECRET_KEY
Description: JWT signing key
Generate: openssl rand -hex 32
Secret: Yes (keep secure!)
...
```
### Setting Variables in Railway
1. Railway Dashboard → Project → Service
2. Variables tab
3. Add each required variable
4. Use `${{Postgres.DATABASE_URL}}` syntax for references
---
## 📦 Deployment
### Current Deployment Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ PRODUCTION STACK (Phase 1) │
└─────────────────────────────────────────────────────────────┘
Cloudflare CDN
Railway Backend (blackroad-backend)
├── FastAPI Application
├── Postgres Database
└── Redis Cache
GitHub Pages
└── Documentation (docs-site)
```
### Deployment Process
#### Automatic (Recommended)
1. Create feature branch
2. Make changes
3. Open PR → CI runs
4. Merge to main → Auto-deploy to Railway
5. Monitor Railway logs
#### Manual (Emergency)
```bash
# Using Railway CLI
railway login
railway link <PROJECT_ID>
railway up --service blackroad-backend
railway logs --tail 100
# Verify deployment
curl https://blackroad.systems/health
```
### Health Checks
```bash
# View health check commands
python scripts/br_ops.py health blackroad-backend
# Example output:
# curl https://blackroad.systems/health
# curl https://blackroad.systems/api/health/summary
# curl https://blackroad.systems/api/system/version
```
---
## 🔄 Phase 2 Migration Plan
### Timeline
- **Q1 2026**: Create `blackroad.io` marketing site
- **Q1 2026**: Extract Prism Console to standalone service
- **Q2 2026**: Extract API gateway (`blackroad-api`)
- **Q2 2026**: Deploy agents runtime (`blackroad-operator`)
### Migration Strategy
1. **Keep monolith running** during extraction
2. **Blue-green deployment** with DNS switching
3. **Monitor error rates** for 24 hours before cutover
4. **Document rollback procedures**
### Extraction Tools
```bash
# Extract API gateway
git subtree split --prefix=backend/app/routers --branch=api-split
cd ../blackroad-api
git pull ../BlackRoad-Operating-System api-split
# Deploy new service
railway up --service blackroad-api
# Update DNS
# Cloudflare: api.blackroad.systems → new service
```
---
## 📚 Documentation References
### Primary Docs
- **MASTER_ORCHESTRATION_PLAN.md** - Complete 7-layer architecture
- **ORG_STRUCTURE.md** - Repository organization strategy
- **PRODUCTION_STACK_AUDIT_2025-11-18.md** - Current production state
- **BLACKROAD_OS_BIG_KAHUNA_VISION.md** - Long-term roadmap
- **CLAUDE.md** - AI assistant guide
### Infrastructure Docs
- **infra/cloudflare/CLOUDFLARE_DNS_BLUEPRINT.md** - DNS configuration
- **infra/env/ENVIRONMENT_MAP.md** - Environment variables
- **DEPLOYMENT_NOTES.md** - Production deployment guide
---
## 🛡️ Security Best Practices
### 1. Environment Variables
- ✅ Never commit `.env` files
- ✅ Use Railway secrets for sensitive data
- ✅ Generate secure keys: `openssl rand -hex 32`
- ✅ Rotate secrets quarterly
### 2. Docker Security
- ✅ Use non-root user in containers
- ✅ Multi-stage builds to reduce attack surface
- ✅ Scan images for vulnerabilities
- ✅ Pin dependency versions
### 3. API Security
- ✅ JWT authentication with short expiry
- ✅ Rate limiting on public endpoints
- ✅ Input validation with Pydantic
- ✅ CORS properly configured
### 4. Database Security
- ✅ Use connection pooling
- ✅ Encrypted connections (SSL/TLS)
- ✅ Regular backups (Railway managed)
- ✅ Access control via environment
---
## 🔧 Troubleshooting
### Common Issues
#### 1. Deployment Fails
```bash
# Check Railway logs
railway logs --service blackroad-backend --tail 100
# Verify environment variables
railway variables list
# Test locally
docker build -t test .
docker run -p 8000:8000 test
curl http://localhost:8000/health
```
#### 2. Service Won't Start
- ✅ Check `DATABASE_URL` and `REDIS_URL` are set
- ✅ Verify `SECRET_KEY` is generated
- ✅ Check port configuration (`$PORT`)
- ✅ Review startup logs for errors
#### 3. Health Check Failing
```bash
# Check endpoint directly
curl -v https://blackroad.systems/health
# Verify Railway health check settings
# Dashboard → Service → Settings → Health Check
```
#### 4. DNS Issues
- ✅ Verify CNAME in Cloudflare dashboard
- ✅ Check proxy status (orange cloud)
- ✅ Wait for DNS propagation (up to 24h)
- ✅ Test with `dig blackroad.systems`
---
## 📞 Getting Help
### Ops CLI Help
```bash
python scripts/br_ops.py help
```
### AI Assistants
- **Atlas** - Infrastructure orchestration
- **Cece** - Engineering & deployment
### Documentation
- Read the service analysis: `infra/analysis/<service>.md`
- Check the manifest: `infra/blackroad-manifest.yml`
- Review deployment docs: `DEPLOYMENT_NOTES.md`
### Support Channels
- **GitHub Issues**: Technical problems
- **Documentation**: `docs.blackroad.systems`
- **Operator**: Alexa Louise Amundson (Cadillac)
---
## 🎯 Roadmap
### Phase 1 (Current)
- ✅ Monolith backend deployed
- ✅ Postgres + Redis managed by Railway
- ✅ Documentation on GitHub Pages
- ✅ Control plane infrastructure established
### Phase 2 (2026 Q1-Q2)
- 📋 Extract API gateway
- 📋 Deploy Prism Console standalone
- 📋 Add agents runtime
- 📋 Create marketing site
### Phase 3 (2026 Q3-Q4)
- 📋 Microservices architecture
- 📋 Multi-region deployment
- 📋 Service mesh (Istio/Linkerd)
- 📋 Advanced observability
---
## 📝 Contributing
### Adding a New Service
1. **Update manifest**:
```bash
vim infra/blackroad-manifest.yml
# Add service definition under projects or planned_projects
```
2. **Create analysis document**:
```bash
cp infra/analysis/blackroad-backend.md infra/analysis/new-service.md
# Fill in service details
```
3. **Test ops CLI**:
```bash
python scripts/br_ops.py list
python scripts/br_ops.py env new-service
```
4. **Commit changes**:
```bash
git add infra/
git commit -m "Add new-service to infrastructure manifest"
```
### Updating Existing Service
1. Update `infra/blackroad-manifest.yml`
2. Update service analysis in `infra/analysis/`
3. Update any affected templates
4. Test with ops CLI
5. Commit and push
---
## 📄 License
This infrastructure documentation is part of the BlackRoad Operating System project.
---
*Control Plane Established: 2025-11-19*
*Maintained By: Atlas (AI Infrastructure Orchestrator)*
*Operator: Alexa Louise Amundson (Cadillac)*
*"Where AI meets the open road." 🛣️*