Establish BlackRoad OS infrastructure control plane

Add comprehensive infrastructure management system to centralize all service
definitions, deployment configurations, and operational tooling.

## New Infrastructure Components

### 1. Service Manifest (infra/blackroad-manifest.yml)
- Complete catalog of all active and planned services
- Deployment configuration for each service
- Environment variable definitions
- Domain mappings and routing
- Database and cache dependencies
- Health check endpoints
- CI/CD integration specifications

### 2. Operations CLI (scripts/br_ops.py)
- Command-line tool for managing all BlackRoad services
- Commands: list, env, repo, open, status, health
- Reads from service manifest for unified operations
- Colored terminal output for better readability

### 3. Service Analysis Documents (infra/analysis/)
- Detailed technical analysis for each service
- Active services:
  - blackroad-backend.md (FastAPI backend)
  - postgres.md (PostgreSQL database)
  - redis.md (Redis cache)
  - docs-site.md (MkDocs documentation)
- Planned services:
  - blackroad-api.md (API gateway - Phase 2)
  - prism-console.md (Admin console - Phase 2)

### 4. Infrastructure Templates (infra/templates/)
- railway.toml.template - Railway deployment config
- railway.json.template - Alternative Railway config
- Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile
- github-workflow-railway-deploy.yml.template - CI/CD workflow
- .env.example.template - Comprehensive env var template

### 5. Documentation (infra/README.md)
- Complete guide to infrastructure control plane
- Usage instructions for ops CLI
- Service manifest documentation
- Deployment procedures
- Troubleshooting guide
- Phase 2 migration plan

## Architecture

This establishes BlackRoad-Operating-System as the canonical control plane
for all BlackRoad services, both current and planned:

**Phase 1 (Active)**:
- blackroad-backend (FastAPI + static UI)
- postgres (Railway managed)
- redis (Railway managed)
- docs-site (GitHub Pages)

**Phase 2 (Planned)**:
- blackroad-api (API gateway)
- blackroad-prism-console (Admin UI)
- blackroad-agents (Orchestration)
- blackroad-web (Marketing site)

**Phase 3 (Future)**:
- lucidia (AI orchestration)
- Additional microservices

## Usage

# List all services
python scripts/br_ops.py list

# Show environment variables
python scripts/br_ops.py env blackroad-backend

# Show repository info
python scripts/br_ops.py repo blackroad-backend

# Show service URL
python scripts/br_ops.py open blackroad-backend prod

# Show overall status
python scripts/br_ops.py status

# Show health checks
python scripts/br_ops.py health blackroad-backend

## Benefits

1. **Single Source of Truth**: All service configuration in one manifest
2. **Unified Operations**: One CLI for all services
3. **Documentation**: Comprehensive per-service analysis
4. **Templates**: Reusable infrastructure patterns
5. **Migration Ready**: Clear path to Phase 2 microservices

## References

- MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture
- ORG_STRUCTURE.md - Repository strategy
- PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state

Implemented by: Atlas (AI Infrastructure Orchestrator)
Date: 2025-11-19
This commit is contained in:
Claude
2025-11-19 21:04:14 +00:00
parent 2610c3a07f
commit abdbc764e6
14 changed files with 3090 additions and 0 deletions

631
infra/README.md Normal file
View File

@@ -0,0 +1,631 @@
# BlackRoad OS Infrastructure Control Plane
> **Last Updated**: 2025-11-19
> **Maintained By**: Atlas (AI Infrastructure Orchestrator) + Alexa Louise Amundson
> **Version**: 1.0
---
## 📋 Overview
This directory contains the **infrastructure control plane** for all BlackRoad OS services. It serves as the single source of truth for service definitions, deployment configurations, and operational tooling.
### Key Components
1. **`blackroad-manifest.yml`** - Complete service catalog with configuration
2. **`analysis/`** - Per-service technical analysis documents
3. **`templates/`** - Reusable infrastructure templates
4. **`../scripts/br_ops.py`** - Command-line operations tool
5. **`cloudflare/`** - DNS and CDN configuration
6. **`env/`** - Environment variable mapping
---
## 🗂️ Directory Structure
```
infra/
├── README.md # This file
├── blackroad-manifest.yml # Service manifest (SSOT)
├── analysis/ # Service analyses
│ ├── blackroad-backend.md # Active: Main backend
│ ├── postgres.md # Active: Database
│ ├── redis.md # Active: Cache
│ ├── docs-site.md # Active: Documentation
│ ├── blackroad-api.md # Planned: API gateway
│ └── prism-console.md # Planned: Admin console
├── templates/ # Infrastructure templates
│ ├── railway.toml.template # Railway config
│ ├── railway.json.template # Railway JSON config
│ ├── Dockerfile.fastapi.template # FastAPI Dockerfile
│ ├── github-workflow-railway-deploy.yml.template
│ └── .env.example.template # Environment variables
├── cloudflare/ # DNS & CDN
│ ├── records.yaml # DNS records
│ ├── CLOUDFLARE_DNS_BLUEPRINT.md # DNS setup guide
│ └── migrate_to_cloudflare.md # Migration guide
└── env/ # Environment mapping
└── ENVIRONMENT_MAP.md # Cross-platform env vars
```
---
## 🚀 Quick Start
### Using the Ops CLI
The `br_ops.py` CLI tool provides unified operations across all services:
```bash
# List all services
python scripts/br_ops.py list
# Show environment variables for a service
python scripts/br_ops.py env blackroad-backend
# Show repository information
python scripts/br_ops.py repo blackroad-backend
# Show service URL
python scripts/br_ops.py open blackroad-backend prod
# Show overall status
python scripts/br_ops.py status
# Show health check commands
python scripts/br_ops.py health blackroad-backend
# Show help
python scripts/br_ops.py help
```
### Example Output
```
$ python scripts/br_ops.py list
BLACKROAD OS SERVICES
================================================================================
🟢 ACTIVE SERVICES
--------------------------------------------------------------------------------
blackroad-backend
Type: backend
Repo: blackboxprogramming/BlackRoad-Operating-System
Domain: blackroad.systems
Project: blackroad-core
Phase: 1
postgres
Type: database
Domain: N/A
Project: blackroad-core
Phase: 1
redis
Type: cache
Domain: N/A
Project: blackroad-core
Phase: 1
📋 PLANNED SERVICES (Future)
--------------------------------------------------------------------------------
public-api
Type: api-gateway
Repo: blackboxprogramming/blackroad-api
Target Date: 2026-Q2
Project: blackroad-api
Total Services: 10
Active: 4
Development: 1
Planned: 5
```
---
## 📖 Service Manifest
### What is `blackroad-manifest.yml`?
The manifest is a YAML file that defines:
- All active and planned services
- Deployment configuration
- Environment variables
- Domain mappings
- Dependencies (databases, caches, etc.)
- Health check endpoints
- CI/CD integration
### Manifest Structure
```yaml
version: "1.0"
workspace: "BlackRoad OS, Inc."
# Deployment state
deployment_state:
phase: "Phase 1 - Monolith"
strategy: "Monorepo with consolidation"
active_services: 3
planned_services: 11
# Domain configuration
domains:
primary: "blackroad.systems"
api: "api.blackroad.systems"
prism: "prism.blackroad.systems"
docs: "docs.blackroad.systems"
# Active projects
projects:
blackroad-core:
description: "Core backend API + static UI"
services:
blackroad-backend:
repo: "blackboxprogramming/BlackRoad-Operating-System"
kind: "backend"
language: "python"
framework: "FastAPI 0.104.1"
# ... detailed configuration ...
# Planned projects
planned_projects:
blackroad-api:
description: "Public API gateway"
status: "planned"
phase: 2
# ... configuration ...
```
### When to Update the Manifest
Update `blackroad-manifest.yml` when:
- ✅ Adding a new service
- ✅ Changing environment variables
- ✅ Updating domain routing
- ✅ Modifying deployment configuration
- ✅ Changing service dependencies
- ✅ Adding or removing health endpoints
---
## 📊 Service Analysis Documents
Each service has a detailed analysis document in `analysis/`:
### Active Services
- **`blackroad-backend.md`** - Main FastAPI backend (33+ routers)
- **`postgres.md`** - PostgreSQL database (Railway managed)
- **`redis.md`** - Redis cache (Railway managed)
- **`docs-site.md`** - MkDocs documentation (GitHub Pages)
### Planned Services
- **`blackroad-api.md`** - Future API gateway (Phase 2)
- **`prism-console.md`** - Admin console UI (Phase 2)
### Analysis Document Contents
Each analysis includes:
- Overview and purpose
- Technology stack
- Current endpoints/features
- Infrastructure configuration
- Database schema (if applicable)
- Security configuration
- Environment variables
- Monitoring & observability
- Performance benchmarks
- Testing strategy
- Development workflow
- Deployment process
- Rollback procedures
- Future enhancements
---
## 🛠️ Infrastructure Templates
### Available Templates
#### 1. `railway.toml.template`
Railway deployment configuration.
**Usage**:
```bash
cp infra/templates/railway.toml.template ./railway.toml
# Edit and customize for your service
```
#### 2. `railway.json.template`
Alternative Railway configuration (JSON format).
**Usage**:
```bash
cp infra/templates/railway.json.template ./railway.json
# Edit and customize
```
#### 3. `Dockerfile.fastapi.template`
Multi-stage Dockerfile optimized for FastAPI services.
**Features**:
- Multi-stage build (smaller image)
- Non-root user (security)
- Health check integrated
- Optimized layer caching
**Usage**:
```bash
cp infra/templates/Dockerfile.fastapi.template ./Dockerfile
# Customize for your service
```
#### 4. `github-workflow-railway-deploy.yml.template`
GitHub Actions workflow for automated Railway deployment.
**Features**:
- Branch-based environment routing
- Health check verification
- Failure notifications
- Manual workflow dispatch
**Usage**:
```bash
cp infra/templates/github-workflow-railway-deploy.yml.template \
.github/workflows/railway-deploy.yml
# Edit secrets and configuration
```
#### 5. `.env.example.template`
Comprehensive environment variable template.
**Usage**:
```bash
cp infra/templates/.env.example.template ./.env.example
# Document your service's required env vars
```
---
## 🌍 Domain & DNS Configuration
### Current Domain Mapping
| Domain | Points To | Service | Status |
|--------|-----------|---------|--------|
| `blackroad.systems` | Railway backend | blackroad-backend | ✅ Active |
| `docs.blackroad.systems` | GitHub Pages | docs-site | ✅ Active |
| `api.blackroad.systems` | TBD | public-api | 📋 Planned (Phase 2) |
| `prism.blackroad.systems` | TBD | prism-console-web | 📋 Planned (Phase 2) |
| `console.blackroad.systems` | TBD | prism-console-web | 📋 Planned (Phase 2) |
| `agents.blackroad.systems` | TBD | agents-api | 📋 Planned (Phase 2) |
| `lucidia.earth` | TBD | lucidia-api | 📋 Planned (Phase 3) |
### DNS Management
DNS is managed via **Cloudflare**. See:
- `infra/cloudflare/CLOUDFLARE_DNS_BLUEPRINT.md` - Complete DNS setup
- `infra/cloudflare/records.yaml` - Current DNS records
---
## 🔐 Environment Variables
### Environment Variable Management
Environment variables are:
1. **Documented** in `infra/env/ENVIRONMENT_MAP.md`
2. **Defined** in manifest (`blackroad-manifest.yml`)
3. **Templated** in `.env.example.template`
4. **Set** in Railway dashboard (production)
5. **Accessed** via `br_ops.py env <service>`
### Viewing Required Variables
```bash
# Show all required env vars for a service
python scripts/br_ops.py env blackroad-backend
```
Output:
```
🔴 REQUIRED (Must Set)
--------------------------------------------------------------------------------
DATABASE_URL
Description: PostgreSQL connection string
Source: ${{Postgres.DATABASE_URL}}
Example: postgresql+asyncpg://user:pass@host:5432/blackroad
SECRET_KEY
Description: JWT signing key
Generate: openssl rand -hex 32
Secret: Yes (keep secure!)
...
```
### Setting Variables in Railway
1. Railway Dashboard → Project → Service
2. Variables tab
3. Add each required variable
4. Use `${{Postgres.DATABASE_URL}}` syntax for references
---
## 📦 Deployment
### Current Deployment Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ PRODUCTION STACK (Phase 1) │
└─────────────────────────────────────────────────────────────┘
Cloudflare CDN
Railway Backend (blackroad-backend)
├── FastAPI Application
├── Postgres Database
└── Redis Cache
GitHub Pages
└── Documentation (docs-site)
```
### Deployment Process
#### Automatic (Recommended)
1. Create feature branch
2. Make changes
3. Open PR → CI runs
4. Merge to main → Auto-deploy to Railway
5. Monitor Railway logs
#### Manual (Emergency)
```bash
# Using Railway CLI
railway login
railway link <PROJECT_ID>
railway up --service blackroad-backend
railway logs --tail 100
# Verify deployment
curl https://blackroad.systems/health
```
### Health Checks
```bash
# View health check commands
python scripts/br_ops.py health blackroad-backend
# Example output:
# curl https://blackroad.systems/health
# curl https://blackroad.systems/api/health/summary
# curl https://blackroad.systems/api/system/version
```
---
## 🔄 Phase 2 Migration Plan
### Timeline
- **Q1 2026**: Create `blackroad.io` marketing site
- **Q1 2026**: Extract Prism Console to standalone service
- **Q2 2026**: Extract API gateway (`blackroad-api`)
- **Q2 2026**: Deploy agents runtime (`blackroad-operator`)
### Migration Strategy
1. **Keep monolith running** during extraction
2. **Blue-green deployment** with DNS switching
3. **Monitor error rates** for 24 hours before cutover
4. **Document rollback procedures**
### Extraction Tools
```bash
# Extract API gateway
git subtree split --prefix=backend/app/routers --branch=api-split
cd ../blackroad-api
git pull ../BlackRoad-Operating-System api-split
# Deploy new service
railway up --service blackroad-api
# Update DNS
# Cloudflare: api.blackroad.systems → new service
```
---
## 📚 Documentation References
### Primary Docs
- **MASTER_ORCHESTRATION_PLAN.md** - Complete 7-layer architecture
- **ORG_STRUCTURE.md** - Repository organization strategy
- **PRODUCTION_STACK_AUDIT_2025-11-18.md** - Current production state
- **BLACKROAD_OS_BIG_KAHUNA_VISION.md** - Long-term roadmap
- **CLAUDE.md** - AI assistant guide
### Infrastructure Docs
- **infra/cloudflare/CLOUDFLARE_DNS_BLUEPRINT.md** - DNS configuration
- **infra/env/ENVIRONMENT_MAP.md** - Environment variables
- **DEPLOYMENT_NOTES.md** - Production deployment guide
---
## 🛡️ Security Best Practices
### 1. Environment Variables
- ✅ Never commit `.env` files
- ✅ Use Railway secrets for sensitive data
- ✅ Generate secure keys: `openssl rand -hex 32`
- ✅ Rotate secrets quarterly
### 2. Docker Security
- ✅ Use non-root user in containers
- ✅ Multi-stage builds to reduce attack surface
- ✅ Scan images for vulnerabilities
- ✅ Pin dependency versions
### 3. API Security
- ✅ JWT authentication with short expiry
- ✅ Rate limiting on public endpoints
- ✅ Input validation with Pydantic
- ✅ CORS properly configured
### 4. Database Security
- ✅ Use connection pooling
- ✅ Encrypted connections (SSL/TLS)
- ✅ Regular backups (Railway managed)
- ✅ Access control via environment
---
## 🔧 Troubleshooting
### Common Issues
#### 1. Deployment Fails
```bash
# Check Railway logs
railway logs --service blackroad-backend --tail 100
# Verify environment variables
railway variables list
# Test locally
docker build -t test .
docker run -p 8000:8000 test
curl http://localhost:8000/health
```
#### 2. Service Won't Start
- ✅ Check `DATABASE_URL` and `REDIS_URL` are set
- ✅ Verify `SECRET_KEY` is generated
- ✅ Check port configuration (`$PORT`)
- ✅ Review startup logs for errors
#### 3. Health Check Failing
```bash
# Check endpoint directly
curl -v https://blackroad.systems/health
# Verify Railway health check settings
# Dashboard → Service → Settings → Health Check
```
#### 4. DNS Issues
- ✅ Verify CNAME in Cloudflare dashboard
- ✅ Check proxy status (orange cloud)
- ✅ Wait for DNS propagation (up to 24h)
- ✅ Test with `dig blackroad.systems`
---
## 📞 Getting Help
### Ops CLI Help
```bash
python scripts/br_ops.py help
```
### AI Assistants
- **Atlas** - Infrastructure orchestration
- **Cece** - Engineering & deployment
### Documentation
- Read the service analysis: `infra/analysis/<service>.md`
- Check the manifest: `infra/blackroad-manifest.yml`
- Review deployment docs: `DEPLOYMENT_NOTES.md`
### Support Channels
- **GitHub Issues**: Technical problems
- **Documentation**: `docs.blackroad.systems`
- **Operator**: Alexa Louise Amundson (Cadillac)
---
## 🎯 Roadmap
### Phase 1 (Current)
- ✅ Monolith backend deployed
- ✅ Postgres + Redis managed by Railway
- ✅ Documentation on GitHub Pages
- ✅ Control plane infrastructure established
### Phase 2 (2026 Q1-Q2)
- 📋 Extract API gateway
- 📋 Deploy Prism Console standalone
- 📋 Add agents runtime
- 📋 Create marketing site
### Phase 3 (2026 Q3-Q4)
- 📋 Microservices architecture
- 📋 Multi-region deployment
- 📋 Service mesh (Istio/Linkerd)
- 📋 Advanced observability
---
## 📝 Contributing
### Adding a New Service
1. **Update manifest**:
```bash
vim infra/blackroad-manifest.yml
# Add service definition under projects or planned_projects
```
2. **Create analysis document**:
```bash
cp infra/analysis/blackroad-backend.md infra/analysis/new-service.md
# Fill in service details
```
3. **Test ops CLI**:
```bash
python scripts/br_ops.py list
python scripts/br_ops.py env new-service
```
4. **Commit changes**:
```bash
git add infra/
git commit -m "Add new-service to infrastructure manifest"
```
### Updating Existing Service
1. Update `infra/blackroad-manifest.yml`
2. Update service analysis in `infra/analysis/`
3. Update any affected templates
4. Test with ops CLI
5. Commit and push
---
## 📄 License
This infrastructure documentation is part of the BlackRoad Operating System project.
---
*Control Plane Established: 2025-11-19*
*Maintained By: Atlas (AI Infrastructure Orchestrator)*
*Operator: Alexa Louise Amundson (Cadillac)*
*"Where AI meets the open road." 🛣️*

View File

@@ -0,0 +1,88 @@
# Service Analysis: blackroad-api (PLANNED)
**Status**: 📋 PLANNED (Phase 2)
**Target Date**: Q2 2026
**Service Type**: Public API Gateway
**Repository**: `blackboxprogramming/blackroad-api` (to be created)
---
## Overview
Standalone public API gateway to be extracted from the monolith in Phase 2. Will serve versioned API endpoints with enhanced security, rate limiting, and developer experience.
---
## Extraction Plan
### Source
- **Current Location**: `backend/app/routers/` in monolith
- **Target Repo**: New `blackroad-api` repository
- **Migration Method**: `git subtree split`
### Timeline
1. **Month 1-2**: Plan API contract versioning
2. **Month 3-4**: Extract routers, create new repo
3. **Month 5**: Deploy to Railway (parallel with monolith)
4. **Month 6**: DNS cutover, deprecate monolith API
---
## Architecture
### Technology Stack
- **Language**: Python 3.11+
- **Framework**: FastAPI 0.104.1
- **Features**:
- Versioned endpoints (`/v1/`, `/v2/`)
- Enhanced rate limiting
- API key management
- Developer portal
### Endpoints (Planned)
- `/v1/health` - Health check
- `/v1/version` - API version
- `/v1/auth/*` - Authentication
- `/v1/blockchain/*` - RoadChain access
- `/v1/agents/*` - Agent orchestration
- `/v1/data/*` - Data access APIs
---
## Configuration
### Environment Variables
```bash
CORE_API_URL=https://core-internal.blackroad.systems
AGENTS_API_URL=https://agents-internal.blackroad.systems
DATABASE_URL=${{Postgres.DATABASE_URL}}
REDIS_URL=${{Redis.REDIS_URL}}
API_KEYS_ENCRYPTION_KEY=<generate>
RATE_LIMIT_PER_MINUTE=60
```
### Domains
- **Production**: `api.blackroad.systems`
- **Staging**: `staging.api.blackroad.systems`
- **Dev**: `dev.api.blackroad.systems`
---
## Dependencies
- Internal core API (monolith)
- Internal agents API
- PostgreSQL (shared or dedicated)
- Redis (shared or dedicated)
---
## Risks & Mitigation
- **Risk**: Breaking changes for existing clients
- **Mitigation**: Version API endpoints, maintain v1 compatibility
- **Risk**: Performance degradation with extra hop
- **Mitigation**: Implement intelligent caching, optimize internal calls
---
*Analysis Date: 2025-11-19*
*Status: Planning Phase*

View File

@@ -0,0 +1,392 @@
# Service Analysis: blackroad-backend
**Status**: ✅ ACTIVE (Production)
**Last Analyzed**: 2025-11-19
**Service Type**: Backend API + Static UI Server
**Repository**: `blackboxprogramming/BlackRoad-Operating-System` (monorepo)
---
## Overview
The `blackroad-backend` service is the **canonical production backend** for BlackRoad OS. It serves multiple purposes:
- REST API gateway (33+ routers)
- Static UI hosting (Pocket OS at `/`)
- Health & monitoring endpoints
- WebSocket support (planned)
---
## Technology Stack
### Language & Framework
- **Language**: Python 3.11+
- **Framework**: FastAPI 0.104.1
- **ASGI Server**: Uvicorn 0.24.0
- **Async Support**: Full async/await with asyncio
### Dependencies
- **Web**: FastAPI, Uvicorn, Pydantic 2.5.0
- **Database**: SQLAlchemy 2.0.23 (async), asyncpg, psycopg2-binary
- **Cache**: redis-py 5.0.1, hiredis 2.2.3
- **Auth**: python-jose (JWT), passlib (bcrypt)
- **Testing**: pytest, pytest-asyncio, pytest-cov
- **Monitoring**: prometheus-client, sentry-sdk
---
## Current Endpoints
### Core System
- `GET /` → Pocket OS UI (static HTML/CSS/JS)
- `GET /health` → Basic health check
- `GET /api/health/summary` → Comprehensive health with integration status
- `GET /api/system/version` → System version info
- `GET /api/system/config/public` → Public configuration
- `GET /api/docs` → OpenAPI/Swagger UI
### Authentication
- `POST /api/auth/register` → User registration
- `POST /api/auth/login` → User login (JWT)
- `POST /api/auth/refresh` → Token refresh
- `GET /api/auth/me` → Current user
### Blockchain (RoadChain)
- `GET /api/blockchain/blocks` → List blocks
- `POST /api/blockchain/blocks` → Create block
- `GET /api/blockchain/verify` → Verify chain integrity
### Applications
- `/api/email/*` → RoadMail
- `/api/social/*` → BlackRoad Social
- `/api/video/*` → BlackStream
- `/api/miner/*` → Mining operations
- `/api/dashboard/*` → Dashboard data
- `/api/ai_chat/*` → AI/Lucidia integration
### Integrations (30+ routers)
See `backend/app/main.py` for full list of routers.
---
## Infrastructure
### Deployment
- **Platform**: Railway
- **Container**: Docker (multi-stage build)
- **Dockerfile**: `backend/Dockerfile`
- **Build Context**: Repository root
- **Start Command**: `uvicorn app.main:app --host 0.0.0.0 --port $PORT`
### Resources (Current)
- **Memory**: ~512MB
- **CPU**: Shared
- **Port**: Dynamic ($PORT assigned by Railway)
### Healthcheck
- **Path**: `/health`
- **Interval**: 30s
- **Timeout**: 5s
- **Retries**: 3
---
## Database Schema
### Models (SQLAlchemy ORM)
Location: `backend/app/models/`
**Core Models**:
- `User` → User accounts
- `Wallet` → Blockchain wallets
- `Block` → RoadChain blocks
- `Transaction` → Blockchain transactions
- `Job` → Prism jobs (future)
- `Event` → Audit events
**Relationships**:
- User → Wallets (1:many)
- User → Jobs (1:many)
- Block → Transactions (1:many)
### Migrations
- **Tool**: Alembic 1.12.1
- **Location**: `backend/alembic/`
- **Auto-upgrade**: Disabled (manual for safety)
---
## Caching Strategy
### Redis Usage
- **Session Storage**:
- Key pattern: `session:{user_id}`
- TTL: 1 hour
- **API Response Cache**:
- Key pattern: `api:cache:{endpoint}:{params_hash}`
- TTL: 5-60 minutes (varies by endpoint)
- **WebSocket State** (future):
- Pub/sub channels for real-time updates
- **Rate Limiting**:
- Key pattern: `ratelimit:{ip}:{endpoint}`
- TTL: 1 minute
---
## Security
### Authentication
- **Method**: JWT (JSON Web Tokens)
- **Access Token**: 30 minutes expiry
- **Refresh Token**: 7 days expiry
- **Algorithm**: HS256
- **Secret**: `SECRET_KEY` environment variable
### Password Hashing
- **Algorithm**: bcrypt
- **Rounds**: 12 (default)
### CORS
- **Allowed Origins**: Configurable via `ALLOWED_ORIGINS` env var
- **Default**: `https://blackroad.systems`
- **Credentials**: Allowed
### Input Validation
- **Framework**: Pydantic schemas
- **SQL Injection**: Protected (SQLAlchemy ORM)
- **XSS**: Frontend sanitization
---
## Current Issues & Fixes
### Recent Deployment Issues (Fixed)
**Fixed 2025-11-18**: Railway `startCommand` mismatch
- Issue: `cd backend && uvicorn ...` failed inside container
- Fix: Removed override, let Dockerfile CMD handle it
**Fixed 2025-11-18**: Dockerfile security hardening
- Added non-root user
- Multi-stage build for smaller image
- Health check integrated
### Known Limitations
⚠️ **Prism Console**: Not yet served at `/prism` (planned)
⚠️ **WebSockets**: Not yet implemented (planned)
⚠️ **GraphQL**: REST-only currently
---
## Environment Variables
### Critical (Must Set)
```bash
DATABASE_URL=${{Postgres.DATABASE_URL}}
REDIS_URL=${{Redis.REDIS_URL}}
SECRET_KEY=<generate>
ENVIRONMENT=production
DEBUG=False
ALLOWED_ORIGINS=https://blackroad.systems
```
### Important
```bash
ACCESS_TOKEN_EXPIRE_MINUTES=30
REFRESH_TOKEN_EXPIRE_DAYS=7
WALLET_MASTER_KEY=<generate>
API_BASE_URL=https://blackroad.systems
FRONTEND_URL=https://blackroad.systems
```
### Optional (Features)
```bash
OPENAI_API_KEY=sk-...
GITHUB_TOKEN=ghp_...
GITHUB_WEBHOOK_SECRET=<generate>
STRIPE_SECRET_KEY=sk_...
SENTRY_DSN=https://...
```
---
## Monitoring & Observability
### Metrics
- **Prometheus**: Custom metrics at `/metrics` (planned)
- **Railway**: Built-in CPU, memory, network metrics
### Logging
- **Format**: Structured JSON
- **Level**: INFO (production), DEBUG (development)
- **Destination**: stdout → Railway logs
### Error Tracking
- **Sentry**: Configured when `SENTRY_DSN` set
- **Coverage**: All uncaught exceptions
### Alerts (Recommended)
- Health check failures
- Error rate > 5%
- Response time p95 > 1s
- Memory usage > 80%
---
## Performance
### Current Benchmarks
- **Cold start**: ~3-5 seconds
- **Warm response**: ~50-200ms (cached)
- **Database query**: ~10-50ms (simple)
- **API throughput**: ~100 req/s (estimated)
### Optimization Opportunities
1. **Add Redis caching** for expensive queries
2. **Implement CDN** for static assets
3. **Enable Gzip compression**
4. **Database connection pooling** (already configured)
5. **Async background tasks** with Celery
---
## Testing
### Test Suite
- **Location**: `backend/tests/`
- **Framework**: pytest + pytest-asyncio
- **Coverage**: ~60-75% (target: 80%)
### Test Types
- **Unit**: Individual function tests
- **Integration**: Database + Redis tests
- **API**: Endpoint contract tests
- **E2E**: Frontend + backend flows (manual)
### Running Tests
```bash
cd backend
pytest -v --cov=app
```
---
## Development Workflow
### Local Setup
```bash
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with local DATABASE_URL, REDIS_URL, SECRET_KEY
uvicorn app.main:app --reload
```
### Docker Setup
```bash
cd backend
docker-compose up
# Starts FastAPI, Postgres, Redis, Adminer
```
### Making Changes
1. Create feature branch: `git checkout -b feature/my-feature`
2. Make changes in `backend/app/`
3. Add tests in `backend/tests/`
4. Run tests: `pytest -v`
5. Commit and push
6. Open PR → CI runs → Auto-merge (if passing)
---
## Deployment Process
### Automatic (Recommended)
```bash
git checkout main
git pull origin main
# Merge PR via GitHub UI or merge queue
# GitHub Actions triggers Railway deployment
# Monitor Railway dashboard for status
```
### Manual (Emergency)
```bash
railway login
railway link <PROJECT_ID>
railway up --service blackroad-backend
railway logs --tail 100
```
### Verification
```bash
curl https://blackroad.systems/health
# Should return: {"status": "healthy", "environment": "production"}
curl https://blackroad.systems/api/docs
# Should return Swagger UI HTML
```
---
## Rollback Procedure
### Via Railway Dashboard
1. Go to Railway → Deployments
2. Find previous working deployment
3. Click "Rollback" button
4. Verify health check passes
### Via Git
```bash
git revert <bad-commit>
git push origin main
# Wait for auto-deployment
```
---
## Future Enhancements
### Phase 2 (Q1-Q2 2026)
- [ ] Extract API gateway to separate service
- [ ] Serve Prism Console at `/prism`
- [ ] WebSocket support for real-time updates
- [ ] GraphQL endpoint for flexible queries
### Phase 3 (Q3-Q4 2026)
- [ ] Microservices architecture
- [ ] Service mesh (Istio/Linkerd)
- [ ] Multi-region deployment
- [ ] Advanced caching with CDN
---
## Dependencies
### Runtime Dependencies
- **Postgres**: Required (managed by Railway)
- **Redis**: Required (managed by Railway)
- **Cloudflare**: Optional (for CDN + DNS)
### External APIs (Optional)
- OpenAI API (for AI features)
- GitHub API (for agent integrations)
- Stripe API (for payments)
- Twilio API (for SMS)
---
## Contact & Support
**Primary Operator**: Alexa Louise Amundson (Cadillac)
**AI Assistant**: Atlas (Infrastructure), Cece (Engineering)
**Documentation**: See `CLAUDE.md`, `MASTER_ORCHESTRATION_PLAN.md`
**Issues**: GitHub Issues in `blackboxprogramming/BlackRoad-Operating-System`
---
*Analysis Date: 2025-11-19*
*Next Review: 2025-12-19*
*Maintained By: Atlas (AI Infrastructure Orchestrator)*

149
infra/analysis/docs-site.md Normal file
View File

@@ -0,0 +1,149 @@
# Service Analysis: docs-site
**Status**: ✅ ACTIVE (GitHub Pages)
**Last Analyzed**: 2025-11-19
**Service Type**: Static Documentation Site
**Platform**: GitHub Pages + Cloudflare CDN
---
## Overview
Technical documentation site built with MkDocs Material theme. Deployed to GitHub Pages and served via Cloudflare CDN at `docs.blackroad.systems`.
---
## Technology Stack
### Static Site Generator
- **Tool**: MkDocs 1.5+
- **Theme**: Material for MkDocs
- **Plugins**: search, minify, git-revision-date
### Build Process
```bash
cd codex-docs
mkdocs build --strict
# Output: site/
```
---
## Content Structure
### Documentation Categories
- **Architecture**: System design, decisions, deployment
- **API Reference**: Endpoint documentation, schemas
- **Guides**: Quickstart, development, deployment
- **Agents**: Agent ecosystem, creating agents
- **Contributing**: Contribution guidelines
### Source Location
- **Root**: `codex-docs/`
- **Config**: `codex-docs/mkdocs.yml`
- **Docs**: `codex-docs/docs/`
- **Build Output**: `codex-docs/site/` (gitignored)
---
## Deployment
### Platform
- **Primary**: GitHub Pages
- **Branch**: `gh-pages` (auto-generated)
- **CDN**: Cloudflare (proxied)
### CI/CD
```yaml
# .github/workflows/docs-deploy.yml
on:
push:
branches: [main]
paths: ['codex-docs/**']
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: pip install mkdocs mkdocs-material
- name: Build docs
run: cd codex-docs && mkdocs build --strict
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./codex-docs/site
```
---
## Domain Configuration
### DNS (Cloudflare)
```
Type: CNAME
Name: docs
Value: blackboxprogramming.github.io
Proxy: Enabled (Orange cloud)
TTL: Auto
```
### GitHub Pages
```
Repository Settings → Pages
Source: gh-pages branch
Custom domain: docs.blackroad.systems
HTTPS: Enforced
```
---
## Performance
### Metrics
- **Build Time**: ~10-20 seconds
- **Deploy Time**: ~30-60 seconds
- **Page Load**: < 1s (CDN cached)
- **Size**: ~5-10 MB (all pages)
### Optimizations
- Minified HTML/CSS/JS
- Image optimization
- Cloudflare CDN caching
- Gzip compression
---
## Maintenance
### Regular Tasks
- Update documentation with code changes
- Review for outdated content
- Check broken links
- Update screenshots
### Versioning
- Current: Single version (latest)
- Future: Multi-version support with mike
---
## Troubleshooting
### Build Failures
1. Check MkDocs config syntax
2. Verify all referenced files exist
3. Look for broken internal links
4. Check plugin compatibility
### DNS Issues
1. Verify CNAME record in Cloudflare
2. Check GitHub Pages custom domain setting
3. Wait for DNS propagation (up to 24h)
---
*Analysis Date: 2025-11-19*
*Next Review: 2025-12-19*

150
infra/analysis/postgres.md Normal file
View File

@@ -0,0 +1,150 @@
# Service Analysis: Postgres
**Status**: ✅ ACTIVE (Production)
**Last Analyzed**: 2025-11-19
**Service Type**: Managed Database (PostgreSQL)
**Provider**: Railway
---
## Overview
Managed PostgreSQL database service provided by Railway. Stores all persistent data for BlackRoad OS including users, wallets, blockchain, and application data.
---
## Configuration
### Version
- **PostgreSQL**: 15+ (Railway managed, auto-updates minor versions)
### Resources
- **Storage**: Auto-scaling (starts at 1GB)
- **Memory**: Shared (Railway managed)
- **Connections**: Max 100 concurrent
### Performance Tuning
```sql
-- Current settings (Railway defaults)
max_connections = 100
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
```
---
## Schema
### Current Tables
- `users` - User accounts (auth, profiles)
- `wallets` - Blockchain wallets (encrypted keys)
- `blocks` - RoadChain blockchain
- `transactions` - Blockchain transactions
- `jobs` - Prism job queue (future)
- `events` - Audit/compliance events
### Migration Management
- **Tool**: Alembic
- **Location**: `backend/alembic/`
- **Strategy**: Manual migrations (no auto-upgrade in production)
---
## Backups
### Automated Backups (Railway)
- **Frequency**: Daily
- **Retention**: 30 days
- **Storage**: Railway managed
### Restore Process
1. Railway Dashboard → Database → Backups
2. Select backup date
3. Click "Restore"
4. Verify data integrity
---
## Security
### Access Control
- **Network**: Private Railway network only
- **Credentials**: Auto-generated, injected via `${{Postgres.DATABASE_URL}}`
- **SSL/TLS**: Enforced
### Encryption
- **At Rest**: Railway managed encryption
- **In Transit**: SSL/TLS (required)
---
## Monitoring
### Metrics (Railway Dashboard)
- Connection count
- Query performance
- Storage usage
- CPU/memory usage
### Alerts (Recommended)
- Storage > 80% full
- Connection pool exhausted
- Slow queries (> 1s)
---
## Maintenance
### Regular Tasks
- **Vacuum**: Automatic (Railway managed)
- **Analyze**: Automatic (Railway managed)
- **Index optimization**: Manual (as needed)
### Scaling
- **Vertical**: Upgrade Railway plan
- **Storage**: Auto-scales up to plan limit
---
## Connection Details
### Environment Variable
```bash
DATABASE_URL=${{Postgres.DATABASE_URL}}
# Format: postgresql+asyncpg://user:pass@host:port/db
```
### Connection Pool (SQLAlchemy)
```python
# backend/app/database.py
engine = create_async_engine(
DATABASE_URL,
echo=False,
pool_size=10,
max_overflow=20,
pool_pre_ping=True
)
```
---
## Troubleshooting
### Connection Issues
1. Verify `DATABASE_URL` is set in Railway
2. Check service logs for connection errors
3. Verify database service is running
4. Check connection pool exhaustion
### Performance Issues
1. Check slow query log
2. Analyze query plans with `EXPLAIN`
3. Add missing indexes
4. Optimize N+1 queries
---
*Analysis Date: 2025-11-19*
*Next Review: 2025-12-19*

View File

@@ -0,0 +1,87 @@
# Service Analysis: prism-console (PLANNED)
**Status**: 📋 PLANNED (Phase 2)
**Target Date**: Q1 2026
**Service Type**: Admin Console UI
**Repository**: `blackboxprogramming/blackroad-prism-console` (to be created)
---
## Overview
Standalone admin console for job queue monitoring, system observability, and operator dashboards. Currently exists as static files in `prism-console/` directory.
---
## Current State
### Location
- **Source**: `prism-console/` in monolith
- **Status**: Built but not integrated
- **Access**: Should be served at `/prism` by backend
### Features (Implemented)
- Job queue dashboard
- System metrics display
- Event log viewer
- Multi-tab navigation
- Dark theme UI
---
## Phase 2 Plan
### Extraction Strategy
1. **Immediate (Phase 1.5)**:
- Integrate into backend: `app.mount("/prism", StaticFiles(...))`
- Deploy and test
2. **Phase 2**:
- Extract to separate repo
- Build as React/Next.js app (or keep Vanilla JS)
- Deploy to Railway/Vercel
### Technology Options
**Option A: Keep Vanilla JS**
- Pros: Zero dependencies, fast load
- Cons: Limited scalability for complex features
**Option B: React 18+**
- Pros: Component reusability, rich ecosystem
- Cons: Build complexity, larger bundle
**Option C: Next.js 14+**
- Pros: SSR, routing, optimizations
- Cons: More infrastructure needed
---
## Configuration
### Environment Variables
```bash
REACT_APP_API_URL=https://blackroad.systems/api
REACT_APP_CORE_API_URL=https://api.blackroad.systems
REACT_APP_AGENTS_API_URL=https://agents.blackroad.systems
```
### Domains
- **Production**: `prism.blackroad.systems`
- **Staging**: `staging.prism.blackroad.systems`
---
## API Integration
### Required Endpoints
- `GET /api/prism/jobs` - Job queue
- `GET /api/prism/events` - Event stream
- `GET /api/system/version` - Version info
- `GET /api/health/summary` - Health status
- `WebSocket /api/prism/stream` - Real-time updates
---
*Analysis Date: 2025-11-19*
*Status: Planning Phase*

163
infra/analysis/redis.md Normal file
View File

@@ -0,0 +1,163 @@
# Service Analysis: Redis
**Status**: ✅ ACTIVE (Production)
**Last Analyzed**: 2025-11-19
**Service Type**: Managed Cache (Redis)
**Provider**: Railway
---
## Overview
Managed Redis cache service provided by Railway. Used for session storage, API caching, WebSocket state, and pub/sub messaging.
---
## Configuration
### Version
- **Redis**: 7+ (Railway managed)
### Resources
- **Memory**: 256MB (default, configurable)
- **Eviction Policy**: `allkeys-lru` (least recently used)
- **Persistence**: RDB snapshots (Railway managed)
---
## Usage Patterns
### Session Storage
```python
# Store session
await redis.setex(
f"session:{user_id}",
3600, # 1 hour TTL
json.dumps(session_data)
)
# Retrieve session
session_json = await redis.get(f"session:{user_id}")
```
### API Response Caching
```python
# Cache API response
cache_key = f"api:cache:{endpoint}:{params_hash}"
await redis.setex(cache_key, 300, json.dumps(response)) # 5 min TTL
# Retrieve cached response
cached = await redis.get(cache_key)
```
### WebSocket State (Planned)
```python
# Pub/sub for real-time updates
await redis.publish("prism:events", json.dumps(event))
# Subscribe to events
pubsub = redis.pubsub()
await pubsub.subscribe("prism:events")
```
### Rate Limiting
```python
# Track API rate limits
key = f"ratelimit:{ip}:{endpoint}"
count = await redis.incr(key)
if count == 1:
await redis.expire(key, 60) # 1 minute window
```
---
## Key Namespaces
| Namespace | Pattern | TTL | Purpose |
|-----------|---------|-----|---------|
| `session:*` | `session:{user_id}` | 1 hour | User session data |
| `api:cache:*` | `api:cache:{endpoint}:{hash}` | 5-60 min | Cached API responses |
| `ratelimit:*` | `ratelimit:{ip}:{endpoint}` | 1 min | Rate limit counters |
| `websocket:*` | `websocket:{connection_id}` | Variable | WebSocket state |
| `prism:*` | Pub/sub channels | N/A | Event bus |
---
## Performance
### Metrics
- **Hit Rate**: Target > 80%
- **Latency**: < 1ms (local network)
- **Memory Usage**: Monitor for evictions
### Optimization
- Use pipelining for bulk operations
- Implement connection pooling
- Use hiredis for faster parsing
- Monitor key expiration patterns
---
## Monitoring
### Metrics (Railway Dashboard)
- Memory usage
- Connections count
- Commands per second
- Evicted keys
### Alerts (Recommended)
- Memory > 90% full
- High eviction rate
- Connection errors
---
## Connection Details
### Environment Variable
```bash
REDIS_URL=${{Redis.REDIS_URL}}
# Format: redis://host:port/db
```
### Connection Pool
```python
# backend/app/redis_client.py
redis = await aioredis.create_redis_pool(
REDIS_URL,
minsize=5,
maxsize=10,
encoding='utf-8'
)
```
---
## Security
### Access Control
- **Network**: Private Railway network only
- **Credentials**: Auto-generated, injected via `${{Redis.REDIS_URL}}`
- **Encryption**: TLS optional (not required on private network)
---
## Troubleshooting
### Connection Issues
1. Verify `REDIS_URL` is set
2. Check Redis service status
3. Verify network connectivity
4. Check connection pool exhaustion
### Memory Issues
1. Check eviction metrics
2. Analyze key distribution
3. Adjust TTLs for less critical data
4. Upgrade Redis plan if needed
---
*Analysis Date: 2025-11-19*
*Next Review: 2025-12-19*

View File

@@ -0,0 +1,651 @@
version: "1.0"
workspace: "BlackRoad OS, Inc."
control_plane_repo: "blackboxprogramming/BlackRoad-Operating-System"
last_updated: "2025-11-19"
operator: "Alexa Louise Amundson (Cadillac)"
# =============================================================================
# BLACKROAD OS SERVICE MANIFEST
# =============================================================================
# This file is the single source of truth for all BlackRoad services, their
# deployment configuration, dependencies, and infrastructure mapping.
#
# Usage:
# - Read by br-ops CLI tool for service management
# - Referenced by CI/CD workflows for deployment automation
# - Used by documentation generation tools
# - Maintained by AI agents (Atlas, Cece) and human operators
# =============================================================================
# Current deployment state
deployment_state:
phase: "Phase 1 - Monolith"
strategy: "Monorepo with consolidation"
active_services: 3
planned_services: 11
target_phase: "Phase 2 - Strategic Split"
# Railway production environment
railway:
project_id: "TBD" # Set after Railway connection
project_name: "blackroad-production"
region: "us-west1"
# Domain configuration (managed via Cloudflare)
domains:
primary: "blackroad.systems"
api: "api.blackroad.systems"
prism: "prism.blackroad.systems"
docs: "docs.blackroad.systems"
console: "console.blackroad.systems"
lucidia: "lucidia.earth"
network: "blackroad.network"
# =============================================================================
# ACTIVE PROJECTS (PHASE 1)
# =============================================================================
projects:
# ---------------------------------------------------------------------------
# CORE BACKEND (Current Production)
# ---------------------------------------------------------------------------
blackroad-core:
description: "Core backend API + static UI serving (monolith)"
railway_project: "blackroad-production"
status: "active"
phase: 1
services:
blackroad-backend:
repo: "blackboxprogramming/BlackRoad-Operating-System"
branch: "main"
kind: "backend"
language: "python"
framework: "FastAPI 0.104.1"
runtime: "Python 3.11+"
# Deployment configuration
deployment:
dockerfile: "backend/Dockerfile"
build_context: "."
start_command: "uvicorn app.main:app --host 0.0.0.0 --port $PORT"
healthcheck_path: "/health"
port: "${PORT:-8000}"
# Health & monitoring endpoints
entrypoints:
- path: "/health"
method: "GET"
description: "Basic health check"
- path: "/api/health/summary"
method: "GET"
description: "Comprehensive health with integration status"
- path: "/api/system/version"
method: "GET"
description: "System version and build info"
- path: "/api/docs"
method: "GET"
description: "OpenAPI/Swagger documentation"
# Domain routing
domains:
prod: "blackroad.systems"
staging: "staging.blackroad.systems"
dev: "dev.blackroad.systems"
# Routes served
routes:
- path: "/"
serves: "Pocket OS UI (backend/static/)"
type: "static"
- path: "/api/*"
serves: "REST API (33+ routers)"
type: "api"
- path: "/prism"
serves: "Prism Console UI"
type: "static"
status: "planned"
- path: "/api/docs"
serves: "Swagger UI"
type: "docs"
# Environment variables
env:
required:
- name: "DATABASE_URL"
description: "PostgreSQL connection string"
source: "${{Postgres.DATABASE_URL}}"
example: "postgresql+asyncpg://user:pass@host:5432/blackroad"
- name: "REDIS_URL"
description: "Redis connection string"
source: "${{Redis.REDIS_URL}}"
example: "redis://host:6379/0"
- name: "SECRET_KEY"
description: "JWT signing key"
generate: "openssl rand -hex 32"
secret: true
- name: "ENVIRONMENT"
description: "Deployment environment"
values: ["development", "staging", "production"]
- name: "DEBUG"
description: "Debug mode flag"
values: ["True", "False"]
default: "False"
- name: "ALLOWED_ORIGINS"
description: "CORS allowed origins"
example: "https://blackroad.systems,https://prism.blackroad.systems"
important:
- name: "ACCESS_TOKEN_EXPIRE_MINUTES"
description: "JWT access token expiry"
default: "30"
- name: "REFRESH_TOKEN_EXPIRE_DAYS"
description: "JWT refresh token expiry"
default: "7"
- name: "WALLET_MASTER_KEY"
description: "Blockchain wallet encryption key"
generate: "openssl rand -hex 32"
secret: true
- name: "API_BASE_URL"
description: "Public API base URL"
example: "https://blackroad.systems"
- name: "FRONTEND_URL"
description: "Frontend base URL"
example: "https://blackroad.systems"
optional:
- name: "OPENAI_API_KEY"
description: "OpenAI API access"
secret: true
- name: "GITHUB_TOKEN"
description: "GitHub API access for agents"
secret: true
- name: "GITHUB_WEBHOOK_SECRET"
description: "GitHub webhook validation secret"
generate: "openssl rand -hex 32"
secret: true
- name: "STRIPE_SECRET_KEY"
description: "Stripe payment processing"
secret: true
- name: "SENTRY_DSN"
description: "Sentry error tracking"
secret: true
# Database dependencies
databases:
primary:
service: "Postgres"
type: "postgresql"
version: "15+"
connection_var: "DATABASE_URL"
migrations: "alembic"
models_path: "backend/app/models/"
# Cache dependencies
caches:
primary:
service: "Redis"
type: "redis"
version: "7+"
connection_var: "REDIS_URL"
usage:
- "Session storage"
- "API response caching"
- "WebSocket state"
- "Pub/sub for events"
# Monitoring & observability
observability:
metrics:
- "Prometheus metrics at /metrics"
- "Railway built-in metrics"
logging:
- "Structured JSON logs"
- "Log level: INFO (prod), DEBUG (dev)"
tracing:
- "Sentry for error tracking"
alerts:
- "Railway health check failures"
- "High error rate (>5%)"
- "Response time > 1s (p95)"
# -----------------------------------------------------------------------
# DATABASE (Railway managed)
# -----------------------------------------------------------------------
postgres:
kind: "database"
type: "postgresql"
version: "15+"
provider: "railway-managed"
env:
exported:
- name: "DATABASE_URL"
description: "Full PostgreSQL connection string"
format: "postgresql+asyncpg://user:pass@host:port/db"
configuration:
max_connections: 100
shared_buffers: "256MB"
effective_cache_size: "1GB"
backups:
enabled: true
frequency: "daily"
retention_days: 30
migrations:
tool: "alembic"
location: "backend/alembic/"
auto_upgrade: false # Manual for safety
# -----------------------------------------------------------------------
# CACHE (Railway managed)
# -----------------------------------------------------------------------
redis:
kind: "cache"
type: "redis"
version: "7+"
provider: "railway-managed"
env:
exported:
- name: "REDIS_URL"
description: "Redis connection string"
format: "redis://host:port/db"
configuration:
maxmemory: "256MB"
maxmemory_policy: "allkeys-lru"
usage:
- service: "blackroad-backend"
purposes:
- "HTTP session storage"
- "API response caching"
- "WebSocket connection state"
- "Pub/sub for agent events"
# ---------------------------------------------------------------------------
# DOCUMENTATION (GitHub Pages)
# ---------------------------------------------------------------------------
blackroad-docs:
description: "Technical documentation (MkDocs)"
status: "active"
phase: 1
services:
docs-site:
repo: "blackboxprogramming/BlackRoad-Operating-System"
branch: "gh-pages"
kind: "docs"
framework: "MkDocs Material"
deployment:
platform: "github-pages"
source: "codex-docs/"
build_command: "mkdocs build"
output_dir: "site/"
domains:
prod: "docs.blackroad.systems"
# CDN configuration
cdn:
provider: "cloudflare"
caching: true
ssl: true
env:
required:
- name: "SITE_URL"
value: "https://docs.blackroad.systems"
# =============================================================================
# PLANNED PROJECTS (PHASE 2)
# =============================================================================
planned_projects:
# ---------------------------------------------------------------------------
# PUBLIC API GATEWAY (Phase 2 extraction)
# ---------------------------------------------------------------------------
blackroad-api:
description: "Public API gateway (extracted from monolith)"
railway_project: "blackroad-api"
status: "planned"
phase: 2
target_date: "2026-Q2"
services:
public-api:
repo: "blackboxprogramming/blackroad-api" # Future split
kind: "api-gateway"
language: "python"
framework: "FastAPI 0.104.1"
entrypoints:
- path: "/health"
- path: "/version"
- path: "/v1/*"
description: "Versioned API endpoints"
domains:
prod: "api.blackroad.systems"
staging: "staging.api.blackroad.systems"
env:
required:
- "NODE_ENV"
- "CORE_API_URL" # Points to internal core service
- "AGENTS_API_URL"
- "API_KEYS_ENCRYPTION_KEY"
# ---------------------------------------------------------------------------
# PRISM CONSOLE (Phase 2 standalone UI)
# ---------------------------------------------------------------------------
blackroad-prism-console:
description: "Admin console UI for job queue & observability"
railway_project: "blackroad-prism-console"
status: "planned"
phase: 2
target_date: "2026-Q1"
services:
prism-console-web:
repo: "blackboxprogramming/blackroad-prism-console" # Future split
kind: "frontend-console"
language: "javascript"
framework: "React 18+ or Vanilla JS"
domains:
prod: "prism.blackroad.systems"
staging: "staging.prism.blackroad.systems"
env:
required:
- "REACT_APP_API_URL"
- "REACT_APP_CORE_API_URL"
- "REACT_APP_AGENTS_API_URL"
# ---------------------------------------------------------------------------
# OPERATOR/WORKER ENGINE (Phase 2 background jobs)
# ---------------------------------------------------------------------------
blackroad-agents:
description: "Agent orchestration runtime + workers"
railway_project: "blackroad-agents"
status: "planned"
phase: 2
target_date: "2026-Q2"
services:
agents-api:
repo: "blackboxprogramming/blackroad-operator" # Future split
kind: "backend"
language: "python"
framework: "FastAPI"
domains:
prod: "agents.blackroad.systems"
staging: "staging.agents.blackroad.systems"
env:
required:
- "DATABASE_URL"
- "REDIS_URL"
- "CORE_API_URL"
- "PUBLIC_AGENTS_URL"
agents-worker:
repo: "blackboxprogramming/blackroad-operator"
kind: "worker"
language: "python"
framework: "Celery or custom"
env:
required:
- "DATABASE_URL"
- "REDIS_URL"
- "CORE_API_URL"
# ---------------------------------------------------------------------------
# MARKETING SITE (Phase 1-2 transition)
# ---------------------------------------------------------------------------
blackroad-web:
description: "Corporate marketing website"
status: "planned"
phase: 1 # Should be created soon
target_date: "2026-Q1"
services:
web-app:
repo: "blackboxprogramming/blackroad.io" # Future repo
kind: "frontend"
language: "javascript"
framework: "Next.js 14+ or static HTML"
domains:
prod: "blackroad.systems" # Will replace current root
staging: "staging.blackroad.systems"
env:
required:
- "NEXT_PUBLIC_API_URL"
- "NEXT_PUBLIC_APP_URL"
# ---------------------------------------------------------------------------
# AI ORCHESTRATION (Phase 2)
# ---------------------------------------------------------------------------
lucidia:
description: "Multi-model AI orchestration layer"
status: "development"
phase: 2
target_date: "2026-Q3"
services:
lucidia-api:
repo: "blackboxprogramming/lucidia"
kind: "backend"
language: "python"
framework: "FastAPI"
domains:
prod: "lucidia.blackroad.systems"
staging: "staging.lucidia.blackroad.systems"
env:
required:
- "OPENAI_API_KEY"
- "ANTHROPIC_API_KEY"
- "GROQ_API_KEY"
- "DATABASE_URL"
- "REDIS_URL"
# =============================================================================
# EXPERIMENTAL/RESEARCH (NOT FOR PRODUCTION)
# =============================================================================
experimental_projects:
lucidia-lab:
description: "AI research & testing environment"
repo: "blackboxprogramming/lucidia-lab"
status: "research"
phase: 3
quantum-math-lab:
description: "Quantum computing research"
repo: "blackboxprogramming/quantum-math-lab"
status: "research"
phase: 3
native-ai-quantum-energy:
description: "Quantum + AI integration research"
repo: "blackboxprogramming/native-ai-quantum-energy"
status: "research"
phase: 3
# =============================================================================
# INFRASTRUCTURE DEPENDENCIES
# =============================================================================
infrastructure:
# DNS & CDN
cloudflare:
zones:
- domain: "blackroad.systems"
records_managed: true
proxy_enabled: true
ssl_mode: "full"
- domain: "lucidia.earth"
records_managed: true
proxy_enabled: true
ssl_mode: "full"
# Primary hosting platform
railway:
projects:
production:
name: "blackroad-production"
region: "us-west1"
services: 3 # backend, postgres, redis
staging:
name: "blackroad-staging"
region: "us-west1"
services: 3
# Code & CI/CD
github:
organization: "blackboxprogramming"
repositories:
active: 4 # Core repos in Phase 1
total: 23 # Including experimental
workflows:
- "backend-tests.yml"
- "railway-deploy.yml"
- "docs-deploy.yml"
- "railway-automation.yml"
- "infra-ci-bucketed.yml"
# Monitoring & observability
monitoring:
railway:
- "Built-in metrics"
- "Health checks"
- "Log aggregation"
sentry:
- "Error tracking"
- "Performance monitoring"
prometheus:
- "Custom metrics export"
# =============================================================================
# CI/CD CONFIGURATION
# =============================================================================
ci_cd:
github_actions:
workflows:
backend_tests:
file: ".github/workflows/backend-tests.yml"
triggers: ["push", "pull_request"]
paths: ["backend/**", "agents/**"]
railway_deploy:
file: ".github/workflows/railway-deploy.yml"
triggers: ["push"]
branches: ["main"]
requires: ["backend_tests"]
docs_deploy:
file: ".github/workflows/docs-deploy.yml"
triggers: ["push"]
paths: ["codex-docs/**"]
secrets_required:
- "RAILWAY_TOKEN"
- "CF_API_TOKEN"
- "CF_ZONE_ID"
- "SENTRY_DSN"
railway:
auto_deploy:
enabled: true
branch: "main"
healthcheck_timeout: 300 # 5 minutes
rollback_on_failure: true
# =============================================================================
# OPERATIONAL RUNBOOKS
# =============================================================================
runbooks:
deployment:
manual_deploy:
- "git checkout main"
- "git pull origin main"
- "git push origin main # Triggers CI/CD"
- "Watch Railway dashboard for deploy status"
- "Verify /health endpoint returns 200"
rollback:
- "Railway dashboard → Deployments → Previous deployment → Rollback"
- "Or: git revert <commit> && git push"
incident_response:
service_down:
- "Check Railway service logs"
- "Verify DATABASE_URL and REDIS_URL are set"
- "Check /health endpoint"
- "Review recent deployments"
- "Rollback if necessary"
database_issues:
- "Check Postgres logs in Railway"
- "Verify connection string"
- "Check for migration issues"
- "Contact Railway support if managed DB issue"
# =============================================================================
# MIGRATION PATHS
# =============================================================================
migrations:
phase_1_to_2:
description: "Extract API gateway and Prism console from monolith"
steps:
- step: 1
action: "Create blackroad-api repo"
command: "git subtree split --prefix=backend/app/routers"
- step: 2
action: "Deploy blackroad-api to Railway"
command: "railway up --service blackroad-api"
- step: 3
action: "Update DNS routing"
command: "Update Cloudflare DNS: api.blackroad.systems"
- step: 4
action: "Verify health checks"
command: "curl https://api.blackroad.systems/health"
rollback_plan:
- "Keep monolith running during migration"
- "Blue-green deployment with DNS switch"
- "Monitor error rates for 24 hours before removing old service"
# =============================================================================
# METADATA
# =============================================================================
metadata:
version: "1.0"
schema_version: "2025-11-19"
maintained_by:
- "Alexa Louise Amundson (Operator)"
- "Atlas (AI Infrastructure Orchestrator)"
- "Cece (AI Engineer)"
last_audit: "2025-11-18"
next_review: "2025-12-19"
references:
- "MASTER_ORCHESTRATION_PLAN.md"
- "ORG_STRUCTURE.md"
- "PRODUCTION_STACK_AUDIT_2025-11-18.md"
- "BLACKROAD_OS_BIG_KAHUNA_VISION.md"

View File

@@ -0,0 +1,127 @@
# Environment Variables Template
# Copy this file to .env and fill in the values
# NEVER commit .env to git!
# =============================================================================
# CORE CONFIGURATION
# =============================================================================
# Application
APP_NAME="BlackRoad Service"
APP_VERSION="1.0.0"
ENVIRONMENT=development # development, staging, production
DEBUG=True # Set to False in production
# Server
PORT=8000
HOST=0.0.0.0
# Security
SECRET_KEY= # Generate with: openssl rand -hex 32
ALLOWED_ORIGINS=http://localhost:8000,http://localhost:3000
# =============================================================================
# DATABASE
# =============================================================================
# PostgreSQL (Railway managed)
DATABASE_URL= # Format: postgresql+asyncpg://user:pass@host:5432/db
# Database pool settings
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20
# =============================================================================
# CACHE
# =============================================================================
# Redis (Railway managed)
REDIS_URL= # Format: redis://host:6379/0
# Redis settings
REDIS_MAX_CONNECTIONS=10
# =============================================================================
# AUTHENTICATION
# =============================================================================
# JWT settings
ACCESS_TOKEN_EXPIRE_MINUTES=30
REFRESH_TOKEN_EXPIRE_DAYS=7
# Password hashing
BCRYPT_ROUNDS=12
# =============================================================================
# BLOCKCHAIN (If applicable)
# =============================================================================
WALLET_MASTER_KEY= # Generate with: openssl rand -hex 32
# =============================================================================
# EXTERNAL APIs (Optional)
# =============================================================================
# OpenAI
OPENAI_API_KEY=
# GitHub
GITHUB_TOKEN=
GITHUB_WEBHOOK_SECRET= # Generate with: openssl rand -hex 32
# Stripe (Payments)
STRIPE_SECRET_KEY=
STRIPE_PUBLISHABLE_KEY=
STRIPE_WEBHOOK_SECRET=
# Sentry (Error Tracking)
SENTRY_DSN=
# Twilio (SMS)
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
TWILIO_PHONE_NUMBER=
# =============================================================================
# MONITORING & OBSERVABILITY
# =============================================================================
# Logging
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
# Metrics
ENABLE_METRICS=true
METRICS_PORT=9090
# =============================================================================
# FEATURE FLAGS
# =============================================================================
ENABLE_WEBSOCKETS=false
ENABLE_GRAPHQL=false
ENABLE_RATE_LIMITING=true
# =============================================================================
# CORS
# =============================================================================
CORS_ALLOW_CREDENTIALS=true
CORS_MAX_AGE=3600
# =============================================================================
# NOTES
# =============================================================================
# 1. Generate secure random keys:
# openssl rand -hex 32
#
# 2. Never commit .env to git!
# Add .env to .gitignore
#
# 3. For Railway deployment:
# Set these in Railway dashboard
# Use ${{Postgres.DATABASE_URL}} and ${{Redis.REDIS_URL}} references
#
# 4. For local development:
# Use docker-compose for Postgres and Redis
# Or connect to Railway services

View File

@@ -0,0 +1,59 @@
# Multi-stage Dockerfile Template for FastAPI Services
# Based on BlackRoad OS best practices
# =============================================================================
# Stage 1: Build stage
# =============================================================================
FROM python:3.11-slim as builder
WORKDIR /build
# Install build dependencies
RUN apt-get update && apt-get install -y \
gcc \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# =============================================================================
# Stage 2: Runtime stage
# =============================================================================
FROM python:3.11-slim
WORKDIR /app
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
libpq5 \
curl \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user for security
RUN useradd -m -u 1000 appuser && \
chown -R appuser:appuser /app
# Copy installed packages from builder
COPY --from=builder /root/.local /home/appuser/.local
# Copy application code
COPY --chown=appuser:appuser . .
# Make sure scripts in .local are usable
ENV PATH=/home/appuser/.local/bin:$PATH
# Switch to non-root user
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD curl -f http://localhost:${PORT:-8000}/health || exit 1
# Expose port (Railway will override with $PORT)
EXPOSE 8000
# Start command
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

View File

@@ -0,0 +1,93 @@
name: Deploy to Railway
on:
push:
branches:
- main
- staging
- dev
paths:
- '**/*.py'
- 'requirements.txt'
- 'Dockerfile'
- 'railway.toml'
- 'railway.json'
- '.github/workflows/railway-deploy.yml'
workflow_dispatch:
inputs:
environment:
description: 'Deployment environment'
required: true
default: 'production'
type: choice
options:
- production
- staging
- development
jobs:
deploy:
name: Deploy to Railway
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Determine environment
id: env
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "environment=${{ inputs.environment }}" >> $GITHUB_OUTPUT
elif [ "${{ github.ref }}" = "refs/heads/main" ]; then
echo "environment=production" >> $GITHUB_OUTPUT
elif [ "${{ github.ref }}" = "refs/heads/staging" ]; then
echo "environment=staging" >> $GITHUB_OUTPUT
else
echo "environment=development" >> $GITHUB_OUTPUT
fi
- name: Install Railway CLI
run: npm i -g @railway/cli
- name: Deploy to Railway
env:
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}
run: |
railway up --service ${{ secrets.RAILWAY_SERVICE_NAME }} \
--environment ${{ steps.env.outputs.environment }} \
--ci
- name: Wait for deployment
run: sleep 30
- name: Health check
run: |
if [ "${{ steps.env.outputs.environment }}" = "production" ]; then
HEALTH_URL="https://${{ secrets.PRODUCTION_DOMAIN }}/health"
elif [ "${{ steps.env.outputs.environment }}" = "staging" ]; then
HEALTH_URL="https://${{ secrets.STAGING_DOMAIN }}/health"
else
HEALTH_URL="${{ secrets.DEV_DOMAIN }}/health"
fi
echo "Checking health at: $HEALTH_URL"
for i in {1..10}; do
if curl -f "$HEALTH_URL"; then
echo "✅ Health check passed!"
exit 0
fi
echo "Attempt $i failed, retrying in 10s..."
sleep 10
done
echo "❌ Health check failed after 10 attempts"
exit 1
- name: Notify on failure
if: failure()
run: |
echo "::error::Deployment to ${{ steps.env.outputs.environment }} failed!"
# Add notification logic here (Slack, Discord, email, etc.)

View File

@@ -0,0 +1,14 @@
{
"$schema": "https://railway.app/railway.schema.json",
"build": {
"builder": "DOCKERFILE",
"dockerfilePath": "Dockerfile"
},
"deploy": {
"startCommand": "uvicorn app.main:app --host 0.0.0.0 --port $PORT",
"healthcheckPath": "/health",
"healthcheckTimeout": 300,
"restartPolicyType": "ON_FAILURE",
"watchPatterns": ["**/*.py", "requirements.txt", "Dockerfile"]
}
}

View File

@@ -0,0 +1,31 @@
# Railway Deployment Configuration Template
# Copy this file to your service repository and customize
[build]
builder = "DOCKERFILE"
dockerfilePath = "Dockerfile" # Adjust if Dockerfile is in a subdirectory
[deploy]
startCommand = "uvicorn app.main:app --host 0.0.0.0 --port $PORT" # Adjust for your service
healthcheckPath = "/health"
healthcheckTimeout = 300 # 5 minutes
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10
# Watch patterns for auto-redeployment
[deploy.watchPatterns]
patterns = ["**/*.py", "requirements.txt", "Dockerfile"]
# Environment variables (set these in Railway dashboard)
# Required:
# - DATABASE_URL
# - REDIS_URL
# - SECRET_KEY
# - ENVIRONMENT
# - DEBUG
# - ALLOWED_ORIGINS
# Optional:
# - OPENAI_API_KEY
# - GITHUB_TOKEN
# - SENTRY_DSN

455
scripts/br_ops.py Executable file
View File

@@ -0,0 +1,455 @@
#!/usr/bin/env python3
"""
BlackRoad OS Operations CLI (br-ops)
Central command-line tool for managing all BlackRoad services from the control plane.
Reads from infra/blackroad-manifest.yml to provide unified operations across all repos.
Usage:
python scripts/br_ops.py list
python scripts/br_ops.py env blackroad-backend
python scripts/br_ops.py repo blackroad-backend
python scripts/br_ops.py status
python scripts/br_ops.py open blackroad-backend prod
python scripts/br_ops.py health blackroad-backend
Author: Atlas (AI Infrastructure Orchestrator)
Version: 1.0.0
Date: 2025-11-19
"""
import sys
import os
import yaml
import json
from pathlib import Path
from typing import Dict, Any, List, Optional
from datetime import datetime
class Colors:
"""ANSI color codes for terminal output"""
HEADER = '\033[95m'
OKBLUE = '\033[94m'
OKCYAN = '\033[96m'
OKGREEN = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
ENDC = '\033[0m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
class BlackRoadOps:
"""Main operations CLI class"""
def __init__(self):
self.repo_root = Path(__file__).parent.parent
self.manifest_path = self.repo_root / "infra" / "blackroad-manifest.yml"
self.manifest = self._load_manifest()
def _load_manifest(self) -> Dict[str, Any]:
"""Load the service manifest YAML"""
if not self.manifest_path.exists():
self._error(f"Manifest not found: {self.manifest_path}")
sys.exit(1)
with open(self.manifest_path, 'r') as f:
return yaml.safe_load(f)
def _print(self, message: str, color: str = ""):
"""Print colored message"""
if color:
print(f"{color}{message}{Colors.ENDC}")
else:
print(message)
def _error(self, message: str):
"""Print error message"""
self._print(f"❌ ERROR: {message}", Colors.FAIL)
def _success(self, message: str):
"""Print success message"""
self._print(f"{message}", Colors.OKGREEN)
def _warning(self, message: str):
"""Print warning message"""
self._print(f"⚠️ {message}", Colors.WARNING)
def _header(self, message: str):
"""Print section header"""
self._print(f"\n{'=' * 80}", Colors.BOLD)
self._print(message, Colors.HEADER + Colors.BOLD)
self._print('=' * 80, Colors.BOLD)
def _get_all_services(self) -> Dict[str, Dict[str, Any]]:
"""Get all services from active and planned projects"""
all_services = {}
# Active projects
for project_name, project_data in self.manifest.get('projects', {}).items():
for service_name, service_data in project_data.get('services', {}).items():
all_services[service_name] = {
**service_data,
'project': project_name,
'status': project_data.get('status', 'unknown'),
'phase': project_data.get('phase', 'unknown')
}
# Planned projects
for project_name, project_data in self.manifest.get('planned_projects', {}).items():
for service_name, service_data in project_data.get('services', {}).items():
all_services[service_name] = {
**service_data,
'project': project_name,
'status': project_data.get('status', 'planned'),
'phase': project_data.get('phase', 'unknown'),
'target_date': project_data.get('target_date', 'TBD')
}
return all_services
def cmd_list(self, args: List[str]):
"""List all services with their details"""
self._header("BLACKROAD OS SERVICES")
services = self._get_all_services()
if not services:
self._warning("No services found in manifest")
return
# Group by status
active_services = {k: v for k, v in services.items() if v['status'] == 'active'}
planned_services = {k: v for k, v in services.items() if v['status'] == 'planned'}
dev_services = {k: v for k, v in services.items() if v['status'] == 'development'}
# Active services
if active_services:
self._print("\n🟢 ACTIVE SERVICES", Colors.OKGREEN + Colors.BOLD)
self._print("-" * 80)
for name, data in active_services.items():
kind = data.get('kind', 'unknown')
repo = data.get('repo', 'N/A')
domains = data.get('domains', {})
prod_domain = domains.get('prod', 'N/A')
self._print(f"\n {Colors.BOLD}{name}{Colors.ENDC}")
self._print(f" Type: {kind}")
self._print(f" Repo: {repo}")
self._print(f" Domain: {prod_domain}")
self._print(f" Project: {data['project']}")
self._print(f" Phase: {data['phase']}")
# Development services
if dev_services:
self._print("\n🟡 DEVELOPMENT SERVICES", Colors.WARNING + Colors.BOLD)
self._print("-" * 80)
for name, data in dev_services.items():
kind = data.get('kind', 'unknown')
repo = data.get('repo', 'N/A')
self._print(f"\n {Colors.BOLD}{name}{Colors.ENDC}")
self._print(f" Type: {kind}")
self._print(f" Repo: {repo}")
self._print(f" Project: {data['project']}")
# Planned services
if planned_services:
self._print("\n📋 PLANNED SERVICES (Future)", Colors.OKCYAN + Colors.BOLD)
self._print("-" * 80)
for name, data in planned_services.items():
kind = data.get('kind', 'unknown')
repo = data.get('repo', 'N/A')
target = data.get('target_date', 'TBD')
self._print(f"\n {Colors.BOLD}{name}{Colors.ENDC}")
self._print(f" Type: {kind}")
self._print(f" Repo: {repo}")
self._print(f" Target Date: {target}")
self._print(f" Project: {data['project']}")
# Summary
self._print("\n" + "=" * 80)
self._print(f"Total Services: {len(services)}", Colors.BOLD)
self._print(f" Active: {len(active_services)}")
self._print(f" Development: {len(dev_services)}")
self._print(f" Planned: {len(planned_services)}")
def cmd_env(self, args: List[str]):
"""Show required environment variables for a service"""
if not args:
self._error("Usage: br-ops env <service-name>")
return
service_name = args[0]
services = self._get_all_services()
if service_name not in services:
self._error(f"Service not found: {service_name}")
self._print("\nAvailable services:")
for name in services.keys():
self._print(f" - {name}")
return
service = services[service_name]
env_config = service.get('env', {})
self._header(f"ENVIRONMENT VARIABLES: {service_name}")
# Required variables
required = env_config.get('required', [])
if required:
self._print("\n🔴 REQUIRED (Must Set)", Colors.FAIL + Colors.BOLD)
self._print("-" * 80)
for var in required:
if isinstance(var, dict):
name = var.get('name')
desc = var.get('description', '')
example = var.get('example', '')
source = var.get('source', '')
generate = var.get('generate', '')
secret = var.get('secret', False)
self._print(f"\n {Colors.BOLD}{name}{Colors.ENDC}")
if desc:
self._print(f" Description: {desc}")
if source:
self._print(f" Source: {source}")
if example:
self._print(f" Example: {example}")
if generate:
self._print(f" Generate: {generate}", Colors.OKCYAN)
if secret:
self._print(f" Secret: Yes (keep secure!)", Colors.WARNING)
# Important variables
important = env_config.get('important', [])
if important:
self._print("\n🟡 IMPORTANT (Recommended)", Colors.WARNING + Colors.BOLD)
self._print("-" * 80)
for var in important:
if isinstance(var, dict):
name = var.get('name')
desc = var.get('description', '')
default = var.get('default', '')
self._print(f"\n {Colors.BOLD}{name}{Colors.ENDC}")
if desc:
self._print(f" Description: {desc}")
if default:
self._print(f" Default: {default}")
# Optional variables
optional = env_config.get('optional', [])
if optional:
self._print("\n🟢 OPTIONAL (Features)", Colors.OKGREEN + Colors.BOLD)
self._print("-" * 80)
for var in optional:
if isinstance(var, dict):
name = var.get('name')
desc = var.get('description', '')
self._print(f"\n {Colors.BOLD}{name}{Colors.ENDC}")
if desc:
self._print(f" Description: {desc}")
def cmd_repo(self, args: List[str]):
"""Show repository information for a service"""
if not args:
self._error("Usage: br-ops repo <service-name>")
return
service_name = args[0]
services = self._get_all_services()
if service_name not in services:
self._error(f"Service not found: {service_name}")
return
service = services[service_name]
self._header(f"REPOSITORY INFO: {service_name}")
repo = service.get('repo', 'N/A')
branch = service.get('branch', 'main')
kind = service.get('kind', 'unknown')
language = service.get('language', 'N/A')
framework = service.get('framework', 'N/A')
self._print(f"\nRepository: {repo}", Colors.BOLD)
self._print(f"Branch: {branch}")
self._print(f"Type: {kind}")
self._print(f"Language: {language}")
self._print(f"Framework: {framework}")
# Git URLs
if repo != 'N/A':
self._print(f"\nGit URLs:", Colors.BOLD)
self._print(f" HTTPS: https://github.com/{repo}.git")
self._print(f" SSH: git@github.com:{repo}.git")
def cmd_open(self, args: List[str]):
"""Print Railway dashboard URL for a service"""
if not args:
self._error("Usage: br-ops open <service-name> [env]")
self._print(" env: prod (default), staging, dev")
return
service_name = args[0]
env = args[1] if len(args) > 1 else 'prod'
services = self._get_all_services()
if service_name not in services:
self._error(f"Service not found: {service_name}")
return
service = services[service_name]
domains = service.get('domains', {})
domain = domains.get(env)
if not domain:
self._error(f"No {env} domain configured for {service_name}")
return
self._header(f"SERVICE URL: {service_name} ({env})")
self._print(f"\n🌐 {Colors.BOLD}https://{domain}{Colors.ENDC}\n")
# Also print Railway info
self._print("Railway Dashboard:", Colors.BOLD)
self._print(" Visit https://railway.app/ and select your project")
self._print(f" Service: {service_name}")
def cmd_status(self, args: List[str]):
"""Show status of all services"""
self._header("SERVICE STATUS")
services = self._get_all_services()
active_count = sum(1 for s in services.values() if s['status'] == 'active')
planned_count = sum(1 for s in services.values() if s['status'] == 'planned')
dev_count = sum(1 for s in services.values() if s['status'] == 'development')
self._print(f"\nTotal Services: {len(services)}", Colors.BOLD)
self._print(f" 🟢 Active: {active_count}")
self._print(f" 🟡 Development: {dev_count}")
self._print(f" 📋 Planned: {planned_count}")
# Deployment state
state = self.manifest.get('deployment_state', {})
self._print(f"\nDeployment Phase: {state.get('phase', 'unknown')}", Colors.BOLD)
self._print(f"Strategy: {state.get('strategy', 'unknown')}")
self._print(f"Active Services: {state.get('active_services', 'unknown')}")
self._print(f"Target Phase: {state.get('target_phase', 'unknown')}")
# Instructions for health checks
self._print("\n" + "=" * 80)
self._print("Health Check Commands:", Colors.BOLD)
self._print(" curl https://blackroad.systems/health")
self._print(" curl https://blackroad.systems/api/health/summary")
def cmd_health(self, args: List[str]):
"""Show health check instructions for a service"""
if not args:
self._error("Usage: br-ops health <service-name>")
return
service_name = args[0]
services = self._get_all_services()
if service_name not in services:
self._error(f"Service not found: {service_name}")
return
service = services[service_name]
domains = service.get('domains', {})
prod_domain = domains.get('prod')
entrypoints = service.get('entrypoints', [])
self._header(f"HEALTH CHECKS: {service_name}")
if not prod_domain:
self._warning("No production domain configured")
return
self._print(f"\nProduction Domain: https://{prod_domain}\n", Colors.BOLD)
if entrypoints:
self._print("Health Endpoints:", Colors.BOLD)
for endpoint in entrypoints:
if isinstance(endpoint, dict):
path = endpoint.get('path')
method = endpoint.get('method', 'GET')
desc = endpoint.get('description', '')
self._print(f"\n {method} {path}")
if desc:
self._print(f" {desc}")
self._print(f" curl https://{prod_domain}{path}", Colors.OKCYAN)
else:
self._print(f"\n GET {endpoint}")
self._print(f" curl https://{prod_domain}{endpoint}", Colors.OKCYAN)
def cmd_help(self, args: List[str]):
"""Show help message"""
self._header("BLACKROAD OS OPERATIONS CLI")
self._print("\nManage all BlackRoad services from the control plane.\n")
self._print("Usage:", Colors.BOLD)
self._print(" python scripts/br_ops.py <command> [args]\n")
self._print("Commands:", Colors.BOLD)
self._print(" list List all services")
self._print(" env <service> Show environment variables for service")
self._print(" repo <service> Show repository info for service")
self._print(" open <service> Show service URL")
self._print(" status Show overall status")
self._print(" health <service> Show health check commands")
self._print(" help Show this help message")
self._print("\nExamples:", Colors.BOLD)
self._print(" python scripts/br_ops.py list")
self._print(" python scripts/br_ops.py env blackroad-backend")
self._print(" python scripts/br_ops.py repo blackroad-backend")
self._print(" python scripts/br_ops.py open blackroad-backend prod")
self._print(" python scripts/br_ops.py health blackroad-backend")
self._print("\nNote:", Colors.WARNING)
self._print(" This tool reads from infra/blackroad-manifest.yml")
self._print(" For actual deployment operations, use Railway CLI or GitHub Actions")
def run(self):
"""Main entry point"""
if len(sys.argv) < 2:
self.cmd_help([])
return
command = sys.argv[1]
args = sys.argv[2:]
commands = {
'list': self.cmd_list,
'env': self.cmd_env,
'repo': self.cmd_repo,
'open': self.cmd_open,
'status': self.cmd_status,
'health': self.cmd_health,
'help': self.cmd_help,
}
if command not in commands:
self._error(f"Unknown command: {command}")
self._print("\nRun 'python scripts/br_ops.py help' for usage")
sys.exit(1)
try:
commands[command](args)
except Exception as e:
self._error(f"Command failed: {str(e)}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == '__main__':
cli = BlackRoadOps()
cli.run()