mirror of https://github.com/blackboxprogramming/BlackRoad-Operating-System.git synced 2026-03-17 04:57:15 -05:00

Files

Claude abdbc764e6 Establish BlackRoad OS infrastructure control plane

Add comprehensive infrastructure management system to centralize all service
definitions, deployment configurations, and operational tooling.

## New Infrastructure Components

### 1. Service Manifest (infra/blackroad-manifest.yml)
- Complete catalog of all active and planned services
- Deployment configuration for each service
- Environment variable definitions
- Domain mappings and routing
- Database and cache dependencies
- Health check endpoints
- CI/CD integration specifications

### 2. Operations CLI (scripts/br_ops.py)
- Command-line tool for managing all BlackRoad services
- Commands: list, env, repo, open, status, health
- Reads from service manifest for unified operations
- Colored terminal output for better readability

### 3. Service Analysis Documents (infra/analysis/)
- Detailed technical analysis for each service
- Active services:
  - blackroad-backend.md (FastAPI backend)
  - postgres.md (PostgreSQL database)
  - redis.md (Redis cache)
  - docs-site.md (MkDocs documentation)
- Planned services:
  - blackroad-api.md (API gateway - Phase 2)
  - prism-console.md (Admin console - Phase 2)

### 4. Infrastructure Templates (infra/templates/)
- railway.toml.template - Railway deployment config
- railway.json.template - Alternative Railway config
- Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile
- github-workflow-railway-deploy.yml.template - CI/CD workflow
- .env.example.template - Comprehensive env var template

### 5. Documentation (infra/README.md)
- Complete guide to infrastructure control plane
- Usage instructions for ops CLI
- Service manifest documentation
- Deployment procedures
- Troubleshooting guide
- Phase 2 migration plan

## Architecture

This establishes BlackRoad-Operating-System as the canonical control plane
for all BlackRoad services, both current and planned:

**Phase 1 (Active)**:
- blackroad-backend (FastAPI + static UI)
- postgres (Railway managed)
- redis (Railway managed)
- docs-site (GitHub Pages)

**Phase 2 (Planned)**:
- blackroad-api (API gateway)
- blackroad-prism-console (Admin UI)
- blackroad-agents (Orchestration)
- blackroad-web (Marketing site)

**Phase 3 (Future)**:
- lucidia (AI orchestration)
- Additional microservices

## Usage

# List all services
python scripts/br_ops.py list

# Show environment variables
python scripts/br_ops.py env blackroad-backend

# Show repository info
python scripts/br_ops.py repo blackroad-backend

# Show service URL
python scripts/br_ops.py open blackroad-backend prod

# Show overall status
python scripts/br_ops.py status

# Show health checks
python scripts/br_ops.py health blackroad-backend

## Benefits

1. **Single Source of Truth**: All service configuration in one manifest
2. **Unified Operations**: One CLI for all services
3. **Documentation**: Comprehensive per-service analysis
4. **Templates**: Reusable infrastructure patterns
5. **Migration Ready**: Clear path to Phase 2 microservices

## References

- MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture
- ORG_STRUCTURE.md - Repository strategy
- PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state

Implemented by: Atlas (AI Infrastructure Orchestrator)
Date: 2025-11-19

2025-11-19 21:04:14 +00:00

8.9 KiB

Raw Blame History

Service Analysis: blackroad-backend

Status: ✅ ACTIVE (Production) Last Analyzed: 2025-11-19 Service Type: Backend API + Static UI Server Repository: blackboxprogramming/BlackRoad-Operating-System (monorepo)

Overview

The blackroad-backend service is the canonical production backend for BlackRoad OS. It serves multiple purposes:

REST API gateway (33+ routers)
Static UI hosting (Pocket OS at /)
Health & monitoring endpoints
WebSocket support (planned)

Technology Stack

Language & Framework

Language: Python 3.11+
Framework: FastAPI 0.104.1
ASGI Server: Uvicorn 0.24.0
Async Support: Full async/await with asyncio

Dependencies

Web: FastAPI, Uvicorn, Pydantic 2.5.0
Database: SQLAlchemy 2.0.23 (async), asyncpg, psycopg2-binary
Cache: redis-py 5.0.1, hiredis 2.2.3
Auth: python-jose (JWT), passlib (bcrypt)
Testing: pytest, pytest-asyncio, pytest-cov
Monitoring: prometheus-client, sentry-sdk

Current Endpoints

Core System

GET / → Pocket OS UI (static HTML/CSS/JS)
GET /health → Basic health check
GET /api/health/summary → Comprehensive health with integration status
GET /api/system/version → System version info
GET /api/system/config/public → Public configuration
GET /api/docs → OpenAPI/Swagger UI

Authentication

POST /api/auth/register → User registration
POST /api/auth/login → User login (JWT)
POST /api/auth/refresh → Token refresh
GET /api/auth/me → Current user

Blockchain (RoadChain)

GET /api/blockchain/blocks → List blocks
POST /api/blockchain/blocks → Create block
GET /api/blockchain/verify → Verify chain integrity

Applications

/api/email/* → RoadMail
/api/social/* → BlackRoad Social
/api/video/* → BlackStream
/api/miner/* → Mining operations
/api/dashboard/* → Dashboard data
/api/ai_chat/* → AI/Lucidia integration

Integrations (30+ routers)

See backend/app/main.py for full list of routers.

Infrastructure

Deployment

Platform: Railway
Container: Docker (multi-stage build)
Dockerfile: backend/Dockerfile
Build Context: Repository root
Start Command: uvicorn app.main:app --host 0.0.0.0 --port $PORT

Resources (Current)

Memory: ~512MB
CPU: Shared
Port: Dynamic ($PORT assigned by Railway)

Healthcheck

Path: /health
Interval: 30s
Timeout: 5s
Retries: 3

Database Schema

Models (SQLAlchemy ORM)

Location: backend/app/models/

Core Models:

User → User accounts
Wallet → Blockchain wallets
Block → RoadChain blocks
Transaction → Blockchain transactions
Job → Prism jobs (future)
Event → Audit events

Relationships:

User → Wallets (1:many)
User → Jobs (1:many)
Block → Transactions (1:many)

Migrations

Tool: Alembic 1.12.1
Location: backend/alembic/
Auto-upgrade: Disabled (manual for safety)

Caching Strategy

Redis Usage

Session Storage:
- Key pattern: session:{user_id}
- TTL: 1 hour
API Response Cache:
- Key pattern: api:cache:{endpoint}:{params_hash}
- TTL: 5-60 minutes (varies by endpoint)
WebSocket State (future):
- Pub/sub channels for real-time updates
Rate Limiting:
- Key pattern: ratelimit:{ip}:{endpoint}
- TTL: 1 minute

Security

Authentication

Method: JWT (JSON Web Tokens)
Access Token: 30 minutes expiry
Refresh Token: 7 days expiry
Algorithm: HS256
Secret: SECRET_KEY environment variable

Password Hashing

Algorithm: bcrypt
Rounds: 12 (default)

CORS

Allowed Origins: Configurable via ALLOWED_ORIGINS env var
Default: https://blackroad.systems
Credentials: Allowed

Input Validation

Framework: Pydantic schemas
SQL Injection: Protected (SQLAlchemy ORM)
XSS: Frontend sanitization

Current Issues & Fixes

Recent Deployment Issues (Fixed)

✅ Fixed 2025-11-18: Railway startCommand mismatch

Issue: cd backend && uvicorn ... failed inside container
Fix: Removed override, let Dockerfile CMD handle it

✅ Fixed 2025-11-18: Dockerfile security hardening

Added non-root user
Multi-stage build for smaller image
Health check integrated

Known Limitations

⚠️ Prism Console: Not yet served at /prism (planned) ⚠️ WebSockets: Not yet implemented (planned) ⚠️ GraphQL: REST-only currently

Environment Variables

Critical (Must Set)

DATABASE_URL=${{Postgres.DATABASE_URL}}
REDIS_URL=${{Redis.REDIS_URL}}
SECRET_KEY=<generate>
ENVIRONMENT=production
DEBUG=False
ALLOWED_ORIGINS=https://blackroad.systems

Important

ACCESS_TOKEN_EXPIRE_MINUTES=30
REFRESH_TOKEN_EXPIRE_DAYS=7
WALLET_MASTER_KEY=<generate>
API_BASE_URL=https://blackroad.systems
FRONTEND_URL=https://blackroad.systems

Optional (Features)

OPENAI_API_KEY=sk-...
GITHUB_TOKEN=ghp_...
GITHUB_WEBHOOK_SECRET=<generate>
STRIPE_SECRET_KEY=sk_...
SENTRY_DSN=https://...

Monitoring & Observability

Metrics

Prometheus: Custom metrics at /metrics (planned)
Railway: Built-in CPU, memory, network metrics

Logging

Format: Structured JSON
Level: INFO (production), DEBUG (development)
Destination: stdout → Railway logs

Error Tracking

Sentry: Configured when SENTRY_DSN set
Coverage: All uncaught exceptions

Alerts (Recommended)

Health check failures
Error rate > 5%
Response time p95 > 1s
Memory usage > 80%

Performance

Current Benchmarks

Cold start: ~3-5 seconds
Warm response: ~50-200ms (cached)
Database query: ~10-50ms (simple)
API throughput: ~100 req/s (estimated)

Optimization Opportunities

Add Redis caching for expensive queries
Implement CDN for static assets
Enable Gzip compression
Database connection pooling (already configured)
Async background tasks with Celery

Testing

Test Suite

Location: backend/tests/
Framework: pytest + pytest-asyncio
Coverage: ~60-75% (target: 80%)

Test Types

Unit: Individual function tests
Integration: Database + Redis tests
API: Endpoint contract tests
E2E: Frontend + backend flows (manual)

Running Tests

cd backend
pytest -v --cov=app

Development Workflow

Local Setup

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with local DATABASE_URL, REDIS_URL, SECRET_KEY
uvicorn app.main:app --reload

Docker Setup

cd backend
docker-compose up
# Starts FastAPI, Postgres, Redis, Adminer

Making Changes

Create feature branch: git checkout -b feature/my-feature
Make changes in backend/app/
Add tests in backend/tests/
Run tests: pytest -v
Commit and push
Open PR → CI runs → Auto-merge (if passing)

Deployment Process

Automatic (Recommended)

git checkout main
git pull origin main
# Merge PR via GitHub UI or merge queue
# GitHub Actions triggers Railway deployment
# Monitor Railway dashboard for status

Manual (Emergency)

railway login
railway link <PROJECT_ID>
railway up --service blackroad-backend
railway logs --tail 100

Verification

curl https://blackroad.systems/health
# Should return: {"status": "healthy", "environment": "production"}

curl https://blackroad.systems/api/docs
# Should return Swagger UI HTML

Rollback Procedure

Via Railway Dashboard

Go to Railway → Deployments
Find previous working deployment
Click "Rollback" button
Verify health check passes

Via Git

git revert <bad-commit>
git push origin main
# Wait for auto-deployment

Future Enhancements

Phase 2 (Q1-Q2 2026)

Extract API gateway to separate service
Serve Prism Console at /prism
WebSocket support for real-time updates
GraphQL endpoint for flexible queries

Phase 3 (Q3-Q4 2026)

Microservices architecture
Service mesh (Istio/Linkerd)
Multi-region deployment
Advanced caching with CDN

Dependencies

Runtime Dependencies

Postgres: Required (managed by Railway)
Redis: Required (managed by Railway)
Cloudflare: Optional (for CDN + DNS)

External APIs (Optional)

OpenAI API (for AI features)
GitHub API (for agent integrations)
Stripe API (for payments)
Twilio API (for SMS)

Contact & Support

Primary Operator: Alexa Louise Amundson (Cadillac) AI Assistant: Atlas (Infrastructure), Cece (Engineering) Documentation: See CLAUDE.md, MASTER_ORCHESTRATION_PLAN.md Issues: GitHub Issues in blackboxprogramming/BlackRoad-Operating-System

Analysis Date: 2025-11-19 Next Review: 2025-12-19 Maintained By: Atlas (AI Infrastructure Orchestrator)

8.9 KiB Raw Blame History

Service Analysis: blackroad-backend

Overview

Technology Stack

Language & Framework

Dependencies

Current Endpoints

Core System

Authentication

Blockchain (RoadChain)

Applications

Integrations (30+ routers)

Infrastructure

Deployment

Resources (Current)

Healthcheck

Database Schema

Models (SQLAlchemy ORM)

Migrations

Caching Strategy

Redis Usage

Security

Authentication

Password Hashing

CORS

Input Validation

Current Issues & Fixes

Recent Deployment Issues (Fixed)

Known Limitations

Environment Variables

Critical (Must Set)

Important

Optional (Features)

Monitoring & Observability

Metrics

Logging

Error Tracking

Alerts (Recommended)

Performance

Current Benchmarks

Optimization Opportunities

Testing

Test Suite

Test Types

Running Tests

Development Workflow

Local Setup

Docker Setup

Making Changes

Deployment Process

Automatic (Recommended)

Manual (Emergency)

Verification

Rollback Procedure

Via Railway Dashboard

Via Git

Future Enhancements

Phase 2 (Q1-Q2 2026)

Phase 3 (Q3-Q4 2026)

Dependencies

Runtime Dependencies

External APIs (Optional)

Contact & Support

8.9 KiB

Raw Blame History