Establish BlackRoad OS infrastructure control plane

Add comprehensive infrastructure management system to centralize all service definitions, deployment configurations, and operational tooling. ## New Infrastructure Components ### 1. Service Manifest (infra/blackroad-manifest.yml) - Complete catalog of all active and planned services - Deployment configuration for each service - Environment variable definitions - Domain mappings and routing - Database and cache dependencies - Health check endpoints - CI/CD integration specifications ### 2. Operations CLI (scripts/br_ops.py) - Command-line tool for managing all BlackRoad services - Commands: list, env, repo, open, status, health - Reads from service manifest for unified operations - Colored terminal output for better readability ### 3. Service Analysis Documents (infra/analysis/) - Detailed technical analysis for each service - Active services: - blackroad-backend.md (FastAPI backend) - postgres.md (PostgreSQL database) - redis.md (Redis cache) - docs-site.md (MkDocs documentation) - Planned services: - blackroad-api.md (API gateway - Phase 2) - prism-console.md (Admin console - Phase 2) ### 4. Infrastructure Templates (infra/templates/) - railway.toml.template - Railway deployment config - railway.json.template - Alternative Railway config - Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile - github-workflow-railway-deploy.yml.template - CI/CD workflow - .env.example.template - Comprehensive env var template ### 5. Documentation (infra/README.md) - Complete guide to infrastructure control plane - Usage instructions for ops CLI - Service manifest documentation - Deployment procedures - Troubleshooting guide - Phase 2 migration plan ## Architecture This establishes BlackRoad-Operating-System as the canonical control plane for all BlackRoad services, both current and planned: **Phase 1 (Active)**: - blackroad-backend (FastAPI + static UI) - postgres (Railway managed) - redis (Railway managed) - docs-site (GitHub Pages) **Phase 2 (Planned)**: - blackroad-api (API gateway) - blackroad-prism-console (Admin UI) - blackroad-agents (Orchestration) - blackroad-web (Marketing site) **Phase 3 (Future)**: - lucidia (AI orchestration) - Additional microservices ## Usage # List all services python scripts/br_ops.py list # Show environment variables python scripts/br_ops.py env blackroad-backend # Show repository info python scripts/br_ops.py repo blackroad-backend # Show service URL python scripts/br_ops.py open blackroad-backend prod # Show overall status python scripts/br_ops.py status # Show health checks python scripts/br_ops.py health blackroad-backend ## Benefits 1. **Single Source of Truth**: All service configuration in one manifest 2. **Unified Operations**: One CLI for all services 3. **Documentation**: Comprehensive per-service analysis 4. **Templates**: Reusable infrastructure patterns 5. **Migration Ready**: Clear path to Phase 2 microservices ## References - MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture - ORG_STRUCTURE.md - Repository strategy - PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state Implemented by: Atlas (AI Infrastructure Orchestrator) Date: 2025-11-19
2026-03-17 05:57:21 -05:00 · 2025-11-19 21:04:14 +00:00
parent 2610c3a07f
commit abdbc764e6
14 changed files with 3090 additions and 0 deletions
--- a/infra/analysis/redis.md
+++ b/infra/analysis/redis.md
@@ -0,0 +1,163 @@
+# Service Analysis: Redis
+
+**Status**: ✅ ACTIVE (Production)
+**Last Analyzed**: 2025-11-19
+**Service Type**: Managed Cache (Redis)
+**Provider**: Railway
+
+---
+
+## Overview
+
+Managed Redis cache service provided by Railway. Used for session storage, API caching, WebSocket state, and pub/sub messaging.
+
+---
+
+## Configuration
+
+### Version
+- **Redis**: 7+ (Railway managed)
+
+### Resources
+- **Memory**: 256MB (default, configurable)
+- **Eviction Policy**: `allkeys-lru` (least recently used)
+- **Persistence**: RDB snapshots (Railway managed)
+
+---
+
+## Usage Patterns
+
+### Session Storage
+```python
+# Store session
+await redis.setex(
+    f"session:{user_id}",
+    3600,  # 1 hour TTL
+    json.dumps(session_data)
+)
+
+# Retrieve session
+session_json = await redis.get(f"session:{user_id}")
+```
+
+### API Response Caching
+```python
+# Cache API response
+cache_key = f"api:cache:{endpoint}:{params_hash}"
+await redis.setex(cache_key, 300, json.dumps(response))  # 5 min TTL
+
+# Retrieve cached response
+cached = await redis.get(cache_key)
+```
+
+### WebSocket State (Planned)
+```python
+# Pub/sub for real-time updates
+await redis.publish("prism:events", json.dumps(event))
+
+# Subscribe to events
+pubsub = redis.pubsub()
+await pubsub.subscribe("prism:events")
+```
+
+### Rate Limiting
+```python
+# Track API rate limits
+key = f"ratelimit:{ip}:{endpoint}"
+count = await redis.incr(key)
+if count == 1:
+    await redis.expire(key, 60)  # 1 minute window
+```
+
+---
+
+## Key Namespaces
+
+| Namespace | Pattern | TTL | Purpose |
+|-----------|---------|-----|---------|
+| `session:*` | `session:{user_id}` | 1 hour | User session data |
+| `api:cache:*` | `api:cache:{endpoint}:{hash}` | 5-60 min | Cached API responses |
+| `ratelimit:*` | `ratelimit:{ip}:{endpoint}` | 1 min | Rate limit counters |
+| `websocket:*` | `websocket:{connection_id}` | Variable | WebSocket state |
+| `prism:*` | Pub/sub channels | N/A | Event bus |
+
+---
+
+## Performance
+
+### Metrics
+- **Hit Rate**: Target > 80%
+- **Latency**: < 1ms (local network)
+- **Memory Usage**: Monitor for evictions
+
+### Optimization
+- Use pipelining for bulk operations
+- Implement connection pooling
+- Use hiredis for faster parsing
+- Monitor key expiration patterns
+
+---
+
+## Monitoring
+
+### Metrics (Railway Dashboard)
+- Memory usage
+- Connections count
+- Commands per second
+- Evicted keys
+
+### Alerts (Recommended)
+- Memory > 90% full
+- High eviction rate
+- Connection errors
+
+---
+
+## Connection Details
+
+### Environment Variable
+```bash
+REDIS_URL=${{Redis.REDIS_URL}}
+# Format: redis://host:port/db
+```
+
+### Connection Pool
+```python
+# backend/app/redis_client.py
+redis = await aioredis.create_redis_pool(
+    REDIS_URL,
+    minsize=5,
+    maxsize=10,
+    encoding='utf-8'
+)
+```
+
+---
+
+## Security
+
+### Access Control
+- **Network**: Private Railway network only
+- **Credentials**: Auto-generated, injected via `${{Redis.REDIS_URL}}`
+- **Encryption**: TLS optional (not required on private network)
+
+---
+
+## Troubleshooting
+
+### Connection Issues
+1. Verify `REDIS_URL` is set
+2. Check Redis service status
+3. Verify network connectivity
+4. Check connection pool exhaustion
+
+### Memory Issues
+1. Check eviction metrics
+2. Analyze key distribution
+3. Adjust TTLs for less critical data
+4. Upgrade Redis plan if needed
+
+---
+
+*Analysis Date: 2025-11-19*
+*Next Review: 2025-12-19*