mirror of
https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
synced 2026-03-17 07:57:19 -05:00
Add comprehensive infrastructure management system to centralize all service definitions, deployment configurations, and operational tooling. ## New Infrastructure Components ### 1. Service Manifest (infra/blackroad-manifest.yml) - Complete catalog of all active and planned services - Deployment configuration for each service - Environment variable definitions - Domain mappings and routing - Database and cache dependencies - Health check endpoints - CI/CD integration specifications ### 2. Operations CLI (scripts/br_ops.py) - Command-line tool for managing all BlackRoad services - Commands: list, env, repo, open, status, health - Reads from service manifest for unified operations - Colored terminal output for better readability ### 3. Service Analysis Documents (infra/analysis/) - Detailed technical analysis for each service - Active services: - blackroad-backend.md (FastAPI backend) - postgres.md (PostgreSQL database) - redis.md (Redis cache) - docs-site.md (MkDocs documentation) - Planned services: - blackroad-api.md (API gateway - Phase 2) - prism-console.md (Admin console - Phase 2) ### 4. Infrastructure Templates (infra/templates/) - railway.toml.template - Railway deployment config - railway.json.template - Alternative Railway config - Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile - github-workflow-railway-deploy.yml.template - CI/CD workflow - .env.example.template - Comprehensive env var template ### 5. Documentation (infra/README.md) - Complete guide to infrastructure control plane - Usage instructions for ops CLI - Service manifest documentation - Deployment procedures - Troubleshooting guide - Phase 2 migration plan ## Architecture This establishes BlackRoad-Operating-System as the canonical control plane for all BlackRoad services, both current and planned: **Phase 1 (Active)**: - blackroad-backend (FastAPI + static UI) - postgres (Railway managed) - redis (Railway managed) - docs-site (GitHub Pages) **Phase 2 (Planned)**: - blackroad-api (API gateway) - blackroad-prism-console (Admin UI) - blackroad-agents (Orchestration) - blackroad-web (Marketing site) **Phase 3 (Future)**: - lucidia (AI orchestration) - Additional microservices ## Usage # List all services python scripts/br_ops.py list # Show environment variables python scripts/br_ops.py env blackroad-backend # Show repository info python scripts/br_ops.py repo blackroad-backend # Show service URL python scripts/br_ops.py open blackroad-backend prod # Show overall status python scripts/br_ops.py status # Show health checks python scripts/br_ops.py health blackroad-backend ## Benefits 1. **Single Source of Truth**: All service configuration in one manifest 2. **Unified Operations**: One CLI for all services 3. **Documentation**: Comprehensive per-service analysis 4. **Templates**: Reusable infrastructure patterns 5. **Migration Ready**: Clear path to Phase 2 microservices ## References - MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture - ORG_STRUCTURE.md - Repository strategy - PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state Implemented by: Atlas (AI Infrastructure Orchestrator) Date: 2025-11-19
3.2 KiB
3.2 KiB
Service Analysis: Redis
Status: ✅ ACTIVE (Production) Last Analyzed: 2025-11-19 Service Type: Managed Cache (Redis) Provider: Railway
Overview
Managed Redis cache service provided by Railway. Used for session storage, API caching, WebSocket state, and pub/sub messaging.
Configuration
Version
- Redis: 7+ (Railway managed)
Resources
- Memory: 256MB (default, configurable)
- Eviction Policy:
allkeys-lru(least recently used) - Persistence: RDB snapshots (Railway managed)
Usage Patterns
Session Storage
# Store session
await redis.setex(
f"session:{user_id}",
3600, # 1 hour TTL
json.dumps(session_data)
)
# Retrieve session
session_json = await redis.get(f"session:{user_id}")
API Response Caching
# Cache API response
cache_key = f"api:cache:{endpoint}:{params_hash}"
await redis.setex(cache_key, 300, json.dumps(response)) # 5 min TTL
# Retrieve cached response
cached = await redis.get(cache_key)
WebSocket State (Planned)
# Pub/sub for real-time updates
await redis.publish("prism:events", json.dumps(event))
# Subscribe to events
pubsub = redis.pubsub()
await pubsub.subscribe("prism:events")
Rate Limiting
# Track API rate limits
key = f"ratelimit:{ip}:{endpoint}"
count = await redis.incr(key)
if count == 1:
await redis.expire(key, 60) # 1 minute window
Key Namespaces
| Namespace | Pattern | TTL | Purpose |
|---|---|---|---|
session:* |
session:{user_id} |
1 hour | User session data |
api:cache:* |
api:cache:{endpoint}:{hash} |
5-60 min | Cached API responses |
ratelimit:* |
ratelimit:{ip}:{endpoint} |
1 min | Rate limit counters |
websocket:* |
websocket:{connection_id} |
Variable | WebSocket state |
prism:* |
Pub/sub channels | N/A | Event bus |
Performance
Metrics
- Hit Rate: Target > 80%
- Latency: < 1ms (local network)
- Memory Usage: Monitor for evictions
Optimization
- Use pipelining for bulk operations
- Implement connection pooling
- Use hiredis for faster parsing
- Monitor key expiration patterns
Monitoring
Metrics (Railway Dashboard)
- Memory usage
- Connections count
- Commands per second
- Evicted keys
Alerts (Recommended)
- Memory > 90% full
- High eviction rate
- Connection errors
Connection Details
Environment Variable
REDIS_URL=${{Redis.REDIS_URL}}
# Format: redis://host:port/db
Connection Pool
# backend/app/redis_client.py
redis = await aioredis.create_redis_pool(
REDIS_URL,
minsize=5,
maxsize=10,
encoding='utf-8'
)
Security
Access Control
- Network: Private Railway network only
- Credentials: Auto-generated, injected via
${{Redis.REDIS_URL}} - Encryption: TLS optional (not required on private network)
Troubleshooting
Connection Issues
- Verify
REDIS_URLis set - Check Redis service status
- Verify network connectivity
- Check connection pool exhaustion
Memory Issues
- Check eviction metrics
- Analyze key distribution
- Adjust TTLs for less critical data
- Upgrade Redis plan if needed
Analysis Date: 2025-11-19 Next Review: 2025-12-19