Establish BlackRoad OS infrastructure control plane

Add comprehensive infrastructure management system to centralize all service
definitions, deployment configurations, and operational tooling.

## New Infrastructure Components

### 1. Service Manifest (infra/blackroad-manifest.yml)
- Complete catalog of all active and planned services
- Deployment configuration for each service
- Environment variable definitions
- Domain mappings and routing
- Database and cache dependencies
- Health check endpoints
- CI/CD integration specifications

### 2. Operations CLI (scripts/br_ops.py)
- Command-line tool for managing all BlackRoad services
- Commands: list, env, repo, open, status, health
- Reads from service manifest for unified operations
- Colored terminal output for better readability

### 3. Service Analysis Documents (infra/analysis/)
- Detailed technical analysis for each service
- Active services:
  - blackroad-backend.md (FastAPI backend)
  - postgres.md (PostgreSQL database)
  - redis.md (Redis cache)
  - docs-site.md (MkDocs documentation)
- Planned services:
  - blackroad-api.md (API gateway - Phase 2)
  - prism-console.md (Admin console - Phase 2)

### 4. Infrastructure Templates (infra/templates/)
- railway.toml.template - Railway deployment config
- railway.json.template - Alternative Railway config
- Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile
- github-workflow-railway-deploy.yml.template - CI/CD workflow
- .env.example.template - Comprehensive env var template

### 5. Documentation (infra/README.md)
- Complete guide to infrastructure control plane
- Usage instructions for ops CLI
- Service manifest documentation
- Deployment procedures
- Troubleshooting guide
- Phase 2 migration plan

## Architecture

This establishes BlackRoad-Operating-System as the canonical control plane
for all BlackRoad services, both current and planned:

**Phase 1 (Active)**:
- blackroad-backend (FastAPI + static UI)
- postgres (Railway managed)
- redis (Railway managed)
- docs-site (GitHub Pages)

**Phase 2 (Planned)**:
- blackroad-api (API gateway)
- blackroad-prism-console (Admin UI)
- blackroad-agents (Orchestration)
- blackroad-web (Marketing site)

**Phase 3 (Future)**:
- lucidia (AI orchestration)
- Additional microservices

## Usage

# List all services
python scripts/br_ops.py list

# Show environment variables
python scripts/br_ops.py env blackroad-backend

# Show repository info
python scripts/br_ops.py repo blackroad-backend

# Show service URL
python scripts/br_ops.py open blackroad-backend prod

# Show overall status
python scripts/br_ops.py status

# Show health checks
python scripts/br_ops.py health blackroad-backend

## Benefits

1. **Single Source of Truth**: All service configuration in one manifest
2. **Unified Operations**: One CLI for all services
3. **Documentation**: Comprehensive per-service analysis
4. **Templates**: Reusable infrastructure patterns
5. **Migration Ready**: Clear path to Phase 2 microservices

## References

- MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture
- ORG_STRUCTURE.md - Repository strategy
- PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state

Implemented by: Atlas (AI Infrastructure Orchestrator)
Date: 2025-11-19
This commit is contained in:
Claude
2025-11-19 21:04:14 +00:00
parent 2610c3a07f
commit abdbc764e6
14 changed files with 3090 additions and 0 deletions

163
infra/analysis/redis.md Normal file
View File

@@ -0,0 +1,163 @@
# Service Analysis: Redis
**Status**: ✅ ACTIVE (Production)
**Last Analyzed**: 2025-11-19
**Service Type**: Managed Cache (Redis)
**Provider**: Railway
---
## Overview
Managed Redis cache service provided by Railway. Used for session storage, API caching, WebSocket state, and pub/sub messaging.
---
## Configuration
### Version
- **Redis**: 7+ (Railway managed)
### Resources
- **Memory**: 256MB (default, configurable)
- **Eviction Policy**: `allkeys-lru` (least recently used)
- **Persistence**: RDB snapshots (Railway managed)
---
## Usage Patterns
### Session Storage
```python
# Store session
await redis.setex(
f"session:{user_id}",
3600, # 1 hour TTL
json.dumps(session_data)
)
# Retrieve session
session_json = await redis.get(f"session:{user_id}")
```
### API Response Caching
```python
# Cache API response
cache_key = f"api:cache:{endpoint}:{params_hash}"
await redis.setex(cache_key, 300, json.dumps(response)) # 5 min TTL
# Retrieve cached response
cached = await redis.get(cache_key)
```
### WebSocket State (Planned)
```python
# Pub/sub for real-time updates
await redis.publish("prism:events", json.dumps(event))
# Subscribe to events
pubsub = redis.pubsub()
await pubsub.subscribe("prism:events")
```
### Rate Limiting
```python
# Track API rate limits
key = f"ratelimit:{ip}:{endpoint}"
count = await redis.incr(key)
if count == 1:
await redis.expire(key, 60) # 1 minute window
```
---
## Key Namespaces
| Namespace | Pattern | TTL | Purpose |
|-----------|---------|-----|---------|
| `session:*` | `session:{user_id}` | 1 hour | User session data |
| `api:cache:*` | `api:cache:{endpoint}:{hash}` | 5-60 min | Cached API responses |
| `ratelimit:*` | `ratelimit:{ip}:{endpoint}` | 1 min | Rate limit counters |
| `websocket:*` | `websocket:{connection_id}` | Variable | WebSocket state |
| `prism:*` | Pub/sub channels | N/A | Event bus |
---
## Performance
### Metrics
- **Hit Rate**: Target > 80%
- **Latency**: < 1ms (local network)
- **Memory Usage**: Monitor for evictions
### Optimization
- Use pipelining for bulk operations
- Implement connection pooling
- Use hiredis for faster parsing
- Monitor key expiration patterns
---
## Monitoring
### Metrics (Railway Dashboard)
- Memory usage
- Connections count
- Commands per second
- Evicted keys
### Alerts (Recommended)
- Memory > 90% full
- High eviction rate
- Connection errors
---
## Connection Details
### Environment Variable
```bash
REDIS_URL=${{Redis.REDIS_URL}}
# Format: redis://host:port/db
```
### Connection Pool
```python
# backend/app/redis_client.py
redis = await aioredis.create_redis_pool(
REDIS_URL,
minsize=5,
maxsize=10,
encoding='utf-8'
)
```
---
## Security
### Access Control
- **Network**: Private Railway network only
- **Credentials**: Auto-generated, injected via `${{Redis.REDIS_URL}}`
- **Encryption**: TLS optional (not required on private network)
---
## Troubleshooting
### Connection Issues
1. Verify `REDIS_URL` is set
2. Check Redis service status
3. Verify network connectivity
4. Check connection pool exhaustion
### Memory Issues
1. Check eviction metrics
2. Analyze key distribution
3. Adjust TTLs for less critical data
4. Upgrade Redis plan if needed
---
*Analysis Date: 2025-11-19*
*Next Review: 2025-12-19*