Files
blackroad-operating-system/infra/analysis/redis.md
Claude abdbc764e6 Establish BlackRoad OS infrastructure control plane
Add comprehensive infrastructure management system to centralize all service
definitions, deployment configurations, and operational tooling.

## New Infrastructure Components

### 1. Service Manifest (infra/blackroad-manifest.yml)
- Complete catalog of all active and planned services
- Deployment configuration for each service
- Environment variable definitions
- Domain mappings and routing
- Database and cache dependencies
- Health check endpoints
- CI/CD integration specifications

### 2. Operations CLI (scripts/br_ops.py)
- Command-line tool for managing all BlackRoad services
- Commands: list, env, repo, open, status, health
- Reads from service manifest for unified operations
- Colored terminal output for better readability

### 3. Service Analysis Documents (infra/analysis/)
- Detailed technical analysis for each service
- Active services:
  - blackroad-backend.md (FastAPI backend)
  - postgres.md (PostgreSQL database)
  - redis.md (Redis cache)
  - docs-site.md (MkDocs documentation)
- Planned services:
  - blackroad-api.md (API gateway - Phase 2)
  - prism-console.md (Admin console - Phase 2)

### 4. Infrastructure Templates (infra/templates/)
- railway.toml.template - Railway deployment config
- railway.json.template - Alternative Railway config
- Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile
- github-workflow-railway-deploy.yml.template - CI/CD workflow
- .env.example.template - Comprehensive env var template

### 5. Documentation (infra/README.md)
- Complete guide to infrastructure control plane
- Usage instructions for ops CLI
- Service manifest documentation
- Deployment procedures
- Troubleshooting guide
- Phase 2 migration plan

## Architecture

This establishes BlackRoad-Operating-System as the canonical control plane
for all BlackRoad services, both current and planned:

**Phase 1 (Active)**:
- blackroad-backend (FastAPI + static UI)
- postgres (Railway managed)
- redis (Railway managed)
- docs-site (GitHub Pages)

**Phase 2 (Planned)**:
- blackroad-api (API gateway)
- blackroad-prism-console (Admin UI)
- blackroad-agents (Orchestration)
- blackroad-web (Marketing site)

**Phase 3 (Future)**:
- lucidia (AI orchestration)
- Additional microservices

## Usage

# List all services
python scripts/br_ops.py list

# Show environment variables
python scripts/br_ops.py env blackroad-backend

# Show repository info
python scripts/br_ops.py repo blackroad-backend

# Show service URL
python scripts/br_ops.py open blackroad-backend prod

# Show overall status
python scripts/br_ops.py status

# Show health checks
python scripts/br_ops.py health blackroad-backend

## Benefits

1. **Single Source of Truth**: All service configuration in one manifest
2. **Unified Operations**: One CLI for all services
3. **Documentation**: Comprehensive per-service analysis
4. **Templates**: Reusable infrastructure patterns
5. **Migration Ready**: Clear path to Phase 2 microservices

## References

- MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture
- ORG_STRUCTURE.md - Repository strategy
- PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state

Implemented by: Atlas (AI Infrastructure Orchestrator)
Date: 2025-11-19
2025-11-19 21:04:14 +00:00

3.2 KiB

Service Analysis: Redis

Status: ACTIVE (Production) Last Analyzed: 2025-11-19 Service Type: Managed Cache (Redis) Provider: Railway


Overview

Managed Redis cache service provided by Railway. Used for session storage, API caching, WebSocket state, and pub/sub messaging.


Configuration

Version

  • Redis: 7+ (Railway managed)

Resources

  • Memory: 256MB (default, configurable)
  • Eviction Policy: allkeys-lru (least recently used)
  • Persistence: RDB snapshots (Railway managed)

Usage Patterns

Session Storage

# Store session
await redis.setex(
    f"session:{user_id}",
    3600,  # 1 hour TTL
    json.dumps(session_data)
)

# Retrieve session
session_json = await redis.get(f"session:{user_id}")

API Response Caching

# Cache API response
cache_key = f"api:cache:{endpoint}:{params_hash}"
await redis.setex(cache_key, 300, json.dumps(response))  # 5 min TTL

# Retrieve cached response
cached = await redis.get(cache_key)

WebSocket State (Planned)

# Pub/sub for real-time updates
await redis.publish("prism:events", json.dumps(event))

# Subscribe to events
pubsub = redis.pubsub()
await pubsub.subscribe("prism:events")

Rate Limiting

# Track API rate limits
key = f"ratelimit:{ip}:{endpoint}"
count = await redis.incr(key)
if count == 1:
    await redis.expire(key, 60)  # 1 minute window

Key Namespaces

Namespace Pattern TTL Purpose
session:* session:{user_id} 1 hour User session data
api:cache:* api:cache:{endpoint}:{hash} 5-60 min Cached API responses
ratelimit:* ratelimit:{ip}:{endpoint} 1 min Rate limit counters
websocket:* websocket:{connection_id} Variable WebSocket state
prism:* Pub/sub channels N/A Event bus

Performance

Metrics

  • Hit Rate: Target > 80%
  • Latency: < 1ms (local network)
  • Memory Usage: Monitor for evictions

Optimization

  • Use pipelining for bulk operations
  • Implement connection pooling
  • Use hiredis for faster parsing
  • Monitor key expiration patterns

Monitoring

Metrics (Railway Dashboard)

  • Memory usage
  • Connections count
  • Commands per second
  • Evicted keys
  • Memory > 90% full
  • High eviction rate
  • Connection errors

Connection Details

Environment Variable

REDIS_URL=${{Redis.REDIS_URL}}
# Format: redis://host:port/db

Connection Pool

# backend/app/redis_client.py
redis = await aioredis.create_redis_pool(
    REDIS_URL,
    minsize=5,
    maxsize=10,
    encoding='utf-8'
)

Security

Access Control

  • Network: Private Railway network only
  • Credentials: Auto-generated, injected via ${{Redis.REDIS_URL}}
  • Encryption: TLS optional (not required on private network)

Troubleshooting

Connection Issues

  1. Verify REDIS_URL is set
  2. Check Redis service status
  3. Verify network connectivity
  4. Check connection pool exhaustion

Memory Issues

  1. Check eviction metrics
  2. Analyze key distribution
  3. Adjust TTLs for less critical data
  4. Upgrade Redis plan if needed

Analysis Date: 2025-11-19 Next Review: 2025-12-19