mirror of
https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
synced 2026-03-17 05:57:21 -05:00
Establish BlackRoad OS infrastructure control plane
Add comprehensive infrastructure management system to centralize all service definitions, deployment configurations, and operational tooling. ## New Infrastructure Components ### 1. Service Manifest (infra/blackroad-manifest.yml) - Complete catalog of all active and planned services - Deployment configuration for each service - Environment variable definitions - Domain mappings and routing - Database and cache dependencies - Health check endpoints - CI/CD integration specifications ### 2. Operations CLI (scripts/br_ops.py) - Command-line tool for managing all BlackRoad services - Commands: list, env, repo, open, status, health - Reads from service manifest for unified operations - Colored terminal output for better readability ### 3. Service Analysis Documents (infra/analysis/) - Detailed technical analysis for each service - Active services: - blackroad-backend.md (FastAPI backend) - postgres.md (PostgreSQL database) - redis.md (Redis cache) - docs-site.md (MkDocs documentation) - Planned services: - blackroad-api.md (API gateway - Phase 2) - prism-console.md (Admin console - Phase 2) ### 4. Infrastructure Templates (infra/templates/) - railway.toml.template - Railway deployment config - railway.json.template - Alternative Railway config - Dockerfile.fastapi.template - Multi-stage FastAPI Dockerfile - github-workflow-railway-deploy.yml.template - CI/CD workflow - .env.example.template - Comprehensive env var template ### 5. Documentation (infra/README.md) - Complete guide to infrastructure control plane - Usage instructions for ops CLI - Service manifest documentation - Deployment procedures - Troubleshooting guide - Phase 2 migration plan ## Architecture This establishes BlackRoad-Operating-System as the canonical control plane for all BlackRoad services, both current and planned: **Phase 1 (Active)**: - blackroad-backend (FastAPI + static UI) - postgres (Railway managed) - redis (Railway managed) - docs-site (GitHub Pages) **Phase 2 (Planned)**: - blackroad-api (API gateway) - blackroad-prism-console (Admin UI) - blackroad-agents (Orchestration) - blackroad-web (Marketing site) **Phase 3 (Future)**: - lucidia (AI orchestration) - Additional microservices ## Usage # List all services python scripts/br_ops.py list # Show environment variables python scripts/br_ops.py env blackroad-backend # Show repository info python scripts/br_ops.py repo blackroad-backend # Show service URL python scripts/br_ops.py open blackroad-backend prod # Show overall status python scripts/br_ops.py status # Show health checks python scripts/br_ops.py health blackroad-backend ## Benefits 1. **Single Source of Truth**: All service configuration in one manifest 2. **Unified Operations**: One CLI for all services 3. **Documentation**: Comprehensive per-service analysis 4. **Templates**: Reusable infrastructure patterns 5. **Migration Ready**: Clear path to Phase 2 microservices ## References - MASTER_ORCHESTRATION_PLAN.md - 7-layer architecture - ORG_STRUCTURE.md - Repository strategy - PRODUCTION_STACK_AUDIT_2025-11-18.md - Current state Implemented by: Atlas (AI Infrastructure Orchestrator) Date: 2025-11-19
This commit is contained in:
163
infra/analysis/redis.md
Normal file
163
infra/analysis/redis.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Service Analysis: Redis
|
||||
|
||||
**Status**: ✅ ACTIVE (Production)
|
||||
**Last Analyzed**: 2025-11-19
|
||||
**Service Type**: Managed Cache (Redis)
|
||||
**Provider**: Railway
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Managed Redis cache service provided by Railway. Used for session storage, API caching, WebSocket state, and pub/sub messaging.
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Version
|
||||
- **Redis**: 7+ (Railway managed)
|
||||
|
||||
### Resources
|
||||
- **Memory**: 256MB (default, configurable)
|
||||
- **Eviction Policy**: `allkeys-lru` (least recently used)
|
||||
- **Persistence**: RDB snapshots (Railway managed)
|
||||
|
||||
---
|
||||
|
||||
## Usage Patterns
|
||||
|
||||
### Session Storage
|
||||
```python
|
||||
# Store session
|
||||
await redis.setex(
|
||||
f"session:{user_id}",
|
||||
3600, # 1 hour TTL
|
||||
json.dumps(session_data)
|
||||
)
|
||||
|
||||
# Retrieve session
|
||||
session_json = await redis.get(f"session:{user_id}")
|
||||
```
|
||||
|
||||
### API Response Caching
|
||||
```python
|
||||
# Cache API response
|
||||
cache_key = f"api:cache:{endpoint}:{params_hash}"
|
||||
await redis.setex(cache_key, 300, json.dumps(response)) # 5 min TTL
|
||||
|
||||
# Retrieve cached response
|
||||
cached = await redis.get(cache_key)
|
||||
```
|
||||
|
||||
### WebSocket State (Planned)
|
||||
```python
|
||||
# Pub/sub for real-time updates
|
||||
await redis.publish("prism:events", json.dumps(event))
|
||||
|
||||
# Subscribe to events
|
||||
pubsub = redis.pubsub()
|
||||
await pubsub.subscribe("prism:events")
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
```python
|
||||
# Track API rate limits
|
||||
key = f"ratelimit:{ip}:{endpoint}"
|
||||
count = await redis.incr(key)
|
||||
if count == 1:
|
||||
await redis.expire(key, 60) # 1 minute window
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Namespaces
|
||||
|
||||
| Namespace | Pattern | TTL | Purpose |
|
||||
|-----------|---------|-----|---------|
|
||||
| `session:*` | `session:{user_id}` | 1 hour | User session data |
|
||||
| `api:cache:*` | `api:cache:{endpoint}:{hash}` | 5-60 min | Cached API responses |
|
||||
| `ratelimit:*` | `ratelimit:{ip}:{endpoint}` | 1 min | Rate limit counters |
|
||||
| `websocket:*` | `websocket:{connection_id}` | Variable | WebSocket state |
|
||||
| `prism:*` | Pub/sub channels | N/A | Event bus |
|
||||
|
||||
---
|
||||
|
||||
## Performance
|
||||
|
||||
### Metrics
|
||||
- **Hit Rate**: Target > 80%
|
||||
- **Latency**: < 1ms (local network)
|
||||
- **Memory Usage**: Monitor for evictions
|
||||
|
||||
### Optimization
|
||||
- Use pipelining for bulk operations
|
||||
- Implement connection pooling
|
||||
- Use hiredis for faster parsing
|
||||
- Monitor key expiration patterns
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Metrics (Railway Dashboard)
|
||||
- Memory usage
|
||||
- Connections count
|
||||
- Commands per second
|
||||
- Evicted keys
|
||||
|
||||
### Alerts (Recommended)
|
||||
- Memory > 90% full
|
||||
- High eviction rate
|
||||
- Connection errors
|
||||
|
||||
---
|
||||
|
||||
## Connection Details
|
||||
|
||||
### Environment Variable
|
||||
```bash
|
||||
REDIS_URL=${{Redis.REDIS_URL}}
|
||||
# Format: redis://host:port/db
|
||||
```
|
||||
|
||||
### Connection Pool
|
||||
```python
|
||||
# backend/app/redis_client.py
|
||||
redis = await aioredis.create_redis_pool(
|
||||
REDIS_URL,
|
||||
minsize=5,
|
||||
maxsize=10,
|
||||
encoding='utf-8'
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### Access Control
|
||||
- **Network**: Private Railway network only
|
||||
- **Credentials**: Auto-generated, injected via `${{Redis.REDIS_URL}}`
|
||||
- **Encryption**: TLS optional (not required on private network)
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Connection Issues
|
||||
1. Verify `REDIS_URL` is set
|
||||
2. Check Redis service status
|
||||
3. Verify network connectivity
|
||||
4. Check connection pool exhaustion
|
||||
|
||||
### Memory Issues
|
||||
1. Check eviction metrics
|
||||
2. Analyze key distribution
|
||||
3. Adjust TTLs for less critical data
|
||||
4. Upgrade Redis plan if needed
|
||||
|
||||
---
|
||||
|
||||
*Analysis Date: 2025-11-19*
|
||||
*Next Review: 2025-12-19*
|
||||
Reference in New Issue
Block a user