mirror of
https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
synced 2026-03-17 05:57:21 -05:00
## Domain Architecture - Complete domain-to-service mapping for 16 verified domains - Subdomain architecture for blackroad.systems and blackroad.io - GitHub organization mapping (BlackRoad-OS repos) - Railway service-to-domain configuration - DNS configuration templates for Cloudflare ## Extracted Services ### AIops Service (services/aiops/) - Canary analysis for deployment validation - Config drift detection - Event correlation engine - Auto-remediation with runbook mapping - SLO budget management ### Analytics Service (services/analytics/) - Rule-based anomaly detection with safe expression evaluation - Cohort analysis with multi-metric aggregation - Decision engine with credit budget constraints - Narrative report generation ### Codex Governance (services/codex/) - 82+ governance principles (entries) - Codex Pantheon with 48+ agent archetypes - Manifesto defining ethical framework ## Integration Points - AIops → infra.blackroad.systems (blackroad-os-infra) - Analytics → core.blackroad.systems (blackroad-os-core) - Codex → operator.blackroad.systems (blackroad-os-operator) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
BlackRoad OS - AIops Service
AI-powered operations and automated incident response
Overview
The AIops service provides intelligent operational monitoring, automated remediation, and SLO budget management for BlackRoad OS infrastructure.
Features
Canary Analysis (canary.py)
- Compare metric snapshots between baseline and canary deployments
- Configurable thresholds for latency (p50, p95) and error rates
- Automatic pass/fail determination
- Artifact generation for audit trails
Config Drift Detection (config_drift.py)
- Detect configuration changes across environments
- Severity-based alerting (critical/warning)
- Baseline comparison for compliance
Event Correlation (correlation.py)
- Rule-based event correlation engine
- Multi-source event integration (incidents, healthchecks, changes, anomalies)
- Time-window based pattern matching
- Root cause identification
Auto-Remediation (remediation.py)
- Runbook-based automated responses
- Maintenance window enforcement
- Dry-run execution support
- Execution blocking for safety
SLO Budget Management (slo_budget.py)
- Error budget calculation and tracking
- Budget state management (ok/warn/burning)
- Alert generation for budget exhaustion
Configuration
Config files are expected at:
configs/aiops/
├── canary.yaml # Canary analysis thresholds
├── correlation.yaml # Correlation rules
└── maintenance.yaml # Maintenance windows
Usage
from services.aiops import canary, remediation, slo_budget
# Run canary analysis
result = canary.analyze(base_path, canary_path)
# Check SLO budget
status = slo_budget.budget_status("api-gateway", "7d")
# Plan remediation
plan = remediation.plan(correlations)
Integration
Railway Deployment
- Service:
blackroad-os-infra - Domain:
infra.blackroad.systems - Health:
GET /health
Endpoints
POST /v1/aiops/canary- Run canary analysisPOST /v1/aiops/correlate- Correlate eventsPOST /v1/aiops/remediate- Execute remediationGET /v1/aiops/slo/:service- Get SLO budget status
Artifacts
All operations generate artifacts in:
artifacts/aiops/
├── canary_YYYYMMDDHHMMSS/
│ ├── diff.json
│ └── report.md
├── correlation_YYYYMMDDHHMMSS.json
├── plan.json
└── exec_YYYYMMDDHHMMSS/
├── log.jsonl
└── summary.md
Source
Extracted from: blackroad-prism-console/aiops/