mirror of
https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
synced 2026-03-18 04:33:59 -05:00
Create master orchestration prompt for BlackRoad-OS (#120)
…nt guides This implements the "Big Kahuna" master orchestration plan to get BlackRoad OS fully online and deployable without manual PR management. ## Backend Service (blackroad-core) - Add /version endpoint with build metadata - Prism Console already mounted at /prism - Health check at /health - Comprehensive API health at /api/health/summary ## Operator Service (blackroad-operator) - Add /version endpoint with build metadata - Create requirements.txt for dependencies - Create Dockerfile for containerization - Create railway.toml for Railway deployment - Health check at /health ## Infrastructure - Consolidate railway.toml for monorepo multi-service deployment - Backend service (Dockerfile-based) - Operator service (Nixpacks-based) - Remove conflicting railway.json ## Documentation - Add DEPLOYMENT_SMOKE_TEST_GUIDE.md - Complete deployment instructions (local + Railway) - Automated smoke test suite - Troubleshooting guide - Monitoring & health check setup - Add infra/DNS_CLOUDFLARE_PLAN.md - Complete DNS record table - Cloudflare configuration steps - Health check configuration - Security best practices ## Testing - Add scripts/smoke-test.sh for automated endpoint testing - Validates all health and version endpoints - Supports both Railway and Cloudflare URLs ## Result Alexa can now: 1. Push to main → GitHub Actions deploys to Railway 2. Configure Cloudflare DNS (one-time setup) 3. Run smoke tests to verify everything works 4. Visit https://os.blackroad.systems and use the OS No manual PR merging, no config juggling, no infrastructure babysitting. # Pull Request ## Description <!-- Provide a brief description of the changes in this PR --> ## Type of Change <!-- Mark the relevant option with an 'x' --> - [ ] 📝 Documentation update - [ ] 🧪 Tests only - [ ] 🏗️ Scaffolding/stubs - [ ] ✨ New feature - [ ] 🐛 Bug fix - [ ] ♻️ Refactoring - [ ] ⚙️ Infrastructure/CI - [ ] 📦 Dependencies update - [ ] 🔒 Security fix - [ ] 💥 Breaking change ## Checklist <!-- Mark completed items with an 'x' --> - [ ] Code follows the project's style guidelines - [ ] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes ## Auto-Merge Eligibility <!-- This section helps determine if this PR qualifies for auto-merge --> **Eligible for auto-merge?** - [ ] Yes - This is a docs-only, tests-only, or small AI-generated PR - [ ] No - Requires human review **Reason for auto-merge eligibility:** - [ ] Docs-only (Tier 1) - [ ] Tests-only (Tier 2) - [ ] Scaffolding < 200 lines (Tier 3) - [ ] AI-generated < 500 lines (Tier 4) - [ ] Dependency patch/minor (Tier 5) **If not auto-merge eligible, why?** - [ ] Breaking change - [ ] Security-related - [ ] Infrastructure changes - [ ] Requires discussion - [ ] Large PR (> 500 lines) ## Related Issues <!-- Link to related issues --> Closes # Related to # ## Test Plan <!-- Describe how you tested these changes --> ## Screenshots (if applicable) <!-- Add screenshots for UI changes --> --- **Note**: This PR will be automatically labeled based on files changed. See `GITHUB_AUTOMATION_RULES.md` for details. If this PR meets auto-merge criteria (see `AUTO_MERGE_POLICY.md`), it will be automatically approved and merged after checks pass. For questions about the merge queue system, see `MERGE_QUEUE_PLAN.md`.
This commit is contained in:
794
DEPLOYMENT_SMOKE_TEST_GUIDE.md
Normal file
794
DEPLOYMENT_SMOKE_TEST_GUIDE.md
Normal file
@@ -0,0 +1,794 @@
|
|||||||
|
# BlackRoad OS - Deployment & Smoke Test Guide
|
||||||
|
|
||||||
|
**Version:** 1.0
|
||||||
|
**Last Updated:** 2025-11-19
|
||||||
|
**For:** Alexa Louise (Cadillac)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This guide provides **one-click deployment instructions** and **smoke tests** to verify that BlackRoad OS is fully operational across all services.
|
||||||
|
|
||||||
|
**Goal:** Alexa can deploy and verify the entire BlackRoad OS stack without touching individual PRs, configs, or manual interventions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Architecture Overview](#architecture-overview)
|
||||||
|
2. [Prerequisites](#prerequisites)
|
||||||
|
3. [Local Development](#local-development)
|
||||||
|
4. [Railway Deployment](#railway-deployment)
|
||||||
|
5. [Cloudflare DNS Setup](#cloudflare-dns-setup)
|
||||||
|
6. [Smoke Tests](#smoke-tests)
|
||||||
|
7. [Monitoring & Health](#monitoring--health)
|
||||||
|
8. [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Cloudflare CDN │
|
||||||
|
│ (SSL, DDoS, Caching, WAF) │
|
||||||
|
└────────────────┬────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌────────┴────────┐
|
||||||
|
│ │
|
||||||
|
┌────▼────┐ ┌─────▼──────┐
|
||||||
|
│ Backend │ │ Operator │
|
||||||
|
│ Service │ │ Engine │
|
||||||
|
└────┬────┘ └─────┬──────┘
|
||||||
|
│ │
|
||||||
|
┌────▼─────────────────▼────┐
|
||||||
|
│ Railway Platform │
|
||||||
|
│ - PostgreSQL │
|
||||||
|
│ - Redis │
|
||||||
|
│ - Auto-scaling │
|
||||||
|
└────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Services
|
||||||
|
|
||||||
|
1. **Backend (Core API)**
|
||||||
|
- FastAPI application
|
||||||
|
- Serves: Frontend UI, API endpoints, Prism Console
|
||||||
|
- Port: 8000
|
||||||
|
- Health: `/health`
|
||||||
|
- Version: `/version`
|
||||||
|
|
||||||
|
2. **Operator Engine**
|
||||||
|
- Job scheduler and GitHub automation
|
||||||
|
- Port: 8001
|
||||||
|
- Health: `/health`
|
||||||
|
- Version: `/version`
|
||||||
|
|
||||||
|
3. **Frontend (Served by Backend)**
|
||||||
|
- Windows 95-style OS UI
|
||||||
|
- Vanilla JavaScript, zero dependencies
|
||||||
|
- Served at `/` from backend
|
||||||
|
|
||||||
|
4. **Prism Console (Served by Backend)**
|
||||||
|
- Admin interface
|
||||||
|
- Served at `/prism` from backend
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Required Tools
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Git
|
||||||
|
git --version # Should be 2.x+
|
||||||
|
|
||||||
|
# Python (for local testing)
|
||||||
|
python --version # Should be 3.11+
|
||||||
|
|
||||||
|
# Node.js (optional, for SDK development)
|
||||||
|
node --version # Should be 18+
|
||||||
|
|
||||||
|
# Railway CLI (for deployment)
|
||||||
|
curl -fsSL https://railway.app/install.sh | sh
|
||||||
|
railway --version
|
||||||
|
```
|
||||||
|
|
||||||
|
### Required Accounts
|
||||||
|
|
||||||
|
- ✅ GitHub account (for repo access)
|
||||||
|
- ✅ Railway account (for deployment)
|
||||||
|
- ✅ Cloudflare account (for DNS)
|
||||||
|
|
||||||
|
### Required Secrets
|
||||||
|
|
||||||
|
Ensure you have these secrets ready:
|
||||||
|
|
||||||
|
**Backend:**
|
||||||
|
- `SECRET_KEY` - JWT signing key (generate with `openssl rand -hex 32`)
|
||||||
|
- `WALLET_MASTER_KEY` - Wallet encryption key (32 chars)
|
||||||
|
- `DATABASE_URL` - PostgreSQL connection string (Railway provides)
|
||||||
|
- `REDIS_URL` - Redis connection string (Railway provides)
|
||||||
|
- `GITHUB_TOKEN` - GitHub PAT for API access
|
||||||
|
- `OPENAI_API_KEY` - OpenAI API key (optional)
|
||||||
|
- `STRIPE_SECRET_KEY` - Stripe secret (optional)
|
||||||
|
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` - AWS S3 (optional)
|
||||||
|
|
||||||
|
**Operator:**
|
||||||
|
- `GITHUB_TOKEN` - GitHub PAT with repo permissions
|
||||||
|
- `GITHUB_WEBHOOK_SECRET` - Webhook signature secret
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Local Development
|
||||||
|
|
||||||
|
### 1. Clone Repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
|
||||||
|
cd BlackRoad-Operating-System
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Backend Local Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
|
||||||
|
# Create virtual environment
|
||||||
|
python -m venv .venv
|
||||||
|
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Create .env file
|
||||||
|
cp .env.example .env
|
||||||
|
|
||||||
|
# Edit .env with your local settings
|
||||||
|
# (Use defaults for local development)
|
||||||
|
|
||||||
|
# Run with Docker Compose (recommended)
|
||||||
|
docker-compose up
|
||||||
|
|
||||||
|
# OR run directly
|
||||||
|
uvicorn app.main:app --reload
|
||||||
|
```
|
||||||
|
|
||||||
|
**Access locally:**
|
||||||
|
- Frontend: http://localhost:8000
|
||||||
|
- API Docs: http://localhost:8000/api/docs
|
||||||
|
- Prism Console: http://localhost:8000/prism
|
||||||
|
- Health: http://localhost:8000/health
|
||||||
|
- Version: http://localhost:8000/version
|
||||||
|
|
||||||
|
### 3. Operator Local Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd operator_engine
|
||||||
|
|
||||||
|
# Install dependencies (in separate venv or reuse backend's)
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Run operator server
|
||||||
|
uvicorn operator_engine.server:app --reload --port 8001
|
||||||
|
```
|
||||||
|
|
||||||
|
**Access locally:**
|
||||||
|
- Health: http://localhost:8001/health
|
||||||
|
- Version: http://localhost:8001/version
|
||||||
|
- Jobs: http://localhost:8001/jobs
|
||||||
|
|
||||||
|
### 4. Run Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backend tests
|
||||||
|
cd backend
|
||||||
|
pytest -v
|
||||||
|
|
||||||
|
# Or use helper script
|
||||||
|
cd ..
|
||||||
|
bash scripts/run_backend_tests.sh
|
||||||
|
|
||||||
|
# Operator tests
|
||||||
|
cd operator_engine
|
||||||
|
pytest -v
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Railway Deployment
|
||||||
|
|
||||||
|
### Option 1: Automatic Deployment (via GitHub)
|
||||||
|
|
||||||
|
**This is the recommended approach.**
|
||||||
|
|
||||||
|
1. **Push to main branch:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add .
|
||||||
|
git commit -m "Deploy BlackRoad OS"
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **GitHub Actions automatically triggers:**
|
||||||
|
- `.github/workflows/railway-deploy.yml` runs
|
||||||
|
- Builds and deploys both services to Railway
|
||||||
|
- Runs health checks
|
||||||
|
- Sends notifications (if configured)
|
||||||
|
|
||||||
|
3. **Monitor deployment:**
|
||||||
|
- Go to: https://github.com/blackboxprogramming/BlackRoad-Operating-System/actions
|
||||||
|
- Watch the "Deploy to Railway" workflow
|
||||||
|
|
||||||
|
### Option 2: Manual Deployment (via Railway CLI)
|
||||||
|
|
||||||
|
**Use this for testing or troubleshooting.**
|
||||||
|
|
||||||
|
1. **Install Railway CLI:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://railway.app/install.sh | sh
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Login to Railway:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
railway login
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Link to your Railway project:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From repo root
|
||||||
|
railway link
|
||||||
|
|
||||||
|
# Select your project from the list
|
||||||
|
# Or create a new project
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Deploy services:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Deploy both services (uses railway.toml)
|
||||||
|
railway up
|
||||||
|
|
||||||
|
# Or deploy specific service
|
||||||
|
railway up -s blackroad-backend
|
||||||
|
railway up -s blackroad-operator
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Check deployment status:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
railway status
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
railway logs -s blackroad-backend
|
||||||
|
railway logs -s blackroad-operator
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Get service URLs:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# In Railway dashboard or CLI
|
||||||
|
railway domain
|
||||||
|
|
||||||
|
# Should output something like:
|
||||||
|
# blackroad-backend: blackroad-backend-production.up.railway.app
|
||||||
|
# blackroad-operator: blackroad-operator-production.up.railway.app
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configure Environment Variables in Railway
|
||||||
|
|
||||||
|
**Via Railway Dashboard:**
|
||||||
|
|
||||||
|
1. Go to: https://railway.app/dashboard
|
||||||
|
2. Select your project
|
||||||
|
3. Click on each service (backend, operator)
|
||||||
|
4. Go to **Variables** tab
|
||||||
|
5. Add environment variables from `.env.example`
|
||||||
|
|
||||||
|
**Via Railway CLI:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set a variable for backend service
|
||||||
|
railway variables set SECRET_KEY="your-secret-key-here" -s blackroad-backend
|
||||||
|
|
||||||
|
# Set a variable for operator service
|
||||||
|
railway variables set GITHUB_TOKEN="ghp_..." -s blackroad-operator
|
||||||
|
|
||||||
|
# Or set from .env file
|
||||||
|
railway variables set -f backend/.env -s blackroad-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required Variables (Backend):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
SECRET_KEY=<your-secret-32-char-key>
|
||||||
|
WALLET_MASTER_KEY=<your-wallet-key-32-chars>
|
||||||
|
ALLOWED_ORIGINS=https://blackroad.systems,https://api.blackroad.systems,https://os.blackroad.systems
|
||||||
|
ENVIRONMENT=production
|
||||||
|
DEBUG=False
|
||||||
|
GITHUB_TOKEN=<your-github-pat>
|
||||||
|
OPENAI_API_KEY=<optional>
|
||||||
|
STRIPE_SECRET_KEY=<optional>
|
||||||
|
AWS_ACCESS_KEY_ID=<optional>
|
||||||
|
AWS_SECRET_ACCESS_KEY=<optional>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required Variables (Operator):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ENVIRONMENT=production
|
||||||
|
GITHUB_TOKEN=<your-github-pat>
|
||||||
|
GITHUB_WEBHOOK_SECRET=<your-webhook-secret>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** `DATABASE_URL` and `REDIS_URL` are automatically provided by Railway when you add PostgreSQL and Redis services.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cloudflare DNS Setup
|
||||||
|
|
||||||
|
Follow the **[DNS_CLOUDFLARE_PLAN.md](./infra/DNS_CLOUDFLARE_PLAN.md)** document.
|
||||||
|
|
||||||
|
**Quick Summary:**
|
||||||
|
|
||||||
|
1. Get Railway production URLs (from Railway dashboard or `railway domain`)
|
||||||
|
2. Login to Cloudflare: https://dash.cloudflare.com
|
||||||
|
3. Select `blackroad.systems` domain
|
||||||
|
4. Add CNAME records:
|
||||||
|
- `api` → `<backend-railway-url>`
|
||||||
|
- `core` → `<backend-railway-url>`
|
||||||
|
- `operator` → `<operator-railway-url>`
|
||||||
|
- `console` → `<backend-railway-url>`
|
||||||
|
- `docs` → `<backend-railway-url>`
|
||||||
|
- `web` → `<backend-railway-url>`
|
||||||
|
- `os` → `<backend-railway-url>`
|
||||||
|
- `@` (root) → `<backend-railway-url>`
|
||||||
|
- `www` → `<backend-railway-url>`
|
||||||
|
|
||||||
|
5. Enable **Proxy** (orange cloud) for all records
|
||||||
|
6. Set SSL/TLS to **Full (strict)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Smoke Tests
|
||||||
|
|
||||||
|
Run these tests **after deployment** to ensure everything works.
|
||||||
|
|
||||||
|
### Automated Smoke Test Script
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# smoke-test.sh - Run after deployment
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Set your domain or Railway URL
|
||||||
|
BACKEND_URL=${BACKEND_URL:-"https://api.blackroad.systems"}
|
||||||
|
OPERATOR_URL=${OPERATOR_URL:-"https://operator.blackroad.systems"}
|
||||||
|
|
||||||
|
echo "🔍 Running BlackRoad OS Smoke Tests..."
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
|
||||||
|
# Test 1: Backend Health
|
||||||
|
echo -n "✓ Backend Health Check... "
|
||||||
|
curl -f -s "$BACKEND_URL/health" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 2: Backend Version
|
||||||
|
echo -n "✓ Backend Version... "
|
||||||
|
curl -f -s "$BACKEND_URL/version" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 3: API Health Summary
|
||||||
|
echo -n "✓ API Health Summary... "
|
||||||
|
curl -f -s "$BACKEND_URL/api/health/summary" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 4: API Docs
|
||||||
|
echo -n "✓ API Documentation... "
|
||||||
|
curl -f -s "$BACKEND_URL/api/docs" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 5: Frontend UI
|
||||||
|
echo -n "✓ Frontend UI... "
|
||||||
|
curl -f -s "$BACKEND_URL/" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 6: Prism Console
|
||||||
|
echo -n "✓ Prism Console... "
|
||||||
|
curl -f -s "$BACKEND_URL/prism" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 7: Operator Health
|
||||||
|
echo -n "✓ Operator Health Check... "
|
||||||
|
curl -f -s "$OPERATOR_URL/health" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 8: Operator Version
|
||||||
|
echo -n "✓ Operator Version... "
|
||||||
|
curl -f -s "$OPERATOR_URL/version" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
# Test 9: Operator Jobs
|
||||||
|
echo -n "✓ Operator Jobs List... "
|
||||||
|
curl -f -s "$OPERATOR_URL/jobs" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
|
||||||
|
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo "✅ All smoke tests complete!"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Save this as `scripts/smoke-test.sh` and run:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
chmod +x scripts/smoke-test.sh
|
||||||
|
|
||||||
|
# Test with Railway URLs
|
||||||
|
BACKEND_URL=https://<your-backend>.up.railway.app \
|
||||||
|
OPERATOR_URL=https://<your-operator>.up.railway.app \
|
||||||
|
./scripts/smoke-test.sh
|
||||||
|
|
||||||
|
# Or test with Cloudflare domains
|
||||||
|
BACKEND_URL=https://api.blackroad.systems \
|
||||||
|
OPERATOR_URL=https://operator.blackroad.systems \
|
||||||
|
./scripts/smoke-test.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manual Smoke Tests
|
||||||
|
|
||||||
|
**Test 1: Backend Health**
|
||||||
|
```bash
|
||||||
|
curl -i https://api.blackroad.systems/health
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# HTTP/2 200
|
||||||
|
# {"status":"healthy","timestamp":1234567890.123}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 2: Backend Version**
|
||||||
|
```bash
|
||||||
|
curl -i https://api.blackroad.systems/version
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# HTTP/2 200
|
||||||
|
# {
|
||||||
|
# "service": "blackroad-core",
|
||||||
|
# "version": "1.0.0",
|
||||||
|
# "environment": "production",
|
||||||
|
# "commit": "abc123...",
|
||||||
|
# "built_at": "2025-11-19T...",
|
||||||
|
# "python_version": "3.11.x",
|
||||||
|
# "platform": "Linux"
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 3: API Health Summary**
|
||||||
|
```bash
|
||||||
|
curl -i https://api.blackroad.systems/api/health/summary
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# HTTP/2 200
|
||||||
|
# {
|
||||||
|
# "status": "healthy" | "degraded" | "unhealthy",
|
||||||
|
# "summary": {
|
||||||
|
# "total": 12,
|
||||||
|
# "connected": 5,
|
||||||
|
# "not_configured": 7,
|
||||||
|
# "errors": 0
|
||||||
|
# },
|
||||||
|
# "connected_apis": ["github", "openai", ...],
|
||||||
|
# ...
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 4: Operator Health**
|
||||||
|
```bash
|
||||||
|
curl -i https://operator.blackroad.systems/health
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# HTTP/2 200
|
||||||
|
# {"status":"healthy","version":"0.1.0"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 5: Operator Version**
|
||||||
|
```bash
|
||||||
|
curl -i https://operator.blackroad.systems/version
|
||||||
|
|
||||||
|
# Expected:
|
||||||
|
# HTTP/2 200
|
||||||
|
# {
|
||||||
|
# "service": "blackroad-operator",
|
||||||
|
# "version": "0.1.0",
|
||||||
|
# ...
|
||||||
|
# }
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 6: Frontend UI**
|
||||||
|
```bash
|
||||||
|
# Open in browser
|
||||||
|
open https://os.blackroad.systems
|
||||||
|
|
||||||
|
# Should see Windows 95-style desktop interface
|
||||||
|
# Check for:
|
||||||
|
# - Desktop icons
|
||||||
|
# - Taskbar at bottom
|
||||||
|
# - Start menu works
|
||||||
|
# - Windows can be opened/closed
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 7: Prism Console**
|
||||||
|
```bash
|
||||||
|
# Open in browser
|
||||||
|
open https://console.blackroad.systems/prism
|
||||||
|
|
||||||
|
# Should see dark admin interface
|
||||||
|
# Check for:
|
||||||
|
# - Navigation tabs (Overview, Jobs, Agents, Logs, System)
|
||||||
|
# - Metrics cards
|
||||||
|
# - System status
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 8: API Documentation**
|
||||||
|
```bash
|
||||||
|
# Open in browser
|
||||||
|
open https://api.blackroad.systems/api/docs
|
||||||
|
|
||||||
|
# Should see Swagger UI
|
||||||
|
# Check for:
|
||||||
|
# - All routers listed
|
||||||
|
# - Endpoints grouped by tags
|
||||||
|
# - Try out authentication endpoints
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring & Health
|
||||||
|
|
||||||
|
### Health Endpoints
|
||||||
|
|
||||||
|
**Backend:**
|
||||||
|
- Basic health: `GET /health`
|
||||||
|
- Version info: `GET /version`
|
||||||
|
- API health: `GET /api/health/summary`
|
||||||
|
- Full API health: `GET /api/health/all`
|
||||||
|
- Individual API: `GET /api/health/{api_name}`
|
||||||
|
|
||||||
|
**Operator:**
|
||||||
|
- Basic health: `GET /health`
|
||||||
|
- Version info: `GET /version`
|
||||||
|
- Scheduler status: `GET /scheduler/status`
|
||||||
|
|
||||||
|
### Railway Monitoring
|
||||||
|
|
||||||
|
**View Metrics:**
|
||||||
|
1. Go to Railway dashboard
|
||||||
|
2. Select your project
|
||||||
|
3. Click on a service
|
||||||
|
4. View **Metrics** tab:
|
||||||
|
- CPU usage
|
||||||
|
- Memory usage
|
||||||
|
- Network traffic
|
||||||
|
- Request count
|
||||||
|
|
||||||
|
**View Logs:**
|
||||||
|
```bash
|
||||||
|
# Via CLI
|
||||||
|
railway logs -s blackroad-backend
|
||||||
|
railway logs -s blackroad-operator --tail
|
||||||
|
|
||||||
|
# Via Dashboard
|
||||||
|
# Go to service > Deployments > View Logs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cloudflare Analytics
|
||||||
|
|
||||||
|
1. Login to Cloudflare
|
||||||
|
2. Select `blackroad.systems`
|
||||||
|
3. Go to **Analytics** tab
|
||||||
|
4. Monitor:
|
||||||
|
- Requests
|
||||||
|
- Bandwidth
|
||||||
|
- Unique visitors
|
||||||
|
- Threats blocked
|
||||||
|
- Cache performance
|
||||||
|
|
||||||
|
### Set Up Alerts
|
||||||
|
|
||||||
|
**Railway Alerts:**
|
||||||
|
- Go to Project Settings > Notifications
|
||||||
|
- Enable alerts for:
|
||||||
|
- Deployment failures
|
||||||
|
- High CPU/Memory usage
|
||||||
|
- Service crashes
|
||||||
|
|
||||||
|
**Cloudflare Health Checks:**
|
||||||
|
- See: `infra/DNS_CLOUDFLARE_PLAN.md`
|
||||||
|
- Configure health checks for both services
|
||||||
|
- Get email alerts on downtime
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Issue: Deployment failed
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
```bash
|
||||||
|
# View recent logs
|
||||||
|
railway logs -s blackroad-backend --tail
|
||||||
|
|
||||||
|
# Check build logs
|
||||||
|
railway logs --deployment <deployment-id>
|
||||||
|
|
||||||
|
# Re-deploy
|
||||||
|
railway up -s blackroad-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- Missing environment variables
|
||||||
|
- Database migration failed
|
||||||
|
- Dependency installation failed
|
||||||
|
|
||||||
|
### Issue: Health check returns 502/503
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
1. Service is running: `railway status`
|
||||||
|
2. Logs for errors: `railway logs -s blackroad-backend`
|
||||||
|
3. Environment variables are set
|
||||||
|
4. Database and Redis are accessible
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
# Restart service
|
||||||
|
railway restart -s blackroad-backend
|
||||||
|
|
||||||
|
# Or redeploy
|
||||||
|
railway up -s blackroad-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
### Issue: CORS errors in browser
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
1. `ALLOWED_ORIGINS` environment variable includes your domain
|
||||||
|
2. Cloudflare SSL mode is **Full (strict)**
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
# Update allowed origins
|
||||||
|
railway variables set ALLOWED_ORIGINS="https://blackroad.systems,https://api.blackroad.systems,https://os.blackroad.systems" -s blackroad-backend
|
||||||
|
|
||||||
|
# Restart service
|
||||||
|
railway restart -s blackroad-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
### Issue: DNS not resolving
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
```bash
|
||||||
|
# Check DNS propagation
|
||||||
|
dig api.blackroad.systems
|
||||||
|
nslookup api.blackroad.systems
|
||||||
|
|
||||||
|
# Or use online tool
|
||||||
|
# https://dnschecker.org
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
1. Wait up to 24 hours for global propagation
|
||||||
|
2. Flush local DNS cache
|
||||||
|
3. Verify CNAME records in Cloudflare
|
||||||
|
|
||||||
|
### Issue: 500 Internal Server Error
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
1. Railway logs: `railway logs -s blackroad-backend`
|
||||||
|
2. Database connectivity
|
||||||
|
3. Missing secrets
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
# Check database connection
|
||||||
|
railway run -s blackroad-backend -- python -c "from app.database import async_engine; print('DB OK')"
|
||||||
|
|
||||||
|
# Verify all required env vars are set
|
||||||
|
railway variables -s blackroad-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
### Issue: API endpoints return 404
|
||||||
|
|
||||||
|
**Check:**
|
||||||
|
1. Correct URL path
|
||||||
|
2. Service is deployed correctly
|
||||||
|
3. Routes are registered in `main.py`
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
# Check API documentation for correct paths
|
||||||
|
open https://api.blackroad.systems/api/docs
|
||||||
|
|
||||||
|
# Verify routes
|
||||||
|
railway logs -s blackroad-backend | grep "Application startup complete"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
BlackRoad OS is considered **fully operational** when:
|
||||||
|
|
||||||
|
- ✅ Backend service deploys without errors
|
||||||
|
- ✅ Operator service deploys without errors
|
||||||
|
- ✅ All health endpoints return 200 OK
|
||||||
|
- ✅ Frontend UI loads and is interactive
|
||||||
|
- ✅ Prism Console loads and shows metrics
|
||||||
|
- ✅ API documentation is accessible
|
||||||
|
- ✅ DNS resolves correctly for all subdomains
|
||||||
|
- ✅ SSL certificates are valid
|
||||||
|
- ✅ All smoke tests pass
|
||||||
|
- ✅ No errors in Railway logs
|
||||||
|
- ✅ Cloudflare proxy is active (orange cloud)
|
||||||
|
|
||||||
|
**Alexa's Victory Condition:**
|
||||||
|
|
||||||
|
> Visit https://os.blackroad.systems and see the Windows 95 desktop.
|
||||||
|
> Click around, open apps, and everything just works.
|
||||||
|
> No Git, no PRs, no manual config.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Reference Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Deploy to Railway
|
||||||
|
railway up
|
||||||
|
|
||||||
|
# View status
|
||||||
|
railway status
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
railway logs -s blackroad-backend --tail
|
||||||
|
|
||||||
|
# Set environment variable
|
||||||
|
railway variables set SECRET_KEY="xxx" -s blackroad-backend
|
||||||
|
|
||||||
|
# Restart service
|
||||||
|
railway restart -s blackroad-backend
|
||||||
|
|
||||||
|
# Run smoke tests
|
||||||
|
./scripts/smoke-test.sh
|
||||||
|
|
||||||
|
# View Railway dashboard
|
||||||
|
railway open
|
||||||
|
|
||||||
|
# SSH into Railway service (for debugging)
|
||||||
|
railway shell -s blackroad-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps After Deployment
|
||||||
|
|
||||||
|
1. **Configure monitoring:**
|
||||||
|
- Set up Cloudflare health checks
|
||||||
|
- Enable Railway alerts
|
||||||
|
- Set up external uptime monitoring
|
||||||
|
|
||||||
|
2. **Set up CI/CD:**
|
||||||
|
- GitHub Actions already configured
|
||||||
|
- Ensure `RAILWAY_TOKEN` secret is set in GitHub
|
||||||
|
|
||||||
|
3. **Configure webhooks:**
|
||||||
|
- Set up GitHub webhooks for operator
|
||||||
|
- Configure Stripe webhooks (if using payments)
|
||||||
|
|
||||||
|
4. **Performance tuning:**
|
||||||
|
- Monitor Railway metrics
|
||||||
|
- Adjust replica count if needed
|
||||||
|
- Configure Cloudflare caching rules
|
||||||
|
|
||||||
|
5. **Security hardening:**
|
||||||
|
- Enable Cloudflare WAF
|
||||||
|
- Set up rate limiting
|
||||||
|
- Configure IP whitelisting for `/prism` (optional)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Document Version:** 1.0
|
||||||
|
**Last Updated:** 2025-11-19
|
||||||
|
**Maintained by:** Atlas (AI Infrastructure Engineer)
|
||||||
|
**For:** Alexa Louise (Cadillac), Founder & Operator
|
||||||
@@ -230,6 +230,24 @@ async def health_check():
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# Version info
|
||||||
|
@app.get("/version")
|
||||||
|
async def version_info():
|
||||||
|
"""Version information endpoint"""
|
||||||
|
import platform
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
return {
|
||||||
|
"service": "blackroad-core",
|
||||||
|
"version": settings.APP_VERSION,
|
||||||
|
"environment": settings.ENVIRONMENT,
|
||||||
|
"commit": os.getenv("GIT_COMMIT", "unknown"),
|
||||||
|
"built_at": os.getenv("BUILD_TIMESTAMP", datetime.utcnow().isoformat()),
|
||||||
|
"python_version": platform.python_version(),
|
||||||
|
"platform": platform.system(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# API info
|
# API info
|
||||||
@app.get("/api")
|
@app.get("/api")
|
||||||
async def api_info():
|
async def api_info():
|
||||||
|
|||||||
355
infra/DNS_CLOUDFLARE_PLAN.md
Normal file
355
infra/DNS_CLOUDFLARE_PLAN.md
Normal file
@@ -0,0 +1,355 @@
|
|||||||
|
# Cloudflare DNS Configuration Plan
|
||||||
|
|
||||||
|
**Version:** 1.0
|
||||||
|
**Last Updated:** 2025-11-19
|
||||||
|
**Owner:** Alexa Louise (Cadillac)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This document provides the DNS configuration plan for routing BlackRoad OS services through Cloudflare. All services are deployed to Railway and fronted by Cloudflare for CDN, DDoS protection, and SSL termination.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DNS Records
|
||||||
|
|
||||||
|
Configure the following DNS records in the Cloudflare dashboard for `blackroad.systems`:
|
||||||
|
|
||||||
|
| Subdomain | Type | Target (Railway URL) | Proxy | SSL/TLS Mode | Purpose |
|
||||||
|
|-----------|------|----------------------|-------|--------------|---------|
|
||||||
|
| `@` (root) | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Main website redirect |
|
||||||
|
| `www` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | WWW redirect |
|
||||||
|
| `api` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Public API Gateway |
|
||||||
|
| `core` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Core Backend (alias) |
|
||||||
|
| `operator` | CNAME | `<operator-railway>.up.railway.app` | ☁️ ON | Full (strict) | Operator Engine |
|
||||||
|
| `console` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Prism Console (served at /prism) |
|
||||||
|
| `docs` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Documentation site |
|
||||||
|
| `web` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Web/Frontend OS UI |
|
||||||
|
| `os` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Operating System UI |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Service Mapping
|
||||||
|
|
||||||
|
### Backend Service (Railway)
|
||||||
|
|
||||||
|
**Railway Service Name:** `blackroad-backend`
|
||||||
|
**Railway URL:** `<TBD>-production.up.railway.app`
|
||||||
|
**Internal Port:** 8000
|
||||||
|
|
||||||
|
**Serves:**
|
||||||
|
- `/` → Frontend OS UI (backend/static/index.html)
|
||||||
|
- `/prism` → Prism Console UI
|
||||||
|
- `/api/*` → All API endpoints
|
||||||
|
- `/health` → Health check
|
||||||
|
- `/version` → Version info
|
||||||
|
- `/api/health/all` → Comprehensive API health
|
||||||
|
|
||||||
|
**Routed via:**
|
||||||
|
- `api.blackroad.systems`
|
||||||
|
- `core.blackroad.systems`
|
||||||
|
- `console.blackroad.systems`
|
||||||
|
- `docs.blackroad.systems`
|
||||||
|
- `web.blackroad.systems`
|
||||||
|
- `os.blackroad.systems`
|
||||||
|
- `blackroad.systems` (root)
|
||||||
|
- `www.blackroad.systems`
|
||||||
|
|
||||||
|
### Operator Service (Railway)
|
||||||
|
|
||||||
|
**Railway Service Name:** `blackroad-operator`
|
||||||
|
**Railway URL:** `<TBD>-production.up.railway.app`
|
||||||
|
**Internal Port:** 8001
|
||||||
|
|
||||||
|
**Serves:**
|
||||||
|
- `/health` → Health check
|
||||||
|
- `/version` → Version info
|
||||||
|
- `/jobs` → Job list
|
||||||
|
- `/jobs/{id}` → Job details
|
||||||
|
- `/scheduler/status` → Scheduler status
|
||||||
|
|
||||||
|
**Routed via:**
|
||||||
|
- `operator.blackroad.systems`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to Configure
|
||||||
|
|
||||||
|
### Step 1: Get Railway URLs
|
||||||
|
|
||||||
|
After deploying to Railway, retrieve the production URLs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install Railway CLI
|
||||||
|
curl -fsSL https://railway.app/install.sh | sh
|
||||||
|
|
||||||
|
# Login
|
||||||
|
railway login
|
||||||
|
|
||||||
|
# Link project
|
||||||
|
railway link <project-id>
|
||||||
|
|
||||||
|
# Get service URLs
|
||||||
|
railway status
|
||||||
|
|
||||||
|
# Or check Railway dashboard:
|
||||||
|
# https://railway.app/dashboard
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see URLs like:
|
||||||
|
- `blackroad-backend-production.up.railway.app`
|
||||||
|
- `blackroad-operator-production.up.railway.app`
|
||||||
|
|
||||||
|
### Step 2: Add DNS Records in Cloudflare
|
||||||
|
|
||||||
|
1. **Login to Cloudflare:** https://dash.cloudflare.com
|
||||||
|
2. **Select domain:** `blackroad.systems`
|
||||||
|
3. **Go to DNS tab**
|
||||||
|
4. **Add CNAME records** per table above
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Type: CNAME
|
||||||
|
Name: api
|
||||||
|
Target: blackroad-backend-production.up.railway.app
|
||||||
|
Proxy status: Proxied (orange cloud)
|
||||||
|
TTL: Auto
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Configure SSL/TLS
|
||||||
|
|
||||||
|
1. Go to **SSL/TLS** tab in Cloudflare
|
||||||
|
2. Set encryption mode to **Full (strict)**
|
||||||
|
3. Enable **Always Use HTTPS**
|
||||||
|
4. Enable **Automatic HTTPS Rewrites**
|
||||||
|
|
||||||
|
### Step 4: Configure Page Rules (Optional)
|
||||||
|
|
||||||
|
Add page rules for better routing:
|
||||||
|
|
||||||
|
1. **Force HTTPS:**
|
||||||
|
- URL: `http://*blackroad.systems/*`
|
||||||
|
- Setting: Always Use HTTPS
|
||||||
|
|
||||||
|
2. **Cache API responses (optional):**
|
||||||
|
- URL: `api.blackroad.systems/api/*`
|
||||||
|
- Settings:
|
||||||
|
- Cache Level: Standard
|
||||||
|
- Edge Cache TTL: 2 hours
|
||||||
|
- Browser Cache TTL: 30 minutes
|
||||||
|
|
||||||
|
3. **No cache for dynamic endpoints:**
|
||||||
|
- URL: `api.blackroad.systems/api/auth/*`
|
||||||
|
- Setting: Cache Level: Bypass
|
||||||
|
|
||||||
|
### Step 5: Verify Configuration
|
||||||
|
|
||||||
|
Test each endpoint:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backend health
|
||||||
|
curl -i https://api.blackroad.systems/health
|
||||||
|
|
||||||
|
# Backend version
|
||||||
|
curl -i https://api.blackroad.systems/version
|
||||||
|
|
||||||
|
# API health summary
|
||||||
|
curl -i https://api.blackroad.systems/api/health/summary
|
||||||
|
|
||||||
|
# Operator health
|
||||||
|
curl -i https://operator.blackroad.systems/health
|
||||||
|
|
||||||
|
# Operator version
|
||||||
|
curl -i https://operator.blackroad.systems/version
|
||||||
|
|
||||||
|
# Frontend
|
||||||
|
open https://os.blackroad.systems
|
||||||
|
|
||||||
|
# Prism Console
|
||||||
|
open https://console.blackroad.systems/prism
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Health Check Configuration
|
||||||
|
|
||||||
|
Configure Cloudflare Health Checks for uptime monitoring:
|
||||||
|
|
||||||
|
### Backend Health Check
|
||||||
|
|
||||||
|
- **Name:** BlackRoad Backend
|
||||||
|
- **URL:** `https://api.blackroad.systems/health`
|
||||||
|
- **Interval:** 60 seconds
|
||||||
|
- **Retries:** 2
|
||||||
|
- **Expected codes:** 200
|
||||||
|
- **Notification:** Email on failure
|
||||||
|
|
||||||
|
### Operator Health Check
|
||||||
|
|
||||||
|
- **Name:** BlackRoad Operator
|
||||||
|
- **URL:** `https://operator.blackroad.systems/health`
|
||||||
|
- **Interval:** 60 seconds
|
||||||
|
- **Retries:** 2
|
||||||
|
- **Expected codes:** 200
|
||||||
|
- **Notification:** Email on failure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Firewall Rules
|
||||||
|
|
||||||
|
Add Cloudflare firewall rules to protect your services:
|
||||||
|
|
||||||
|
### 1. Rate Limiting (Recommended)
|
||||||
|
|
||||||
|
- **URL:** `api.blackroad.systems/api/*`
|
||||||
|
- **Rule:** Block if rate > 100 requests/minute from same IP
|
||||||
|
|
||||||
|
### 2. Bot Protection
|
||||||
|
|
||||||
|
- **URL:** `*blackroad.systems/*`
|
||||||
|
- **Rule:** Challenge known bots, block malicious bots
|
||||||
|
|
||||||
|
### 3. Geo-blocking (Optional)
|
||||||
|
|
||||||
|
- **URL:** `*blackroad.systems/*`
|
||||||
|
- **Rule:** Block countries with high spam rates (if desired)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Issue: DNS not resolving
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
1. Check DNS propagation: https://dnschecker.org
|
||||||
|
2. Wait up to 24 hours for global propagation
|
||||||
|
3. Flush local DNS: `sudo dscacheutil -flushcache` (macOS)
|
||||||
|
|
||||||
|
### Issue: 521 Error (Web server is down)
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
1. Check Railway service status
|
||||||
|
2. Verify health endpoint: `curl <railway-url>/health`
|
||||||
|
3. Check Railway logs: `railway logs`
|
||||||
|
|
||||||
|
### Issue: 525 Error (SSL handshake failed)
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
1. Ensure Railway has valid SSL certificate
|
||||||
|
2. Set Cloudflare SSL/TLS to **Full (strict)**
|
||||||
|
3. Check Railway environment variables
|
||||||
|
|
||||||
|
### Issue: CORS errors
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
1. Ensure `ALLOWED_ORIGINS` in backend `.env` includes Cloudflare domains
|
||||||
|
2. Add `https://api.blackroad.systems` to allowed origins
|
||||||
|
3. Restart backend service
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
Ensure these environment variables are set in Railway for both services:
|
||||||
|
|
||||||
|
### Backend Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ENVIRONMENT=production
|
||||||
|
DEBUG=False
|
||||||
|
ALLOWED_ORIGINS=https://blackroad.systems,https://www.blackroad.systems,https://os.blackroad.systems,https://api.blackroad.systems,https://console.blackroad.systems,https://docs.blackroad.systems,https://web.blackroad.systems,https://core.blackroad.systems
|
||||||
|
DATABASE_URL=<postgres-connection-string>
|
||||||
|
REDIS_URL=<redis-connection-string>
|
||||||
|
SECRET_KEY=<your-secret-key>
|
||||||
|
# ... other secrets
|
||||||
|
```
|
||||||
|
|
||||||
|
### Operator Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ENVIRONMENT=production
|
||||||
|
GITHUB_TOKEN=<github-pat>
|
||||||
|
GITHUB_WEBHOOK_SECRET=<webhook-secret>
|
||||||
|
# ... other secrets
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring & Alerts
|
||||||
|
|
||||||
|
### Cloudflare Analytics
|
||||||
|
|
||||||
|
Monitor:
|
||||||
|
- **Requests:** Total requests, unique visitors
|
||||||
|
- **Bandwidth:** Data transfer
|
||||||
|
- **Threats:** Blocked requests
|
||||||
|
- **Performance:** Cache hit ratio
|
||||||
|
|
||||||
|
### Railway Metrics
|
||||||
|
|
||||||
|
Monitor:
|
||||||
|
- **CPU usage:** Should stay < 80%
|
||||||
|
- **Memory usage:** Should stay < 90%
|
||||||
|
- **Request latency:** p50, p95, p99
|
||||||
|
- **Error rate:** Should stay < 1%
|
||||||
|
|
||||||
|
### Uptime Monitoring (External)
|
||||||
|
|
||||||
|
Use a third-party service for independent monitoring:
|
||||||
|
- UptimeRobot: https://uptimerobot.com
|
||||||
|
- Pingdom: https://pingdom.com
|
||||||
|
- StatusCake: https://www.statuscake.com
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security Best Practices
|
||||||
|
|
||||||
|
1. **Enable Cloudflare WAF (Web Application Firewall)**
|
||||||
|
- Protects against OWASP Top 10
|
||||||
|
- Managed rulesets for common attacks
|
||||||
|
|
||||||
|
2. **Use Cloudflare Zero Trust (Optional)**
|
||||||
|
- Add authentication layer for admin routes
|
||||||
|
- Restrict `/prism` to authorized IPs
|
||||||
|
|
||||||
|
3. **Enable DDoS Protection**
|
||||||
|
- Already included with Cloudflare proxy
|
||||||
|
- Configure rate limiting per endpoint
|
||||||
|
|
||||||
|
4. **Regular SSL Certificate Rotation**
|
||||||
|
- Railway auto-renews Let's Encrypt certs
|
||||||
|
- Cloudflare Universal SSL auto-renews
|
||||||
|
|
||||||
|
5. **Audit Logs**
|
||||||
|
- Review Cloudflare Security Events weekly
|
||||||
|
- Review Railway deployment logs
|
||||||
|
- Monitor GitHub webhook events
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. ✅ Deploy backend to Railway
|
||||||
|
2. ✅ Deploy operator to Railway
|
||||||
|
3. ⏳ Get Railway production URLs
|
||||||
|
4. ⏳ Configure Cloudflare DNS records
|
||||||
|
5. ⏳ Set up health checks
|
||||||
|
6. ⏳ Configure firewall rules
|
||||||
|
7. ⏳ Enable monitoring & alerts
|
||||||
|
8. ⏳ Test all endpoints
|
||||||
|
9. ⏳ Update documentation with actual URLs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
|
||||||
|
**Owner:** Alexa Louise (Cadillac)
|
||||||
|
**Repository:** https://github.com/blackboxprogramming/BlackRoad-Operating-System
|
||||||
|
**Railway Project:** `<TBD>`
|
||||||
|
**Cloudflare Account:** `<TBD>`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** 2025-11-19
|
||||||
|
**Document Version:** 1.0
|
||||||
22
operator_engine/Dockerfile
Normal file
22
operator_engine/Dockerfile
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Copy requirements
|
||||||
|
COPY requirements.txt .
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
# Copy application code
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
# Expose port
|
||||||
|
EXPOSE 8001
|
||||||
|
|
||||||
|
# Health check
|
||||||
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||||
|
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8001/health')"
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
CMD ["uvicorn", "operator_engine.server:app", "--host", "0.0.0.0", "--port", "8001"]
|
||||||
18
operator_engine/railway.toml
Normal file
18
operator_engine/railway.toml
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
[build]
|
||||||
|
builder = "NIXPACKS"
|
||||||
|
|
||||||
|
[deploy]
|
||||||
|
startCommand = "uvicorn operator_engine.server:app --host 0.0.0.0 --port $PORT"
|
||||||
|
healthcheckPath = "/health"
|
||||||
|
watchPatterns = ["operator_engine/**/*.py"]
|
||||||
|
|
||||||
|
[[services]]
|
||||||
|
name = "blackroad-operator"
|
||||||
|
|
||||||
|
[services.healthcheck]
|
||||||
|
path = "/health"
|
||||||
|
timeout = 10
|
||||||
|
|
||||||
|
[[services.env]]
|
||||||
|
name = "ENVIRONMENT"
|
||||||
|
value = "production"
|
||||||
7
operator_engine/requirements.txt
Normal file
7
operator_engine/requirements.txt
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
fastapi==0.104.1
|
||||||
|
uvicorn==0.24.0
|
||||||
|
pydantic==2.5.0
|
||||||
|
pydantic-settings==2.1.0
|
||||||
|
httpx==0.25.2
|
||||||
|
pytest==7.4.3
|
||||||
|
pytest-asyncio==0.21.1
|
||||||
@@ -20,6 +20,24 @@ async def health_check():
|
|||||||
return {"status": "healthy", "version": settings.APP_VERSION}
|
return {"status": "healthy", "version": settings.APP_VERSION}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/version")
|
||||||
|
async def version_info():
|
||||||
|
"""Version information endpoint"""
|
||||||
|
import platform
|
||||||
|
import os
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
return {
|
||||||
|
"service": "blackroad-operator",
|
||||||
|
"version": settings.APP_VERSION,
|
||||||
|
"environment": os.getenv("ENVIRONMENT", "development"),
|
||||||
|
"commit": os.getenv("GIT_COMMIT", "unknown"),
|
||||||
|
"built_at": os.getenv("BUILD_TIMESTAMP", datetime.utcnow().isoformat()),
|
||||||
|
"python_version": platform.python_version(),
|
||||||
|
"platform": platform.system(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/jobs", response_model=List[Dict[str, Any]])
|
@app.get("/jobs", response_model=List[Dict[str, Any]])
|
||||||
async def list_jobs():
|
async def list_jobs():
|
||||||
"""List all jobs in the registry"""
|
"""List all jobs in the registry"""
|
||||||
|
|||||||
11
railway.json
11
railway.json
@@ -1,11 +0,0 @@
|
|||||||
{
|
|
||||||
"$schema": "https://railway.app/railway.schema.json",
|
|
||||||
"build": {
|
|
||||||
"builder": "NIXPACKS"
|
|
||||||
},
|
|
||||||
"deploy": {
|
|
||||||
"startCommand": "uvicorn app.main:app --host 0.0.0.0 --port $PORT",
|
|
||||||
"healthcheckPath": "/health",
|
|
||||||
"watchPatterns": ["backend/**", "app/**", "*.py"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
39
railway.toml
39
railway.toml
@@ -1,16 +1,20 @@
|
|||||||
[build]
|
# BlackRoad OS Monorepo - Railway Configuration
|
||||||
|
# This configures multiple services from a single repository
|
||||||
|
|
||||||
|
# Backend (Core API)
|
||||||
|
[[services]]
|
||||||
|
name = "blackroad-backend"
|
||||||
|
source = "backend"
|
||||||
|
|
||||||
|
[services.build]
|
||||||
builder = "DOCKERFILE"
|
builder = "DOCKERFILE"
|
||||||
dockerfilePath = "backend/Dockerfile"
|
dockerfilePath = "backend/Dockerfile"
|
||||||
|
|
||||||
[deploy]
|
[services.deploy]
|
||||||
numReplicas = 1
|
numReplicas = 1
|
||||||
sleepApplication = false
|
sleepApplication = false
|
||||||
restartPolicyType = "ON_FAILURE"
|
restartPolicyType = "ON_FAILURE"
|
||||||
restartPolicyMaxRetries = 10
|
restartPolicyMaxRetries = 10
|
||||||
# startCommand is handled by Dockerfile CMD - no need to override
|
|
||||||
|
|
||||||
[[services]]
|
|
||||||
name = "blackroad-backend"
|
|
||||||
|
|
||||||
[services.healthcheck]
|
[services.healthcheck]
|
||||||
path = "/health"
|
path = "/health"
|
||||||
@@ -23,3 +27,26 @@ value = "production"
|
|||||||
[[services.env]]
|
[[services.env]]
|
||||||
name = "DEBUG"
|
name = "DEBUG"
|
||||||
value = "False"
|
value = "False"
|
||||||
|
|
||||||
|
# Operator Engine (Job Scheduler & GitHub Automation)
|
||||||
|
[[services]]
|
||||||
|
name = "blackroad-operator"
|
||||||
|
source = "operator_engine"
|
||||||
|
|
||||||
|
[services.build]
|
||||||
|
builder = "NIXPACKS"
|
||||||
|
|
||||||
|
[services.deploy]
|
||||||
|
startCommand = "uvicorn operator_engine.server:app --host 0.0.0.0 --port $PORT"
|
||||||
|
numReplicas = 1
|
||||||
|
sleepApplication = false
|
||||||
|
restartPolicyType = "ON_FAILURE"
|
||||||
|
restartPolicyMaxRetries = 10
|
||||||
|
|
||||||
|
[services.healthcheck]
|
||||||
|
path = "/health"
|
||||||
|
timeout = 10
|
||||||
|
|
||||||
|
[[services.env]]
|
||||||
|
name = "ENVIRONMENT"
|
||||||
|
value = "production"
|
||||||
|
|||||||
79
scripts/smoke-test.sh
Executable file
79
scripts/smoke-test.sh
Executable file
@@ -0,0 +1,79 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# smoke-test.sh - BlackRoad OS Smoke Tests
|
||||||
|
# Run this after deployment to verify all services are operational
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Colors for output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Set your domain or Railway URL
|
||||||
|
BACKEND_URL=${BACKEND_URL:-"https://api.blackroad.systems"}
|
||||||
|
OPERATOR_URL=${OPERATOR_URL:-"https://operator.blackroad.systems"}
|
||||||
|
|
||||||
|
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||||
|
echo -e "${BLUE}🔍 BlackRoad OS Smoke Tests${NC}"
|
||||||
|
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||||
|
echo ""
|
||||||
|
echo -e "Backend URL: ${YELLOW}$BACKEND_URL${NC}"
|
||||||
|
echo -e "Operator URL: ${YELLOW}$OPERATOR_URL${NC}"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
PASSED=0
|
||||||
|
FAILED=0
|
||||||
|
|
||||||
|
# Helper function to run tests
|
||||||
|
test_endpoint() {
|
||||||
|
local name=$1
|
||||||
|
local url=$2
|
||||||
|
local expected_code=${3:-200}
|
||||||
|
|
||||||
|
echo -n "Testing $name... "
|
||||||
|
|
||||||
|
response=$(curl -s -o /dev/null -w "%{http_code}" "$url" 2>/dev/null) || response="000"
|
||||||
|
|
||||||
|
if [ "$response" -eq "$expected_code" ]; then
|
||||||
|
echo -e "${GREEN}✅ PASS${NC} (HTTP $response)"
|
||||||
|
((PASSED++))
|
||||||
|
else
|
||||||
|
echo -e "${RED}❌ FAIL${NC} (HTTP $response, expected $expected_code)"
|
||||||
|
((FAILED++))
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Backend Tests
|
||||||
|
echo -e "\n${BLUE}Backend Service Tests:${NC}"
|
||||||
|
test_endpoint "Health Check" "$BACKEND_URL/health"
|
||||||
|
test_endpoint "Version Info" "$BACKEND_URL/version"
|
||||||
|
test_endpoint "API Info" "$BACKEND_URL/api"
|
||||||
|
test_endpoint "API Health Summary" "$BACKEND_URL/api/health/summary"
|
||||||
|
test_endpoint "API Documentation" "$BACKEND_URL/api/docs"
|
||||||
|
test_endpoint "Frontend UI" "$BACKEND_URL/"
|
||||||
|
test_endpoint "Prism Console" "$BACKEND_URL/prism"
|
||||||
|
|
||||||
|
# Operator Tests
|
||||||
|
echo -e "\n${BLUE}Operator Service Tests:${NC}"
|
||||||
|
test_endpoint "Health Check" "$OPERATOR_URL/health"
|
||||||
|
test_endpoint "Version Info" "$OPERATOR_URL/version"
|
||||||
|
test_endpoint "Jobs List" "$OPERATOR_URL/jobs"
|
||||||
|
test_endpoint "Scheduler Status" "$OPERATOR_URL/scheduler/status"
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
echo -e "\n${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||||
|
echo -e "${BLUE}Summary:${NC}"
|
||||||
|
echo -e " ${GREEN}Passed: $PASSED${NC}"
|
||||||
|
echo -e " ${RED}Failed: $FAILED${NC}"
|
||||||
|
|
||||||
|
if [ $FAILED -eq 0 ]; then
|
||||||
|
echo -e "\n${GREEN}✅ All smoke tests passed!${NC}"
|
||||||
|
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||||
|
exit 0
|
||||||
|
else
|
||||||
|
echo -e "\n${RED}❌ Some tests failed. Check logs for details.${NC}"
|
||||||
|
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
Reference in New Issue
Block a user