Create master orchestration prompt for BlackRoad-OS (#120)

…nt guides

This implements the "Big Kahuna" master orchestration plan to get
BlackRoad OS fully online and deployable without manual PR management.

## Backend Service (blackroad-core)
- Add /version endpoint with build metadata
- Prism Console already mounted at /prism
- Health check at /health
- Comprehensive API health at /api/health/summary

## Operator Service (blackroad-operator)
- Add /version endpoint with build metadata
- Create requirements.txt for dependencies
- Create Dockerfile for containerization
- Create railway.toml for Railway deployment
- Health check at /health

## Infrastructure
- Consolidate railway.toml for monorepo multi-service deployment
  - Backend service (Dockerfile-based)
  - Operator service (Nixpacks-based)
- Remove conflicting railway.json

## Documentation
- Add DEPLOYMENT_SMOKE_TEST_GUIDE.md
  - Complete deployment instructions (local + Railway)
  - Automated smoke test suite
  - Troubleshooting guide
  - Monitoring & health check setup
- Add infra/DNS_CLOUDFLARE_PLAN.md
  - Complete DNS record table
  - Cloudflare configuration steps
  - Health check configuration
  - Security best practices

## Testing
- Add scripts/smoke-test.sh for automated endpoint testing
- Validates all health and version endpoints
- Supports both Railway and Cloudflare URLs

## Result
Alexa can now:
1. Push to main → GitHub Actions deploys to Railway
2. Configure Cloudflare DNS (one-time setup)
3. Run smoke tests to verify everything works
4. Visit https://os.blackroad.systems and use the OS

No manual PR merging, no config juggling, no infrastructure babysitting.

# Pull Request

## Description

<!-- Provide a brief description of the changes in this PR -->

## Type of Change

<!-- Mark the relevant option with an 'x' -->

- [ ] 📝 Documentation update
- [ ] 🧪 Tests only
- [ ] 🏗️ Scaffolding/stubs
- [ ]  New feature
- [ ] 🐛 Bug fix
- [ ] ♻️ Refactoring
- [ ] ⚙️ Infrastructure/CI
- [ ] 📦 Dependencies update
- [ ] 🔒 Security fix
- [ ] 💥 Breaking change

## Checklist

<!-- Mark completed items with an 'x' -->

- [ ] Code follows the project's style guidelines
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] New and existing unit tests pass locally with my changes

## Auto-Merge Eligibility

<!-- This section helps determine if this PR qualifies for auto-merge
-->

**Eligible for auto-merge?**
- [ ] Yes - This is a docs-only, tests-only, or small AI-generated PR
- [ ] No - Requires human review

**Reason for auto-merge eligibility:**
- [ ] Docs-only (Tier 1)
- [ ] Tests-only (Tier 2)
- [ ] Scaffolding < 200 lines (Tier 3)
- [ ] AI-generated < 500 lines (Tier 4)
- [ ] Dependency patch/minor (Tier 5)

**If not auto-merge eligible, why?**
- [ ] Breaking change
- [ ] Security-related
- [ ] Infrastructure changes
- [ ] Requires discussion
- [ ] Large PR (> 500 lines)

## Related Issues

<!-- Link to related issues -->

Closes #
Related to #

## Test Plan

<!-- Describe how you tested these changes -->

## Screenshots (if applicable)

<!-- Add screenshots for UI changes -->

---

**Note**: This PR will be automatically labeled based on files changed.
See `GITHUB_AUTOMATION_RULES.md` for details.

If this PR meets auto-merge criteria (see `AUTO_MERGE_POLICY.md`), it
will be automatically approved and merged after checks pass.

For questions about the merge queue system, see `MERGE_QUEUE_PLAN.md`.
This commit is contained in:
Alexa Amundson
2025-11-19 16:22:17 -06:00
committed by GitHub
10 changed files with 1344 additions and 17 deletions

View File

@@ -0,0 +1,794 @@
# BlackRoad OS - Deployment & Smoke Test Guide
**Version:** 1.0
**Last Updated:** 2025-11-19
**For:** Alexa Louise (Cadillac)
---
## Overview
This guide provides **one-click deployment instructions** and **smoke tests** to verify that BlackRoad OS is fully operational across all services.
**Goal:** Alexa can deploy and verify the entire BlackRoad OS stack without touching individual PRs, configs, or manual interventions.
---
## Table of Contents
1. [Architecture Overview](#architecture-overview)
2. [Prerequisites](#prerequisites)
3. [Local Development](#local-development)
4. [Railway Deployment](#railway-deployment)
5. [Cloudflare DNS Setup](#cloudflare-dns-setup)
6. [Smoke Tests](#smoke-tests)
7. [Monitoring & Health](#monitoring--health)
8. [Troubleshooting](#troubleshooting)
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────────┐
│ Cloudflare CDN │
│ (SSL, DDoS, Caching, WAF) │
└────────────────┬────────────────────────────────────────┘
┌────────┴────────┐
│ │
┌────▼────┐ ┌─────▼──────┐
│ Backend │ │ Operator │
│ Service │ │ Engine │
└────┬────┘ └─────┬──────┘
│ │
┌────▼─────────────────▼────┐
│ Railway Platform │
│ - PostgreSQL │
│ - Redis │
│ - Auto-scaling │
└────────────────────────────┘
```
### Services
1. **Backend (Core API)**
- FastAPI application
- Serves: Frontend UI, API endpoints, Prism Console
- Port: 8000
- Health: `/health`
- Version: `/version`
2. **Operator Engine**
- Job scheduler and GitHub automation
- Port: 8001
- Health: `/health`
- Version: `/version`
3. **Frontend (Served by Backend)**
- Windows 95-style OS UI
- Vanilla JavaScript, zero dependencies
- Served at `/` from backend
4. **Prism Console (Served by Backend)**
- Admin interface
- Served at `/prism` from backend
---
## Prerequisites
### Required Tools
```bash
# Git
git --version # Should be 2.x+
# Python (for local testing)
python --version # Should be 3.11+
# Node.js (optional, for SDK development)
node --version # Should be 18+
# Railway CLI (for deployment)
curl -fsSL https://railway.app/install.sh | sh
railway --version
```
### Required Accounts
- ✅ GitHub account (for repo access)
- ✅ Railway account (for deployment)
- ✅ Cloudflare account (for DNS)
### Required Secrets
Ensure you have these secrets ready:
**Backend:**
- `SECRET_KEY` - JWT signing key (generate with `openssl rand -hex 32`)
- `WALLET_MASTER_KEY` - Wallet encryption key (32 chars)
- `DATABASE_URL` - PostgreSQL connection string (Railway provides)
- `REDIS_URL` - Redis connection string (Railway provides)
- `GITHUB_TOKEN` - GitHub PAT for API access
- `OPENAI_API_KEY` - OpenAI API key (optional)
- `STRIPE_SECRET_KEY` - Stripe secret (optional)
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` - AWS S3 (optional)
**Operator:**
- `GITHUB_TOKEN` - GitHub PAT with repo permissions
- `GITHUB_WEBHOOK_SECRET` - Webhook signature secret
---
## Local Development
### 1. Clone Repository
```bash
git clone https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
cd BlackRoad-Operating-System
```
### 2. Backend Local Setup
```bash
cd backend
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
cp .env.example .env
# Edit .env with your local settings
# (Use defaults for local development)
# Run with Docker Compose (recommended)
docker-compose up
# OR run directly
uvicorn app.main:app --reload
```
**Access locally:**
- Frontend: http://localhost:8000
- API Docs: http://localhost:8000/api/docs
- Prism Console: http://localhost:8000/prism
- Health: http://localhost:8000/health
- Version: http://localhost:8000/version
### 3. Operator Local Setup
```bash
cd operator_engine
# Install dependencies (in separate venv or reuse backend's)
pip install -r requirements.txt
# Run operator server
uvicorn operator_engine.server:app --reload --port 8001
```
**Access locally:**
- Health: http://localhost:8001/health
- Version: http://localhost:8001/version
- Jobs: http://localhost:8001/jobs
### 4. Run Tests
```bash
# Backend tests
cd backend
pytest -v
# Or use helper script
cd ..
bash scripts/run_backend_tests.sh
# Operator tests
cd operator_engine
pytest -v
```
---
## Railway Deployment
### Option 1: Automatic Deployment (via GitHub)
**This is the recommended approach.**
1. **Push to main branch:**
```bash
git add .
git commit -m "Deploy BlackRoad OS"
git push origin main
```
2. **GitHub Actions automatically triggers:**
- `.github/workflows/railway-deploy.yml` runs
- Builds and deploys both services to Railway
- Runs health checks
- Sends notifications (if configured)
3. **Monitor deployment:**
- Go to: https://github.com/blackboxprogramming/BlackRoad-Operating-System/actions
- Watch the "Deploy to Railway" workflow
### Option 2: Manual Deployment (via Railway CLI)
**Use this for testing or troubleshooting.**
1. **Install Railway CLI:**
```bash
curl -fsSL https://railway.app/install.sh | sh
```
2. **Login to Railway:**
```bash
railway login
```
3. **Link to your Railway project:**
```bash
# From repo root
railway link
# Select your project from the list
# Or create a new project
```
4. **Deploy services:**
```bash
# Deploy both services (uses railway.toml)
railway up
# Or deploy specific service
railway up -s blackroad-backend
railway up -s blackroad-operator
```
5. **Check deployment status:**
```bash
railway status
# View logs
railway logs -s blackroad-backend
railway logs -s blackroad-operator
```
6. **Get service URLs:**
```bash
# In Railway dashboard or CLI
railway domain
# Should output something like:
# blackroad-backend: blackroad-backend-production.up.railway.app
# blackroad-operator: blackroad-operator-production.up.railway.app
```
### Configure Environment Variables in Railway
**Via Railway Dashboard:**
1. Go to: https://railway.app/dashboard
2. Select your project
3. Click on each service (backend, operator)
4. Go to **Variables** tab
5. Add environment variables from `.env.example`
**Via Railway CLI:**
```bash
# Set a variable for backend service
railway variables set SECRET_KEY="your-secret-key-here" -s blackroad-backend
# Set a variable for operator service
railway variables set GITHUB_TOKEN="ghp_..." -s blackroad-operator
# Or set from .env file
railway variables set -f backend/.env -s blackroad-backend
```
**Required Variables (Backend):**
```bash
SECRET_KEY=<your-secret-32-char-key>
WALLET_MASTER_KEY=<your-wallet-key-32-chars>
ALLOWED_ORIGINS=https://blackroad.systems,https://api.blackroad.systems,https://os.blackroad.systems
ENVIRONMENT=production
DEBUG=False
GITHUB_TOKEN=<your-github-pat>
OPENAI_API_KEY=<optional>
STRIPE_SECRET_KEY=<optional>
AWS_ACCESS_KEY_ID=<optional>
AWS_SECRET_ACCESS_KEY=<optional>
```
**Required Variables (Operator):**
```bash
ENVIRONMENT=production
GITHUB_TOKEN=<your-github-pat>
GITHUB_WEBHOOK_SECRET=<your-webhook-secret>
```
**Note:** `DATABASE_URL` and `REDIS_URL` are automatically provided by Railway when you add PostgreSQL and Redis services.
---
## Cloudflare DNS Setup
Follow the **[DNS_CLOUDFLARE_PLAN.md](./infra/DNS_CLOUDFLARE_PLAN.md)** document.
**Quick Summary:**
1. Get Railway production URLs (from Railway dashboard or `railway domain`)
2. Login to Cloudflare: https://dash.cloudflare.com
3. Select `blackroad.systems` domain
4. Add CNAME records:
- `api``<backend-railway-url>`
- `core``<backend-railway-url>`
- `operator``<operator-railway-url>`
- `console``<backend-railway-url>`
- `docs``<backend-railway-url>`
- `web``<backend-railway-url>`
- `os``<backend-railway-url>`
- `@` (root) → `<backend-railway-url>`
- `www``<backend-railway-url>`
5. Enable **Proxy** (orange cloud) for all records
6. Set SSL/TLS to **Full (strict)**
---
## Smoke Tests
Run these tests **after deployment** to ensure everything works.
### Automated Smoke Test Script
```bash
#!/bin/bash
# smoke-test.sh - Run after deployment
set -e
# Set your domain or Railway URL
BACKEND_URL=${BACKEND_URL:-"https://api.blackroad.systems"}
OPERATOR_URL=${OPERATOR_URL:-"https://operator.blackroad.systems"}
echo "🔍 Running BlackRoad OS Smoke Tests..."
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
# Test 1: Backend Health
echo -n "✓ Backend Health Check... "
curl -f -s "$BACKEND_URL/health" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 2: Backend Version
echo -n "✓ Backend Version... "
curl -f -s "$BACKEND_URL/version" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 3: API Health Summary
echo -n "✓ API Health Summary... "
curl -f -s "$BACKEND_URL/api/health/summary" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 4: API Docs
echo -n "✓ API Documentation... "
curl -f -s "$BACKEND_URL/api/docs" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 5: Frontend UI
echo -n "✓ Frontend UI... "
curl -f -s "$BACKEND_URL/" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 6: Prism Console
echo -n "✓ Prism Console... "
curl -f -s "$BACKEND_URL/prism" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 7: Operator Health
echo -n "✓ Operator Health Check... "
curl -f -s "$OPERATOR_URL/health" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 8: Operator Version
echo -n "✓ Operator Version... "
curl -f -s "$OPERATOR_URL/version" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
# Test 9: Operator Jobs
echo -n "✓ Operator Jobs List... "
curl -f -s "$OPERATOR_URL/jobs" > /dev/null && echo "✅ PASS" || echo "❌ FAIL"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "✅ All smoke tests complete!"
```
**Save this as `scripts/smoke-test.sh` and run:**
```bash
chmod +x scripts/smoke-test.sh
# Test with Railway URLs
BACKEND_URL=https://<your-backend>.up.railway.app \
OPERATOR_URL=https://<your-operator>.up.railway.app \
./scripts/smoke-test.sh
# Or test with Cloudflare domains
BACKEND_URL=https://api.blackroad.systems \
OPERATOR_URL=https://operator.blackroad.systems \
./scripts/smoke-test.sh
```
### Manual Smoke Tests
**Test 1: Backend Health**
```bash
curl -i https://api.blackroad.systems/health
# Expected:
# HTTP/2 200
# {"status":"healthy","timestamp":1234567890.123}
```
**Test 2: Backend Version**
```bash
curl -i https://api.blackroad.systems/version
# Expected:
# HTTP/2 200
# {
# "service": "blackroad-core",
# "version": "1.0.0",
# "environment": "production",
# "commit": "abc123...",
# "built_at": "2025-11-19T...",
# "python_version": "3.11.x",
# "platform": "Linux"
# }
```
**Test 3: API Health Summary**
```bash
curl -i https://api.blackroad.systems/api/health/summary
# Expected:
# HTTP/2 200
# {
# "status": "healthy" | "degraded" | "unhealthy",
# "summary": {
# "total": 12,
# "connected": 5,
# "not_configured": 7,
# "errors": 0
# },
# "connected_apis": ["github", "openai", ...],
# ...
# }
```
**Test 4: Operator Health**
```bash
curl -i https://operator.blackroad.systems/health
# Expected:
# HTTP/2 200
# {"status":"healthy","version":"0.1.0"}
```
**Test 5: Operator Version**
```bash
curl -i https://operator.blackroad.systems/version
# Expected:
# HTTP/2 200
# {
# "service": "blackroad-operator",
# "version": "0.1.0",
# ...
# }
```
**Test 6: Frontend UI**
```bash
# Open in browser
open https://os.blackroad.systems
# Should see Windows 95-style desktop interface
# Check for:
# - Desktop icons
# - Taskbar at bottom
# - Start menu works
# - Windows can be opened/closed
```
**Test 7: Prism Console**
```bash
# Open in browser
open https://console.blackroad.systems/prism
# Should see dark admin interface
# Check for:
# - Navigation tabs (Overview, Jobs, Agents, Logs, System)
# - Metrics cards
# - System status
```
**Test 8: API Documentation**
```bash
# Open in browser
open https://api.blackroad.systems/api/docs
# Should see Swagger UI
# Check for:
# - All routers listed
# - Endpoints grouped by tags
# - Try out authentication endpoints
```
---
## Monitoring & Health
### Health Endpoints
**Backend:**
- Basic health: `GET /health`
- Version info: `GET /version`
- API health: `GET /api/health/summary`
- Full API health: `GET /api/health/all`
- Individual API: `GET /api/health/{api_name}`
**Operator:**
- Basic health: `GET /health`
- Version info: `GET /version`
- Scheduler status: `GET /scheduler/status`
### Railway Monitoring
**View Metrics:**
1. Go to Railway dashboard
2. Select your project
3. Click on a service
4. View **Metrics** tab:
- CPU usage
- Memory usage
- Network traffic
- Request count
**View Logs:**
```bash
# Via CLI
railway logs -s blackroad-backend
railway logs -s blackroad-operator --tail
# Via Dashboard
# Go to service > Deployments > View Logs
```
### Cloudflare Analytics
1. Login to Cloudflare
2. Select `blackroad.systems`
3. Go to **Analytics** tab
4. Monitor:
- Requests
- Bandwidth
- Unique visitors
- Threats blocked
- Cache performance
### Set Up Alerts
**Railway Alerts:**
- Go to Project Settings > Notifications
- Enable alerts for:
- Deployment failures
- High CPU/Memory usage
- Service crashes
**Cloudflare Health Checks:**
- See: `infra/DNS_CLOUDFLARE_PLAN.md`
- Configure health checks for both services
- Get email alerts on downtime
---
## Troubleshooting
### Issue: Deployment failed
**Check:**
```bash
# View recent logs
railway logs -s blackroad-backend --tail
# Check build logs
railway logs --deployment <deployment-id>
# Re-deploy
railway up -s blackroad-backend
```
**Common causes:**
- Missing environment variables
- Database migration failed
- Dependency installation failed
### Issue: Health check returns 502/503
**Check:**
1. Service is running: `railway status`
2. Logs for errors: `railway logs -s blackroad-backend`
3. Environment variables are set
4. Database and Redis are accessible
**Fix:**
```bash
# Restart service
railway restart -s blackroad-backend
# Or redeploy
railway up -s blackroad-backend
```
### Issue: CORS errors in browser
**Check:**
1. `ALLOWED_ORIGINS` environment variable includes your domain
2. Cloudflare SSL mode is **Full (strict)**
**Fix:**
```bash
# Update allowed origins
railway variables set ALLOWED_ORIGINS="https://blackroad.systems,https://api.blackroad.systems,https://os.blackroad.systems" -s blackroad-backend
# Restart service
railway restart -s blackroad-backend
```
### Issue: DNS not resolving
**Check:**
```bash
# Check DNS propagation
dig api.blackroad.systems
nslookup api.blackroad.systems
# Or use online tool
# https://dnschecker.org
```
**Fix:**
1. Wait up to 24 hours for global propagation
2. Flush local DNS cache
3. Verify CNAME records in Cloudflare
### Issue: 500 Internal Server Error
**Check:**
1. Railway logs: `railway logs -s blackroad-backend`
2. Database connectivity
3. Missing secrets
**Fix:**
```bash
# Check database connection
railway run -s blackroad-backend -- python -c "from app.database import async_engine; print('DB OK')"
# Verify all required env vars are set
railway variables -s blackroad-backend
```
### Issue: API endpoints return 404
**Check:**
1. Correct URL path
2. Service is deployed correctly
3. Routes are registered in `main.py`
**Fix:**
```bash
# Check API documentation for correct paths
open https://api.blackroad.systems/api/docs
# Verify routes
railway logs -s blackroad-backend | grep "Application startup complete"
```
---
## Success Criteria
BlackRoad OS is considered **fully operational** when:
- ✅ Backend service deploys without errors
- ✅ Operator service deploys without errors
- ✅ All health endpoints return 200 OK
- ✅ Frontend UI loads and is interactive
- ✅ Prism Console loads and shows metrics
- ✅ API documentation is accessible
- ✅ DNS resolves correctly for all subdomains
- ✅ SSL certificates are valid
- ✅ All smoke tests pass
- ✅ No errors in Railway logs
- ✅ Cloudflare proxy is active (orange cloud)
**Alexa's Victory Condition:**
> Visit https://os.blackroad.systems and see the Windows 95 desktop.
> Click around, open apps, and everything just works.
> No Git, no PRs, no manual config.
---
## Quick Reference Commands
```bash
# Deploy to Railway
railway up
# View status
railway status
# View logs
railway logs -s blackroad-backend --tail
# Set environment variable
railway variables set SECRET_KEY="xxx" -s blackroad-backend
# Restart service
railway restart -s blackroad-backend
# Run smoke tests
./scripts/smoke-test.sh
# View Railway dashboard
railway open
# SSH into Railway service (for debugging)
railway shell -s blackroad-backend
```
---
## Next Steps After Deployment
1. **Configure monitoring:**
- Set up Cloudflare health checks
- Enable Railway alerts
- Set up external uptime monitoring
2. **Set up CI/CD:**
- GitHub Actions already configured
- Ensure `RAILWAY_TOKEN` secret is set in GitHub
3. **Configure webhooks:**
- Set up GitHub webhooks for operator
- Configure Stripe webhooks (if using payments)
4. **Performance tuning:**
- Monitor Railway metrics
- Adjust replica count if needed
- Configure Cloudflare caching rules
5. **Security hardening:**
- Enable Cloudflare WAF
- Set up rate limiting
- Configure IP whitelisting for `/prism` (optional)
---
**Document Version:** 1.0
**Last Updated:** 2025-11-19
**Maintained by:** Atlas (AI Infrastructure Engineer)
**For:** Alexa Louise (Cadillac), Founder & Operator

View File

@@ -230,6 +230,24 @@ async def health_check():
} }
# Version info
@app.get("/version")
async def version_info():
"""Version information endpoint"""
import platform
from datetime import datetime
return {
"service": "blackroad-core",
"version": settings.APP_VERSION,
"environment": settings.ENVIRONMENT,
"commit": os.getenv("GIT_COMMIT", "unknown"),
"built_at": os.getenv("BUILD_TIMESTAMP", datetime.utcnow().isoformat()),
"python_version": platform.python_version(),
"platform": platform.system(),
}
# API info # API info
@app.get("/api") @app.get("/api")
async def api_info(): async def api_info():

View File

@@ -0,0 +1,355 @@
# Cloudflare DNS Configuration Plan
**Version:** 1.0
**Last Updated:** 2025-11-19
**Owner:** Alexa Louise (Cadillac)
---
## Overview
This document provides the DNS configuration plan for routing BlackRoad OS services through Cloudflare. All services are deployed to Railway and fronted by Cloudflare for CDN, DDoS protection, and SSL termination.
---
## DNS Records
Configure the following DNS records in the Cloudflare dashboard for `blackroad.systems`:
| Subdomain | Type | Target (Railway URL) | Proxy | SSL/TLS Mode | Purpose |
|-----------|------|----------------------|-------|--------------|---------|
| `@` (root) | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Main website redirect |
| `www` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | WWW redirect |
| `api` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Public API Gateway |
| `core` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Core Backend (alias) |
| `operator` | CNAME | `<operator-railway>.up.railway.app` | ☁️ ON | Full (strict) | Operator Engine |
| `console` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Prism Console (served at /prism) |
| `docs` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Documentation site |
| `web` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Web/Frontend OS UI |
| `os` | CNAME | `<backend-railway>.up.railway.app` | ☁️ ON | Full (strict) | Operating System UI |
---
## Service Mapping
### Backend Service (Railway)
**Railway Service Name:** `blackroad-backend`
**Railway URL:** `<TBD>-production.up.railway.app`
**Internal Port:** 8000
**Serves:**
- `/` → Frontend OS UI (backend/static/index.html)
- `/prism` → Prism Console UI
- `/api/*` → All API endpoints
- `/health` → Health check
- `/version` → Version info
- `/api/health/all` → Comprehensive API health
**Routed via:**
- `api.blackroad.systems`
- `core.blackroad.systems`
- `console.blackroad.systems`
- `docs.blackroad.systems`
- `web.blackroad.systems`
- `os.blackroad.systems`
- `blackroad.systems` (root)
- `www.blackroad.systems`
### Operator Service (Railway)
**Railway Service Name:** `blackroad-operator`
**Railway URL:** `<TBD>-production.up.railway.app`
**Internal Port:** 8001
**Serves:**
- `/health` → Health check
- `/version` → Version info
- `/jobs` → Job list
- `/jobs/{id}` → Job details
- `/scheduler/status` → Scheduler status
**Routed via:**
- `operator.blackroad.systems`
---
## How to Configure
### Step 1: Get Railway URLs
After deploying to Railway, retrieve the production URLs:
```bash
# Install Railway CLI
curl -fsSL https://railway.app/install.sh | sh
# Login
railway login
# Link project
railway link <project-id>
# Get service URLs
railway status
# Or check Railway dashboard:
# https://railway.app/dashboard
```
You should see URLs like:
- `blackroad-backend-production.up.railway.app`
- `blackroad-operator-production.up.railway.app`
### Step 2: Add DNS Records in Cloudflare
1. **Login to Cloudflare:** https://dash.cloudflare.com
2. **Select domain:** `blackroad.systems`
3. **Go to DNS tab**
4. **Add CNAME records** per table above
**Example:**
```
Type: CNAME
Name: api
Target: blackroad-backend-production.up.railway.app
Proxy status: Proxied (orange cloud)
TTL: Auto
```
### Step 3: Configure SSL/TLS
1. Go to **SSL/TLS** tab in Cloudflare
2. Set encryption mode to **Full (strict)**
3. Enable **Always Use HTTPS**
4. Enable **Automatic HTTPS Rewrites**
### Step 4: Configure Page Rules (Optional)
Add page rules for better routing:
1. **Force HTTPS:**
- URL: `http://*blackroad.systems/*`
- Setting: Always Use HTTPS
2. **Cache API responses (optional):**
- URL: `api.blackroad.systems/api/*`
- Settings:
- Cache Level: Standard
- Edge Cache TTL: 2 hours
- Browser Cache TTL: 30 minutes
3. **No cache for dynamic endpoints:**
- URL: `api.blackroad.systems/api/auth/*`
- Setting: Cache Level: Bypass
### Step 5: Verify Configuration
Test each endpoint:
```bash
# Backend health
curl -i https://api.blackroad.systems/health
# Backend version
curl -i https://api.blackroad.systems/version
# API health summary
curl -i https://api.blackroad.systems/api/health/summary
# Operator health
curl -i https://operator.blackroad.systems/health
# Operator version
curl -i https://operator.blackroad.systems/version
# Frontend
open https://os.blackroad.systems
# Prism Console
open https://console.blackroad.systems/prism
```
---
## Health Check Configuration
Configure Cloudflare Health Checks for uptime monitoring:
### Backend Health Check
- **Name:** BlackRoad Backend
- **URL:** `https://api.blackroad.systems/health`
- **Interval:** 60 seconds
- **Retries:** 2
- **Expected codes:** 200
- **Notification:** Email on failure
### Operator Health Check
- **Name:** BlackRoad Operator
- **URL:** `https://operator.blackroad.systems/health`
- **Interval:** 60 seconds
- **Retries:** 2
- **Expected codes:** 200
- **Notification:** Email on failure
---
## Firewall Rules
Add Cloudflare firewall rules to protect your services:
### 1. Rate Limiting (Recommended)
- **URL:** `api.blackroad.systems/api/*`
- **Rule:** Block if rate > 100 requests/minute from same IP
### 2. Bot Protection
- **URL:** `*blackroad.systems/*`
- **Rule:** Challenge known bots, block malicious bots
### 3. Geo-blocking (Optional)
- **URL:** `*blackroad.systems/*`
- **Rule:** Block countries with high spam rates (if desired)
---
## Troubleshooting
### Issue: DNS not resolving
**Solution:**
1. Check DNS propagation: https://dnschecker.org
2. Wait up to 24 hours for global propagation
3. Flush local DNS: `sudo dscacheutil -flushcache` (macOS)
### Issue: 521 Error (Web server is down)
**Solution:**
1. Check Railway service status
2. Verify health endpoint: `curl <railway-url>/health`
3. Check Railway logs: `railway logs`
### Issue: 525 Error (SSL handshake failed)
**Solution:**
1. Ensure Railway has valid SSL certificate
2. Set Cloudflare SSL/TLS to **Full (strict)**
3. Check Railway environment variables
### Issue: CORS errors
**Solution:**
1. Ensure `ALLOWED_ORIGINS` in backend `.env` includes Cloudflare domains
2. Add `https://api.blackroad.systems` to allowed origins
3. Restart backend service
---
## Environment Variables
Ensure these environment variables are set in Railway for both services:
### Backend Service
```bash
ENVIRONMENT=production
DEBUG=False
ALLOWED_ORIGINS=https://blackroad.systems,https://www.blackroad.systems,https://os.blackroad.systems,https://api.blackroad.systems,https://console.blackroad.systems,https://docs.blackroad.systems,https://web.blackroad.systems,https://core.blackroad.systems
DATABASE_URL=<postgres-connection-string>
REDIS_URL=<redis-connection-string>
SECRET_KEY=<your-secret-key>
# ... other secrets
```
### Operator Service
```bash
ENVIRONMENT=production
GITHUB_TOKEN=<github-pat>
GITHUB_WEBHOOK_SECRET=<webhook-secret>
# ... other secrets
```
---
## Monitoring & Alerts
### Cloudflare Analytics
Monitor:
- **Requests:** Total requests, unique visitors
- **Bandwidth:** Data transfer
- **Threats:** Blocked requests
- **Performance:** Cache hit ratio
### Railway Metrics
Monitor:
- **CPU usage:** Should stay < 80%
- **Memory usage:** Should stay < 90%
- **Request latency:** p50, p95, p99
- **Error rate:** Should stay < 1%
### Uptime Monitoring (External)
Use a third-party service for independent monitoring:
- UptimeRobot: https://uptimerobot.com
- Pingdom: https://pingdom.com
- StatusCake: https://www.statuscake.com
---
## Security Best Practices
1. **Enable Cloudflare WAF (Web Application Firewall)**
- Protects against OWASP Top 10
- Managed rulesets for common attacks
2. **Use Cloudflare Zero Trust (Optional)**
- Add authentication layer for admin routes
- Restrict `/prism` to authorized IPs
3. **Enable DDoS Protection**
- Already included with Cloudflare proxy
- Configure rate limiting per endpoint
4. **Regular SSL Certificate Rotation**
- Railway auto-renews Let's Encrypt certs
- Cloudflare Universal SSL auto-renews
5. **Audit Logs**
- Review Cloudflare Security Events weekly
- Review Railway deployment logs
- Monitor GitHub webhook events
---
## Next Steps
1. ✅ Deploy backend to Railway
2. ✅ Deploy operator to Railway
3. ⏳ Get Railway production URLs
4. ⏳ Configure Cloudflare DNS records
5. ⏳ Set up health checks
6. ⏳ Configure firewall rules
7. ⏳ Enable monitoring & alerts
8. ⏳ Test all endpoints
9. ⏳ Update documentation with actual URLs
---
## Contact
**Owner:** Alexa Louise (Cadillac)
**Repository:** https://github.com/blackboxprogramming/BlackRoad-Operating-System
**Railway Project:** `<TBD>`
**Cloudflare Account:** `<TBD>`
---
**Last Updated:** 2025-11-19
**Document Version:** 1.0

View File

@@ -0,0 +1,22 @@
FROM python:3.11-slim
WORKDIR /app
# Copy requirements
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 8001
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8001/health')"
# Run the application
CMD ["uvicorn", "operator_engine.server:app", "--host", "0.0.0.0", "--port", "8001"]

View File

@@ -0,0 +1,18 @@
[build]
builder = "NIXPACKS"
[deploy]
startCommand = "uvicorn operator_engine.server:app --host 0.0.0.0 --port $PORT"
healthcheckPath = "/health"
watchPatterns = ["operator_engine/**/*.py"]
[[services]]
name = "blackroad-operator"
[services.healthcheck]
path = "/health"
timeout = 10
[[services.env]]
name = "ENVIRONMENT"
value = "production"

View File

@@ -0,0 +1,7 @@
fastapi==0.104.1
uvicorn==0.24.0
pydantic==2.5.0
pydantic-settings==2.1.0
httpx==0.25.2
pytest==7.4.3
pytest-asyncio==0.21.1

View File

@@ -20,6 +20,24 @@ async def health_check():
return {"status": "healthy", "version": settings.APP_VERSION} return {"status": "healthy", "version": settings.APP_VERSION}
@app.get("/version")
async def version_info():
"""Version information endpoint"""
import platform
import os
from datetime import datetime
return {
"service": "blackroad-operator",
"version": settings.APP_VERSION,
"environment": os.getenv("ENVIRONMENT", "development"),
"commit": os.getenv("GIT_COMMIT", "unknown"),
"built_at": os.getenv("BUILD_TIMESTAMP", datetime.utcnow().isoformat()),
"python_version": platform.python_version(),
"platform": platform.system(),
}
@app.get("/jobs", response_model=List[Dict[str, Any]]) @app.get("/jobs", response_model=List[Dict[str, Any]])
async def list_jobs(): async def list_jobs():
"""List all jobs in the registry""" """List all jobs in the registry"""

View File

@@ -1,11 +0,0 @@
{
"$schema": "https://railway.app/railway.schema.json",
"build": {
"builder": "NIXPACKS"
},
"deploy": {
"startCommand": "uvicorn app.main:app --host 0.0.0.0 --port $PORT",
"healthcheckPath": "/health",
"watchPatterns": ["backend/**", "app/**", "*.py"]
}
}

View File

@@ -1,16 +1,20 @@
[build] # BlackRoad OS Monorepo - Railway Configuration
# This configures multiple services from a single repository
# Backend (Core API)
[[services]]
name = "blackroad-backend"
source = "backend"
[services.build]
builder = "DOCKERFILE" builder = "DOCKERFILE"
dockerfilePath = "backend/Dockerfile" dockerfilePath = "backend/Dockerfile"
[deploy] [services.deploy]
numReplicas = 1 numReplicas = 1
sleepApplication = false sleepApplication = false
restartPolicyType = "ON_FAILURE" restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10 restartPolicyMaxRetries = 10
# startCommand is handled by Dockerfile CMD - no need to override
[[services]]
name = "blackroad-backend"
[services.healthcheck] [services.healthcheck]
path = "/health" path = "/health"
@@ -23,3 +27,26 @@ value = "production"
[[services.env]] [[services.env]]
name = "DEBUG" name = "DEBUG"
value = "False" value = "False"
# Operator Engine (Job Scheduler & GitHub Automation)
[[services]]
name = "blackroad-operator"
source = "operator_engine"
[services.build]
builder = "NIXPACKS"
[services.deploy]
startCommand = "uvicorn operator_engine.server:app --host 0.0.0.0 --port $PORT"
numReplicas = 1
sleepApplication = false
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10
[services.healthcheck]
path = "/health"
timeout = 10
[[services.env]]
name = "ENVIRONMENT"
value = "production"

79
scripts/smoke-test.sh Executable file
View File

@@ -0,0 +1,79 @@
#!/bin/bash
# smoke-test.sh - BlackRoad OS Smoke Tests
# Run this after deployment to verify all services are operational
set -e
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Set your domain or Railway URL
BACKEND_URL=${BACKEND_URL:-"https://api.blackroad.systems"}
OPERATOR_URL=${OPERATOR_URL:-"https://operator.blackroad.systems"}
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo -e "${BLUE}🔍 BlackRoad OS Smoke Tests${NC}"
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo ""
echo -e "Backend URL: ${YELLOW}$BACKEND_URL${NC}"
echo -e "Operator URL: ${YELLOW}$OPERATOR_URL${NC}"
echo ""
PASSED=0
FAILED=0
# Helper function to run tests
test_endpoint() {
local name=$1
local url=$2
local expected_code=${3:-200}
echo -n "Testing $name... "
response=$(curl -s -o /dev/null -w "%{http_code}" "$url" 2>/dev/null) || response="000"
if [ "$response" -eq "$expected_code" ]; then
echo -e "${GREEN}✅ PASS${NC} (HTTP $response)"
((PASSED++))
else
echo -e "${RED}❌ FAIL${NC} (HTTP $response, expected $expected_code)"
((FAILED++))
fi
}
# Backend Tests
echo -e "\n${BLUE}Backend Service Tests:${NC}"
test_endpoint "Health Check" "$BACKEND_URL/health"
test_endpoint "Version Info" "$BACKEND_URL/version"
test_endpoint "API Info" "$BACKEND_URL/api"
test_endpoint "API Health Summary" "$BACKEND_URL/api/health/summary"
test_endpoint "API Documentation" "$BACKEND_URL/api/docs"
test_endpoint "Frontend UI" "$BACKEND_URL/"
test_endpoint "Prism Console" "$BACKEND_URL/prism"
# Operator Tests
echo -e "\n${BLUE}Operator Service Tests:${NC}"
test_endpoint "Health Check" "$OPERATOR_URL/health"
test_endpoint "Version Info" "$OPERATOR_URL/version"
test_endpoint "Jobs List" "$OPERATOR_URL/jobs"
test_endpoint "Scheduler Status" "$OPERATOR_URL/scheduler/status"
# Summary
echo -e "\n${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo -e "${BLUE}Summary:${NC}"
echo -e " ${GREEN}Passed: $PASSED${NC}"
echo -e " ${RED}Failed: $FAILED${NC}"
if [ $FAILED -eq 0 ]; then
echo -e "\n${GREEN}✅ All smoke tests passed!${NC}"
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
exit 0
else
echo -e "\n${RED}❌ Some tests failed. Check logs for details.${NC}"
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
exit 1
fi