Files
blackroad-operating-system/docs/RAILWAY_DEPLOYMENT.md
Claude dbdd8bd148 Add BlackRoad OS Kernel, DNS Infrastructure, and Complete Service Registry
This commit introduces a comprehensive infrastructure overhaul that transforms
BlackRoad OS into a true distributed operating system with unified kernel,
DNS-aware service discovery, and standardized syscall APIs.

## New Infrastructure Components

### 1. Kernel Module (kernel/typescript/)
- Complete TypeScript kernel implementation for all services
- Service registry with production and dev DNS mappings
- RPC client for inter-service communication
- Event bus, job queue, state management
- Structured logging with log levels
- Full type safety with TypeScript

Modules:
- types.ts: Complete type definitions
- serviceRegistry.ts: DNS-aware service discovery
- identity.ts: Service identity and metadata
- config.ts: Environment-aware configuration
- logger.ts: Structured logging
- rpc.ts: Inter-service RPC client
- events.ts: Event bus (pub/sub)
- jobs.ts: Background job queue
- state.ts: Key-value state management
- index.ts: Main exports

### 2. DNS Infrastructure Documentation (infra/DNS.md)
- Complete Cloudflare DNS mapping
- Railway production and dev endpoints
- Email configuration (MX, SPF, DKIM, DMARC)
- SSL/TLS, security, and monitoring settings
- Service-to-domain mapping
- Health check configuration

Production Services:
- operator.blackroad.systems
- core.blackroad.systems
- api.blackroad.systems
- console.blackroad.systems
- docs.blackroad.systems
- web.blackroad.systems
- os.blackroad.systems
- app.blackroad.systems

### 3. Service Registry & Architecture (INFRASTRUCTURE.md)
- Canonical service registry with all endpoints
- Monorepo-to-satellite deployment model
- Service-as-process architecture
- DNS-as-filesystem model
- Inter-service communication patterns
- Service lifecycle management
- Complete environment variable documentation

### 4. Syscall API Specification (SYSCALL_API.md)
- Standard kernel API for all services
- Required syscalls: health, version, identity, RPC
- Optional syscalls: logging, metrics, events, jobs, state
- Complete API documentation with examples
- Express.js implementation guide

Core Endpoints:
- GET /health
- GET /version
- GET /v1/sys/identity
- GET /v1/sys/health
- POST /v1/sys/rpc
- POST /v1/sys/event
- POST /v1/sys/job
- GET/PUT /v1/sys/state

### 5. Railway Deployment Guide (docs/RAILWAY_DEPLOYMENT.md)
- Step-by-step deployment instructions
- Environment variable configuration
- Monitoring and health checks
- Troubleshooting guide
- Best practices for Railway deployment

### 6. Atlas Kernel Scaffold Prompt (prompts/atlas/ATLAS_KERNEL_SCAFFOLD.md)
- Complete prompt for generating new services
- Auto-generates full kernel implementation
- Includes all DNS and Railway mappings
- Production-ready output with zero TODOs

### 7. GitHub Workflow Templates (templates/github-workflows/)
- deploy.yml: Railway auto-deployment
- test.yml: Test suite with coverage
- validate-kernel.yml: Kernel validation
- README.md: Template documentation

## Updated Files

### CLAUDE.md
- Added "Kernel Architecture & DNS Infrastructure" section
- Updated Table of Contents
- Added service architecture diagram
- Documented all new infrastructure files
- Updated repository structure with new directories
- Added kernel and infrastructure to critical path files

## Architecture Impact

This update establishes BlackRoad OS as a distributed operating system where:
- Each Railway service = OS process
- Each Cloudflare domain = mount point
- All services communicate via syscalls
- Unified kernel ensures interoperability
- DNS-aware service discovery
- Production and development environments

## Service Discovery

Services can now discover and call each other:
```typescript
import { rpc } from './kernel';
const user = await rpc.call('core', 'getUserById', { id: 123 });
```

## DNS Mappings

Production:
- operator.blackroad.systems → blackroad-os-operator-production-3983.up.railway.app
- core.blackroad.systems → 9gw4d0h2.up.railway.app
- api.blackroad.systems → ac7bx15h.up.railway.app

Internal (Railway):
- blackroad-os-operator.railway.internal:8001
- blackroad-os-core.railway.internal:8000
- blackroad-os-api.railway.internal:8000

## Next Steps

1. Sync kernel to satellite repos
2. Implement syscall endpoints in all services
3. Update services to use RPC for inter-service calls
4. Configure Cloudflare health checks
5. Deploy updated services to Railway

---

Files Added:
- INFRASTRUCTURE.md
- SYSCALL_API.md
- infra/DNS.md
- docs/RAILWAY_DEPLOYMENT.md
- kernel/typescript/* (9 modules + README)
- prompts/atlas/ATLAS_KERNEL_SCAFFOLD.md
- templates/github-workflows/* (4 files)

Files Modified:
- CLAUDE.md

Total: 22 new files, 1 updated file
2025-11-20 00:48:41 +00:00

12 KiB

BlackRoad OS - Railway Deployment Guide

Version: 2.0 Last Updated: 2025-11-20 Author: Atlas (Infrastructure Architect) Status: Production Guide


Table of Contents

  1. Overview
  2. Prerequisites
  3. Deployment Architecture
  4. Setting Up a New Service
  5. Environment Variables
  6. Deploying to Production
  7. Monitoring & Health Checks
  8. Troubleshooting
  9. Best Practices

Overview

BlackRoad OS uses a monorepo-to-satellite deployment model:

  1. Monorepo (BlackRoad-Operating-System): Source of truth, NOT deployed
  2. Satellite repos: Individual deployable services (e.g., blackroad-os-core)
  3. Railway: Hosting platform for all services
  4. Cloudflare: DNS + CDN in front of Railway

Railway Project: 57687c3e-5e71-48b6-bc3e-c5b52aebdfc9


Prerequisites

Required Tools

# Railway CLI
curl -fsSL https://railway.app/install.sh | sh

# GitHub CLI (optional)
brew install gh

# Node.js 20+
nvm install 20
nvm use 20

Required Access

  • Railway account with access to BlackRoad OS project
  • GitHub access to BlackRoad-OS organization
  • Cloudflare account (for DNS management)

Deployment Architecture

┌─────────────────────────────────────┐
│  Monorepo (BlackRoad-Operating-     │
│  System) - NOT DEPLOYED              │
│  Contains: services/, kernel/        │
└──────────────┬──────────────────────┘
               │
      GitHub Actions Sync
               │
      ┌────────┴────────┐
      ▼                 ▼
┌──────────────┐  ┌──────────────┐
│ Satellite    │  │ Satellite    │
│ blackroad-os-│  │ blackroad-os-│
│ core (repo)  │  │ api (repo)   │
└──────┬───────┘  └──────┬───────┘
       │                 │
  Railway Deploy    Railway Deploy
       │                 │
       ▼                 ▼
┌──────────────┐  ┌──────────────┐
│ Railway Svc  │  │ Railway Svc  │
│ core-prod    │  │ api-prod     │
└──────┬───────┘  └──────┬───────┘
       │                 │
       └────────┬────────┘
                │
         Cloudflare DNS
                │
       ┌────────┴────────┐
       ▼                 ▼
core.blackroad.   api.blackroad.
systems           systems

Setting Up a New Service

Step 1: Create Satellite Repository

# In GitHub (via web or gh CLI)
gh repo create BlackRoad-OS/blackroad-os-{service} \
  --private \
  --description "BlackRoad OS - {Service} Service"

Step 2: Initialize Service Code

# Clone satellite repo
git clone https://github.com/BlackRoad-OS/blackroad-os-{service}
cd blackroad-os-{service}

# Copy service code from monorepo
cp -r /path/to/monorepo/services/{service}/* .

# Copy kernel
cp -r /path/to/monorepo/kernel/typescript src/kernel

# Initialize package.json (if not present)
pnpm init

# Add kernel dependencies
pnpm add express dotenv
pnpm add -D typescript @types/node @types/express tsx

Step 3: Create Railway Service

# Login to Railway
railway login

# Link to BlackRoad OS project
railway link 57687c3e-5e71-48b6-bc3e-c5b52aebdfc9

# Create new service
railway service create blackroad-os-{service}-production

# Or use Railway dashboard:
# https://railway.app/project/57687c3e-5e71-48b6-bc3e-c5b52aebdfc9

Step 4: Configure railway.json

Create railway.json in satellite repo:

{
  "$schema": "https://railway.app/railway.schema.json",
  "build": {
    "builder": "NIXPACKS",
    "buildCommand": "pnpm install && pnpm build"
  },
  "deploy": {
    "startCommand": "pnpm start",
    "restartPolicyType": "ON_FAILURE",
    "restartPolicyMaxRetries": 3
  },
  "healthcheck": {
    "path": "/health",
    "interval": 30,
    "timeout": 10
  }
}

Step 5: Set Environment Variables

In Railway dashboard or CLI:

# Required variables
railway variables set SERVICE_NAME=blackroad-os-{service}
railway variables set SERVICE_ROLE={role}
railway variables set ENVIRONMENT=production
railway variables set PORT=8000

# Service URLs
railway variables set OPERATOR_URL=https://operator.blackroad.systems
railway variables set CORE_API_URL=https://core.blackroad.systems
railway variables set PUBLIC_API_URL=https://api.blackroad.systems

# Internal URLs
railway variables set OPERATOR_INTERNAL_URL=http://blackroad-os-operator.railway.internal:8001
railway variables set CORE_API_INTERNAL_URL=http://blackroad-os-core.railway.internal:8000

# Service-specific secrets
railway variables set DATABASE_URL=postgresql://...
railway variables set REDIS_URL=redis://...
railway variables set API_KEY=...

Step 6: Deploy

# Push to main branch
git add .
git commit -m "Initial commit"
git push origin main

# Railway auto-deploys on push to main
# Or manually deploy:
railway up

Environment Variables

Required for All Services

SERVICE_NAME=blackroad-os-{service}
SERVICE_ROLE=core|api|operator|web|console|docs|shell
ENVIRONMENT=production|development|staging
PORT=8000

Railway Auto-Provided

RAILWAY_STATIC_URL=<auto>
RAILWAY_ENVIRONMENT=<auto>
RAILWAY_PROJECT_ID=<auto>

Inter-Service URLs

# Public URLs
OPERATOR_URL=https://operator.blackroad.systems
CORE_API_URL=https://core.blackroad.systems
PUBLIC_API_URL=https://api.blackroad.systems
CONSOLE_URL=https://console.blackroad.systems
DOCS_URL=https://docs.blackroad.systems
WEB_URL=https://web.blackroad.systems
OS_URL=https://os.blackroad.systems

# Internal URLs (Railway private network)
OPERATOR_INTERNAL_URL=http://blackroad-os-operator.railway.internal:8001
CORE_API_INTERNAL_URL=http://blackroad-os-core.railway.internal:8000
PUBLIC_API_INTERNAL_URL=http://blackroad-os-api.railway.internal:8000
CONSOLE_INTERNAL_URL=http://blackroad-os-prism-console.railway.internal:8000
DOCS_INTERNAL_URL=http://blackroad-os-docs.railway.internal:8000
WEB_INTERNAL_URL=http://blackroad-os-web.railway.internal:8000

Service-Specific Secrets

# Database (if applicable)
DATABASE_URL=postgresql://user:pass@host:5432/db

# Redis (if applicable)
REDIS_URL=redis://host:6379/0

# API Keys (service-specific)
API_KEY=...
JWT_SECRET=...
GITHUB_TOKEN=...

Deploying to Production

Automatic Deployment (GitHub Actions)

Create .github/workflows/deploy.yml in satellite repo:

name: Deploy to Railway

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Railway CLI
        run: npm install -g @railway/cli

      - name: Deploy to Railway
        run: railway up --service blackroad-os-{service}-production
        env:
          RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}

      - name: Health Check
        run: |
          sleep 10
          curl -f https://{service}.blackroad.systems/health || exit 1

Manual Deployment

# Deploy current directory
railway up

# Deploy specific service
railway up --service blackroad-os-core-production

# Deploy with environment
railway up --environment production

Deployment Checklist

Before deploying:

  • All tests pass locally
  • Environment variables configured
  • railway.json present
  • Health check endpoint implemented
  • Dockerfile builds successfully (if using Docker)
  • No hardcoded secrets in code
  • Logs configured for Railway

Monitoring & Health Checks

Health Check Endpoints

Railway checks /health endpoint:

app.get('/health', (req, res) => {
  res.json({ status: 'healthy' });
});

Configuration:

  • Interval: 30 seconds
  • Timeout: 10 seconds
  • Expected: 200 OK

Viewing Logs

# Real-time logs
railway logs

# Specific service
railway logs --service blackroad-os-core-production

# Follow logs
railway logs -f

# Last 100 lines
railway logs --tail 100

Metrics Dashboard

View in Railway:

  • CPU usage
  • Memory usage
  • Request rate
  • Error rate
  • Network traffic

URL: https://railway.app/project/57687c3e-5e71-48b6-bc3e-c5b52aebdfc9

Alerts

Configure alerts in Railway dashboard:

  • Service down
  • High CPU usage (> 80%)
  • High memory usage (> 90%)
  • Error rate (> 5%)

Troubleshooting

Service Won't Start

Symptoms: Service crashes immediately after deploy

Solutions:

# Check logs
railway logs --tail 100

# Common issues:
# 1. Missing environment variables
railway variables

# 2. Port mismatch
# Ensure PORT=8000 or read from process.env.PORT

# 3. Build failed
# Check build logs in Railway dashboard

Health Check Failing

Symptoms: Railway shows "unhealthy" status

Solutions:

# Test health endpoint directly
curl https://{service-url}.up.railway.app/health

# Check if server is listening on correct port
# Ensure: app.listen(process.env.PORT || 8000)

# Verify health check path in railway.json
# Should be: "healthcheck": { "path": "/health" }

Can't Connect to Other Services

Symptoms: RPC calls timeout or fail

Solutions:

# 1. Verify internal URLs are set
railway variables | grep INTERNAL

# 2. Test internal connectivity
# Deploy a test service that curls other services

# 3. Check Railway private network
# All services must be in same Railway project

# 4. Verify service names
# Must match {service}.railway.internal format

Environment Variables Not Loading

Symptoms: Config validation errors

Solutions:

# List all variables
railway variables

# Set missing variables
railway variables set KEY=value

# Reload service after setting variables
railway redeploy

Deployment Stuck

Symptoms: Deploy in progress for > 10 minutes

Solutions:

# Cancel deployment
railway cancel-deploy

# Redeploy
railway up

# If still stuck, check Railway status:
# https://railway.statuspage.io

Best Practices

Code Organization

satellite-repo/
├── src/
│   ├── index.ts          # Entry point
│   ├── server.ts         # Express server
│   ├── kernel/           # Copied from monorepo
│   ├── routes/           # API routes
│   ├── middleware/       # Express middleware
│   └── workers/          # Background workers
├── tests/
├── .github/
│   └── workflows/
│       └── deploy.yml    # Auto-deploy
├── railway.json          # Railway config
├── package.json
├── tsconfig.json
├── .env.example
└── README.md

Secrets Management

Never:

  • Commit secrets to Git
  • Hardcode API keys
  • Log sensitive data
  • Expose secrets in error messages

Always:

  • Use Railway environment variables
  • Rotate secrets regularly
  • Use different secrets per environment
  • Validate secrets on startup

Performance

  • Use Railway internal URLs for inter-service calls
  • Enable HTTP/2 for faster connections
  • Implement connection pooling (DB, Redis)
  • Cache frequently-accessed data
  • Use streaming for large responses

Reliability

  • Implement graceful shutdown
  • Handle SIGTERM signal
  • Use health checks
  • Implement circuit breakers for RPC
  • Add retry logic with exponential backoff
  • Log all errors

Cost Optimization

  • Right-size services (CPU/memory)
  • Use development environment for testing
  • Clean up unused services
  • Monitor usage in Railway dashboard
  • Use Railway's free tier for development

Additional Resources


Version: 2.0 Last Updated: 2025-11-20 Author: Atlas (Infrastructure Architect) Status: Production Guide