mirror of
https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
synced 2026-03-17 07:57:19 -05:00
This commit introduces a comprehensive infrastructure overhaul that transforms
BlackRoad OS into a true distributed operating system with unified kernel,
DNS-aware service discovery, and standardized syscall APIs.
## New Infrastructure Components
### 1. Kernel Module (kernel/typescript/)
- Complete TypeScript kernel implementation for all services
- Service registry with production and dev DNS mappings
- RPC client for inter-service communication
- Event bus, job queue, state management
- Structured logging with log levels
- Full type safety with TypeScript
Modules:
- types.ts: Complete type definitions
- serviceRegistry.ts: DNS-aware service discovery
- identity.ts: Service identity and metadata
- config.ts: Environment-aware configuration
- logger.ts: Structured logging
- rpc.ts: Inter-service RPC client
- events.ts: Event bus (pub/sub)
- jobs.ts: Background job queue
- state.ts: Key-value state management
- index.ts: Main exports
### 2. DNS Infrastructure Documentation (infra/DNS.md)
- Complete Cloudflare DNS mapping
- Railway production and dev endpoints
- Email configuration (MX, SPF, DKIM, DMARC)
- SSL/TLS, security, and monitoring settings
- Service-to-domain mapping
- Health check configuration
Production Services:
- operator.blackroad.systems
- core.blackroad.systems
- api.blackroad.systems
- console.blackroad.systems
- docs.blackroad.systems
- web.blackroad.systems
- os.blackroad.systems
- app.blackroad.systems
### 3. Service Registry & Architecture (INFRASTRUCTURE.md)
- Canonical service registry with all endpoints
- Monorepo-to-satellite deployment model
- Service-as-process architecture
- DNS-as-filesystem model
- Inter-service communication patterns
- Service lifecycle management
- Complete environment variable documentation
### 4. Syscall API Specification (SYSCALL_API.md)
- Standard kernel API for all services
- Required syscalls: health, version, identity, RPC
- Optional syscalls: logging, metrics, events, jobs, state
- Complete API documentation with examples
- Express.js implementation guide
Core Endpoints:
- GET /health
- GET /version
- GET /v1/sys/identity
- GET /v1/sys/health
- POST /v1/sys/rpc
- POST /v1/sys/event
- POST /v1/sys/job
- GET/PUT /v1/sys/state
### 5. Railway Deployment Guide (docs/RAILWAY_DEPLOYMENT.md)
- Step-by-step deployment instructions
- Environment variable configuration
- Monitoring and health checks
- Troubleshooting guide
- Best practices for Railway deployment
### 6. Atlas Kernel Scaffold Prompt (prompts/atlas/ATLAS_KERNEL_SCAFFOLD.md)
- Complete prompt for generating new services
- Auto-generates full kernel implementation
- Includes all DNS and Railway mappings
- Production-ready output with zero TODOs
### 7. GitHub Workflow Templates (templates/github-workflows/)
- deploy.yml: Railway auto-deployment
- test.yml: Test suite with coverage
- validate-kernel.yml: Kernel validation
- README.md: Template documentation
## Updated Files
### CLAUDE.md
- Added "Kernel Architecture & DNS Infrastructure" section
- Updated Table of Contents
- Added service architecture diagram
- Documented all new infrastructure files
- Updated repository structure with new directories
- Added kernel and infrastructure to critical path files
## Architecture Impact
This update establishes BlackRoad OS as a distributed operating system where:
- Each Railway service = OS process
- Each Cloudflare domain = mount point
- All services communicate via syscalls
- Unified kernel ensures interoperability
- DNS-aware service discovery
- Production and development environments
## Service Discovery
Services can now discover and call each other:
```typescript
import { rpc } from './kernel';
const user = await rpc.call('core', 'getUserById', { id: 123 });
```
## DNS Mappings
Production:
- operator.blackroad.systems → blackroad-os-operator-production-3983.up.railway.app
- core.blackroad.systems → 9gw4d0h2.up.railway.app
- api.blackroad.systems → ac7bx15h.up.railway.app
Internal (Railway):
- blackroad-os-operator.railway.internal:8001
- blackroad-os-core.railway.internal:8000
- blackroad-os-api.railway.internal:8000
## Next Steps
1. Sync kernel to satellite repos
2. Implement syscall endpoints in all services
3. Update services to use RPC for inter-service calls
4. Configure Cloudflare health checks
5. Deploy updated services to Railway
---
Files Added:
- INFRASTRUCTURE.md
- SYSCALL_API.md
- infra/DNS.md
- docs/RAILWAY_DEPLOYMENT.md
- kernel/typescript/* (9 modules + README)
- prompts/atlas/ATLAS_KERNEL_SCAFFOLD.md
- templates/github-workflows/* (4 files)
Files Modified:
- CLAUDE.md
Total: 22 new files, 1 updated file
557 lines
12 KiB
Markdown
557 lines
12 KiB
Markdown
# BlackRoad OS - Railway Deployment Guide
|
|
|
|
**Version:** 2.0
|
|
**Last Updated:** 2025-11-20
|
|
**Author:** Atlas (Infrastructure Architect)
|
|
**Status:** Production Guide
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Prerequisites](#prerequisites)
|
|
3. [Deployment Architecture](#deployment-architecture)
|
|
4. [Setting Up a New Service](#setting-up-a-new-service)
|
|
5. [Environment Variables](#environment-variables)
|
|
6. [Deploying to Production](#deploying-to-production)
|
|
7. [Monitoring & Health Checks](#monitoring--health-checks)
|
|
8. [Troubleshooting](#troubleshooting)
|
|
9. [Best Practices](#best-practices)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
BlackRoad OS uses a **monorepo-to-satellite deployment model**:
|
|
|
|
1. **Monorepo** (`BlackRoad-Operating-System`): Source of truth, NOT deployed
|
|
2. **Satellite repos**: Individual deployable services (e.g., `blackroad-os-core`)
|
|
3. **Railway**: Hosting platform for all services
|
|
4. **Cloudflare**: DNS + CDN in front of Railway
|
|
|
|
**Railway Project**: 57687c3e-5e71-48b6-bc3e-c5b52aebdfc9
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Required Tools
|
|
|
|
```bash
|
|
# Railway CLI
|
|
curl -fsSL https://railway.app/install.sh | sh
|
|
|
|
# GitHub CLI (optional)
|
|
brew install gh
|
|
|
|
# Node.js 20+
|
|
nvm install 20
|
|
nvm use 20
|
|
```
|
|
|
|
### Required Access
|
|
|
|
- [ ] Railway account with access to BlackRoad OS project
|
|
- [ ] GitHub access to BlackRoad-OS organization
|
|
- [ ] Cloudflare account (for DNS management)
|
|
|
|
---
|
|
|
|
## Deployment Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ Monorepo (BlackRoad-Operating- │
|
|
│ System) - NOT DEPLOYED │
|
|
│ Contains: services/, kernel/ │
|
|
└──────────────┬──────────────────────┘
|
|
│
|
|
GitHub Actions Sync
|
|
│
|
|
┌────────┴────────┐
|
|
▼ ▼
|
|
┌──────────────┐ ┌──────────────┐
|
|
│ Satellite │ │ Satellite │
|
|
│ blackroad-os-│ │ blackroad-os-│
|
|
│ core (repo) │ │ api (repo) │
|
|
└──────┬───────┘ └──────┬───────┘
|
|
│ │
|
|
Railway Deploy Railway Deploy
|
|
│ │
|
|
▼ ▼
|
|
┌──────────────┐ ┌──────────────┐
|
|
│ Railway Svc │ │ Railway Svc │
|
|
│ core-prod │ │ api-prod │
|
|
└──────┬───────┘ └──────┬───────┘
|
|
│ │
|
|
└────────┬────────┘
|
|
│
|
|
Cloudflare DNS
|
|
│
|
|
┌────────┴────────┐
|
|
▼ ▼
|
|
core.blackroad. api.blackroad.
|
|
systems systems
|
|
```
|
|
|
|
---
|
|
|
|
## Setting Up a New Service
|
|
|
|
### Step 1: Create Satellite Repository
|
|
|
|
```bash
|
|
# In GitHub (via web or gh CLI)
|
|
gh repo create BlackRoad-OS/blackroad-os-{service} \
|
|
--private \
|
|
--description "BlackRoad OS - {Service} Service"
|
|
```
|
|
|
|
### Step 2: Initialize Service Code
|
|
|
|
```bash
|
|
# Clone satellite repo
|
|
git clone https://github.com/BlackRoad-OS/blackroad-os-{service}
|
|
cd blackroad-os-{service}
|
|
|
|
# Copy service code from monorepo
|
|
cp -r /path/to/monorepo/services/{service}/* .
|
|
|
|
# Copy kernel
|
|
cp -r /path/to/monorepo/kernel/typescript src/kernel
|
|
|
|
# Initialize package.json (if not present)
|
|
pnpm init
|
|
|
|
# Add kernel dependencies
|
|
pnpm add express dotenv
|
|
pnpm add -D typescript @types/node @types/express tsx
|
|
```
|
|
|
|
### Step 3: Create Railway Service
|
|
|
|
```bash
|
|
# Login to Railway
|
|
railway login
|
|
|
|
# Link to BlackRoad OS project
|
|
railway link 57687c3e-5e71-48b6-bc3e-c5b52aebdfc9
|
|
|
|
# Create new service
|
|
railway service create blackroad-os-{service}-production
|
|
|
|
# Or use Railway dashboard:
|
|
# https://railway.app/project/57687c3e-5e71-48b6-bc3e-c5b52aebdfc9
|
|
```
|
|
|
|
### Step 4: Configure railway.json
|
|
|
|
Create `railway.json` in satellite repo:
|
|
|
|
```json
|
|
{
|
|
"$schema": "https://railway.app/railway.schema.json",
|
|
"build": {
|
|
"builder": "NIXPACKS",
|
|
"buildCommand": "pnpm install && pnpm build"
|
|
},
|
|
"deploy": {
|
|
"startCommand": "pnpm start",
|
|
"restartPolicyType": "ON_FAILURE",
|
|
"restartPolicyMaxRetries": 3
|
|
},
|
|
"healthcheck": {
|
|
"path": "/health",
|
|
"interval": 30,
|
|
"timeout": 10
|
|
}
|
|
}
|
|
```
|
|
|
|
### Step 5: Set Environment Variables
|
|
|
|
In Railway dashboard or CLI:
|
|
|
|
```bash
|
|
# Required variables
|
|
railway variables set SERVICE_NAME=blackroad-os-{service}
|
|
railway variables set SERVICE_ROLE={role}
|
|
railway variables set ENVIRONMENT=production
|
|
railway variables set PORT=8000
|
|
|
|
# Service URLs
|
|
railway variables set OPERATOR_URL=https://operator.blackroad.systems
|
|
railway variables set CORE_API_URL=https://core.blackroad.systems
|
|
railway variables set PUBLIC_API_URL=https://api.blackroad.systems
|
|
|
|
# Internal URLs
|
|
railway variables set OPERATOR_INTERNAL_URL=http://blackroad-os-operator.railway.internal:8001
|
|
railway variables set CORE_API_INTERNAL_URL=http://blackroad-os-core.railway.internal:8000
|
|
|
|
# Service-specific secrets
|
|
railway variables set DATABASE_URL=postgresql://...
|
|
railway variables set REDIS_URL=redis://...
|
|
railway variables set API_KEY=...
|
|
```
|
|
|
|
### Step 6: Deploy
|
|
|
|
```bash
|
|
# Push to main branch
|
|
git add .
|
|
git commit -m "Initial commit"
|
|
git push origin main
|
|
|
|
# Railway auto-deploys on push to main
|
|
# Or manually deploy:
|
|
railway up
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
### Required for All Services
|
|
|
|
```bash
|
|
SERVICE_NAME=blackroad-os-{service}
|
|
SERVICE_ROLE=core|api|operator|web|console|docs|shell
|
|
ENVIRONMENT=production|development|staging
|
|
PORT=8000
|
|
```
|
|
|
|
### Railway Auto-Provided
|
|
|
|
```bash
|
|
RAILWAY_STATIC_URL=<auto>
|
|
RAILWAY_ENVIRONMENT=<auto>
|
|
RAILWAY_PROJECT_ID=<auto>
|
|
```
|
|
|
|
### Inter-Service URLs
|
|
|
|
```bash
|
|
# Public URLs
|
|
OPERATOR_URL=https://operator.blackroad.systems
|
|
CORE_API_URL=https://core.blackroad.systems
|
|
PUBLIC_API_URL=https://api.blackroad.systems
|
|
CONSOLE_URL=https://console.blackroad.systems
|
|
DOCS_URL=https://docs.blackroad.systems
|
|
WEB_URL=https://web.blackroad.systems
|
|
OS_URL=https://os.blackroad.systems
|
|
|
|
# Internal URLs (Railway private network)
|
|
OPERATOR_INTERNAL_URL=http://blackroad-os-operator.railway.internal:8001
|
|
CORE_API_INTERNAL_URL=http://blackroad-os-core.railway.internal:8000
|
|
PUBLIC_API_INTERNAL_URL=http://blackroad-os-api.railway.internal:8000
|
|
CONSOLE_INTERNAL_URL=http://blackroad-os-prism-console.railway.internal:8000
|
|
DOCS_INTERNAL_URL=http://blackroad-os-docs.railway.internal:8000
|
|
WEB_INTERNAL_URL=http://blackroad-os-web.railway.internal:8000
|
|
```
|
|
|
|
### Service-Specific Secrets
|
|
|
|
```bash
|
|
# Database (if applicable)
|
|
DATABASE_URL=postgresql://user:pass@host:5432/db
|
|
|
|
# Redis (if applicable)
|
|
REDIS_URL=redis://host:6379/0
|
|
|
|
# API Keys (service-specific)
|
|
API_KEY=...
|
|
JWT_SECRET=...
|
|
GITHUB_TOKEN=...
|
|
```
|
|
|
|
---
|
|
|
|
## Deploying to Production
|
|
|
|
### Automatic Deployment (GitHub Actions)
|
|
|
|
Create `.github/workflows/deploy.yml` in satellite repo:
|
|
|
|
```yaml
|
|
name: Deploy to Railway
|
|
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
|
|
jobs:
|
|
deploy:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Install Railway CLI
|
|
run: npm install -g @railway/cli
|
|
|
|
- name: Deploy to Railway
|
|
run: railway up --service blackroad-os-{service}-production
|
|
env:
|
|
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}
|
|
|
|
- name: Health Check
|
|
run: |
|
|
sleep 10
|
|
curl -f https://{service}.blackroad.systems/health || exit 1
|
|
```
|
|
|
|
### Manual Deployment
|
|
|
|
```bash
|
|
# Deploy current directory
|
|
railway up
|
|
|
|
# Deploy specific service
|
|
railway up --service blackroad-os-core-production
|
|
|
|
# Deploy with environment
|
|
railway up --environment production
|
|
```
|
|
|
|
### Deployment Checklist
|
|
|
|
Before deploying:
|
|
|
|
- [ ] All tests pass locally
|
|
- [ ] Environment variables configured
|
|
- [ ] `railway.json` present
|
|
- [ ] Health check endpoint implemented
|
|
- [ ] Dockerfile builds successfully (if using Docker)
|
|
- [ ] No hardcoded secrets in code
|
|
- [ ] Logs configured for Railway
|
|
|
|
---
|
|
|
|
## Monitoring & Health Checks
|
|
|
|
### Health Check Endpoints
|
|
|
|
Railway checks `/health` endpoint:
|
|
|
|
```typescript
|
|
app.get('/health', (req, res) => {
|
|
res.json({ status: 'healthy' });
|
|
});
|
|
```
|
|
|
|
**Configuration**:
|
|
- Interval: 30 seconds
|
|
- Timeout: 10 seconds
|
|
- Expected: 200 OK
|
|
|
|
### Viewing Logs
|
|
|
|
```bash
|
|
# Real-time logs
|
|
railway logs
|
|
|
|
# Specific service
|
|
railway logs --service blackroad-os-core-production
|
|
|
|
# Follow logs
|
|
railway logs -f
|
|
|
|
# Last 100 lines
|
|
railway logs --tail 100
|
|
```
|
|
|
|
### Metrics Dashboard
|
|
|
|
View in Railway:
|
|
- CPU usage
|
|
- Memory usage
|
|
- Request rate
|
|
- Error rate
|
|
- Network traffic
|
|
|
|
**URL**: https://railway.app/project/57687c3e-5e71-48b6-bc3e-c5b52aebdfc9
|
|
|
|
### Alerts
|
|
|
|
Configure alerts in Railway dashboard:
|
|
- Service down
|
|
- High CPU usage (> 80%)
|
|
- High memory usage (> 90%)
|
|
- Error rate (> 5%)
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Service Won't Start
|
|
|
|
**Symptoms**: Service crashes immediately after deploy
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check logs
|
|
railway logs --tail 100
|
|
|
|
# Common issues:
|
|
# 1. Missing environment variables
|
|
railway variables
|
|
|
|
# 2. Port mismatch
|
|
# Ensure PORT=8000 or read from process.env.PORT
|
|
|
|
# 3. Build failed
|
|
# Check build logs in Railway dashboard
|
|
```
|
|
|
|
### Health Check Failing
|
|
|
|
**Symptoms**: Railway shows "unhealthy" status
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Test health endpoint directly
|
|
curl https://{service-url}.up.railway.app/health
|
|
|
|
# Check if server is listening on correct port
|
|
# Ensure: app.listen(process.env.PORT || 8000)
|
|
|
|
# Verify health check path in railway.json
|
|
# Should be: "healthcheck": { "path": "/health" }
|
|
```
|
|
|
|
### Can't Connect to Other Services
|
|
|
|
**Symptoms**: RPC calls timeout or fail
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# 1. Verify internal URLs are set
|
|
railway variables | grep INTERNAL
|
|
|
|
# 2. Test internal connectivity
|
|
# Deploy a test service that curls other services
|
|
|
|
# 3. Check Railway private network
|
|
# All services must be in same Railway project
|
|
|
|
# 4. Verify service names
|
|
# Must match {service}.railway.internal format
|
|
```
|
|
|
|
### Environment Variables Not Loading
|
|
|
|
**Symptoms**: Config validation errors
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# List all variables
|
|
railway variables
|
|
|
|
# Set missing variables
|
|
railway variables set KEY=value
|
|
|
|
# Reload service after setting variables
|
|
railway redeploy
|
|
```
|
|
|
|
### Deployment Stuck
|
|
|
|
**Symptoms**: Deploy in progress for > 10 minutes
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Cancel deployment
|
|
railway cancel-deploy
|
|
|
|
# Redeploy
|
|
railway up
|
|
|
|
# If still stuck, check Railway status:
|
|
# https://railway.statuspage.io
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### Code Organization
|
|
|
|
```
|
|
satellite-repo/
|
|
├── src/
|
|
│ ├── index.ts # Entry point
|
|
│ ├── server.ts # Express server
|
|
│ ├── kernel/ # Copied from monorepo
|
|
│ ├── routes/ # API routes
|
|
│ ├── middleware/ # Express middleware
|
|
│ └── workers/ # Background workers
|
|
├── tests/
|
|
├── .github/
|
|
│ └── workflows/
|
|
│ └── deploy.yml # Auto-deploy
|
|
├── railway.json # Railway config
|
|
├── package.json
|
|
├── tsconfig.json
|
|
├── .env.example
|
|
└── README.md
|
|
```
|
|
|
|
### Secrets Management
|
|
|
|
❌ **Never**:
|
|
- Commit secrets to Git
|
|
- Hardcode API keys
|
|
- Log sensitive data
|
|
- Expose secrets in error messages
|
|
|
|
✅ **Always**:
|
|
- Use Railway environment variables
|
|
- Rotate secrets regularly
|
|
- Use different secrets per environment
|
|
- Validate secrets on startup
|
|
|
|
### Performance
|
|
|
|
- Use Railway internal URLs for inter-service calls
|
|
- Enable HTTP/2 for faster connections
|
|
- Implement connection pooling (DB, Redis)
|
|
- Cache frequently-accessed data
|
|
- Use streaming for large responses
|
|
|
|
### Reliability
|
|
|
|
- Implement graceful shutdown
|
|
- Handle SIGTERM signal
|
|
- Use health checks
|
|
- Implement circuit breakers for RPC
|
|
- Add retry logic with exponential backoff
|
|
- Log all errors
|
|
|
|
### Cost Optimization
|
|
|
|
- Right-size services (CPU/memory)
|
|
- Use development environment for testing
|
|
- Clean up unused services
|
|
- Monitor usage in Railway dashboard
|
|
- Use Railway's free tier for development
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- **Railway Docs**: https://docs.railway.app
|
|
- **Railway Dashboard**: https://railway.app/project/57687c3e-5e71-48b6-bc3e-c5b52aebdfc9
|
|
- **Monorepo**: https://github.com/blackboxprogramming/BlackRoad-Operating-System
|
|
- **Satellite Org**: https://github.com/BlackRoad-OS
|
|
- **Cloudflare Dashboard**: https://dash.cloudflare.com
|
|
- **DNS Map**: `../infra/DNS.md`
|
|
- **Service Registry**: `../INFRASTRUCTURE.md`
|
|
- **Syscall API**: `../SYSCALL_API.md`
|
|
|
|
---
|
|
|
|
**Version:** 2.0
|
|
**Last Updated:** 2025-11-20
|
|
**Author:** Atlas (Infrastructure Architect)
|
|
**Status:** ✅ Production Guide
|