Files
blackroad-private-enhancements/docs/TROUBLESHOOTING.md
blackboxprogramming 4acdf1f8ac
Some checks failed
☁️ Cloudflare Deployment / Deploy Workers (push) Has been cancelled
🚂 Railway Deployment / Deploy to Railway (push) Has been cancelled
🌐 Unified Multi-Platform Deployment / 🔍 Prepare (push) Has been cancelled
▲ Vercel Deployment / Deploy to Vercel (push) Has been cancelled
🌐 Unified Multi-Platform Deployment / 🚀 Deploy all platforms (push) Has been cancelled
🔒 Security Scanning / 📦 Dependencies (push) Failing after 40s
🔒 Security Scanning / 🔐 Secrets (push) Failing after 1m34s
💾 Automated Backup / 📦 Backup infrastructure (push) Failing after 45s
🏥 Infrastructure Health Monitoring / 🔍 Health Check (push) Successful in 2s
Initial commit — RoadCode import
2026-03-08 20:04:29 -05:00

9.0 KiB

🔧 Troubleshooting Guide

Common issues and solutions for BlackRoad-Private infrastructure.

Deployment Issues

Railway Deployment Fails

Symptoms:

  • Workflow fails at "Deploy to Railway" step
  • Error: "Authentication failed"

Solutions:

  1. Verify RAILWAY_TOKEN secret is set
  2. Check token hasn't expired
  3. Regenerate token: https://railway.app/account/tokens
  4. Ensure RAILWAY_PROJECT_ID matches your project
# Test locally
railway login
railway link
railway status

Cloudflare Workers Deployment Fails

Symptoms:

  • Error: "Account ID not found"
  • Error: "Zone ID invalid"

Solutions:

  1. Verify secrets in GitHub:

    • CLOUDFLARE_API_TOKEN
    • CLOUDFLARE_ACCOUNT_ID
    • CLOUDFLARE_ZONE_ID
  2. Check API token permissions:

    • Workers: Edit
    • Account Settings: Read
    • Zone: Edit
# Test locally
wrangler whoami
wrangler deploy

Vercel Deployment Fails

Symptoms:

  • Error: "Project not found"
  • Error: "Team/Organization mismatch"

Solutions:

  1. Verify all Vercel secrets:

    • VERCEL_TOKEN
    • VERCEL_ORG_ID
    • VERCEL_PROJECT_ID
  2. Check token scope includes deployment access

# Test locally
vercel whoami
vercel --prod

Health Check Issues

All Health Checks Failing

Symptoms:

  • Health check workflow shows all platforms unhealthy
  • No actual service issues

Solutions:

  1. Verify health URL secrets are set correctly
  2. Check health endpoints return 200 status
  3. Ensure endpoints don't require authentication
# Test manually
curl -v https://your-service-url/api/health

Intermittent Health Check Failures

Symptoms:

  • Health checks sometimes fail
  • Services are actually healthy

Solutions:

  1. Increase health check timeout (currently 100s)
  2. Check platform status pages for outages
  3. Review application logs for slow responses

Build Issues

npm install Fails

Symptoms:

  • Build fails at dependency installation
  • Error: "EACCES" or "Permission denied"

Solutions:

  1. Check package-lock.json is committed
  2. Verify Node version (requires 20+)
  3. Clear cache and retry:
npm cache clean --force
npm ci

Build Timeout

Symptoms:

  • Build takes too long and times out
  • Error: "Process exceeded timeout"

Solutions:

  1. Optimize build process
  2. Increase timeout in railway.json
  3. Use build cache effectively
  4. Consider multi-stage builds

Secret Issues

Secret Not Found

Symptoms:

  • Workflow fails with "secret not found"
  • Variable shows as empty

Solutions:

  1. Go to Settings → Secrets and variables → Actions
  2. Verify secret name matches exactly (case-sensitive)
  3. Check secret is in correct environment
  4. Secrets don't show values - verify by re-creating

Secret Exposed in Logs

Symptoms:

  • Secret value visible in workflow logs

Solutions:

  1. Immediately rotate the secret
  2. Check for echo commands that output secrets
  3. Use *** masking: secrets are auto-masked if in GitHub Secrets
  4. Never log full API responses

Workflow Issues

Workflow Doesn't Trigger

Symptoms:

  • Push to main but no workflow runs
  • Manual dispatch button missing

Solutions:

  1. Check workflow file syntax (YAML is valid)
  2. Verify workflow file is in .github/workflows/
  3. Check branch name matches trigger
  4. Ensure workflow is enabled (Actions → Workflows)
# Validate YAML
yamllint .github/workflows/*.yml

Workflow Stuck on "Queued"

Symptoms:

  • Workflow shows "Queued" for extended time
  • No progress

Solutions:

  1. Check GitHub Actions minutes quota
  2. Verify no concurrent job limits hit
  3. Check for required status checks blocking
  4. Cancel and re-run workflow

Platform-Specific Issues

Railway

Port Binding Error

Error: address already in use

Solution:

# Ensure PORT is set in railway.json
PORT=3000

Database Connection Fails

Error: connection refused

Solution:

  1. Check database service is running
  2. Verify connection string in environment
  3. Check Railway service dependencies

Cloudflare

Worker Exceeds Size Limit

Error: script too large

Solutions:

  1. Reduce bundle size
  2. Use webpack/rollup to minimize
  3. Split into multiple workers
  4. Use ES modules

KV Namespace Not Found

Error: KV namespace binding not found

Solution:

# Create namespace
wrangler kv:namespace create BLACKROAD_PRIVATE_KV

# Update wrangler.toml with namespace ID

Vercel

Function Timeout

Error: FUNCTION_INVOCATION_TIMEOUT

Solutions:

  1. Optimize function performance
  2. Increase timeout in vercel.json:
{
  "functions": {
    "api/*.js": {
      "maxDuration": 10
    }
  }
}

Build Memory Exceeded

Error: Command failed with exit code 137

Solutions:

  1. Reduce build complexity
  2. Use NODE_OPTIONS="--max_old_space_size=4096"
  3. Upgrade Vercel plan for more memory

Security Scan Issues

False Positive Vulnerabilities

Symptoms:

  • Security scan reports vulnerabilities in dev dependencies
  • Vulnerabilities don't affect production

Solutions:

  1. Run: npm audit --production
  2. Add exception in workflow if safe
  3. Update to patched versions when available

TruffleHog Blocks Commit

Symptoms:

  • Secret detected in commit
  • Deployment blocked

Solutions:

  1. DO NOT commit actual secrets
  2. Remove secret from history:
git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch path/to/file" \
  --prune-empty --tag-name-filter cat -- --all
  1. Use .gitignore for sensitive files
  2. Rotate exposed secrets immediately

Monitoring Issues

Health Check Alert Spam

Symptoms:

  • Many health check failure alerts
  • Services are actually healthy

Solutions:

  1. Adjust health check frequency in workflow
  2. Implement retry logic
  3. Add grace period before alerting
  4. Check for network issues

Missing Deployment Logs

Symptoms:

  • Workflow succeeds but no logs
  • Can't debug issues

Solutions:

  1. Add more logging to workflows:
- name: Debug
  run: |
    echo "Variable: ${{ env.VAR }}"
    ls -la
  1. Enable debug logging:
    • Re-run workflow with "Enable debug logging"
  2. Check platform-specific logs

Emergency Procedures

Complete Deployment Failure

All platforms failing:

  1. Check Status Pages:

  2. Rollback:

# Railway
railway rollback

# Cloudflare
wrangler rollback

# Vercel
vercel rollback <deployment-url>
  1. Manual Deploy:
# Deploy to Railway directly
cd BlackRoad-Private
railway up

# Deploy to Cloudflare
wrangler deploy

# Deploy to Vercel
vercel --prod

Data Loss Prevention

If deployment corrupts data:

  1. Immediate:

    • Stop all deployments
    • Isolate affected services
  2. Restore:

    • Find latest backup artifact
    • Download and extract
    • Redeploy from backup
  3. Verify:

    • Test all endpoints
    • Check data integrity
    • Monitor for issues

Getting Help

Before Opening Issue

  1. Check this troubleshooting guide
  2. Review workflow logs
  3. Check platform status pages
  4. Search existing issues

Creating Issue

Include:

  • Platform affected (Railway/Cloudflare/Vercel)
  • Workflow run URL
  • Error message (full text)
  • Steps to reproduce
  • Expected vs actual behavior

Escalation

For critical production issues:

  1. Create high-priority issue
  2. Tag repository maintainers
  3. Contact platform support if needed

Useful Commands

Debugging

# Check all platform statuses
curl -s https://your-railway-url/api/health | jq
curl -s https://your-cloudflare-url/api/health | jq
curl -s https://your-vercel-url/api/health | jq

# View recent deployments
railway deployments
wrangler deployments list
vercel ls

# Check logs
railway logs --tail
wrangler tail
vercel logs

# Test build locally
npm ci
npm run build
npm start

Quick Fixes

# Clear all caches
npm cache clean --force
rm -rf node_modules package-lock.json
npm install

# Reset to working state
git fetch origin
git reset --hard origin/main

# Force redeploy
git commit --allow-empty -m "trigger deploy"
git push

Prevention

Best Practices

  1. Test Locally First:

    npm ci && npm run build && npm start
    
  2. Use Feature Branches:

    • Never push directly to main
    • Use PRs for review
    • Test in staging first
  3. Monitor Actively:

    • Check health dashboards daily
    • Review security scans weekly
    • Verify backups monthly
  4. Document Changes:

    • Update README for config changes
    • Note breaking changes
    • Keep runbooks current
  5. Have Rollback Plan:

    • Know how to rollback each platform
    • Keep previous deployment info
    • Test rollback procedures

Additional Resources