Fix and Expansion Blueprint for BlackRoad OS

This document provides a concise, repo-wide action plan to stabilize the existing code, raise test coverage, and expand capabilities in a controlled, incremental manner. It prioritizes high-risk areas first and defines clear ownership and exit criteria so workstreams can proceed in parallel without blocking each other.

Objectives

Restore baseline stability for backend APIs, static UI delivery, and SDKs.
Eliminate configuration drift between environments and align .env with runtime settings.
Increase automated test coverage (unit + integration) across backend, agents, and SDKs.
Consolidate the authoritative UI bundle and publish a verified artifact per release.
Prepare the platform for feature expansion (integrations, analytics, observability) with guardrails.

Workstreams

1) Environment and Config Hardening

Actions:
- Run scripts/railway/validate_env_template.py against .env.example and reconcile with app.config.Settings.
- Enforce fail-fast defaults for non-dev environments (disallow SQLite/localhost unless explicitly enabled).
- Add CI check to block merges if required env keys are missing.
Exit criteria: CI gate fails when required env vars are absent or misaligned; docs updated to reflect required secrets.

2) Backend Stabilization

Actions:
- Run ./test_all.sh --suite backend --strict; fix failing tests in routers (auth, identity, payments, integrations).
- Add contract tests around /health, auth flows, and critical integrations with mocks for external providers.
- Ensure lifespan handlers close Redis/DB cleanly; add regression test for graceful shutdown.
Exit criteria: Backend suite green in strict mode; coverage report published; health and auth routes validated in CI.

3) Agent Library Reliability

Actions:
- Execute ./test_all.sh --suite agents --strict and address flaky agents or missing fixtures.
- Document category-level capabilities and mark experimental agents; add smoke tests for registry/executor.
- Introduce deterministic seeds for any stochastic behaviors to stabilize CI runs.
Exit criteria: Agents suite green; registry smoke test executes deterministically; docs list stable vs experimental agents.

4) SDK (Python & TypeScript) Quality Pass

Actions:
- Run ./test_all.sh --suite sdk-python --strict and ./test_all.sh --suite sdk-typescript --strict.
- Align SDK authentication and error handling with backend responses; add E2E tests against local backend.
- Publish typed client generation steps so released SDKs mirror API schema.
Exit criteria: Both SDK suites green; generated clients match API schema; publish instructions in sdk/README.

5) Static UI Consolidation

Actions:
- Choose backend/static as the authoritative bundle; document deprecation path for blackroad-os/.
- Add visual regression snapshots for key views (dashboard, auth, notifications) and wire into CI.
- Provide a release script that fingerprints assets and uploads a versioned bundle for backend to serve.
Exit criteria: Single source of truth for UI; regression snapshots stored; release script produces versioned artifacts.

6) Observability & Ops

Actions:
- Enable structured logging across backend routers; add tracing hooks where supported.
- Integrate Sentry (or configured alternative) behind env flag with safe defaults.
- Document smoke test checklist in DEPLOYMENT_SMOKE_TEST_GUIDE.md and ensure it references the consolidated UI.
Exit criteria: Logs/traces emitted with request correlation IDs; optional Sentry integrated; smoke guide updated and used.

7) Expansion Pipeline

Actions:
- Define a feature toggle framework for new integrations (Stripe/Twilio/Discord/Slack) to allow staged rollout.
- Add analytics hooks for user actions in the UI and relevant backend events, guarded by opt-in env vars.
- Schedule quarterly dependency audits and supply-chain checks (pip/npm vulnerability scans) in CI.
Exit criteria: Feature flags available; analytics opt-in documented; automated dependency scans included in CI.

Execution Guidance

Start with environment validation to unblock all suites, then tackle backend and agents in parallel.
Keep changes small and merged frequently; avoid large rebases by gating on suite-level CI runs.
For any integration requiring secrets, rely on mocked providers in CI and document manual smoke steps separately.

Milestones

Stability Gate (Week 1): Env validation CI check merged; backend + agents tests audited with failing cases identified.
Consolidation (Week 2-3): Backend/static UI aligned; SDKs synced to API schema; majority of tests passing in strict mode.
Expansion Ready (Week 4): Feature flags landed; observability wired; dependency scan jobs active; release process documented.

4.8 KiB Raw Blame History