# BlackRoad Raspberry Pi Cluster Test Suite **Zero physical intervention. All tests over SSH. Non-destructive. Idempotent.** ## Quick Start ```bash # Run full test suite on all nodes ./run-tests.sh # Dry run (shows what would be tested) ./run-tests.sh --dry-run # Fast mode (skips slow tests like SMART, latency) ./run-tests.sh --fast # Test single node ./run-tests.sh --node lucidia # Combine flags ./run-tests.sh --fast --node octavia ``` ## Prerequisites - Passwordless SSH to all nodes (use `ssh-copy-id` or Tailscale) - Nodes must be reachable by hostname (use `.local` mDNS or `/etc/hosts`) - Bash 4.0+ on orchestrator machine - `jq` for JSON parsing (optional but recommended) ## What Gets Tested | Test Module | Validates | Critical Checks | |------------|-----------|----------------| | `test_os.sh` | Boot & OS health | Uptime, kernel, filesystem R/W, boot device | | `test_storage.sh` | Storage integrity | Disk/inode usage, SMART health | | `test_network.sh` | Network fabric | LAN, Tailscale, DNS, mDNS, inter-node ping | | `test_time.sh` | Time authority | Chrony tracking, octavia as NTP source, drift | | `test_mqtt.sh` | MQTT fabric | Broker reachable, pub/sub roundtrip, retained msgs | | `test_systemd.sh` | Service health | BlackRoad services active, no failed units | | `test_hardware.sh` | Hardware status | Temperature, throttling, USB, I2C, HDMI | | `test_role.sh` | Role correctness | node.yaml exists, role matches hardware | | `test_ui.sh` | Dashboard/UI | Grafana, kiosk service, HDMI output | | `test_agent.sh` | Operator layer | Heartbeat MQTT, phase cycling, emotional envelope | ## Test Results Results are saved in `results/` directory: - Per-node JSON files: `{node}_{timestamp}.json` - Summary report: `summary_{timestamp}.txt` ### Exit Codes - `0` = All tests passed - `1` = Warnings present - `2` = Failures detected ## Example Output ``` ╔════════════════════════════════════════════════════════╗ ║ BlackRoad Cluster Test Suite ║ ║ Fri Dec 26 15:30:00 PST 2025 ║ ╚════════════════════════════════════════════════════════╝ Mode: LIVE | Speed: FULL Nodes: lucidia alice aria octavia shellfish ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Testing: lucidia ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ test_os... ✓ PASS test_storage... ✓ PASS test_network... ✓ PASS test_time... ✓ PASS test_mqtt... ✓ PASS test_systemd... ✓ PASS test_hardware... ✓ PASS test_role... ✓ PASS test_ui... ✓ PASS test_agent... ✓ PASS Summary: 10 pass | 0 warn | 0 fail ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Testing: octavia ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ test_os... ✓ PASS test_storage... ⚠ WARN test_network... ✓ PASS test_time... ✓ PASS test_mqtt... ✓ PASS test_systemd... ✓ PASS test_hardware... ✓ PASS test_role... ✓ PASS test_ui... ✓ PASS test_agent... ✓ PASS Summary: 9 pass | 1 warn | 0 fail ╔════════════════════════════════════════════════════════╗ ║ Cluster Summary ║ ╚════════════════════════════════════════════════════════╝ Total Tests: 50 Pass: 48 Warn: 2 Fail: 0 Results saved to: /Users/alexa/blackroad-cluster-tests/results Status: WARNINGS PRESENT ``` ## Safety Guarantees ✅ **No destructive operations** - No `mkfs`, `dd`, `fdisk`, or partition changes - No EEPROM writes - No forced reboots - No service restarts ✅ **Read-only by default** - All tests use read operations - MQTT tests use temporary topics and clean up - File system checks are non-invasive ✅ **Idempotent execution** - Safe to run every minute - Results don't affect system state - No cumulative side effects ## Integration Examples ### Cron Job (every 15 minutes) ```bash */15 * * * * cd /home/alexa/blackroad-cluster-tests && ./run-tests.sh --fast ``` ### MQTT Alert on Failure ```bash ./run-tests.sh || mosquitto_pub -h octavia -t blackroad/alerts/tests -m "Test suite failed at $(date)" ``` ### Dashboard Integration ```bash # Parse JSON results and send to monitoring system ./run-tests.sh --fast for result in results/*_$(date +%Y%m%d)*.json; do jq -c . "$result" | curl -X POST https://monitor.blackroad.com/api/test-results -d @- done ``` ### Pre-Deployment Check ```bash #!/bin/bash # deploy.sh set -e echo "Running cluster health check..." ./run-tests.sh --fast if [ $? -ne 0 ]; then echo "❌ Cluster health check failed. Aborting deployment." exit 1 fi echo "✅ Cluster healthy. Proceeding with deployment..." # ... deployment logic ``` ## Customization ### Add Custom Test Module Create `modules/test_custom.sh`: ```bash #!/usr/bin/env bash set -euo pipefail output_result() { local status=$1 local message=$2 local details=${3:-"{}"} echo "{\"test\": \"custom\", \"status\": \"$status\", \"message\": \"$message\", \"details\": $details}" } if [[ "${DRY_RUN:-false}" == "true" ]]; then output_result "DRY_RUN" "Would check custom metric" exit 0 fi # Your test logic here # ... output_result "PASS" "Custom check passed" "{}" ``` Then add to `TEST_MODULES` array in `run-tests.sh`. ### Change Target Nodes Edit `NODES` array in `run-tests.sh`: ```bash NODES=(lucidia alice aria octavia shellfish) ``` ### Adjust Thresholds Edit individual test modules. Example in `test_storage.sh`: ```bash # Change disk warning from 80% to 70% elif [[ $ROOT_USAGE -gt 70 ]]; then output_result "WARN" "Root disk usage high (${ROOT_USAGE}%)" "$DETAILS" ``` ## Troubleshooting ### SSH Connection Issues ```bash # Test manual SSH ssh lucidia "echo test" # Set up passwordless auth ssh-copy-id lucidia ``` ### Missing Tools on Nodes ```bash # Install on each node ssh lucidia "sudo apt-get update && sudo apt-get install -y jq mosquitto-clients i2c-tools smartmontools" ``` ### Slow Test Execution ```bash # Use fast mode to skip SMART checks and latency tests ./run-tests.sh --fast ``` ### JSON Parsing Errors ```bash # Install jq on orchestrator machine brew install jq # macOS apt-get install jq # Debian/Ubuntu ``` ## Architecture ``` run-tests.sh (orchestrator) │ ├─ SSH to each node │ │ │ ├─ modules/test_os.sh → JSON output │ ├─ modules/test_storage.sh → JSON output │ ├─ modules/test_network.sh → JSON output │ └─ ... (all test modules) │ └─ Aggregate results → results/{node}_{timestamp}.json → results/summary_{timestamp}.txt ``` Each test module: 1. Runs read-only commands 2. Outputs single-line JSON to stdout 3. Returns 0 (script always succeeds; status in JSON) Orchestrator: 1. Copies test module via stdin to SSH 2. Executes remotely 3. Captures JSON output (last line) 4. Aggregates into cluster summary ## License MIT - BlackRoad Systems 2025