7.6 KiB
7.6 KiB
BlackRoad Raspberry Pi Cluster Test Suite
Zero physical intervention. All tests over SSH. Non-destructive. Idempotent.
Quick Start
# Run full test suite on all nodes
./run-tests.sh
# Dry run (shows what would be tested)
./run-tests.sh --dry-run
# Fast mode (skips slow tests like SMART, latency)
./run-tests.sh --fast
# Test single node
./run-tests.sh --node lucidia
# Combine flags
./run-tests.sh --fast --node octavia
Prerequisites
- Passwordless SSH to all nodes (use
ssh-copy-idor Tailscale) - Nodes must be reachable by hostname (use
.localmDNS or/etc/hosts) - Bash 4.0+ on orchestrator machine
jqfor JSON parsing (optional but recommended)
What Gets Tested
| Test Module | Validates | Critical Checks |
|---|---|---|
test_os.sh |
Boot & OS health | Uptime, kernel, filesystem R/W, boot device |
test_storage.sh |
Storage integrity | Disk/inode usage, SMART health |
test_network.sh |
Network fabric | LAN, Tailscale, DNS, mDNS, inter-node ping |
test_time.sh |
Time authority | Chrony tracking, octavia as NTP source, drift |
test_mqtt.sh |
MQTT fabric | Broker reachable, pub/sub roundtrip, retained msgs |
test_systemd.sh |
Service health | BlackRoad services active, no failed units |
test_hardware.sh |
Hardware status | Temperature, throttling, USB, I2C, HDMI |
test_role.sh |
Role correctness | node.yaml exists, role matches hardware |
test_ui.sh |
Dashboard/UI | Grafana, kiosk service, HDMI output |
test_agent.sh |
Operator layer | Heartbeat MQTT, phase cycling, emotional envelope |
Test Results
Results are saved in results/ directory:
- Per-node JSON files:
{node}_{timestamp}.json - Summary report:
summary_{timestamp}.txt
Exit Codes
0= All tests passed1= Warnings present2= Failures detected
Example Output
╔════════════════════════════════════════════════════════╗
║ BlackRoad Cluster Test Suite ║
║ Fri Dec 26 15:30:00 PST 2025 ║
╚════════════════════════════════════════════════════════╝
Mode: LIVE | Speed: FULL
Nodes: lucidia alice aria octavia shellfish
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Testing: lucidia
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
test_os... ✓ PASS
test_storage... ✓ PASS
test_network... ✓ PASS
test_time... ✓ PASS
test_mqtt... ✓ PASS
test_systemd... ✓ PASS
test_hardware... ✓ PASS
test_role... ✓ PASS
test_ui... ✓ PASS
test_agent... ✓ PASS
Summary: 10 pass | 0 warn | 0 fail
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Testing: octavia
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
test_os... ✓ PASS
test_storage... ⚠ WARN
test_network... ✓ PASS
test_time... ✓ PASS
test_mqtt... ✓ PASS
test_systemd... ✓ PASS
test_hardware... ✓ PASS
test_role... ✓ PASS
test_ui... ✓ PASS
test_agent... ✓ PASS
Summary: 9 pass | 1 warn | 0 fail
╔════════════════════════════════════════════════════════╗
║ Cluster Summary ║
╚════════════════════════════════════════════════════════╝
Total Tests: 50
Pass: 48
Warn: 2
Fail: 0
Results saved to: /Users/alexa/blackroad-cluster-tests/results
Status: WARNINGS PRESENT
Safety Guarantees
✅ No destructive operations
- No
mkfs,dd,fdisk, or partition changes - No EEPROM writes
- No forced reboots
- No service restarts
✅ Read-only by default
- All tests use read operations
- MQTT tests use temporary topics and clean up
- File system checks are non-invasive
✅ Idempotent execution
- Safe to run every minute
- Results don't affect system state
- No cumulative side effects
Integration Examples
Cron Job (every 15 minutes)
*/15 * * * * cd /home/alexa/blackroad-cluster-tests && ./run-tests.sh --fast
MQTT Alert on Failure
./run-tests.sh || mosquitto_pub -h octavia -t blackroad/alerts/tests -m "Test suite failed at $(date)"
Dashboard Integration
# Parse JSON results and send to monitoring system
./run-tests.sh --fast
for result in results/*_$(date +%Y%m%d)*.json; do
jq -c . "$result" | curl -X POST https://monitor.blackroad.com/api/test-results -d @-
done
Pre-Deployment Check
#!/bin/bash
# deploy.sh
set -e
echo "Running cluster health check..."
./run-tests.sh --fast
if [ $? -ne 0 ]; then
echo "❌ Cluster health check failed. Aborting deployment."
exit 1
fi
echo "✅ Cluster healthy. Proceeding with deployment..."
# ... deployment logic
Customization
Add Custom Test Module
Create modules/test_custom.sh:
#!/usr/bin/env bash
set -euo pipefail
output_result() {
local status=$1
local message=$2
local details=${3:-"{}"}
echo "{\"test\": \"custom\", \"status\": \"$status\", \"message\": \"$message\", \"details\": $details}"
}
if [[ "${DRY_RUN:-false}" == "true" ]]; then
output_result "DRY_RUN" "Would check custom metric"
exit 0
fi
# Your test logic here
# ...
output_result "PASS" "Custom check passed" "{}"
Then add to TEST_MODULES array in run-tests.sh.
Change Target Nodes
Edit NODES array in run-tests.sh:
NODES=(lucidia alice aria octavia shellfish)
Adjust Thresholds
Edit individual test modules. Example in test_storage.sh:
# Change disk warning from 80% to 70%
elif [[ $ROOT_USAGE -gt 70 ]]; then
output_result "WARN" "Root disk usage high (${ROOT_USAGE}%)" "$DETAILS"
Troubleshooting
SSH Connection Issues
# Test manual SSH
ssh lucidia "echo test"
# Set up passwordless auth
ssh-copy-id lucidia
Missing Tools on Nodes
# Install on each node
ssh lucidia "sudo apt-get update && sudo apt-get install -y jq mosquitto-clients i2c-tools smartmontools"
Slow Test Execution
# Use fast mode to skip SMART checks and latency tests
./run-tests.sh --fast
JSON Parsing Errors
# Install jq on orchestrator machine
brew install jq # macOS
apt-get install jq # Debian/Ubuntu
Architecture
run-tests.sh (orchestrator)
│
├─ SSH to each node
│ │
│ ├─ modules/test_os.sh → JSON output
│ ├─ modules/test_storage.sh → JSON output
│ ├─ modules/test_network.sh → JSON output
│ └─ ... (all test modules)
│
└─ Aggregate results → results/{node}_{timestamp}.json
→ results/summary_{timestamp}.txt
Each test module:
- Runs read-only commands
- Outputs single-line JSON to stdout
- Returns 0 (script always succeeds; status in JSON)
Orchestrator:
- Copies test module via stdin to SSH
- Executes remotely
- Captures JSON output (last line)
- Aggregates into cluster summary
License
MIT - BlackRoad Systems 2025