Files
lucidia-earth/.trinity/greenlight/docs/GREENLIGHT_AI_EXTENSION.md
Alexa Louise 19dc467aff 🌈 Add Light Trinity system (RedLight + GreenLight + YellowLight)
Complete deployment of unified Light Trinity system:

🔴 RedLight: Template & brand system (18 HTML templates)
💚 GreenLight: Project & collaboration (14 layers, 103 templates)
💛 YellowLight: Infrastructure & deployment
🌈 Trinity: Unified compliance & testing

Includes:
- 12 documentation files
- 8 shell scripts
- 18 HTML brand templates
- Trinity compliance workflow

Built by: Cece + Alexa
Date: December 23, 2025
Source: blackroad-os/blackroad-os-infra
🌸
2025-12-23 15:49:37 -06:00

513 lines
14 KiB
Markdown

# 🤗 GreenLight AI & ML Extension
**Extension to GreenLight for HuggingFace & AI Infrastructure**
---
## 🤖 AI Model Lifecycle States
Add these to the **Lifecycle States** category:
| Emoji | State | Code | Trinary | Description |
|-------|-------|------|---------|-------------|
| 🤗 | MODEL_LOADING | `model_loading` | 0 | Loading model into memory |
| 🧠 | MODEL_READY | `model_ready` | +1 | Model loaded and ready |
| ⚡ | INFERENCE_RUNNING | `inference_running` | +1 | Generating output |
| 🔄 | TOKEN_STREAMING | `token_streaming` | +1 | Streaming tokens |
| 💾 | MODEL_CACHED | `model_cached` | +1 | Model in cache |
| 📥 | MODEL_DOWNLOADING | `model_downloading` | 0 | Downloading weights |
| 🏋️ | MODEL_TRAINING | `model_training` | +1 | Model training |
| 📊 | MODEL_EVAL | `model_eval` | 0 | Evaluating performance |
| 🔧 | MODEL_FINE_TUNING | `model_fine_tuning` | +1 | Fine-tuning model |
| ⏱️ | INFERENCE_TIMEOUT | `inference_timeout` | -1 | Request timed out |
---
## 🎯 AI Task Categories
Add to **Domain Tags**:
| Emoji | Category | Code | Description |
|-------|----------|------|-------------|
| 💬 | TEXT_GEN | `text_gen` | Text generation / completion |
| 🗣️ | CHAT | `chat` | Chat completions |
| 🎨 | IMAGE_GEN | `image_gen` | Image generation |
| 🖼️ | IMAGE_EDIT | `image_edit` | Image editing / inpainting |
| 🔤 | EMBEDDINGS | `embeddings` | Vector embeddings |
| 🔍 | OCR | `ocr` | Optical character recognition |
| 🎙️ | TTS | `tts` | Text to speech |
| 👂 | STT | `stt` | Speech to text |
| 🎥 | VIDEO_GEN | `video_gen` | Video generation |
| 🔬 | CLASSIFICATION | `classification` | Classification tasks |
---
## 🏗️ AI Infrastructure Components
| Emoji | Component | Code | Description |
|-------|-----------|------|-------------|
| 🤗 | HUGGINGFACE | `huggingface` | HuggingFace platform |
| ⚡ | VLLM | `vllm` | vLLM inference server |
| 🦙 | LLAMA_CPP | `llama_cpp` | llama.cpp engine |
| 🔥 | TRANSFORMERS | `transformers` | Transformers library |
| 🌐 | INFERENCE_ENDPOINT | `inference_endpoint` | HF Inference Endpoint |
| 📦 | MODEL_HUB | `model_hub` | Model repository |
| 🚀 | SPACE | `space` | HuggingFace Space |
| 💾 | MODEL_CACHE | `model_cache` | Model caching layer |
| 🖥️ | GPU_INSTANCE | `gpu_instance` | GPU compute instance |
---
## 🎛️ GPU Instance Types
| Emoji | Instance | Code | VRAM | Description |
|-------|----------|------|------|-------------|
| 🟢 | T4 | `t4` | 16GB | Small models / testing |
| 🔵 | L4 | `l4` | 24GB | 7B-13B models |
| 🟡 | A10G | `a10g` | 24GB | Production inference |
| 🟠 | A100 | `a100` | 80GB | Large models (70B+) |
| 🔴 | H100 | `h100` | 80GB | Maximum performance |
| 🟣 | JETSON | `jetson` | 8GB | Edge inference |
---
## 🎨 Composite Patterns for AI
### Model Operations
```
🤗📥👉📌 = Downloading model, micro scale
🧠✅🎢⭐ = Model loaded, macro impact, high priority
⚡🔄💬🌀 = Streaming chat tokens, AI domain
🎨✅👉📌 = Image generated, micro scale
```
### Inference Flows
```
⚡💬🧠✅ = Chat inference running, model ready
⚡🎨🖼️✅ = Image generation complete
🔤💾🎢✅ = Embeddings cached, macro scale
🔍📄👉✅ = OCR completed, micro scale
```
### Infrastructure
```
🚀🤗🌐✅ = HF Space deployed
🌐⚡🟡📌 = Inference endpoint on A10G
💾🧠🎢⭐ = Model cached, high priority
⏱️❌🧠🔥 = Inference timeout, fire priority
```
### Combined AI Flow
```
[⚡📥] [🤗🧠] [⚡💬] [🔄📊] [✅🎉] = Request → Load → Inference → Stream → Complete
[🎨⚡] [🖼️✅] = Image generation → success
[🔤💾] [✅🎢] = Embeddings → cached
```
---
## 📝 NATS Subject Patterns (AI)
### Inference Events
```
greenlight.inference.started.micro.ai.{model_name}
greenlight.inference.completed.micro.ai.{model_name}
greenlight.inference.failed.micro.ai.{model_name}
greenlight.inference.timeout.micro.ai.{model_name}
```
### Model Events
```
greenlight.model.loaded.macro.ai.{model_name}
greenlight.model.cached.micro.ai.{model_name}
greenlight.model.downloading.micro.ai.{model_name}
greenlight.model.uploaded.macro.ai.{model_name}
```
### Endpoint Events
```
greenlight.endpoint.created.macro.ai.{endpoint_name}
greenlight.endpoint.paused.micro.ai.{endpoint_name}
greenlight.endpoint.resumed.micro.ai.{endpoint_name}
greenlight.endpoint.scaled.macro.ai.{endpoint_name}
```
### Task-Specific Events
```
greenlight.chat.completed.micro.ai.{model}
greenlight.image.generated.micro.ai.{model}
greenlight.embeddings.cached.micro.ai.{model}
greenlight.ocr.completed.micro.ai.{file}
```
---
## 🔨 AI Memory Templates
### Model Operations
```bash
# Model loading
gl_model_loading() {
local model_name="$1"
local size="${2:-unknown}"
gl_log "🤗📥👉📌" "model_loading" "$model_name" \
"Loading model: $size"
}
# Model ready
gl_model_ready() {
local model_name="$1"
local vram="${2:-unknown}"
gl_log "🧠✅🎢⭐" "model_ready" "$model_name" \
"Model loaded, VRAM: $vram"
}
# Model cached
gl_model_cached() {
local model_name="$1"
local cache_key="$2"
gl_log "💾🧠👉📌" "model_cached" "$model_name" \
"Model cached: $cache_key"
}
# Model downloading
gl_model_downloading() {
local model_name="$1"
local size="${2:-unknown}"
gl_log "📥🤗👉📌" "model_downloading" "$model_name" \
"Downloading: $size"
}
# Model uploaded
gl_model_uploaded() {
local model_name="$1"
local repo_id="$2"
gl_log "📤🤗🎢✅" "model_uploaded" "$model_name" \
"Uploaded to: $repo_id"
}
```
### Inference Operations
```bash
# Inference started
gl_inference_start() {
local task_type="$1" # chat, text_gen, image_gen, etc.
local model="$2"
local request_id="${3:-$(uuidgen)}"
local task_emoji=""
case "$task_type" in
chat) task_emoji="💬" ;;
text_gen) task_emoji="💬" ;;
image_gen) task_emoji="🎨" ;;
embeddings) task_emoji="🔤" ;;
ocr) task_emoji="🔍" ;;
tts) task_emoji="🎙️" ;;
video_gen) task_emoji="🎥" ;;
*) task_emoji="🤖" ;;
esac
gl_log "${task_emoji}👉📌" "inference_start" "$model" \
"$task_type inference started: $request_id"
}
# Inference complete
gl_inference_complete() {
local task_type="$1"
local model="$2"
local duration="${3:-unknown}"
local task_emoji=""
case "$task_type" in
chat) task_emoji="💬" ;;
text_gen) task_emoji="💬" ;;
image_gen) task_emoji="🎨" ;;
embeddings) task_emoji="🔤" ;;
ocr) task_emoji="🔍" ;;
*) task_emoji="🤖" ;;
esac
gl_log "${task_emoji}🎢🎉" "inference_complete" "$model" \
"$task_type complete in $duration"
}
# Inference failed
gl_inference_failed() {
local task_type="$1"
local model="$2"
local error="${3:-unknown error}"
gl_log "❌⚡🤖🔥" "inference_failed" "$model" \
"$task_type failed: $error"
}
# Token streaming
gl_token_streaming() {
local model="$1"
local tokens_generated="$2"
gl_log "🔄💬⚡👉" "token_streaming" "$model" \
"Streaming: $tokens_generated tokens"
}
# Inference timeout
gl_inference_timeout() {
local model="$1"
local timeout_seconds="$2"
gl_log "⏱️❌🤖🔥" "inference_timeout" "$model" \
"Timed out after ${timeout_seconds}s"
}
```
### Endpoint Management
```bash
# Endpoint created
gl_endpoint_created() {
local endpoint_name="$1"
local model="$2"
local instance_type="${3:-unknown}"
local instance_emoji=""
case "$instance_type" in
*t4*) instance_emoji="🟢" ;;
*l4*) instance_emoji="🔵" ;;
*a10g*) instance_emoji="🟡" ;;
*a100*) instance_emoji="🟠" ;;
*h100*) instance_emoji="🔴" ;;
*) instance_emoji="🖥️" ;;
esac
gl_log "🚀🌐${instance_emoji}" "endpoint_created" "$endpoint_name" \
"Endpoint created: $model on $instance_type"
}
# Endpoint paused
gl_endpoint_paused() {
local endpoint_name="$1"
gl_log "⏸️🌐👉📌" "endpoint_paused" "$endpoint_name" \
"Endpoint paused (cost savings)"
}
# Endpoint resumed
gl_endpoint_resumed() {
local endpoint_name="$1"
gl_log "▶️🌐👉📌" "endpoint_resumed" "$endpoint_name" \
"Endpoint resumed"
}
# Endpoint scaled
gl_endpoint_scaled() {
local endpoint_name="$1"
local replicas="$2"
gl_log "📈🌐🎢⭐" "endpoint_scaled" "$endpoint_name" \
"Scaled to $replicas replicas"
}
# Endpoint deleted
gl_endpoint_deleted() {
local endpoint_name="$1"
gl_log "🗑️🌐👉📌" "endpoint_deleted" "$endpoint_name" \
"Endpoint deleted"
}
```
### Space Operations
```bash
# Space deployed
gl_space_deployed() {
local space_name="$1"
local url="$2"
gl_log "🚀🤗🌐✅" "space_deployed" "$space_name" \
"Space deployed: $url"
}
# Space invoked
gl_space_invoked() {
local space_id="$1"
local task_type="$2"
gl_log "⚡🤗👉📌" "space_invoked" "$space_id" \
"Space invoked for: $task_type"
}
```
---
## 🎯 Example Integration: Complete Inference Flow
### Scenario: Chat inference with DeepSeek
```bash
# 1. Load model
gl_model_loading "deepseek-ai/DeepSeek-V3.2" "7B"
# [🤗📥👉📌] model_loading: deepseek-ai/DeepSeek-V3.2 — Loading model: 7B
# 2. Model ready
gl_model_ready "deepseek-ai/DeepSeek-V3.2" "16GB"
# [🧠✅🎢⭐] model_ready: deepseek-ai/DeepSeek-V3.2 — Model loaded, VRAM: 16GB
# 3. Start inference
gl_inference_start "chat" "deepseek-ai/DeepSeek-V3.2" "req_abc123"
# [⚡💬👉📌] inference_start: deepseek-ai/DeepSeek-V3.2 — chat inference started: req_abc123
# 4. Token streaming
gl_token_streaming "deepseek-ai/DeepSeek-V3.2" "247"
# [🔄💬⚡👉] token_streaming: deepseek-ai/DeepSeek-V3.2 — Streaming: 247 tokens
# 5. Complete
gl_inference_complete "chat" "deepseek-ai/DeepSeek-V3.2" "3.2s"
# [✅💬🎢🎉] inference_complete: deepseek-ai/DeepSeek-V3.2 — chat complete in 3.2s
```
### Scenario: Image generation with FLUX
```bash
# 1. Space invoked
gl_space_invoked "black-forest-labs/FLUX.1-dev" "image_gen"
# [⚡🤗👉📌] space_invoked: black-forest-labs/FLUX.1-dev — Space invoked for: image_gen
# 2. Start inference
gl_inference_start "image_gen" "FLUX.1-dev" "img_xyz789"
# [⚡🎨👉📌] inference_start: FLUX.1-dev — image_gen inference started: img_xyz789
# 3. Complete
gl_inference_complete "image_gen" "FLUX.1-dev" "12.4s"
# [✅🎨🎢🎉] inference_complete: FLUX.1-dev — image_gen complete in 12.4s
```
### Scenario: Endpoint lifecycle (Lucidia)
```bash
# 1. Create endpoint
gl_endpoint_created "lucidia-inference" "blackroadio/Lucidia" "nvidia-a10g"
# [🚀🌐🟡✅] endpoint_created: lucidia-inference — Endpoint created: blackroadio/Lucidia on nvidia-a10g
# 2. Run inference
gl_inference_start "chat" "lucidia-inference" "req_lucidia_001"
# [⚡💬👉📌] inference_start: lucidia-inference — chat inference started: req_lucidia_001
# 3. Complete
gl_inference_complete "chat" "lucidia-inference" "2.1s"
# [✅💬🎢🎉] inference_complete: lucidia-inference — chat complete in 2.1s
# 4. Pause for cost savings
gl_endpoint_paused "lucidia-inference"
# [⏸️🌐👉📌] endpoint_paused: lucidia-inference — Endpoint paused (cost savings)
# 5. Resume when needed
gl_endpoint_resumed "lucidia-inference"
# [▶️🌐👉📌] endpoint_resumed: lucidia-inference — Endpoint resumed
```
### Scenario: Inference failure and timeout
```bash
# 1. Start inference
gl_inference_start "chat" "large-model" "req_fail"
# [⚡💬👉📌] inference_start: large-model — chat inference started: req_fail
# 2. Timeout
gl_inference_timeout "large-model" "30"
# [⏱️❌🤖🔥] inference_timeout: large-model — Timed out after 30s
# 3. Failed
gl_inference_failed "chat" "large-model" "OOM error"
# [❌⚡🤖🔥] inference_failed: large-model — chat failed: OOM error
```
---
## 📊 AI Analytics Integration
### Performance Tracking
```bash
# Inference latency
gl_log "📊⚡🎢📌" "latency_metric" "ai-metrics" "p95 latency: 3.2s (chat)"
# Token throughput
gl_log "📊💬👉📌" "throughput_metric" "ai-metrics" "Throughput: 45 tokens/sec"
# Cache hit rate
gl_log "📊💾🎢⭐" "cache_metric" "ai-metrics" "Cache hit rate: 78%"
```
### Cost Tracking
```bash
# GPU costs
gl_log "💰🖥️🎢📌" "gpu_cost" "ai-billing" "A10G usage: $2.47/hour"
# Inference costs
gl_log "💰⚡👉📌" "inference_cost" "ai-billing" "1,247 requests: $12.34"
```
---
## 📚 Integration Checklist
- [x] Extended lifecycle states for AI operations
- [x] Added AI task category tags
- [x] Created infrastructure component tags
- [x] Mapped GPU instance types
- [x] Created composite patterns for inference flows
- [x] Extended NATS subjects for AI events
- [x] Built 15+ AI-specific templates
- [x] Integrated with 27-step GreenLight workflow
- [x] Added analytics tracking patterns
- [x] Added cost tracking patterns
---
## 🎯 HuggingFace Account Details
**Username:** blackroadio
**Profile:** https://huggingface.co/blackroadio
**Models:** 2 (Lucidia, qwen3-235b-a22b)
**API Tokens:** https://huggingface.co/settings/tokens
**Endpoints:** https://endpoints.huggingface.co
### Recommended Models
**Text Generation:**
- openai/gpt-oss-20b (7.2M downloads)
- deepseek-ai/DeepSeek-V3.2 (90.9K downloads)
- nvidia/Nemotron-3-Nano-30B-A3B (247.7K downloads)
**Image Generation:**
- black-forest-labs/FLUX.1-dev (809.7K downloads)
- stabilityai/stable-diffusion-xl-base-1.0 (2.1M downloads)
**Embeddings:**
- sentence-transformers/all-MiniLM-L6-v2 (149.3M downloads)
- BAAI/bge-m3 (8.2M downloads)
**OCR:**
- deepseek-ai/DeepSeek-OCR (4.7M downloads)
### Available Spaces (15+)
- evalstate/flux1_schnell - Fast image generation
- mcp-tools/FLUX.1-Krea-dev - High quality images
- not-lain/background-removal - Remove backgrounds
- ResembleAI/Chatterbox - Text to speech
- mcp-tools/DeepSeek-OCR-experimental - OCR
---
**Created:** December 23, 2025
**For:** HuggingFace AI Infrastructure
**Version:** 2.0.0-ai
**Status:** 🔨 IMPLEMENTATION