mirror of
https://github.com/blackboxprogramming/BlackRoad-Operating-System.git
synced 2026-03-18 03:33:59 -05:00
Add Cognitive Layer - The missing OS layer for AI-human collaboration
This is what AI collaboration should have been from day one. A comprehensive cognitive layer that solves the fundamental problems of context loss, information silos, and coordination chaos. ## Core Components **Intent Graph** - Tracks WHY things happen - Every goal, task, and decision has a rationale - Relationships between objectives are explicit - Context is never lost **Semantic File System** - Files that know what they ARE - Auto-classification based on content and purpose - Semantic search (find by meaning, not just name) - Auto-organization (no more downloads folder chaos) - Files suggest where they belong **Living Documents** - Self-updating documentation - Code-aware: understands what code it documents - Detects when code changes and docs are stale - Can auto-generate from code - Always in sync **Context Engine** - Right information at the right time - Provides relevant context based on current task - Integrates intent, code, docs, and decisions - Proactive intelligence (suggests next actions) - Answers: "Why does this exist?" "What's related?" **Agent Coordination Protocol** - Multi-agent collaboration that works - Shared context via cognitive layer - Clear task ownership and handoffs - No duplicate work - Conflict resolution - Progress tracking **Smart Documents** - OCR, templates, auto-formatting - Extract text from PDFs and images - Identify document types automatically - ATS-friendly resume formatting - Business plan templates - Auto-filing based on content - Template matching and application ## What This Solves Traditional problems: ❌ Files in arbitrary folders ❌ Context lives in people's heads ❌ Docs get out of sync ❌ Multi-agent chaos ❌ Downloads folder anarchy ❌ Lost decisions and rationale Cognitive OS solutions: ✅ Files organize by meaning and purpose ✅ Context is captured and connected ✅ Docs update themselves ✅ Agents coordinate cleanly ✅ Everything auto-organizes ✅ Every decision is recorded with WHY ## Architecture cognitive/ ├── __init__.py # Main CognitiveOS integration ├── intent_graph.py # Goals, tasks, decisions, relationships ├── semantic_fs.py # Content-aware file organization ├── living_docs.py # Self-updating documentation ├── context_engine.py # Intelligent context retrieval ├── agent_coordination.py # Multi-agent collaboration ├── smart_documents.py # OCR, templates, auto-format ├── README.md # Vision and philosophy ├── USAGE.md # Complete usage guide ├── quickstart.py # Interactive demo └── requirements.txt # Optional dependencies ## Quick Start ```python from cognitive import CognitiveOS # Initialize cog = CognitiveOS() # Create a goal with rationale goal = cog.create_goal( "Build user authentication", rationale="Users need secure access" ) # Process a document (auto-classify, auto-organize) cog.process_new_file("~/Downloads/resume.pdf") # Get context for what you're working on context = cog.get_context(task_id="current-task") ``` ## Philosophy This is how AI and data should have been handled from the start: - **Semantic over Hierarchical**: Organize by meaning, not folders - **Intent-Preserving**: Capture WHY, not just WHAT - **Auto-Linking**: Related things connect automatically - **Context-Aware**: System knows what you're trying to do - **Agent-First**: Designed for AI-human collaboration Combines the best of Notion + Asana + actual code awareness + auto-organization + OCR + business planning + ATS-friendly formatting. No more hoping the world doesn't catch on fire. No more downloads folder chaos. No more lost context. This is the cognitive layer every OS should have had.
This commit is contained in:
59
cognitive/README.md
Normal file
59
cognitive/README.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# Cognitive Layer - BlackRoad OS
|
||||||
|
|
||||||
|
## The Problem We're Solving
|
||||||
|
|
||||||
|
Every day, agents and humans struggle with:
|
||||||
|
- **Context fragmentation**: Information scattered across files, commits, docs, chats
|
||||||
|
- **Intent loss**: We forget WHY decisions were made
|
||||||
|
- **Document chaos**: Downloads folder anarchy
|
||||||
|
- **Knowledge silos**: Related info doesn't connect automatically
|
||||||
|
- **Coordination overhead**: Multi-agent/multi-human collaboration is messy
|
||||||
|
|
||||||
|
## The Vision
|
||||||
|
|
||||||
|
What if your OS actually understood what you're trying to accomplish? Not just files and folders, but **intent, context, and meaning**.
|
||||||
|
|
||||||
|
## Core Components
|
||||||
|
|
||||||
|
### 1. Intent Graph
|
||||||
|
Tracks goals, decisions, tasks, and their relationships. Every action has a "why" attached.
|
||||||
|
|
||||||
|
### 2. Semantic File System
|
||||||
|
Auto-organizes based on content and purpose. No more "hope the world doesn't catch on fire" file management.
|
||||||
|
|
||||||
|
### 3. Living Documents
|
||||||
|
Documents that are code-aware, self-updating, and understand their relationship to your project.
|
||||||
|
|
||||||
|
### 4. Context Engine
|
||||||
|
Provides exactly the information you need based on what you're trying to do right now.
|
||||||
|
|
||||||
|
### 5. Knowledge Weaver
|
||||||
|
Automatically connects related information across code, docs, tasks, and decisions.
|
||||||
|
|
||||||
|
### 6. Agent Coordination Protocol
|
||||||
|
Multi-agent collaboration that actually works - shared context, clear handoffs, no duplicate work.
|
||||||
|
|
||||||
|
### 7. Temporal Inspector
|
||||||
|
Understand how your system evolved and why - time-travel for decision making.
|
||||||
|
|
||||||
|
## Philosophy
|
||||||
|
|
||||||
|
- **Semantic over Hierarchical**: Organize by meaning, not arbitrary folder structures
|
||||||
|
- **Intent-Preserving**: Capture WHY, not just WHAT
|
||||||
|
- **Auto-Linking**: Related things connect automatically
|
||||||
|
- **Context-Aware**: The system knows what you're trying to do
|
||||||
|
- **Agent-First**: Designed for AI-human collaboration from the ground up
|
||||||
|
|
||||||
|
## Real-World Use Case: The Smart Document System
|
||||||
|
|
||||||
|
Imagine a doc editor that:
|
||||||
|
- Understands if it's a resume, business plan, technical spec, or meeting notes
|
||||||
|
- Automatically makes it ATS-friendly if it's a resume
|
||||||
|
- OCR scans and extracts structured data
|
||||||
|
- Auto-files itself based on project/purpose
|
||||||
|
- Links to related code, tasks, and decisions
|
||||||
|
- Formats itself correctly for the target audience
|
||||||
|
- Updates when linked code changes
|
||||||
|
- Suggests relevant templates based on intent
|
||||||
|
|
||||||
|
That's what we're building.
|
||||||
428
cognitive/USAGE.md
Normal file
428
cognitive/USAGE.md
Normal file
@@ -0,0 +1,428 @@
|
|||||||
|
# Cognitive OS - Usage Guide
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```python
|
||||||
|
from cognitive import CognitiveOS
|
||||||
|
|
||||||
|
# Initialize the cognitive layer
|
||||||
|
cog = CognitiveOS(workspace_path=".")
|
||||||
|
|
||||||
|
# Show current state
|
||||||
|
cog.show_current_state()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Core Concepts
|
||||||
|
|
||||||
|
### 1. Intent Graph - WHY Things Happen
|
||||||
|
|
||||||
|
The Intent Graph tracks goals, tasks, decisions, and their relationships. Every action has a "why" attached.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create a goal
|
||||||
|
goal = cog.create_goal(
|
||||||
|
title="Build user authentication system",
|
||||||
|
description="Add login, signup, and password reset functionality",
|
||||||
|
rationale="Users need secure access to their accounts"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create tasks under that goal
|
||||||
|
task = cog.create_task(
|
||||||
|
title="Implement password hashing",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="Passwords must be securely stored"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Make a decision and record WHY
|
||||||
|
decision = cog.intent_graph.create_decision(
|
||||||
|
title="Use bcrypt for password hashing",
|
||||||
|
rationale="Industry standard, proven security, good performance",
|
||||||
|
alternatives_considered=[
|
||||||
|
"argon2 (newer but less battle-tested)",
|
||||||
|
"scrypt (more memory intensive)",
|
||||||
|
"pbkdf2 (older, slower)"
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Link decision to task
|
||||||
|
cog.intent_graph.link_nodes(decision.id, task.id, "related")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Semantic File System - WHAT Files Are
|
||||||
|
|
||||||
|
No more "throw it in downloads and hope." Files organize themselves based on what they ARE, not arbitrary folder structures.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Process a new file
|
||||||
|
metadata = cog.semantic_fs.index_file("~/Downloads/resume.pdf")
|
||||||
|
|
||||||
|
print(f"Document type: {metadata.document_type}")
|
||||||
|
# Output: Document type: resume
|
||||||
|
|
||||||
|
# Get suggested location
|
||||||
|
suggested = cog.semantic_fs.suggest_location("~/Downloads/resume.pdf")
|
||||||
|
print(f"Should go here: {suggested}")
|
||||||
|
# Output: Should go here: documents/career/resumes/resume.pdf
|
||||||
|
|
||||||
|
# Auto-organize (actually move the file)
|
||||||
|
cog.semantic_fs.auto_organize("~/Downloads/resume.pdf", dry_run=False)
|
||||||
|
|
||||||
|
# Search semantically
|
||||||
|
results = cog.semantic_fs.search(
|
||||||
|
"technical skills",
|
||||||
|
filters={'document_type': DocumentType.RESUME}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Living Documents - Self-Updating Docs
|
||||||
|
|
||||||
|
Documentation that understands code and updates itself.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create API documentation from code
|
||||||
|
doc = cog.doc_manager.generate_doc(
|
||||||
|
doc_type=DocType.API_REFERENCE,
|
||||||
|
code_file="src/auth.py",
|
||||||
|
output_path="docs/api/auth.md"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if docs are out of sync
|
||||||
|
out_of_sync = cog.doc_manager.check_all_docs()
|
||||||
|
for doc in out_of_sync:
|
||||||
|
print(f"⚠ Needs update: {doc.file_path}")
|
||||||
|
# Auto-update it
|
||||||
|
doc.update_from_code(doc.code_references.pop())
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Context Engine - Right Info, Right Time
|
||||||
|
|
||||||
|
Get exactly the context you need based on what you're doing.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get context for a task
|
||||||
|
context = cog.get_context(task_id="task-123")
|
||||||
|
|
||||||
|
# See what's relevant
|
||||||
|
for item in context.get_top_items(10):
|
||||||
|
print(f"[{item.type}] {item.title} - relevance: {item.relevance_score}")
|
||||||
|
|
||||||
|
# Get context for a file
|
||||||
|
file_context = cog.get_context(file_path="src/auth.py")
|
||||||
|
# Returns: related tasks, documentation, decisions, similar files
|
||||||
|
|
||||||
|
# Get current context (what's happening now?)
|
||||||
|
current = cog.get_context()
|
||||||
|
# Returns: active goals, blocked tasks, recent files, stale docs
|
||||||
|
|
||||||
|
# Query-based context
|
||||||
|
query_context = cog.get_context(query="How does authentication work?")
|
||||||
|
# Returns: relevant code, docs, decisions about auth
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Agent Coordination - Multi-Agent Collaboration
|
||||||
|
|
||||||
|
Multiple agents working together without chaos.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Register agents
|
||||||
|
coder = AgentInfo(name="CodeWriter", role=AgentRole.CODER)
|
||||||
|
reviewer = AgentInfo(name="CodeReviewer", role=AgentRole.REVIEWER)
|
||||||
|
|
||||||
|
cog.agent_coordinator.register_agent(coder)
|
||||||
|
cog.agent_coordinator.register_agent(reviewer)
|
||||||
|
|
||||||
|
# Create a collaboration session
|
||||||
|
session = cog.agent_coordinator.create_session(
|
||||||
|
goal="Implement user authentication",
|
||||||
|
description="Add login, signup, password reset"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assign task to coder
|
||||||
|
cog.agent_coordinator.assign_task("implement-login", coder.id)
|
||||||
|
|
||||||
|
# When coder is done, handoff to reviewer
|
||||||
|
handoff = cog.agent_coordinator.create_handoff(
|
||||||
|
from_agent_id=coder.id,
|
||||||
|
to_agent_id=reviewer.id,
|
||||||
|
task_id="implement-login",
|
||||||
|
handoff_type=HandoffType.REVIEW,
|
||||||
|
message="Login implementation complete, ready for review"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Reviewer accepts and reviews
|
||||||
|
cog.agent_coordinator.accept_handoff(handoff.id, reviewer.id)
|
||||||
|
# ... review happens ...
|
||||||
|
cog.agent_coordinator.complete_handoff(handoff.id, result={"approved": True})
|
||||||
|
|
||||||
|
# Get agent context (what should agent know?)
|
||||||
|
agent_context = cog.agent_coordinator.get_agent_context(coder.id)
|
||||||
|
# Returns: current task, pending handoffs, session info, task context
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Smart Documents - OCR, Templates, Auto-Format
|
||||||
|
|
||||||
|
Documents that understand themselves.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Process a resume with OCR
|
||||||
|
doc = cog.doc_processor.process_document("~/Downloads/resume.pdf")
|
||||||
|
|
||||||
|
# See what was extracted
|
||||||
|
print(f"Name: {doc.structured_data.fields.get('name')}")
|
||||||
|
print(f"Email: {doc.structured_data.fields.get('email')}")
|
||||||
|
print(f"Skills: {doc.structured_data.fields.get('sections', {}).get('skills')}")
|
||||||
|
|
||||||
|
# Format as ATS-friendly
|
||||||
|
ats_content = doc.format_as_ats_friendly()
|
||||||
|
# Saves a clean, ATS-compatible version
|
||||||
|
|
||||||
|
# Create new document from template
|
||||||
|
cog.doc_processor.create_from_template(
|
||||||
|
DocumentTemplate.MEETING_NOTES,
|
||||||
|
"meeting_2024_01_15.md",
|
||||||
|
data={'date': '2024-01-15'}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Process business plan
|
||||||
|
biz_plan = cog.doc_processor.process_document("business_plan.pdf")
|
||||||
|
print(f"Executive summary: {biz_plan.structured_data.fields.get('executive_summary')}")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Complete Workflow Example
|
||||||
|
|
||||||
|
Here's how it all works together:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from cognitive import CognitiveOS, AgentRole, DocumentTemplate
|
||||||
|
|
||||||
|
# 1. Initialize
|
||||||
|
cog = CognitiveOS(workspace_path="~/projects/my_startup")
|
||||||
|
|
||||||
|
# 2. Create a business goal
|
||||||
|
goal = cog.create_goal(
|
||||||
|
title="Launch MVP by Q2",
|
||||||
|
description="Build and ship minimum viable product",
|
||||||
|
rationale="Need to validate product-market fit with real users"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 3. Create tasks
|
||||||
|
task1 = cog.create_task(
|
||||||
|
"Write technical specification",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="Team needs clarity on what to build"
|
||||||
|
)
|
||||||
|
|
||||||
|
task2 = cog.create_task(
|
||||||
|
"Implement core features",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="MVP needs basic functionality"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 4. Create spec document from template
|
||||||
|
spec_path = cog.doc_processor.create_from_template(
|
||||||
|
DocumentTemplate.TECHNICAL_SPEC,
|
||||||
|
"docs/mvp_spec.md"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 5. Link document to task
|
||||||
|
cog.intent_graph.link_artifact(task1.id, spec_path)
|
||||||
|
|
||||||
|
# 6. Create collaboration session with agents
|
||||||
|
session = cog.agent_coordinator.create_session(
|
||||||
|
goal="Build MVP",
|
||||||
|
coordinator_id="human-founder"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 7. Process incoming documents (investor deck, market research)
|
||||||
|
cog.process_new_file("~/Downloads/market_research.pdf")
|
||||||
|
cog.process_new_file("~/Downloads/investor_deck.pptx")
|
||||||
|
# These auto-organize into correct folders
|
||||||
|
|
||||||
|
# 8. Get context when working on implementation
|
||||||
|
context = cog.get_context(task_id=task2.id)
|
||||||
|
print("Here's what you need to know:")
|
||||||
|
for item in context.get_top_items(5):
|
||||||
|
print(f" - [{item.type}] {item.title}")
|
||||||
|
|
||||||
|
# 9. Make decisions and record them
|
||||||
|
decision = cog.intent_graph.create_decision(
|
||||||
|
title="Use React for frontend",
|
||||||
|
rationale="Team knows it, large ecosystem, good performance",
|
||||||
|
alternatives_considered=["Vue.js", "Svelte", "Angular"]
|
||||||
|
)
|
||||||
|
cog.intent_graph.link_nodes(decision.id, task2.id, "related")
|
||||||
|
|
||||||
|
# 10. Check overall progress
|
||||||
|
cog.show_current_state()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Existing Tools
|
||||||
|
|
||||||
|
### Git Integration
|
||||||
|
|
||||||
|
```python
|
||||||
|
# When you make a commit
|
||||||
|
commit_hash = "abc123"
|
||||||
|
|
||||||
|
# Link it to task
|
||||||
|
cog.intent_graph.link_artifact(
|
||||||
|
task_id="implement-auth",
|
||||||
|
artifact_path="src/auth.py",
|
||||||
|
commit_hash=commit_hash
|
||||||
|
)
|
||||||
|
|
||||||
|
# Now you can always find: why was this code written?
|
||||||
|
tasks = cog.intent_graph.find_by_artifact("src/auth.py")
|
||||||
|
for task in tasks:
|
||||||
|
print(f"This file exists because: {task.rationale}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### IDE Integration
|
||||||
|
|
||||||
|
```python
|
||||||
|
# When you open a file in your IDE
|
||||||
|
file_path = "src/auth.py"
|
||||||
|
|
||||||
|
# Get full context
|
||||||
|
context = cog.get_context(file_path=file_path)
|
||||||
|
|
||||||
|
# Show in IDE sidebar:
|
||||||
|
# - Why this file exists (linked tasks)
|
||||||
|
# - What documentation exists
|
||||||
|
# - Recent decisions about this code
|
||||||
|
# - Related files
|
||||||
|
```
|
||||||
|
|
||||||
|
### Document Management
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Watch downloads folder
|
||||||
|
import time
|
||||||
|
from watchdog.observers import Observer
|
||||||
|
from watchdog.events import FileSystemEventHandler
|
||||||
|
|
||||||
|
class DownloadHandler(FileSystemEventHandler):
|
||||||
|
def on_created(self, event):
|
||||||
|
if not event.is_directory:
|
||||||
|
print(f"New file detected: {event.src_path}")
|
||||||
|
cog.process_new_file(event.src_path)
|
||||||
|
|
||||||
|
# Auto-process new downloads
|
||||||
|
observer = Observer()
|
||||||
|
observer.schedule(DownloadHandler(), "~/Downloads", recursive=False)
|
||||||
|
observer.start()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### 1. Always Capture Intent
|
||||||
|
|
||||||
|
```python
|
||||||
|
# ❌ Bad - no context
|
||||||
|
task = cog.create_task("Refactor auth module")
|
||||||
|
|
||||||
|
# ✅ Good - clear intent
|
||||||
|
task = cog.create_task(
|
||||||
|
"Refactor auth module to use dependency injection",
|
||||||
|
rationale="Current tight coupling makes testing difficult. DI will improve testability and modularity."
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Link Everything
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create task
|
||||||
|
task = cog.create_task("Add user profile page")
|
||||||
|
|
||||||
|
# Link the code
|
||||||
|
cog.intent_graph.link_artifact(task.id, "src/components/UserProfile.tsx")
|
||||||
|
|
||||||
|
# Link the docs
|
||||||
|
cog.intent_graph.link_artifact(task.id, "docs/features/user_profile.md")
|
||||||
|
|
||||||
|
# Link the design
|
||||||
|
cog.intent_graph.link_artifact(task.id, "designs/user_profile.fig")
|
||||||
|
|
||||||
|
# Now it's all connected!
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Use Context Proactively
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Before starting work, get context
|
||||||
|
context = cog.get_context(task_id="current-task")
|
||||||
|
|
||||||
|
# Check for blockers
|
||||||
|
if context.filter_by_type("blocker"):
|
||||||
|
print("⚠ There are blockers - resolve these first")
|
||||||
|
|
||||||
|
# Check for relevant decisions
|
||||||
|
decisions = context.filter_by_type("decision")
|
||||||
|
for decision in decisions:
|
||||||
|
print(f"Remember: {decision.title} - {decision.content}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Let Documents Organize Themselves
|
||||||
|
|
||||||
|
```python
|
||||||
|
# ❌ Bad - manual organization
|
||||||
|
# User manually moves file to "somewhere logical"
|
||||||
|
|
||||||
|
# ✅ Good - semantic organization
|
||||||
|
metadata = cog.semantic_fs.index_file(file_path)
|
||||||
|
suggested = cog.semantic_fs.suggest_location(file_path)
|
||||||
|
print(f"This belongs in: {suggested}")
|
||||||
|
|
||||||
|
# Optional: auto-organize
|
||||||
|
if input("Auto-organize? (y/n): ") == 'y':
|
||||||
|
cog.semantic_fs.auto_organize(file_path, dry_run=False)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Keep Docs in Sync
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Regularly check doc status
|
||||||
|
out_of_sync = cog.doc_manager.check_all_docs()
|
||||||
|
|
||||||
|
if out_of_sync:
|
||||||
|
print(f"⚠ {len(out_of_sync)} documents need updating")
|
||||||
|
for doc in out_of_sync:
|
||||||
|
# Review and update
|
||||||
|
doc.update_from_code(next(iter(doc.code_references)))
|
||||||
|
```
|
||||||
|
|
||||||
|
## What Makes This Different?
|
||||||
|
|
||||||
|
### Traditional Approach
|
||||||
|
- Files in arbitrary folders
|
||||||
|
- Context lives in people's heads
|
||||||
|
- Docs get out of sync
|
||||||
|
- Multi-agent chaos
|
||||||
|
- Downloads folder anarchy
|
||||||
|
|
||||||
|
### Cognitive OS Approach
|
||||||
|
- Files organize by meaning and purpose
|
||||||
|
- Context is captured and connected
|
||||||
|
- Docs update themselves
|
||||||
|
- Agents coordinate cleanly
|
||||||
|
- Everything auto-organizes
|
||||||
|
|
||||||
|
## The Big Picture
|
||||||
|
|
||||||
|
The Cognitive OS solves the fundamental problem: **information silos and context loss**.
|
||||||
|
|
||||||
|
Instead of:
|
||||||
|
- Searching for files by name
|
||||||
|
- Wondering why code exists
|
||||||
|
- Docs that lie
|
||||||
|
- Duplicate work
|
||||||
|
- Lost decisions
|
||||||
|
|
||||||
|
You get:
|
||||||
|
- Files that know what they are
|
||||||
|
- Code linked to purpose
|
||||||
|
- Docs that update themselves
|
||||||
|
- Clear collaboration
|
||||||
|
- Preserved context
|
||||||
|
|
||||||
|
This is how AI + humans should have been working all along.
|
||||||
253
cognitive/__init__.py
Normal file
253
cognitive/__init__.py
Normal file
@@ -0,0 +1,253 @@
|
|||||||
|
"""
|
||||||
|
Cognitive Layer for BlackRoad OS
|
||||||
|
|
||||||
|
The missing layer that should have existed from day one.
|
||||||
|
|
||||||
|
This module integrates:
|
||||||
|
- Intent Graph: WHY things happen
|
||||||
|
- Semantic FS: WHAT files are and WHERE they belong
|
||||||
|
- Living Docs: Documentation that updates itself
|
||||||
|
- Context Engine: RIGHT information at the RIGHT time
|
||||||
|
- Agent Coordination: Multi-agent collaboration that works
|
||||||
|
- Smart Documents: OCR, ATS-friendly, auto-organizing documents
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .intent_graph import IntentGraph, IntentNode, IntentType, IntentStatus
|
||||||
|
from .semantic_fs import SemanticFileSystem, DocumentType, DocumentPurpose
|
||||||
|
from .living_docs import DocManager, LivingDocument, DocType
|
||||||
|
from .context_engine import ContextEngine, ContextBundle, ContextItem
|
||||||
|
from .agent_coordination import AgentCoordinator, AgentInfo, AgentRole, Handoff
|
||||||
|
from .smart_documents import DocumentProcessor, SmartDocument, DocumentTemplate
|
||||||
|
|
||||||
|
__version__ = "0.1.0"
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
# Intent Graph
|
||||||
|
'IntentGraph',
|
||||||
|
'IntentNode',
|
||||||
|
'IntentType',
|
||||||
|
'IntentStatus',
|
||||||
|
|
||||||
|
# Semantic FS
|
||||||
|
'SemanticFileSystem',
|
||||||
|
'DocumentType',
|
||||||
|
'DocumentPurpose',
|
||||||
|
|
||||||
|
# Living Docs
|
||||||
|
'DocManager',
|
||||||
|
'LivingDocument',
|
||||||
|
'DocType',
|
||||||
|
|
||||||
|
# Context Engine
|
||||||
|
'ContextEngine',
|
||||||
|
'ContextBundle',
|
||||||
|
'ContextItem',
|
||||||
|
|
||||||
|
# Agent Coordination
|
||||||
|
'AgentCoordinator',
|
||||||
|
'AgentInfo',
|
||||||
|
'AgentRole',
|
||||||
|
'Handoff',
|
||||||
|
|
||||||
|
# Smart Documents
|
||||||
|
'DocumentProcessor',
|
||||||
|
'SmartDocument',
|
||||||
|
'DocumentTemplate',
|
||||||
|
|
||||||
|
# Main integration
|
||||||
|
'CognitiveOS',
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
class CognitiveOS:
|
||||||
|
"""
|
||||||
|
The Cognitive Operating System - integrates all cognitive components.
|
||||||
|
|
||||||
|
This is the main entry point for using the cognitive layer.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, workspace_path: str = "."):
|
||||||
|
"""
|
||||||
|
Initialize the Cognitive OS.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
workspace_path: Root path for the workspace
|
||||||
|
"""
|
||||||
|
# Initialize all systems
|
||||||
|
self.intent_graph = IntentGraph()
|
||||||
|
self.semantic_fs = SemanticFileSystem(f"{workspace_path}/.semantic_fs_index.json")
|
||||||
|
self.doc_manager = DocManager(f"{workspace_path}/.living_docs_index.json")
|
||||||
|
self.context_engine = ContextEngine(
|
||||||
|
intent_graph=self.intent_graph,
|
||||||
|
semantic_fs=self.semantic_fs,
|
||||||
|
doc_manager=self.doc_manager
|
||||||
|
)
|
||||||
|
self.agent_coordinator = AgentCoordinator(
|
||||||
|
intent_graph=self.intent_graph,
|
||||||
|
context_engine=self.context_engine
|
||||||
|
)
|
||||||
|
self.doc_processor = DocumentProcessor(
|
||||||
|
semantic_fs=self.semantic_fs,
|
||||||
|
doc_manager=self.doc_manager,
|
||||||
|
intent_graph=self.intent_graph
|
||||||
|
)
|
||||||
|
|
||||||
|
self.workspace_path = workspace_path
|
||||||
|
|
||||||
|
def process_new_file(self, file_path: str) -> None:
|
||||||
|
"""
|
||||||
|
Process a new file through the cognitive layer.
|
||||||
|
|
||||||
|
This is the magic - a file gets:
|
||||||
|
- Analyzed and classified (Semantic FS)
|
||||||
|
- Auto-organized (suggested location)
|
||||||
|
- Linked to intent (why does it exist?)
|
||||||
|
- OCR'd if needed (Smart Docs)
|
||||||
|
- Template-matched (Smart Docs)
|
||||||
|
"""
|
||||||
|
# 1. Index in semantic FS
|
||||||
|
metadata = self.semantic_fs.index_file(file_path)
|
||||||
|
print(f"✓ Indexed: {metadata.document_type.value}")
|
||||||
|
|
||||||
|
# 2. Suggest where it should go
|
||||||
|
suggested_path = self.semantic_fs.suggest_location(file_path)
|
||||||
|
print(f"✓ Suggested location: {suggested_path}")
|
||||||
|
|
||||||
|
# 3. Process with smart docs if it's a document
|
||||||
|
if metadata.document_type in [DocumentType.RESUME, DocumentType.BUSINESS_PLAN,
|
||||||
|
DocumentType.MEETING_NOTES]:
|
||||||
|
doc = self.doc_processor.process_document(file_path, auto_organize=False)
|
||||||
|
print(f"✓ Processed with template: {doc.suggested_template.value if doc.suggested_template else 'none'}")
|
||||||
|
|
||||||
|
print()
|
||||||
|
|
||||||
|
def create_goal(self, title: str, description: str = "", rationale: str = "") -> IntentNode:
|
||||||
|
"""Create a new goal with full cognitive integration"""
|
||||||
|
goal = self.intent_graph.create_goal(title, description, rationale)
|
||||||
|
print(f"✓ Created goal: {title}")
|
||||||
|
if rationale:
|
||||||
|
print(f" Why: {rationale}")
|
||||||
|
return goal
|
||||||
|
|
||||||
|
def create_task(self, title: str, goal_id: str = None, rationale: str = "") -> IntentNode:
|
||||||
|
"""Create a task linked to a goal"""
|
||||||
|
task = self.intent_graph.create_task(title, parent_id=goal_id, rationale=rationale)
|
||||||
|
print(f"✓ Created task: {title}")
|
||||||
|
return task
|
||||||
|
|
||||||
|
def get_context(self, query: str = None, task_id: str = None, file_path: str = None):
|
||||||
|
"""
|
||||||
|
Get relevant context based on what you're asking about.
|
||||||
|
|
||||||
|
This is the intelligence - provide the right info at the right time.
|
||||||
|
"""
|
||||||
|
if query:
|
||||||
|
return self.context_engine.get_context_for_query(query)
|
||||||
|
elif task_id:
|
||||||
|
return self.context_engine.get_context_for_task(task_id)
|
||||||
|
elif file_path:
|
||||||
|
return self.context_engine.get_context_for_file(file_path)
|
||||||
|
else:
|
||||||
|
return self.context_engine.get_current_context()
|
||||||
|
|
||||||
|
def show_current_state(self) -> None:
|
||||||
|
"""Show the current state of the cognitive OS"""
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print("COGNITIVE OS STATE")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
# Intent graph summary
|
||||||
|
print("\n" + self.intent_graph.get_summary())
|
||||||
|
|
||||||
|
# Active context
|
||||||
|
print("\nCURRENT CONTEXT")
|
||||||
|
print("-" * 60)
|
||||||
|
current = self.context_engine.get_current_context()
|
||||||
|
for item in current.get_top_items(5):
|
||||||
|
print(f" [{item.type}] {item.title} (relevance: {item.relevance_score:.2f})")
|
||||||
|
|
||||||
|
# File system stats
|
||||||
|
print("\nSEMANTIC FILE SYSTEM")
|
||||||
|
print("-" * 60)
|
||||||
|
print(f" Indexed files: {len(self.semantic_fs.files)}")
|
||||||
|
|
||||||
|
# Document stats
|
||||||
|
print("\nLIVING DOCUMENTS")
|
||||||
|
print("-" * 60)
|
||||||
|
print(f" Tracked documents: {len(self.doc_manager.documents)}")
|
||||||
|
out_of_sync = self.doc_manager.check_all_docs()
|
||||||
|
if out_of_sync:
|
||||||
|
print(f" ⚠ Out of sync: {len(out_of_sync)}")
|
||||||
|
|
||||||
|
# Agent stats
|
||||||
|
print("\nAGENT COORDINATION")
|
||||||
|
print("-" * 60)
|
||||||
|
print(f" Registered agents: {len(self.agent_coordinator.agents)}")
|
||||||
|
print(f" Active sessions: {len(self.agent_coordinator.sessions)}")
|
||||||
|
|
||||||
|
print("\n" + "=" * 60 + "\n")
|
||||||
|
|
||||||
|
def export_all(self, export_dir: str = ".cognitive_export") -> None:
|
||||||
|
"""Export all cognitive data"""
|
||||||
|
from pathlib import Path
|
||||||
|
Path(export_dir).mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Export intent graph
|
||||||
|
self.intent_graph.export_json(f"{export_dir}/intent_graph.json")
|
||||||
|
|
||||||
|
# Export semantic FS
|
||||||
|
self.semantic_fs.save_index()
|
||||||
|
|
||||||
|
# Export living docs
|
||||||
|
self.doc_manager.export_index()
|
||||||
|
|
||||||
|
# Export agent coordination
|
||||||
|
self.agent_coordinator.export_coordination_state(f"{export_dir}/agent_coordination.json")
|
||||||
|
|
||||||
|
print(f"✓ Exported all cognitive data to {export_dir}/")
|
||||||
|
|
||||||
|
|
||||||
|
# Quick start example
|
||||||
|
def demo():
|
||||||
|
"""Demonstration of the Cognitive OS"""
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print("COGNITIVE OS - DEMONSTRATION")
|
||||||
|
print("=" * 60 + "\n")
|
||||||
|
|
||||||
|
# Initialize
|
||||||
|
cog = CognitiveOS()
|
||||||
|
|
||||||
|
# Create a goal
|
||||||
|
goal = cog.create_goal(
|
||||||
|
title="Build a smart document management system",
|
||||||
|
rationale="Current file management is chaos. Need semantic organization."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create tasks
|
||||||
|
task1 = cog.create_task(
|
||||||
|
"Implement OCR for document scanning",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="Extract structured data from PDFs"
|
||||||
|
)
|
||||||
|
|
||||||
|
task2 = cog.create_task(
|
||||||
|
"Build auto-filing system",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="Documents should organize themselves"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Show state
|
||||||
|
cog.show_current_state()
|
||||||
|
|
||||||
|
# Get context for the first task
|
||||||
|
print("\nCONTEXT FOR: Implement OCR")
|
||||||
|
print("-" * 60)
|
||||||
|
context = cog.get_context(task_id=task1.id)
|
||||||
|
for item in context.get_top_items(3):
|
||||||
|
print(f" [{item.type}] {item.title}")
|
||||||
|
|
||||||
|
print("\n✓ Cognitive OS is ready!\n")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
demo()
|
||||||
560
cognitive/agent_coordination.py
Normal file
560
cognitive/agent_coordination.py
Normal file
@@ -0,0 +1,560 @@
|
|||||||
|
"""
|
||||||
|
Agent Coordination Protocol - Multi-agent collaboration that actually works
|
||||||
|
|
||||||
|
The problem with current multi-agent systems:
|
||||||
|
- Agents duplicate work
|
||||||
|
- No shared context
|
||||||
|
- Messy handoffs
|
||||||
|
- Lost information
|
||||||
|
- No clear ownership
|
||||||
|
|
||||||
|
This protocol solves that by providing:
|
||||||
|
- Shared context via the cognitive layer
|
||||||
|
- Clear task ownership
|
||||||
|
- Handoff protocols
|
||||||
|
- Conflict resolution
|
||||||
|
- Progress tracking
|
||||||
|
- Collaboration patterns
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from enum import Enum
|
||||||
|
from typing import Dict, List, Optional, Set, Any
|
||||||
|
from uuid import uuid4
|
||||||
|
import json
|
||||||
|
|
||||||
|
|
||||||
|
class AgentRole(Enum):
|
||||||
|
"""Agent specializations"""
|
||||||
|
COORDINATOR = "coordinator" # Manages other agents
|
||||||
|
CODER = "coder" # Writes code
|
||||||
|
REVIEWER = "reviewer" # Reviews code/docs
|
||||||
|
RESEARCHER = "researcher" # Finds information
|
||||||
|
DOCUMENTER = "documenter" # Writes documentation
|
||||||
|
TESTER = "tester" # Tests code
|
||||||
|
PLANNER = "planner" # Creates plans
|
||||||
|
EXECUTOR = "executor" # Executes tasks
|
||||||
|
|
||||||
|
|
||||||
|
class TaskStatus(Enum):
|
||||||
|
"""Task status in agent workflow"""
|
||||||
|
PENDING = "pending"
|
||||||
|
CLAIMED = "claimed"
|
||||||
|
IN_PROGRESS = "in_progress"
|
||||||
|
REVIEW_NEEDED = "review_needed"
|
||||||
|
BLOCKED = "blocked"
|
||||||
|
COMPLETED = "completed"
|
||||||
|
CANCELLED = "cancelled"
|
||||||
|
|
||||||
|
|
||||||
|
class HandoffType(Enum):
|
||||||
|
"""Types of handoffs between agents"""
|
||||||
|
SEQUENTIAL = "sequential" # A does task, then B does next task
|
||||||
|
PARALLEL = "parallel" # A and B work simultaneously
|
||||||
|
REVIEW = "review" # A does work, B reviews
|
||||||
|
ASSIST = "assist" # B helps A with current task
|
||||||
|
DELEGATE = "delegate" # A assigns subtask to B
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AgentInfo:
|
||||||
|
"""Information about an agent"""
|
||||||
|
id: str = field(default_factory=lambda: str(uuid4()))
|
||||||
|
name: str = ""
|
||||||
|
role: AgentRole = AgentRole.EXECUTOR
|
||||||
|
capabilities: Set[str] = field(default_factory=set)
|
||||||
|
current_task_id: Optional[str] = None
|
||||||
|
active: bool = True
|
||||||
|
created_at: datetime = field(default_factory=datetime.now)
|
||||||
|
last_seen: datetime = field(default_factory=datetime.now)
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Handoff:
|
||||||
|
"""A handoff between agents"""
|
||||||
|
id: str = field(default_factory=lambda: str(uuid4()))
|
||||||
|
from_agent_id: str = ""
|
||||||
|
to_agent_id: str = ""
|
||||||
|
task_id: str = ""
|
||||||
|
handoff_type: HandoffType = HandoffType.SEQUENTIAL
|
||||||
|
context: Dict[str, Any] = field(default_factory=dict) # Info for receiving agent
|
||||||
|
message: str = "" # Handoff message
|
||||||
|
created_at: datetime = field(default_factory=datetime.now)
|
||||||
|
accepted_at: Optional[datetime] = None
|
||||||
|
completed_at: Optional[datetime] = None
|
||||||
|
status: str = "pending" # pending, accepted, rejected, completed
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CollaborationSession:
|
||||||
|
"""A multi-agent collaboration session"""
|
||||||
|
id: str = field(default_factory=lambda: str(uuid4()))
|
||||||
|
goal: str = ""
|
||||||
|
description: str = ""
|
||||||
|
agents: Set[str] = field(default_factory=set) # Agent IDs
|
||||||
|
task_ids: Set[str] = field(default_factory=set) # Tasks in this session
|
||||||
|
coordinator_id: Optional[str] = None
|
||||||
|
created_at: datetime = field(default_factory=datetime.now)
|
||||||
|
started_at: Optional[datetime] = None
|
||||||
|
completed_at: Optional[datetime] = None
|
||||||
|
status: str = "planning" # planning, active, completed, cancelled
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
class AgentCoordinator:
|
||||||
|
"""
|
||||||
|
Coordinates multiple agents working together.
|
||||||
|
|
||||||
|
Key responsibilities:
|
||||||
|
- Task assignment and load balancing
|
||||||
|
- Handoff management
|
||||||
|
- Conflict resolution
|
||||||
|
- Progress tracking
|
||||||
|
- Context sharing
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, intent_graph=None, context_engine=None):
|
||||||
|
"""
|
||||||
|
Initialize with connections to cognitive systems.
|
||||||
|
|
||||||
|
This allows agents to share context and intent.
|
||||||
|
"""
|
||||||
|
self.intent_graph = intent_graph
|
||||||
|
self.context_engine = context_engine
|
||||||
|
self.agents: Dict[str, AgentInfo] = {}
|
||||||
|
self.handoffs: Dict[str, Handoff] = {}
|
||||||
|
self.sessions: Dict[str, CollaborationSession] = {}
|
||||||
|
self.task_ownership: Dict[str, str] = {} # task_id -> agent_id
|
||||||
|
|
||||||
|
def register_agent(self, agent: AgentInfo) -> AgentInfo:
|
||||||
|
"""Register a new agent"""
|
||||||
|
self.agents[agent.id] = agent
|
||||||
|
return agent
|
||||||
|
|
||||||
|
def create_session(self, goal: str, description: str = "",
|
||||||
|
coordinator_id: Optional[str] = None) -> CollaborationSession:
|
||||||
|
"""Create a new collaboration session"""
|
||||||
|
session = CollaborationSession(
|
||||||
|
goal=goal,
|
||||||
|
description=description,
|
||||||
|
coordinator_id=coordinator_id
|
||||||
|
)
|
||||||
|
self.sessions[session.id] = session
|
||||||
|
|
||||||
|
# Create a goal in the intent graph
|
||||||
|
if self.intent_graph:
|
||||||
|
from cognitive.intent_graph import IntentType
|
||||||
|
goal_node = self.intent_graph.create_goal(
|
||||||
|
title=goal,
|
||||||
|
description=description,
|
||||||
|
metadata={"session_id": session.id}
|
||||||
|
)
|
||||||
|
session.metadata['intent_node_id'] = goal_node.id
|
||||||
|
|
||||||
|
return session
|
||||||
|
|
||||||
|
def assign_task(self, task_id: str, agent_id: str,
|
||||||
|
context: Optional[Dict] = None) -> bool:
|
||||||
|
"""
|
||||||
|
Assign a task to a specific agent.
|
||||||
|
|
||||||
|
Returns True if assignment successful, False otherwise.
|
||||||
|
"""
|
||||||
|
if agent_id not in self.agents:
|
||||||
|
return False
|
||||||
|
|
||||||
|
agent = self.agents[agent_id]
|
||||||
|
|
||||||
|
# Check if agent is available
|
||||||
|
if agent.current_task_id and agent.active:
|
||||||
|
# Agent is busy - could queue or reject
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Assign the task
|
||||||
|
self.task_ownership[task_id] = agent_id
|
||||||
|
agent.current_task_id = task_id
|
||||||
|
agent.last_seen = datetime.now()
|
||||||
|
|
||||||
|
# Update task status in intent graph
|
||||||
|
if self.intent_graph and task_id in self.intent_graph.nodes:
|
||||||
|
task_node = self.intent_graph.nodes[task_id]
|
||||||
|
task_node.assigned_to = agent_id
|
||||||
|
task_node.status = task_node.status # Keep current status
|
||||||
|
|
||||||
|
# Provide context to the agent
|
||||||
|
if self.context_engine and context:
|
||||||
|
# Store context for agent to retrieve
|
||||||
|
pass
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
def find_available_agent(self, required_role: Optional[AgentRole] = None,
|
||||||
|
required_capabilities: Optional[Set[str]] = None) -> Optional[AgentInfo]:
|
||||||
|
"""
|
||||||
|
Find an available agent with specific requirements.
|
||||||
|
|
||||||
|
This is for automatic task assignment.
|
||||||
|
"""
|
||||||
|
for agent in self.agents.values():
|
||||||
|
if not agent.active:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if agent.current_task_id:
|
||||||
|
continue # Agent is busy
|
||||||
|
|
||||||
|
if required_role and agent.role != required_role:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if required_capabilities:
|
||||||
|
if not required_capabilities.issubset(agent.capabilities):
|
||||||
|
continue
|
||||||
|
|
||||||
|
return agent
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def create_handoff(self, from_agent_id: str, to_agent_id: str,
|
||||||
|
task_id: str, handoff_type: HandoffType,
|
||||||
|
message: str = "", context: Optional[Dict] = None) -> Handoff:
|
||||||
|
"""
|
||||||
|
Create a handoff from one agent to another.
|
||||||
|
|
||||||
|
This is how agents collaborate!
|
||||||
|
"""
|
||||||
|
handoff = Handoff(
|
||||||
|
from_agent_id=from_agent_id,
|
||||||
|
to_agent_id=to_agent_id,
|
||||||
|
task_id=task_id,
|
||||||
|
handoff_type=handoff_type,
|
||||||
|
message=message,
|
||||||
|
context=context or {}
|
||||||
|
)
|
||||||
|
|
||||||
|
self.handoffs[handoff.id] = handoff
|
||||||
|
|
||||||
|
# Get full context for the handoff
|
||||||
|
if self.context_engine:
|
||||||
|
context_bundle = self.context_engine.get_context_for_task(task_id)
|
||||||
|
handoff.context['cognitive_context'] = {
|
||||||
|
'goal': context_bundle.task_title,
|
||||||
|
'top_items': [
|
||||||
|
{
|
||||||
|
'type': item.type,
|
||||||
|
'title': item.title,
|
||||||
|
'content': item.content
|
||||||
|
}
|
||||||
|
for item in context_bundle.get_top_items(5)
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
return handoff
|
||||||
|
|
||||||
|
def accept_handoff(self, handoff_id: str, agent_id: str) -> bool:
|
||||||
|
"""Agent accepts a handoff"""
|
||||||
|
if handoff_id not in self.handoffs:
|
||||||
|
return False
|
||||||
|
|
||||||
|
handoff = self.handoffs[handoff_id]
|
||||||
|
|
||||||
|
if handoff.to_agent_id != agent_id:
|
||||||
|
return False # Wrong agent
|
||||||
|
|
||||||
|
handoff.status = "accepted"
|
||||||
|
handoff.accepted_at = datetime.now()
|
||||||
|
|
||||||
|
# Update task ownership
|
||||||
|
self.task_ownership[handoff.task_id] = agent_id
|
||||||
|
if agent_id in self.agents:
|
||||||
|
self.agents[agent_id].current_task_id = handoff.task_id
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
def complete_handoff(self, handoff_id: str, result: Optional[Dict] = None) -> bool:
|
||||||
|
"""Mark a handoff as completed"""
|
||||||
|
if handoff_id not in self.handoffs:
|
||||||
|
return False
|
||||||
|
|
||||||
|
handoff = self.handoffs[handoff_id]
|
||||||
|
handoff.status = "completed"
|
||||||
|
handoff.completed_at = datetime.now()
|
||||||
|
|
||||||
|
if result:
|
||||||
|
handoff.context['result'] = result
|
||||||
|
|
||||||
|
# Update receiving agent
|
||||||
|
if handoff.to_agent_id in self.agents:
|
||||||
|
agent = self.agents[handoff.to_agent_id]
|
||||||
|
if agent.current_task_id == handoff.task_id:
|
||||||
|
agent.current_task_id = None
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
def get_agent_context(self, agent_id: str) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Get full context for an agent.
|
||||||
|
|
||||||
|
This tells the agent what they need to know:
|
||||||
|
- Current task
|
||||||
|
- Related context
|
||||||
|
- Active handoffs
|
||||||
|
- Session info
|
||||||
|
"""
|
||||||
|
if agent_id not in self.agents:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
agent = self.agents[agent_id]
|
||||||
|
context = {
|
||||||
|
'agent': {
|
||||||
|
'id': agent.id,
|
||||||
|
'name': agent.name,
|
||||||
|
'role': agent.role.value,
|
||||||
|
'current_task': agent.current_task_id
|
||||||
|
},
|
||||||
|
'active_handoffs': [],
|
||||||
|
'pending_handoffs': [],
|
||||||
|
'sessions': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Get active handoffs
|
||||||
|
for handoff in self.handoffs.values():
|
||||||
|
if handoff.to_agent_id == agent_id:
|
||||||
|
if handoff.status == "pending":
|
||||||
|
context['pending_handoffs'].append({
|
||||||
|
'id': handoff.id,
|
||||||
|
'from_agent': handoff.from_agent_id,
|
||||||
|
'task_id': handoff.task_id,
|
||||||
|
'type': handoff.handoff_type.value,
|
||||||
|
'message': handoff.message
|
||||||
|
})
|
||||||
|
elif handoff.status == "accepted":
|
||||||
|
context['active_handoffs'].append({
|
||||||
|
'id': handoff.id,
|
||||||
|
'task_id': handoff.task_id,
|
||||||
|
'type': handoff.handoff_type.value
|
||||||
|
})
|
||||||
|
|
||||||
|
# Get task context if agent has current task
|
||||||
|
if agent.current_task_id and self.context_engine:
|
||||||
|
task_context = self.context_engine.get_context_for_task(agent.current_task_id)
|
||||||
|
context['task_context'] = {
|
||||||
|
'title': task_context.task_title,
|
||||||
|
'top_items': [
|
||||||
|
{
|
||||||
|
'type': item.type,
|
||||||
|
'title': item.title,
|
||||||
|
'relevance': item.relevance_score
|
||||||
|
}
|
||||||
|
for item in task_context.get_top_items(5)
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Get active sessions
|
||||||
|
for session in self.sessions.values():
|
||||||
|
if agent_id in session.agents:
|
||||||
|
context['sessions'].append({
|
||||||
|
'id': session.id,
|
||||||
|
'goal': session.goal,
|
||||||
|
'status': session.status,
|
||||||
|
'is_coordinator': session.coordinator_id == agent_id
|
||||||
|
})
|
||||||
|
|
||||||
|
return context
|
||||||
|
|
||||||
|
def suggest_collaboration(self, task_id: str) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Suggest how to collaborate on a task.
|
||||||
|
|
||||||
|
Analyzes task and suggests:
|
||||||
|
- Which agents should work on it
|
||||||
|
- What collaboration pattern to use
|
||||||
|
- How to split the work
|
||||||
|
"""
|
||||||
|
suggestions = []
|
||||||
|
|
||||||
|
if not self.intent_graph or task_id not in self.intent_graph.nodes:
|
||||||
|
return suggestions
|
||||||
|
|
||||||
|
task = self.intent_graph.nodes[task_id]
|
||||||
|
|
||||||
|
# If task has subtasks, suggest parallel work
|
||||||
|
if task.child_ids:
|
||||||
|
suggestions.append({
|
||||||
|
"pattern": "parallel",
|
||||||
|
"description": f"Task has {len(task.child_ids)} subtasks. Assign to multiple agents for parallel execution.",
|
||||||
|
"agents_needed": min(len(task.child_ids), 3),
|
||||||
|
"roles": [AgentRole.EXECUTOR.value]
|
||||||
|
})
|
||||||
|
|
||||||
|
# If task is complex, suggest planner + executor pattern
|
||||||
|
if task.description and len(task.description) > 500:
|
||||||
|
suggestions.append({
|
||||||
|
"pattern": "plan_and_execute",
|
||||||
|
"description": "Complex task. Use planner to break down, then executor to implement.",
|
||||||
|
"agents_needed": 2,
|
||||||
|
"roles": [AgentRole.PLANNER.value, AgentRole.EXECUTOR.value]
|
||||||
|
})
|
||||||
|
|
||||||
|
# If task involves code, suggest coder + reviewer pattern
|
||||||
|
code_extensions = {'.py', '.js', '.java', '.cpp', '.go', '.rs'}
|
||||||
|
has_code = any(
|
||||||
|
any(path.endswith(ext) for ext in code_extensions)
|
||||||
|
for path in task.file_paths
|
||||||
|
)
|
||||||
|
|
||||||
|
if has_code:
|
||||||
|
suggestions.append({
|
||||||
|
"pattern": "code_and_review",
|
||||||
|
"description": "Task involves code. Use coder + reviewer for quality.",
|
||||||
|
"agents_needed": 2,
|
||||||
|
"roles": [AgentRole.CODER.value, AgentRole.REVIEWER.value]
|
||||||
|
})
|
||||||
|
|
||||||
|
return suggestions
|
||||||
|
|
||||||
|
def get_session_status(self, session_id: str) -> Dict[str, Any]:
|
||||||
|
"""Get status of a collaboration session"""
|
||||||
|
if session_id not in self.sessions:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
session = self.sessions[session_id]
|
||||||
|
|
||||||
|
# Gather task statuses
|
||||||
|
task_statuses = {}
|
||||||
|
if self.intent_graph:
|
||||||
|
for task_id in session.task_ids:
|
||||||
|
if task_id in self.intent_graph.nodes:
|
||||||
|
task = self.intent_graph.nodes[task_id]
|
||||||
|
status = task.status.value
|
||||||
|
task_statuses[status] = task_statuses.get(status, 0) + 1
|
||||||
|
|
||||||
|
# Get agent activity
|
||||||
|
active_agents = []
|
||||||
|
for agent_id in session.agents:
|
||||||
|
if agent_id in self.agents:
|
||||||
|
agent = self.agents[agent_id]
|
||||||
|
if agent.active and agent.current_task_id:
|
||||||
|
active_agents.append(agent_id)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'session_id': session.id,
|
||||||
|
'goal': session.goal,
|
||||||
|
'status': session.status,
|
||||||
|
'agents': {
|
||||||
|
'total': len(session.agents),
|
||||||
|
'active': len(active_agents)
|
||||||
|
},
|
||||||
|
'tasks': {
|
||||||
|
'total': len(session.task_ids),
|
||||||
|
'by_status': task_statuses
|
||||||
|
},
|
||||||
|
'progress': self._calculate_progress(session.task_ids),
|
||||||
|
'duration': (datetime.now() - session.created_at).seconds if session.created_at else 0
|
||||||
|
}
|
||||||
|
|
||||||
|
def _calculate_progress(self, task_ids: Set[str]) -> float:
|
||||||
|
"""Calculate progress for a set of tasks"""
|
||||||
|
if not task_ids or not self.intent_graph:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
total = len(task_ids)
|
||||||
|
completed = 0
|
||||||
|
|
||||||
|
for task_id in task_ids:
|
||||||
|
if task_id in self.intent_graph.nodes:
|
||||||
|
task = self.intent_graph.nodes[task_id]
|
||||||
|
if task.status.value == "completed":
|
||||||
|
completed += 1
|
||||||
|
|
||||||
|
return completed / total if total > 0 else 0.0
|
||||||
|
|
||||||
|
def resolve_conflict(self, task_id: str, agent1_id: str, agent2_id: str) -> str:
|
||||||
|
"""
|
||||||
|
Resolve conflict when multiple agents try to work on same task.
|
||||||
|
|
||||||
|
Resolution strategies:
|
||||||
|
- Check who claimed first
|
||||||
|
- Consider agent roles/capabilities
|
||||||
|
- Suggest splitting task
|
||||||
|
- Suggest collaboration pattern
|
||||||
|
"""
|
||||||
|
# Simple strategy: first come, first served
|
||||||
|
if task_id in self.task_ownership:
|
||||||
|
return self.task_ownership[task_id]
|
||||||
|
|
||||||
|
# If both agents have same role, assign to agent with less work
|
||||||
|
if agent1_id in self.agents and agent2_id in self.agents:
|
||||||
|
agent1 = self.agents[agent1_id]
|
||||||
|
agent2 = self.agents[agent2_id]
|
||||||
|
|
||||||
|
# Count active tasks
|
||||||
|
agent1_tasks = sum(1 for tid, aid in self.task_ownership.items()
|
||||||
|
if aid == agent1_id)
|
||||||
|
agent2_tasks = sum(1 for tid, aid in self.task_ownership.items()
|
||||||
|
if aid == agent2_id)
|
||||||
|
|
||||||
|
return agent1_id if agent1_tasks <= agent2_tasks else agent2_id
|
||||||
|
|
||||||
|
return agent1_id
|
||||||
|
|
||||||
|
def export_coordination_state(self, file_path: str) -> None:
|
||||||
|
"""Export coordination state for persistence"""
|
||||||
|
data = {
|
||||||
|
'agents': {
|
||||||
|
aid: {
|
||||||
|
'id': a.id,
|
||||||
|
'name': a.name,
|
||||||
|
'role': a.role.value,
|
||||||
|
'current_task': a.current_task_id,
|
||||||
|
'active': a.active
|
||||||
|
}
|
||||||
|
for aid, a in self.agents.items()
|
||||||
|
},
|
||||||
|
'sessions': {
|
||||||
|
sid: {
|
||||||
|
'id': s.id,
|
||||||
|
'goal': s.goal,
|
||||||
|
'agents': list(s.agents),
|
||||||
|
'tasks': list(s.task_ids),
|
||||||
|
'status': s.status
|
||||||
|
}
|
||||||
|
for sid, s in self.sessions.items()
|
||||||
|
},
|
||||||
|
'task_ownership': self.task_ownership
|
||||||
|
}
|
||||||
|
|
||||||
|
with open(file_path, 'w') as f:
|
||||||
|
json.dump(data, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
# Example usage and collaboration patterns
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# Example: Multi-agent code review workflow
|
||||||
|
# coordinator = AgentCoordinator(intent_graph, context_engine)
|
||||||
|
|
||||||
|
# Register agents
|
||||||
|
# coder = AgentInfo(name="CodeWriter", role=AgentRole.CODER)
|
||||||
|
# reviewer = AgentInfo(name="CodeReviewer", role=AgentRole.REVIEWER)
|
||||||
|
# coordinator.register_agent(coder)
|
||||||
|
# coordinator.register_agent(reviewer)
|
||||||
|
|
||||||
|
# Create session
|
||||||
|
# session = coordinator.create_session(
|
||||||
|
# goal="Implement user authentication feature",
|
||||||
|
# description="Add login, signup, and password reset"
|
||||||
|
# )
|
||||||
|
|
||||||
|
# Assign work
|
||||||
|
# coordinator.assign_task("implement-login", coder.id)
|
||||||
|
|
||||||
|
# When coder is done, handoff to reviewer
|
||||||
|
# handoff = coordinator.create_handoff(
|
||||||
|
# from_agent_id=coder.id,
|
||||||
|
# to_agent_id=reviewer.id,
|
||||||
|
# task_id="implement-login",
|
||||||
|
# handoff_type=HandoffType.REVIEW,
|
||||||
|
# message="Login implementation complete, ready for review"
|
||||||
|
# )
|
||||||
|
|
||||||
|
print("Agent Coordination Protocol initialized")
|
||||||
515
cognitive/context_engine.py
Normal file
515
cognitive/context_engine.py
Normal file
@@ -0,0 +1,515 @@
|
|||||||
|
"""
|
||||||
|
Context Engine - Provides the right information at the right time
|
||||||
|
|
||||||
|
The problem: Information overload. You have code, docs, tasks, decisions,
|
||||||
|
history... but finding the RIGHT information for what you're doing NOW is hard.
|
||||||
|
|
||||||
|
The solution: A context engine that understands:
|
||||||
|
- What you're currently trying to do (from intent graph)
|
||||||
|
- What code you're working with
|
||||||
|
- What decisions have been made
|
||||||
|
- What documentation exists
|
||||||
|
- What related work has been done
|
||||||
|
|
||||||
|
And provides exactly what you need, when you need it.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Dict, List, Optional, Set, Any
|
||||||
|
from pathlib import Path
|
||||||
|
import json
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContextItem:
|
||||||
|
"""A piece of contextual information"""
|
||||||
|
id: str
|
||||||
|
type: str # code, doc, decision, task, artifact, etc.
|
||||||
|
title: str
|
||||||
|
content: str
|
||||||
|
relevance_score: float # 0.0 to 1.0
|
||||||
|
source: str # Where this came from
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
timestamp: datetime = field(default_factory=datetime.now)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContextBundle:
|
||||||
|
"""A collection of relevant context for a specific task/goal"""
|
||||||
|
task_id: str
|
||||||
|
task_title: str
|
||||||
|
items: List[ContextItem] = field(default_factory=list)
|
||||||
|
created_at: datetime = field(default_factory=datetime.now)
|
||||||
|
last_updated: datetime = field(default_factory=datetime.now)
|
||||||
|
|
||||||
|
def add_item(self, item: ContextItem) -> None:
|
||||||
|
"""Add a context item"""
|
||||||
|
self.items.append(item)
|
||||||
|
self.last_updated = datetime.now()
|
||||||
|
|
||||||
|
def get_top_items(self, n: int = 10) -> List[ContextItem]:
|
||||||
|
"""Get top N most relevant items"""
|
||||||
|
sorted_items = sorted(self.items, key=lambda x: x.relevance_score, reverse=True)
|
||||||
|
return sorted_items[:n]
|
||||||
|
|
||||||
|
def filter_by_type(self, item_type: str) -> List[ContextItem]:
|
||||||
|
"""Get all items of a specific type"""
|
||||||
|
return [item for item in self.items if item.type == item_type]
|
||||||
|
|
||||||
|
|
||||||
|
class ContextEngine:
|
||||||
|
"""
|
||||||
|
The Context Engine - provides the right information at the right time.
|
||||||
|
|
||||||
|
This is what makes working with the system feel intelligent.
|
||||||
|
Instead of searching for information, it comes to you.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, intent_graph=None, semantic_fs=None, doc_manager=None):
|
||||||
|
"""
|
||||||
|
Initialize with references to other cognitive systems.
|
||||||
|
|
||||||
|
The Context Engine is the integration point that pulls together:
|
||||||
|
- Intent Graph (what we're trying to do)
|
||||||
|
- Semantic FS (what files exist and what they are)
|
||||||
|
- Doc Manager (what documentation exists)
|
||||||
|
"""
|
||||||
|
self.intent_graph = intent_graph
|
||||||
|
self.semantic_fs = semantic_fs
|
||||||
|
self.doc_manager = doc_manager
|
||||||
|
self.context_cache: Dict[str, ContextBundle] = {}
|
||||||
|
|
||||||
|
def get_context_for_task(self, task_id: str, max_items: int = 20) -> ContextBundle:
|
||||||
|
"""
|
||||||
|
Get relevant context for a specific task.
|
||||||
|
|
||||||
|
This is the main entry point - given a task, what context do you need?
|
||||||
|
"""
|
||||||
|
# Check cache first
|
||||||
|
if task_id in self.context_cache:
|
||||||
|
bundle = self.context_cache[task_id]
|
||||||
|
# Refresh if older than 5 minutes
|
||||||
|
if (datetime.now() - bundle.last_updated).seconds < 300:
|
||||||
|
return bundle
|
||||||
|
|
||||||
|
# Get the task from intent graph
|
||||||
|
if not self.intent_graph or task_id not in self.intent_graph.nodes:
|
||||||
|
return ContextBundle(task_id=task_id, task_title="Unknown")
|
||||||
|
|
||||||
|
task_node = self.intent_graph.nodes[task_id]
|
||||||
|
bundle = ContextBundle(task_id=task_id, task_title=task_node.title)
|
||||||
|
|
||||||
|
# 1. Add parent context (why are we doing this?)
|
||||||
|
for parent_id in task_node.parent_ids:
|
||||||
|
if parent_id in self.intent_graph.nodes:
|
||||||
|
parent = self.intent_graph.nodes[parent_id]
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=parent_id,
|
||||||
|
type="goal",
|
||||||
|
title=parent.title,
|
||||||
|
content=parent.description,
|
||||||
|
relevance_score=0.9,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata={"rationale": parent.rationale}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 2. Add related decisions
|
||||||
|
for related_id in task_node.related_ids:
|
||||||
|
if related_id in self.intent_graph.nodes:
|
||||||
|
related = self.intent_graph.nodes[related_id]
|
||||||
|
if related.type.value == "decision":
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=related_id,
|
||||||
|
type="decision",
|
||||||
|
title=related.title,
|
||||||
|
content=related.rationale,
|
||||||
|
relevance_score=0.85,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata=related.metadata
|
||||||
|
))
|
||||||
|
|
||||||
|
# 3. Add linked artifacts (code/docs)
|
||||||
|
for file_path in task_node.file_paths:
|
||||||
|
# Get file metadata from semantic FS
|
||||||
|
if self.semantic_fs and file_path in self.semantic_fs.files:
|
||||||
|
metadata = self.semantic_fs.files[file_path]
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=file_path,
|
||||||
|
type="artifact",
|
||||||
|
title=Path(file_path).name,
|
||||||
|
content=metadata.summary or "",
|
||||||
|
relevance_score=0.95, # Direct links are highly relevant
|
||||||
|
source="semantic_fs",
|
||||||
|
metadata={
|
||||||
|
"doc_type": metadata.document_type.value,
|
||||||
|
"purpose": metadata.purpose.value
|
||||||
|
}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 4. Add related documentation
|
||||||
|
if self.doc_manager:
|
||||||
|
for file_path in task_node.file_paths:
|
||||||
|
# Find docs that reference this file
|
||||||
|
if file_path in self.doc_manager.code_to_docs:
|
||||||
|
for doc_path in self.doc_manager.code_to_docs[file_path]:
|
||||||
|
if doc_path in self.doc_manager.documents:
|
||||||
|
doc = self.doc_manager.documents[doc_path]
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=doc_path,
|
||||||
|
type="documentation",
|
||||||
|
title=Path(doc_path).name,
|
||||||
|
content=f"Documentation for {file_path}",
|
||||||
|
relevance_score=0.8,
|
||||||
|
source="doc_manager",
|
||||||
|
metadata={
|
||||||
|
"doc_type": doc.doc_type.value,
|
||||||
|
"sync_status": doc.sync_status.value
|
||||||
|
}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 5. Add similar tasks (based on tags)
|
||||||
|
if task_node.tags and self.intent_graph:
|
||||||
|
for tag in task_node.tags:
|
||||||
|
similar_tasks = self.intent_graph.find_by_tag(tag)
|
||||||
|
for similar in similar_tasks[:3]: # Top 3 similar tasks
|
||||||
|
if similar.id != task_id:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=similar.id,
|
||||||
|
type="similar_task",
|
||||||
|
title=similar.title,
|
||||||
|
content=similar.description,
|
||||||
|
relevance_score=0.6,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata={"status": similar.status.value}
|
||||||
|
))
|
||||||
|
|
||||||
|
# Cache the result
|
||||||
|
self.context_cache[task_id] = bundle
|
||||||
|
return bundle
|
||||||
|
|
||||||
|
def get_context_for_file(self, file_path: str) -> ContextBundle:
|
||||||
|
"""
|
||||||
|
Get context for a specific file.
|
||||||
|
|
||||||
|
When you open a file, what context do you need?
|
||||||
|
- Why does this file exist? (intent graph)
|
||||||
|
- What does it do? (semantic FS)
|
||||||
|
- What documentation exists? (doc manager)
|
||||||
|
- What tasks reference it? (intent graph)
|
||||||
|
"""
|
||||||
|
bundle = ContextBundle(task_id=file_path, task_title=Path(file_path).name)
|
||||||
|
|
||||||
|
# 1. Get semantic metadata
|
||||||
|
if self.semantic_fs and file_path in self.semantic_fs.files:
|
||||||
|
metadata = self.semantic_fs.files[file_path]
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=file_path,
|
||||||
|
type="file_metadata",
|
||||||
|
title="File Information",
|
||||||
|
content=metadata.summary or "",
|
||||||
|
relevance_score=1.0,
|
||||||
|
source="semantic_fs",
|
||||||
|
metadata={
|
||||||
|
"doc_type": metadata.document_type.value,
|
||||||
|
"purpose": metadata.purpose.value,
|
||||||
|
"keywords": list(metadata.keywords)
|
||||||
|
}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 2. Find tasks that reference this file
|
||||||
|
if self.intent_graph:
|
||||||
|
tasks = self.intent_graph.find_by_artifact(file_path)
|
||||||
|
for task in tasks:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=task.id,
|
||||||
|
type="related_task",
|
||||||
|
title=task.title,
|
||||||
|
content=task.description,
|
||||||
|
relevance_score=0.9,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata={
|
||||||
|
"rationale": task.rationale,
|
||||||
|
"status": task.status.value
|
||||||
|
}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 3. Find related documentation
|
||||||
|
if self.doc_manager and file_path in self.doc_manager.code_to_docs:
|
||||||
|
for doc_path in self.doc_manager.code_to_docs[file_path]:
|
||||||
|
if doc_path in self.doc_manager.documents:
|
||||||
|
doc = self.doc_manager.documents[doc_path]
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=doc_path,
|
||||||
|
type="documentation",
|
||||||
|
title=Path(doc_path).name,
|
||||||
|
content=f"Documentation for this file",
|
||||||
|
relevance_score=0.95,
|
||||||
|
source="doc_manager",
|
||||||
|
metadata={"sync_status": doc.sync_status.value}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 4. Find related files
|
||||||
|
if self.semantic_fs and file_path in self.semantic_fs.files:
|
||||||
|
metadata = self.semantic_fs.files[file_path]
|
||||||
|
for related_path in metadata.related_files:
|
||||||
|
if related_path in self.semantic_fs.files:
|
||||||
|
related_meta = self.semantic_fs.files[related_path]
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=related_path,
|
||||||
|
type="related_file",
|
||||||
|
title=Path(related_path).name,
|
||||||
|
content=related_meta.summary or "",
|
||||||
|
relevance_score=0.7,
|
||||||
|
source="semantic_fs"
|
||||||
|
))
|
||||||
|
|
||||||
|
return bundle
|
||||||
|
|
||||||
|
def get_context_for_query(self, query: str) -> ContextBundle:
|
||||||
|
"""
|
||||||
|
Get context for a natural language query.
|
||||||
|
|
||||||
|
User asks: "How does authentication work?"
|
||||||
|
System provides: relevant code, docs, decisions, etc.
|
||||||
|
"""
|
||||||
|
bundle = ContextBundle(task_id=f"query:{query}", task_title=query)
|
||||||
|
|
||||||
|
query_lower = query.lower()
|
||||||
|
|
||||||
|
# 1. Search semantic FS for relevant files
|
||||||
|
if self.semantic_fs:
|
||||||
|
results = self.semantic_fs.search(query)
|
||||||
|
for metadata in results[:5]: # Top 5 results
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=metadata.file_path,
|
||||||
|
type="file",
|
||||||
|
title=Path(metadata.file_path).name,
|
||||||
|
content=metadata.summary or "",
|
||||||
|
relevance_score=0.8,
|
||||||
|
source="semantic_fs",
|
||||||
|
metadata={"doc_type": metadata.document_type.value}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 2. Search intent graph for relevant nodes
|
||||||
|
if self.intent_graph:
|
||||||
|
for node in self.intent_graph.nodes.values():
|
||||||
|
score = 0.0
|
||||||
|
|
||||||
|
# Match against title
|
||||||
|
if query_lower in node.title.lower():
|
||||||
|
score += 0.5
|
||||||
|
|
||||||
|
# Match against description
|
||||||
|
if query_lower in node.description.lower():
|
||||||
|
score += 0.3
|
||||||
|
|
||||||
|
# Match against rationale
|
||||||
|
if query_lower in node.rationale.lower():
|
||||||
|
score += 0.2
|
||||||
|
|
||||||
|
if score > 0.3:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=node.id,
|
||||||
|
type=node.type.value,
|
||||||
|
title=node.title,
|
||||||
|
content=node.description,
|
||||||
|
relevance_score=score,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata={"rationale": node.rationale}
|
||||||
|
))
|
||||||
|
|
||||||
|
return bundle
|
||||||
|
|
||||||
|
def get_current_context(self) -> ContextBundle:
|
||||||
|
"""
|
||||||
|
Get context for "right now" - what's currently being worked on?
|
||||||
|
|
||||||
|
Looks at:
|
||||||
|
- Active tasks in intent graph
|
||||||
|
- Recently modified files
|
||||||
|
- Current goals
|
||||||
|
"""
|
||||||
|
bundle = ContextBundle(task_id="current", task_title="Current Context")
|
||||||
|
|
||||||
|
# 1. Get active goals
|
||||||
|
if self.intent_graph:
|
||||||
|
active_goals = self.intent_graph.get_active_goals()
|
||||||
|
for goal in active_goals:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=goal.id,
|
||||||
|
type="active_goal",
|
||||||
|
title=goal.title,
|
||||||
|
content=goal.description,
|
||||||
|
relevance_score=1.0,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata={"rationale": goal.rationale}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 2. Get blocked tasks (need attention!)
|
||||||
|
blocked = self.intent_graph.get_blocked_tasks()
|
||||||
|
for task in blocked:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=task.id,
|
||||||
|
type="blocked_task",
|
||||||
|
title=task.title,
|
||||||
|
content=task.description,
|
||||||
|
relevance_score=0.95,
|
||||||
|
source="intent_graph",
|
||||||
|
metadata={"blocker_count": len(task.blocked_by_ids)}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 3. Get recently modified files
|
||||||
|
if self.semantic_fs:
|
||||||
|
recent_files = sorted(
|
||||||
|
self.semantic_fs.files.values(),
|
||||||
|
key=lambda x: x.modified_at,
|
||||||
|
reverse=True
|
||||||
|
)[:5]
|
||||||
|
|
||||||
|
for metadata in recent_files:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=metadata.file_path,
|
||||||
|
type="recent_file",
|
||||||
|
title=Path(metadata.file_path).name,
|
||||||
|
content=metadata.summary or "",
|
||||||
|
relevance_score=0.8,
|
||||||
|
source="semantic_fs",
|
||||||
|
metadata={"modified_at": metadata.modified_at.isoformat()}
|
||||||
|
))
|
||||||
|
|
||||||
|
# 4. Get out-of-sync docs (need updating!)
|
||||||
|
if self.doc_manager:
|
||||||
|
out_of_sync = [
|
||||||
|
doc for doc in self.doc_manager.documents.values()
|
||||||
|
if doc.sync_status.value == "out_of_sync"
|
||||||
|
]
|
||||||
|
|
||||||
|
for doc in out_of_sync[:3]:
|
||||||
|
bundle.add_item(ContextItem(
|
||||||
|
id=doc.file_path,
|
||||||
|
type="stale_doc",
|
||||||
|
title=Path(doc.file_path).name,
|
||||||
|
content="Documentation out of sync with code",
|
||||||
|
relevance_score=0.9,
|
||||||
|
source="doc_manager"
|
||||||
|
))
|
||||||
|
|
||||||
|
return bundle
|
||||||
|
|
||||||
|
def suggest_next_actions(self, task_id: str) -> List[Dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Based on context, suggest what to do next.
|
||||||
|
|
||||||
|
This is proactive intelligence - the system suggests next steps!
|
||||||
|
"""
|
||||||
|
suggestions = []
|
||||||
|
|
||||||
|
if not self.intent_graph or task_id not in self.intent_graph.nodes:
|
||||||
|
return suggestions
|
||||||
|
|
||||||
|
task = self.intent_graph.nodes[task_id]
|
||||||
|
|
||||||
|
# 1. If task has no linked artifacts, suggest creating them
|
||||||
|
if not task.file_paths:
|
||||||
|
suggestions.append({
|
||||||
|
"action": "create_artifact",
|
||||||
|
"description": "This task has no linked code or documentation. Consider creating or linking relevant files.",
|
||||||
|
"priority": "medium"
|
||||||
|
})
|
||||||
|
|
||||||
|
# 2. If blocked, suggest addressing blockers
|
||||||
|
if task.blocked_by_ids:
|
||||||
|
suggestions.append({
|
||||||
|
"action": "resolve_blockers",
|
||||||
|
"description": f"This task is blocked by {len(task.blocked_by_ids)} items. Address blockers first.",
|
||||||
|
"priority": "high"
|
||||||
|
})
|
||||||
|
|
||||||
|
# 3. If has children tasks, check their status
|
||||||
|
if task.child_ids:
|
||||||
|
children = [self.intent_graph.nodes[cid] for cid in task.child_ids if cid in self.intent_graph.nodes]
|
||||||
|
pending_children = [c for c in children if c.status.value == "pending"]
|
||||||
|
|
||||||
|
if pending_children:
|
||||||
|
suggestions.append({
|
||||||
|
"action": "start_subtasks",
|
||||||
|
"description": f"{len(pending_children)} subtasks are pending. Start with highest priority.",
|
||||||
|
"priority": "medium"
|
||||||
|
})
|
||||||
|
|
||||||
|
# 4. Check if related docs are out of sync
|
||||||
|
context = self.get_context_for_task(task_id)
|
||||||
|
stale_docs = [item for item in context.items if item.type == "documentation"
|
||||||
|
and item.metadata.get("sync_status") == "out_of_sync"]
|
||||||
|
|
||||||
|
if stale_docs:
|
||||||
|
suggestions.append({
|
||||||
|
"action": "update_docs",
|
||||||
|
"description": f"{len(stale_docs)} related documents are out of sync. Update documentation.",
|
||||||
|
"priority": "low"
|
||||||
|
})
|
||||||
|
|
||||||
|
return suggestions
|
||||||
|
|
||||||
|
def export_context(self, bundle: ContextBundle, format: str = "markdown") -> str:
|
||||||
|
"""Export context bundle in a readable format"""
|
||||||
|
if format == "markdown":
|
||||||
|
output = f"# Context: {bundle.task_title}\n\n"
|
||||||
|
output += f"*Generated at {bundle.created_at.strftime('%Y-%m-%d %H:%M')}*\n\n"
|
||||||
|
|
||||||
|
# Group by type
|
||||||
|
by_type: Dict[str, List[ContextItem]] = {}
|
||||||
|
for item in bundle.get_top_items(20):
|
||||||
|
if item.type not in by_type:
|
||||||
|
by_type[item.type] = []
|
||||||
|
by_type[item.type].append(item)
|
||||||
|
|
||||||
|
for item_type, items in by_type.items():
|
||||||
|
output += f"## {item_type.replace('_', ' ').title()}\n\n"
|
||||||
|
for item in items:
|
||||||
|
output += f"### {item.title}\n"
|
||||||
|
if item.content:
|
||||||
|
output += f"{item.content}\n"
|
||||||
|
output += f"\n*Relevance: {item.relevance_score:.2f} | Source: {item.source}*\n\n"
|
||||||
|
|
||||||
|
return output
|
||||||
|
|
||||||
|
elif format == "json":
|
||||||
|
return json.dumps({
|
||||||
|
"task_id": bundle.task_id,
|
||||||
|
"task_title": bundle.task_title,
|
||||||
|
"created_at": bundle.created_at.isoformat(),
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"id": item.id,
|
||||||
|
"type": item.type,
|
||||||
|
"title": item.title,
|
||||||
|
"content": item.content,
|
||||||
|
"relevance_score": item.relevance_score,
|
||||||
|
"source": item.source,
|
||||||
|
"metadata": item.metadata
|
||||||
|
}
|
||||||
|
for item in bundle.items
|
||||||
|
]
|
||||||
|
}, indent=2)
|
||||||
|
|
||||||
|
return str(bundle)
|
||||||
|
|
||||||
|
|
||||||
|
# Example usage
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# This would be initialized with actual systems
|
||||||
|
# engine = ContextEngine(intent_graph, semantic_fs, doc_manager)
|
||||||
|
|
||||||
|
# Get context for a task
|
||||||
|
# context = engine.get_context_for_task("task-123")
|
||||||
|
# print(engine.export_context(context))
|
||||||
|
|
||||||
|
# Get current context
|
||||||
|
# current = engine.get_current_context()
|
||||||
|
# print("Currently working on:")
|
||||||
|
# for item in current.get_top_items(5):
|
||||||
|
# print(f" - {item.title} ({item.type})")
|
||||||
|
|
||||||
|
print("Context Engine initialized")
|
||||||
372
cognitive/intent_graph.py
Normal file
372
cognitive/intent_graph.py
Normal file
@@ -0,0 +1,372 @@
|
|||||||
|
"""
|
||||||
|
Intent Graph - The Core of Cognitive OS
|
||||||
|
|
||||||
|
Tracks goals, decisions, tasks, and their relationships.
|
||||||
|
Every action has a "why" attached. No more context loss.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from enum import Enum
|
||||||
|
from typing import Dict, List, Optional, Set, Any
|
||||||
|
from uuid import uuid4
|
||||||
|
import json
|
||||||
|
|
||||||
|
|
||||||
|
class IntentType(Enum):
|
||||||
|
"""Types of intent nodes in the graph"""
|
||||||
|
GOAL = "goal" # High-level objective
|
||||||
|
TASK = "task" # Specific action item
|
||||||
|
DECISION = "decision" # Choice made and why
|
||||||
|
QUESTION = "question" # Open question
|
||||||
|
CONTEXT = "context" # Background information
|
||||||
|
ARTIFACT = "artifact" # Code, doc, file created
|
||||||
|
INSIGHT = "insight" # Learning or realization
|
||||||
|
BLOCKER = "blocker" # Something preventing progress
|
||||||
|
|
||||||
|
|
||||||
|
class IntentStatus(Enum):
|
||||||
|
"""Status of an intent node"""
|
||||||
|
ACTIVE = "active"
|
||||||
|
COMPLETED = "completed"
|
||||||
|
BLOCKED = "blocked"
|
||||||
|
CANCELLED = "cancelled"
|
||||||
|
PENDING = "pending"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class IntentNode:
|
||||||
|
"""A node in the intent graph"""
|
||||||
|
id: str = field(default_factory=lambda: str(uuid4()))
|
||||||
|
type: IntentType = IntentType.TASK
|
||||||
|
title: str = ""
|
||||||
|
description: str = ""
|
||||||
|
status: IntentStatus = IntentStatus.PENDING
|
||||||
|
created_at: datetime = field(default_factory=datetime.now)
|
||||||
|
updated_at: datetime = field(default_factory=datetime.now)
|
||||||
|
|
||||||
|
# The WHY - this is crucial
|
||||||
|
rationale: str = ""
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
parent_ids: Set[str] = field(default_factory=set)
|
||||||
|
child_ids: Set[str] = field(default_factory=set)
|
||||||
|
related_ids: Set[str] = field(default_factory=set)
|
||||||
|
blocks_ids: Set[str] = field(default_factory=set) # This blocks these
|
||||||
|
blocked_by_ids: Set[str] = field(default_factory=set) # Blocked by these
|
||||||
|
|
||||||
|
# Linked artifacts
|
||||||
|
file_paths: Set[str] = field(default_factory=set)
|
||||||
|
commit_hashes: Set[str] = field(default_factory=set)
|
||||||
|
urls: Set[str] = field(default_factory=set)
|
||||||
|
|
||||||
|
# Metadata
|
||||||
|
tags: Set[str] = field(default_factory=set)
|
||||||
|
priority: int = 0 # Higher = more important
|
||||||
|
effort_estimate: Optional[int] = None # In minutes
|
||||||
|
actual_effort: Optional[int] = None
|
||||||
|
|
||||||
|
# Agent collaboration
|
||||||
|
assigned_to: Optional[str] = None # Agent or human
|
||||||
|
created_by: Optional[str] = None
|
||||||
|
|
||||||
|
# Custom metadata
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
"""Serialize to dict"""
|
||||||
|
return {
|
||||||
|
'id': self.id,
|
||||||
|
'type': self.type.value,
|
||||||
|
'title': self.title,
|
||||||
|
'description': self.description,
|
||||||
|
'status': self.status.value,
|
||||||
|
'created_at': self.created_at.isoformat(),
|
||||||
|
'updated_at': self.updated_at.isoformat(),
|
||||||
|
'rationale': self.rationale,
|
||||||
|
'parent_ids': list(self.parent_ids),
|
||||||
|
'child_ids': list(self.child_ids),
|
||||||
|
'related_ids': list(self.related_ids),
|
||||||
|
'blocks_ids': list(self.blocks_ids),
|
||||||
|
'blocked_by_ids': list(self.blocked_by_ids),
|
||||||
|
'file_paths': list(self.file_paths),
|
||||||
|
'commit_hashes': list(self.commit_hashes),
|
||||||
|
'urls': list(self.urls),
|
||||||
|
'tags': list(self.tags),
|
||||||
|
'priority': self.priority,
|
||||||
|
'effort_estimate': self.effort_estimate,
|
||||||
|
'actual_effort': self.actual_effort,
|
||||||
|
'assigned_to': self.assigned_to,
|
||||||
|
'created_by': self.created_by,
|
||||||
|
'metadata': self.metadata
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class IntentGraph:
|
||||||
|
"""
|
||||||
|
The Intent Graph - tracks what we're doing and why.
|
||||||
|
|
||||||
|
This solves the biggest problem in AI-human collaboration:
|
||||||
|
context loss. Every decision, task, and artifact is connected
|
||||||
|
to its purpose and rationale.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.nodes: Dict[str, IntentNode] = {}
|
||||||
|
self.created_at = datetime.now()
|
||||||
|
self.updated_at = datetime.now()
|
||||||
|
|
||||||
|
def add_node(self, node: IntentNode) -> IntentNode:
|
||||||
|
"""Add a node to the graph"""
|
||||||
|
self.nodes[node.id] = node
|
||||||
|
self.updated_at = datetime.now()
|
||||||
|
return node
|
||||||
|
|
||||||
|
def create_goal(self, title: str, description: str = "",
|
||||||
|
rationale: str = "", **kwargs) -> IntentNode:
|
||||||
|
"""Create a goal node"""
|
||||||
|
node = IntentNode(
|
||||||
|
type=IntentType.GOAL,
|
||||||
|
title=title,
|
||||||
|
description=description,
|
||||||
|
rationale=rationale,
|
||||||
|
status=IntentStatus.ACTIVE,
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
return self.add_node(node)
|
||||||
|
|
||||||
|
def create_task(self, title: str, parent_id: Optional[str] = None,
|
||||||
|
rationale: str = "", **kwargs) -> IntentNode:
|
||||||
|
"""Create a task node, optionally linked to a parent goal"""
|
||||||
|
node = IntentNode(
|
||||||
|
type=IntentType.TASK,
|
||||||
|
title=title,
|
||||||
|
rationale=rationale,
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
if parent_id:
|
||||||
|
node.parent_ids.add(parent_id)
|
||||||
|
if parent_id in self.nodes:
|
||||||
|
self.nodes[parent_id].child_ids.add(node.id)
|
||||||
|
return self.add_node(node)
|
||||||
|
|
||||||
|
def create_decision(self, title: str, rationale: str,
|
||||||
|
alternatives_considered: List[str] = None,
|
||||||
|
**kwargs) -> IntentNode:
|
||||||
|
"""
|
||||||
|
Create a decision node - this is CRITICAL.
|
||||||
|
Always capture WHY a decision was made and what alternatives were considered.
|
||||||
|
"""
|
||||||
|
metadata = kwargs.get('metadata', {})
|
||||||
|
metadata['alternatives_considered'] = alternatives_considered or []
|
||||||
|
kwargs['metadata'] = metadata
|
||||||
|
|
||||||
|
node = IntentNode(
|
||||||
|
type=IntentType.DECISION,
|
||||||
|
title=title,
|
||||||
|
rationale=rationale,
|
||||||
|
status=IntentStatus.COMPLETED,
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
return self.add_node(node)
|
||||||
|
|
||||||
|
def link_nodes(self, from_id: str, to_id: str,
|
||||||
|
relationship: str = "related") -> None:
|
||||||
|
"""Link two nodes with a relationship"""
|
||||||
|
if from_id not in self.nodes or to_id not in self.nodes:
|
||||||
|
return
|
||||||
|
|
||||||
|
if relationship == "parent":
|
||||||
|
self.nodes[to_id].parent_ids.add(from_id)
|
||||||
|
self.nodes[from_id].child_ids.add(to_id)
|
||||||
|
elif relationship == "blocks":
|
||||||
|
self.nodes[from_id].blocks_ids.add(to_id)
|
||||||
|
self.nodes[to_id].blocked_by_ids.add(from_id)
|
||||||
|
else: # related
|
||||||
|
self.nodes[from_id].related_ids.add(to_id)
|
||||||
|
self.nodes[to_id].related_ids.add(from_id)
|
||||||
|
|
||||||
|
self.updated_at = datetime.now()
|
||||||
|
|
||||||
|
def link_artifact(self, node_id: str, artifact_path: str,
|
||||||
|
commit_hash: Optional[str] = None) -> None:
|
||||||
|
"""Link a code/doc artifact to an intent node"""
|
||||||
|
if node_id in self.nodes:
|
||||||
|
self.nodes[node_id].file_paths.add(artifact_path)
|
||||||
|
if commit_hash:
|
||||||
|
self.nodes[node_id].commit_hashes.add(commit_hash)
|
||||||
|
self.updated_at = datetime.now()
|
||||||
|
|
||||||
|
def get_context(self, node_id: str, depth: int = 2) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Get full context for a node - parents, children, related items.
|
||||||
|
This is what makes the system context-aware.
|
||||||
|
"""
|
||||||
|
if node_id not in self.nodes:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
context = {
|
||||||
|
'node': self.nodes[node_id],
|
||||||
|
'parents': [],
|
||||||
|
'children': [],
|
||||||
|
'related': [],
|
||||||
|
'blockers': [],
|
||||||
|
'artifacts': []
|
||||||
|
}
|
||||||
|
|
||||||
|
node = self.nodes[node_id]
|
||||||
|
|
||||||
|
# Get parents
|
||||||
|
for parent_id in node.parent_ids:
|
||||||
|
if parent_id in self.nodes:
|
||||||
|
context['parents'].append(self.nodes[parent_id])
|
||||||
|
|
||||||
|
# Get children
|
||||||
|
for child_id in node.child_ids:
|
||||||
|
if child_id in self.nodes:
|
||||||
|
context['children'].append(self.nodes[child_id])
|
||||||
|
|
||||||
|
# Get related
|
||||||
|
for related_id in node.related_ids:
|
||||||
|
if related_id in self.nodes:
|
||||||
|
context['related'].append(self.nodes[related_id])
|
||||||
|
|
||||||
|
# Get blockers
|
||||||
|
for blocker_id in node.blocked_by_ids:
|
||||||
|
if blocker_id in self.nodes:
|
||||||
|
context['blockers'].append(self.nodes[blocker_id])
|
||||||
|
|
||||||
|
# Get artifacts
|
||||||
|
context['artifacts'] = {
|
||||||
|
'files': list(node.file_paths),
|
||||||
|
'commits': list(node.commit_hashes),
|
||||||
|
'urls': list(node.urls)
|
||||||
|
}
|
||||||
|
|
||||||
|
return context
|
||||||
|
|
||||||
|
def get_active_goals(self) -> List[IntentNode]:
|
||||||
|
"""Get all active goals - what are we trying to accomplish?"""
|
||||||
|
return [
|
||||||
|
node for node in self.nodes.values()
|
||||||
|
if node.type == IntentType.GOAL and node.status == IntentStatus.ACTIVE
|
||||||
|
]
|
||||||
|
|
||||||
|
def get_blocked_tasks(self) -> List[IntentNode]:
|
||||||
|
"""Find all blocked tasks - what needs unblocking?"""
|
||||||
|
return [
|
||||||
|
node for node in self.nodes.values()
|
||||||
|
if node.status == IntentStatus.BLOCKED or len(node.blocked_by_ids) > 0
|
||||||
|
]
|
||||||
|
|
||||||
|
def find_by_tag(self, tag: str) -> List[IntentNode]:
|
||||||
|
"""Find all nodes with a specific tag"""
|
||||||
|
return [
|
||||||
|
node for node in self.nodes.values()
|
||||||
|
if tag in node.tags
|
||||||
|
]
|
||||||
|
|
||||||
|
def find_by_artifact(self, file_path: str) -> List[IntentNode]:
|
||||||
|
"""Find all intent nodes related to a file - WHY does this file exist?"""
|
||||||
|
return [
|
||||||
|
node for node in self.nodes.values()
|
||||||
|
if file_path in node.file_paths
|
||||||
|
]
|
||||||
|
|
||||||
|
def export_json(self, file_path: str) -> None:
|
||||||
|
"""Export the entire graph to JSON"""
|
||||||
|
data = {
|
||||||
|
'created_at': self.created_at.isoformat(),
|
||||||
|
'updated_at': self.updated_at.isoformat(),
|
||||||
|
'nodes': {
|
||||||
|
node_id: node.to_dict()
|
||||||
|
for node_id, node in self.nodes.items()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
with open(file_path, 'w') as f:
|
||||||
|
json.dump(data, f, indent=2)
|
||||||
|
|
||||||
|
def import_json(self, file_path: str) -> None:
|
||||||
|
"""Import a graph from JSON"""
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
# TODO: Implement full deserialization
|
||||||
|
# This would parse the JSON and recreate all nodes
|
||||||
|
pass
|
||||||
|
|
||||||
|
def get_summary(self) -> str:
|
||||||
|
"""Get a human-readable summary of the current state"""
|
||||||
|
goals = [n for n in self.nodes.values() if n.type == IntentType.GOAL]
|
||||||
|
tasks = [n for n in self.nodes.values() if n.type == IntentType.TASK]
|
||||||
|
decisions = [n for n in self.nodes.values() if n.type == IntentType.DECISION]
|
||||||
|
|
||||||
|
active_goals = [g for g in goals if g.status == IntentStatus.ACTIVE]
|
||||||
|
active_tasks = [t for t in tasks if t.status == IntentStatus.ACTIVE]
|
||||||
|
blocked = self.get_blocked_tasks()
|
||||||
|
|
||||||
|
summary = f"""
|
||||||
|
Intent Graph Summary
|
||||||
|
====================
|
||||||
|
Total Nodes: {len(self.nodes)}
|
||||||
|
Goals: {len(goals)} ({len(active_goals)} active)
|
||||||
|
Tasks: {len(tasks)} ({len(active_tasks)} active)
|
||||||
|
Decisions: {len(decisions)}
|
||||||
|
Blocked Items: {len(blocked)}
|
||||||
|
|
||||||
|
Active Goals:
|
||||||
|
"""
|
||||||
|
for goal in active_goals:
|
||||||
|
summary += f"\n - {goal.title}"
|
||||||
|
if goal.rationale:
|
||||||
|
summary += f"\n Why: {goal.rationale}"
|
||||||
|
|
||||||
|
if blocked:
|
||||||
|
summary += "\n\nBlocked Tasks:"
|
||||||
|
for task in blocked:
|
||||||
|
summary += f"\n - {task.title}"
|
||||||
|
if task.blocked_by_ids:
|
||||||
|
summary += f" (blocked by {len(task.blocked_by_ids)} items)"
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
|
||||||
|
# Example usage
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# Create a sample intent graph
|
||||||
|
graph = IntentGraph()
|
||||||
|
|
||||||
|
# Add a goal
|
||||||
|
goal = graph.create_goal(
|
||||||
|
title="Build a smart document management system",
|
||||||
|
rationale="Current file management is chaos. Downloads folder anarchy. Need semantic organization."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add tasks under that goal
|
||||||
|
task1 = graph.create_task(
|
||||||
|
title="Implement OCR for document scanning",
|
||||||
|
parent_id=goal.id,
|
||||||
|
rationale="Need to extract structured data from PDFs and images"
|
||||||
|
)
|
||||||
|
|
||||||
|
task2 = graph.create_task(
|
||||||
|
title="Build auto-filing system",
|
||||||
|
parent_id=goal.id,
|
||||||
|
rationale="Documents should organize themselves based on content"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Make a decision
|
||||||
|
decision = graph.create_decision(
|
||||||
|
title="Use Tesseract for OCR",
|
||||||
|
rationale="Open source, well-maintained, good accuracy, supports multiple languages",
|
||||||
|
alternatives_considered=[
|
||||||
|
"Google Cloud Vision API (too expensive for local-first OS)",
|
||||||
|
"AWS Textract (vendor lock-in)",
|
||||||
|
"pytesseract (Python wrapper around Tesseract - good option)"
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Link decision to task
|
||||||
|
graph.link_nodes(decision.id, task1.id, "related")
|
||||||
|
|
||||||
|
print(graph.get_summary())
|
||||||
407
cognitive/living_docs.py
Normal file
407
cognitive/living_docs.py
Normal file
@@ -0,0 +1,407 @@
|
|||||||
|
"""
|
||||||
|
Living Documents - Code-aware, self-updating documentation
|
||||||
|
|
||||||
|
The problem with traditional docs:
|
||||||
|
- They get out of sync with code
|
||||||
|
- They don't know what code they're documenting
|
||||||
|
- They can't update themselves
|
||||||
|
- They're disconnected from the actual system
|
||||||
|
|
||||||
|
Living Documents solve this by being:
|
||||||
|
- Code-aware: Understand what code they document
|
||||||
|
- Self-updating: Automatically update when code changes
|
||||||
|
- Context-linked: Connected to intent graph and semantic FS
|
||||||
|
- Smart: Know when they're out of date and need attention
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from enum import Enum
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Set, Any
|
||||||
|
import re
|
||||||
|
import ast
|
||||||
|
import json
|
||||||
|
|
||||||
|
|
||||||
|
class DocType(Enum):
|
||||||
|
"""Types of living documents"""
|
||||||
|
API_REFERENCE = "api_reference"
|
||||||
|
ARCHITECTURE = "architecture"
|
||||||
|
TUTORIAL = "tutorial"
|
||||||
|
GUIDE = "guide"
|
||||||
|
SPEC = "spec"
|
||||||
|
README = "readme"
|
||||||
|
CHANGELOG = "changelog"
|
||||||
|
RUNBOOK = "runbook"
|
||||||
|
|
||||||
|
|
||||||
|
class SyncStatus(Enum):
|
||||||
|
"""Document sync status with code"""
|
||||||
|
IN_SYNC = "in_sync"
|
||||||
|
OUT_OF_SYNC = "out_of_sync"
|
||||||
|
NEEDS_REVIEW = "needs_review"
|
||||||
|
UNKNOWN = "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CodeReference:
|
||||||
|
"""A reference to a piece of code"""
|
||||||
|
file_path: str
|
||||||
|
line_start: int
|
||||||
|
line_end: int
|
||||||
|
element_type: str # function, class, method, variable, etc.
|
||||||
|
element_name: str
|
||||||
|
signature: Optional[str] = None
|
||||||
|
docstring: Optional[str] = None
|
||||||
|
last_modified: Optional[datetime] = None
|
||||||
|
hash: Optional[str] = None # Hash of the referenced code
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DocumentSection:
|
||||||
|
"""A section of a living document"""
|
||||||
|
id: str
|
||||||
|
title: str
|
||||||
|
content: str
|
||||||
|
code_references: List[CodeReference] = field(default_factory=list)
|
||||||
|
last_updated: datetime = field(default_factory=datetime.now)
|
||||||
|
auto_generated: bool = False # Was this auto-generated from code?
|
||||||
|
|
||||||
|
|
||||||
|
class LivingDocument:
|
||||||
|
"""
|
||||||
|
A document that knows what code it's documenting and can update itself.
|
||||||
|
|
||||||
|
Key features:
|
||||||
|
- Tracks which code it documents
|
||||||
|
- Detects when code changes
|
||||||
|
- Can auto-update based on code changes
|
||||||
|
- Understands document structure
|
||||||
|
- Links to intent graph (WHY this doc exists)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, file_path: str, doc_type: DocType = DocType.GUIDE):
|
||||||
|
self.file_path = file_path
|
||||||
|
self.doc_type = doc_type
|
||||||
|
self.sections: List[DocumentSection] = []
|
||||||
|
self.code_references: Set[str] = set() # All files referenced
|
||||||
|
self.created_at = datetime.now()
|
||||||
|
self.last_updated = datetime.now()
|
||||||
|
self.last_sync_check = datetime.now()
|
||||||
|
self.sync_status = SyncStatus.UNKNOWN
|
||||||
|
self.intent_node_id: Optional[str] = None # Link to intent graph
|
||||||
|
self.metadata: Dict[str, Any] = {}
|
||||||
|
|
||||||
|
def parse_document(self) -> None:
|
||||||
|
"""
|
||||||
|
Parse the document and extract code references.
|
||||||
|
|
||||||
|
Looks for patterns like:
|
||||||
|
- `file.py:123` - reference to specific line
|
||||||
|
- ```python ... ``` - code blocks
|
||||||
|
- @ref(file.py:ClassName.method) - explicit references
|
||||||
|
"""
|
||||||
|
if not Path(self.file_path).exists():
|
||||||
|
return
|
||||||
|
|
||||||
|
with open(self.file_path, 'r') as f:
|
||||||
|
content = f.read()
|
||||||
|
|
||||||
|
# Extract code references
|
||||||
|
# Pattern: file_path:line_number
|
||||||
|
refs = re.findall(r'`?([a-zA-Z0-9_/.-]+\.py):(\d+)`?', content)
|
||||||
|
for file_path, line_num in refs:
|
||||||
|
self.code_references.add(file_path)
|
||||||
|
|
||||||
|
# Extract code blocks
|
||||||
|
code_blocks = re.findall(r'```(\w+)?\n(.*?)```', content, re.DOTALL)
|
||||||
|
|
||||||
|
# TODO: Parse document structure into sections
|
||||||
|
# TODO: Link code blocks to actual files if possible
|
||||||
|
|
||||||
|
def check_sync_status(self) -> SyncStatus:
|
||||||
|
"""
|
||||||
|
Check if the document is in sync with the code it references.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- IN_SYNC: All referenced code matches
|
||||||
|
- OUT_OF_SYNC: Referenced code has changed
|
||||||
|
- NEEDS_REVIEW: Can't determine automatically
|
||||||
|
"""
|
||||||
|
if not self.code_references:
|
||||||
|
return SyncStatus.UNKNOWN
|
||||||
|
|
||||||
|
# Check if referenced files have been modified since last update
|
||||||
|
for ref_file in self.code_references:
|
||||||
|
if not Path(ref_file).exists():
|
||||||
|
return SyncStatus.OUT_OF_SYNC
|
||||||
|
|
||||||
|
file_mtime = datetime.fromtimestamp(Path(ref_file).stat().st_mtime)
|
||||||
|
if file_mtime > self.last_updated:
|
||||||
|
return SyncStatus.OUT_OF_SYNC
|
||||||
|
|
||||||
|
return SyncStatus.IN_SYNC
|
||||||
|
|
||||||
|
def extract_code_info(self, file_path: str) -> List[CodeReference]:
|
||||||
|
"""
|
||||||
|
Extract information about code elements from a file.
|
||||||
|
|
||||||
|
For Python files, this extracts:
|
||||||
|
- Classes and their methods
|
||||||
|
- Functions
|
||||||
|
- Docstrings
|
||||||
|
- Signatures
|
||||||
|
"""
|
||||||
|
if not file_path.endswith('.py'):
|
||||||
|
return []
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
content = f.read()
|
||||||
|
tree = ast.parse(content)
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
references = []
|
||||||
|
|
||||||
|
for node in ast.walk(tree):
|
||||||
|
if isinstance(node, ast.FunctionDef):
|
||||||
|
ref = CodeReference(
|
||||||
|
file_path=file_path,
|
||||||
|
line_start=node.lineno,
|
||||||
|
line_end=node.end_lineno or node.lineno,
|
||||||
|
element_type='function',
|
||||||
|
element_name=node.name,
|
||||||
|
signature=self._get_function_signature(node),
|
||||||
|
docstring=ast.get_docstring(node)
|
||||||
|
)
|
||||||
|
references.append(ref)
|
||||||
|
|
||||||
|
elif isinstance(node, ast.ClassDef):
|
||||||
|
ref = CodeReference(
|
||||||
|
file_path=file_path,
|
||||||
|
line_start=node.lineno,
|
||||||
|
line_end=node.end_lineno or node.lineno,
|
||||||
|
element_type='class',
|
||||||
|
element_name=node.name,
|
||||||
|
docstring=ast.get_docstring(node)
|
||||||
|
)
|
||||||
|
references.append(ref)
|
||||||
|
|
||||||
|
return references
|
||||||
|
|
||||||
|
def _get_function_signature(self, node: ast.FunctionDef) -> str:
|
||||||
|
"""Extract function signature from AST node"""
|
||||||
|
args = []
|
||||||
|
for arg in node.args.args:
|
||||||
|
arg_str = arg.arg
|
||||||
|
if arg.annotation:
|
||||||
|
arg_str += f": {ast.unparse(arg.annotation)}"
|
||||||
|
args.append(arg_str)
|
||||||
|
|
||||||
|
return_type = ""
|
||||||
|
if node.returns:
|
||||||
|
return_type = f" -> {ast.unparse(node.returns)}"
|
||||||
|
|
||||||
|
return f"{node.name}({', '.join(args)}){return_type}"
|
||||||
|
|
||||||
|
def auto_generate_api_reference(self, code_file: str) -> str:
|
||||||
|
"""
|
||||||
|
Auto-generate API reference documentation from code.
|
||||||
|
|
||||||
|
This is the dream - docs that write themselves from code!
|
||||||
|
"""
|
||||||
|
refs = self.extract_code_info(code_file)
|
||||||
|
|
||||||
|
doc = f"# API Reference: {Path(code_file).stem}\n\n"
|
||||||
|
doc += f"*Auto-generated on {datetime.now().strftime('%Y-%m-%d %H:%M')}*\n\n"
|
||||||
|
|
||||||
|
# Group by type
|
||||||
|
classes = [r for r in refs if r.element_type == 'class']
|
||||||
|
functions = [r for r in refs if r.element_type == 'function']
|
||||||
|
|
||||||
|
if classes:
|
||||||
|
doc += "## Classes\n\n"
|
||||||
|
for cls in classes:
|
||||||
|
doc += f"### `{cls.element_name}`\n\n"
|
||||||
|
if cls.docstring:
|
||||||
|
doc += f"{cls.docstring}\n\n"
|
||||||
|
doc += f"*Defined in `{cls.file_path}:{cls.line_start}`*\n\n"
|
||||||
|
|
||||||
|
if functions:
|
||||||
|
doc += "## Functions\n\n"
|
||||||
|
for func in functions:
|
||||||
|
doc += f"### `{func.signature}`\n\n"
|
||||||
|
if func.docstring:
|
||||||
|
doc += f"{func.docstring}\n\n"
|
||||||
|
doc += f"*Defined in `{func.file_path}:{func.line_start}`*\n\n"
|
||||||
|
|
||||||
|
return doc
|
||||||
|
|
||||||
|
def update_from_code(self, code_file: str) -> None:
|
||||||
|
"""
|
||||||
|
Update this document based on changes in referenced code.
|
||||||
|
|
||||||
|
This is where the "living" part comes in - the doc updates itself!
|
||||||
|
"""
|
||||||
|
if self.doc_type == DocType.API_REFERENCE:
|
||||||
|
new_content = self.auto_generate_api_reference(code_file)
|
||||||
|
with open(self.file_path, 'w') as f:
|
||||||
|
f.write(new_content)
|
||||||
|
self.last_updated = datetime.now()
|
||||||
|
self.sync_status = SyncStatus.IN_SYNC
|
||||||
|
|
||||||
|
def add_code_reference(self, file_path: str, line_num: Optional[int] = None,
|
||||||
|
element_name: Optional[str] = None) -> str:
|
||||||
|
"""
|
||||||
|
Add a reference to code in the document.
|
||||||
|
|
||||||
|
Returns the markdown to insert in the document.
|
||||||
|
"""
|
||||||
|
self.code_references.add(file_path)
|
||||||
|
|
||||||
|
if element_name:
|
||||||
|
return f"`{file_path}:{element_name}`"
|
||||||
|
elif line_num:
|
||||||
|
return f"`{file_path}:{line_num}`"
|
||||||
|
else:
|
||||||
|
return f"`{file_path}`"
|
||||||
|
|
||||||
|
|
||||||
|
class DocManager:
|
||||||
|
"""
|
||||||
|
Manages all living documents in the system.
|
||||||
|
|
||||||
|
Responsibilities:
|
||||||
|
- Track all living documents
|
||||||
|
- Monitor code changes
|
||||||
|
- Trigger doc updates when code changes
|
||||||
|
- Generate new docs from code
|
||||||
|
- Keep docs in sync
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, index_path: str = ".living_docs_index.json"):
|
||||||
|
self.index_path = index_path
|
||||||
|
self.documents: Dict[str, LivingDocument] = {}
|
||||||
|
self.code_to_docs: Dict[str, Set[str]] = {} # Maps code files to docs that reference them
|
||||||
|
|
||||||
|
def register_document(self, doc: LivingDocument) -> None:
|
||||||
|
"""Register a living document"""
|
||||||
|
self.documents[doc.file_path] = doc
|
||||||
|
|
||||||
|
# Build reverse index: code file -> docs
|
||||||
|
for code_file in doc.code_references:
|
||||||
|
if code_file not in self.code_to_docs:
|
||||||
|
self.code_to_docs[code_file] = set()
|
||||||
|
self.code_to_docs[code_file].add(doc.file_path)
|
||||||
|
|
||||||
|
def check_all_docs(self) -> List[LivingDocument]:
|
||||||
|
"""
|
||||||
|
Check sync status of all documents.
|
||||||
|
|
||||||
|
Returns list of out-of-sync documents.
|
||||||
|
"""
|
||||||
|
out_of_sync = []
|
||||||
|
for doc in self.documents.values():
|
||||||
|
status = doc.check_sync_status()
|
||||||
|
doc.sync_status = status
|
||||||
|
doc.last_sync_check = datetime.now()
|
||||||
|
|
||||||
|
if status == SyncStatus.OUT_OF_SYNC:
|
||||||
|
out_of_sync.append(doc)
|
||||||
|
|
||||||
|
return out_of_sync
|
||||||
|
|
||||||
|
def on_code_change(self, code_file: str) -> List[str]:
|
||||||
|
"""
|
||||||
|
Handle a code file change - find and update affected docs.
|
||||||
|
|
||||||
|
Returns list of documents that were updated.
|
||||||
|
"""
|
||||||
|
if code_file not in self.code_to_docs:
|
||||||
|
return []
|
||||||
|
|
||||||
|
updated_docs = []
|
||||||
|
for doc_path in self.code_to_docs[code_file]:
|
||||||
|
if doc_path in self.documents:
|
||||||
|
doc = self.documents[doc_path]
|
||||||
|
doc.sync_status = SyncStatus.OUT_OF_SYNC
|
||||||
|
# Could auto-update here if configured
|
||||||
|
updated_docs.append(doc_path)
|
||||||
|
|
||||||
|
return updated_docs
|
||||||
|
|
||||||
|
def generate_doc(self, doc_type: DocType, code_file: str,
|
||||||
|
output_path: str) -> LivingDocument:
|
||||||
|
"""
|
||||||
|
Generate a new living document from code.
|
||||||
|
|
||||||
|
This is the magic - auto-generating docs!
|
||||||
|
"""
|
||||||
|
doc = LivingDocument(output_path, doc_type)
|
||||||
|
|
||||||
|
if doc_type == DocType.API_REFERENCE:
|
||||||
|
content = doc.auto_generate_api_reference(code_file)
|
||||||
|
with open(output_path, 'w') as f:
|
||||||
|
f.write(content)
|
||||||
|
|
||||||
|
doc.code_references.add(code_file)
|
||||||
|
self.register_document(doc)
|
||||||
|
|
||||||
|
return doc
|
||||||
|
|
||||||
|
def get_stale_docs(self, max_age_days: int = 30) -> List[LivingDocument]:
|
||||||
|
"""Find documents that haven't been updated in a while"""
|
||||||
|
cutoff = datetime.now().timestamp() - (max_age_days * 24 * 60 * 60)
|
||||||
|
stale = []
|
||||||
|
|
||||||
|
for doc in self.documents.values():
|
||||||
|
if doc.last_updated.timestamp() < cutoff:
|
||||||
|
stale.append(doc)
|
||||||
|
|
||||||
|
return stale
|
||||||
|
|
||||||
|
def export_index(self) -> None:
|
||||||
|
"""Save the document index"""
|
||||||
|
data = {
|
||||||
|
'documents': {},
|
||||||
|
'code_to_docs': {
|
||||||
|
k: list(v) for k, v in self.code_to_docs.items()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for path, doc in self.documents.items():
|
||||||
|
data['documents'][path] = {
|
||||||
|
'file_path': doc.file_path,
|
||||||
|
'doc_type': doc.doc_type.value,
|
||||||
|
'code_references': list(doc.code_references),
|
||||||
|
'sync_status': doc.sync_status.value,
|
||||||
|
'last_updated': doc.last_updated.isoformat(),
|
||||||
|
'intent_node_id': doc.intent_node_id
|
||||||
|
}
|
||||||
|
|
||||||
|
with open(self.index_path, 'w') as f:
|
||||||
|
json.dump(data, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
# Example usage
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# Create a document manager
|
||||||
|
manager = DocManager()
|
||||||
|
|
||||||
|
# Generate API docs from a code file
|
||||||
|
# doc = manager.generate_doc(
|
||||||
|
# doc_type=DocType.API_REFERENCE,
|
||||||
|
# code_file="cognitive/intent_graph.py",
|
||||||
|
# output_path="docs/api/intent_graph.md"
|
||||||
|
# )
|
||||||
|
|
||||||
|
# Check all docs for sync status
|
||||||
|
# out_of_sync = manager.check_all_docs()
|
||||||
|
# if out_of_sync:
|
||||||
|
# print(f"Found {len(out_of_sync)} out-of-sync documents:")
|
||||||
|
# for doc in out_of_sync:
|
||||||
|
# print(f" - {doc.file_path}")
|
||||||
|
|
||||||
|
print("Living Documents system initialized")
|
||||||
185
cognitive/quickstart.py
Executable file
185
cognitive/quickstart.py
Executable file
@@ -0,0 +1,185 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Cognitive OS - Quick Start Script
|
||||||
|
|
||||||
|
This demonstrates the Cognitive OS in action.
|
||||||
|
Run this to see what it can do!
|
||||||
|
"""
|
||||||
|
|
||||||
|
from cognitive import CognitiveOS, AgentRole, DocumentTemplate
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print("\n" + "=" * 70)
|
||||||
|
print(" " * 20 + "COGNITIVE OS - QUICK START")
|
||||||
|
print("=" * 70 + "\n")
|
||||||
|
|
||||||
|
print("Initializing Cognitive Operating System...")
|
||||||
|
cog = CognitiveOS(workspace_path=".")
|
||||||
|
print("✓ Cognitive OS initialized\n")
|
||||||
|
|
||||||
|
# Demo 1: Intent Graph
|
||||||
|
print("-" * 70)
|
||||||
|
print("DEMO 1: Intent Graph - Tracking Goals, Tasks, and WHY")
|
||||||
|
print("-" * 70)
|
||||||
|
|
||||||
|
goal = cog.create_goal(
|
||||||
|
title="Build a smart document management system",
|
||||||
|
description="Create a system that understands documents and organizes them automatically",
|
||||||
|
rationale="Current file management is chaos. Downloads folder anarchy. Need semantic organization."
|
||||||
|
)
|
||||||
|
|
||||||
|
task1 = cog.create_task(
|
||||||
|
"Implement OCR for document scanning",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="Need to extract structured data from PDFs and images"
|
||||||
|
)
|
||||||
|
|
||||||
|
task2 = cog.create_task(
|
||||||
|
"Build auto-filing system",
|
||||||
|
goal_id=goal.id,
|
||||||
|
rationale="Documents should organize themselves based on content"
|
||||||
|
)
|
||||||
|
|
||||||
|
decision = cog.intent_graph.create_decision(
|
||||||
|
title="Use Tesseract for OCR",
|
||||||
|
rationale="Open source, well-maintained, good accuracy, supports multiple languages",
|
||||||
|
alternatives_considered=[
|
||||||
|
"Google Cloud Vision API (too expensive for local-first OS)",
|
||||||
|
"AWS Textract (vendor lock-in)",
|
||||||
|
"Azure Computer Vision (vendor lock-in)"
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
cog.intent_graph.link_nodes(decision.id, task1.id, "related")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Demo 2: Semantic File System
|
||||||
|
print("-" * 70)
|
||||||
|
print("DEMO 2: Semantic File System - Files That Know What They Are")
|
||||||
|
print("-" * 70)
|
||||||
|
|
||||||
|
print("\nExample: If you had a resume in downloads...")
|
||||||
|
print(" Traditional: ~/Downloads/john_resume_final_v2_FINAL.pdf")
|
||||||
|
print(" Cognitive OS suggests: documents/career/resumes/john_resume_final_v2_FINAL.pdf")
|
||||||
|
print("\nBased on content analysis, not filename!")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Demo 3: Context Engine
|
||||||
|
print("-" * 70)
|
||||||
|
print("DEMO 3: Context Engine - Right Info at Right Time")
|
||||||
|
print("-" * 70)
|
||||||
|
|
||||||
|
print(f"\nGetting context for task: '{task1.title}'")
|
||||||
|
context = cog.get_context(task_id=task1.id)
|
||||||
|
|
||||||
|
print("\nRelevant context:")
|
||||||
|
for item in context.get_top_items(5):
|
||||||
|
print(f" [{item.type:15s}] {item.title}")
|
||||||
|
if item.metadata.get('rationale'):
|
||||||
|
print(f" → Why: {item.metadata['rationale']}")
|
||||||
|
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Demo 4: Agent Coordination
|
||||||
|
print("-" * 70)
|
||||||
|
print("DEMO 4: Agent Coordination - Multi-Agent Collaboration")
|
||||||
|
print("-" * 70)
|
||||||
|
|
||||||
|
from cognitive.agent_coordination import AgentInfo, HandoffType
|
||||||
|
|
||||||
|
# Register agents
|
||||||
|
coder = AgentInfo(name="CodeWriter", role=AgentRole.CODER)
|
||||||
|
reviewer = AgentInfo(name="CodeReviewer", role=AgentRole.REVIEWER)
|
||||||
|
|
||||||
|
cog.agent_coordinator.register_agent(coder)
|
||||||
|
cog.agent_coordinator.register_agent(reviewer)
|
||||||
|
|
||||||
|
print(f"\n✓ Registered agent: {coder.name} ({coder.role.value})")
|
||||||
|
print(f"✓ Registered agent: {reviewer.name} ({reviewer.role.value})")
|
||||||
|
|
||||||
|
# Create collaboration session
|
||||||
|
session = cog.agent_coordinator.create_session(
|
||||||
|
goal="Implement OCR feature",
|
||||||
|
description="Add OCR capability to document processor"
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"\n✓ Created collaboration session: {session.goal}")
|
||||||
|
|
||||||
|
# Assign task
|
||||||
|
cog.agent_coordinator.assign_task(task1.id, coder.id)
|
||||||
|
print(f"✓ Assigned task to {coder.name}")
|
||||||
|
|
||||||
|
# Create handoff
|
||||||
|
handoff = cog.agent_coordinator.create_handoff(
|
||||||
|
from_agent_id=coder.id,
|
||||||
|
to_agent_id=reviewer.id,
|
||||||
|
task_id=task1.id,
|
||||||
|
handoff_type=HandoffType.REVIEW,
|
||||||
|
message="OCR implementation complete, ready for review"
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"✓ Created handoff: {coder.name} → {reviewer.name} (for review)")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Demo 5: Smart Documents
|
||||||
|
print("-" * 70)
|
||||||
|
print("DEMO 5: Smart Documents - Templates and Auto-Formatting")
|
||||||
|
print("-" * 70)
|
||||||
|
|
||||||
|
print("\nAvailable templates:")
|
||||||
|
print(" ✓ ATS-friendly Resume (beats applicant tracking systems)")
|
||||||
|
print(" ✓ Business Plan (executive summary, financials, market analysis)")
|
||||||
|
print(" ✓ Meeting Notes (structured with action items)")
|
||||||
|
print(" ✓ Technical Spec (architecture, requirements, API docs)")
|
||||||
|
print(" ✓ And more...")
|
||||||
|
|
||||||
|
print("\nDocuments can:")
|
||||||
|
print(" • Extract text via OCR from images/PDFs")
|
||||||
|
print(" • Identify what type of document they are")
|
||||||
|
print(" • Auto-format for specific purposes (ATS, business, etc.)")
|
||||||
|
print(" • Organize themselves into correct folders")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Show overall state
|
||||||
|
print("-" * 70)
|
||||||
|
print("CURRENT STATE")
|
||||||
|
print("-" * 70)
|
||||||
|
print(cog.intent_graph.get_summary())
|
||||||
|
|
||||||
|
# Next steps
|
||||||
|
print("\n" + "=" * 70)
|
||||||
|
print("NEXT STEPS")
|
||||||
|
print("=" * 70)
|
||||||
|
print("""
|
||||||
|
1. Try it yourself:
|
||||||
|
from cognitive import CognitiveOS
|
||||||
|
cog = CognitiveOS()
|
||||||
|
goal = cog.create_goal("Your goal here", rationale="Why you're doing this")
|
||||||
|
|
||||||
|
2. Process a document:
|
||||||
|
cog.process_new_file("path/to/your/document.pdf")
|
||||||
|
|
||||||
|
3. Get context for your work:
|
||||||
|
context = cog.get_context(query="What am I working on?")
|
||||||
|
|
||||||
|
4. Check the full usage guide:
|
||||||
|
See cognitive/USAGE.md for complete examples
|
||||||
|
|
||||||
|
5. Integrate with your workflow:
|
||||||
|
- Connect to your IDE
|
||||||
|
- Watch your downloads folder
|
||||||
|
- Link to git commits
|
||||||
|
- Coordinate multiple agents
|
||||||
|
|
||||||
|
This is what AI collaboration should have been from day one.
|
||||||
|
No more context loss. No more file chaos. No more docs out of sync.
|
||||||
|
|
||||||
|
Welcome to the Cognitive OS.
|
||||||
|
""")
|
||||||
|
|
||||||
|
print("=" * 70 + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
25
cognitive/requirements.txt
Normal file
25
cognitive/requirements.txt
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
# Cognitive OS Requirements
|
||||||
|
|
||||||
|
# Optional dependencies for enhanced functionality
|
||||||
|
# Install with: pip install -r requirements.txt
|
||||||
|
|
||||||
|
# For PDF processing
|
||||||
|
pdfplumber>=0.9.0
|
||||||
|
PyPDF2>=3.0.0
|
||||||
|
|
||||||
|
# For image OCR
|
||||||
|
pytesseract>=0.3.10
|
||||||
|
Pillow>=10.0.0
|
||||||
|
|
||||||
|
# For Word document processing
|
||||||
|
python-docx>=0.8.11
|
||||||
|
|
||||||
|
# For file watching (auto-organization)
|
||||||
|
watchdog>=3.0.0
|
||||||
|
|
||||||
|
# Note: Core cognitive functionality works without these dependencies.
|
||||||
|
# They're only needed for:
|
||||||
|
# - PDF text extraction (pdfplumber, PyPDF2)
|
||||||
|
# - OCR from images (pytesseract, Pillow)
|
||||||
|
# - Word document processing (python-docx)
|
||||||
|
# - Auto-organizing new files (watchdog)
|
||||||
551
cognitive/semantic_fs.py
Normal file
551
cognitive/semantic_fs.py
Normal file
@@ -0,0 +1,551 @@
|
|||||||
|
"""
|
||||||
|
Semantic File System - Auto-organizing file management
|
||||||
|
|
||||||
|
No more downloads folder chaos. Files organize themselves based on:
|
||||||
|
- Content (what's in them)
|
||||||
|
- Purpose (why they exist)
|
||||||
|
- Context (what they're related to)
|
||||||
|
- Usage patterns (how they're accessed)
|
||||||
|
|
||||||
|
This is what file management should have been from the start.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from enum import Enum
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Set, Any
|
||||||
|
import hashlib
|
||||||
|
import mimetypes
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
|
||||||
|
|
||||||
|
class DocumentType(Enum):
|
||||||
|
"""Semantic document types - not just file extensions"""
|
||||||
|
RESUME = "resume"
|
||||||
|
COVER_LETTER = "cover_letter"
|
||||||
|
BUSINESS_PLAN = "business_plan"
|
||||||
|
TECHNICAL_SPEC = "technical_spec"
|
||||||
|
MEETING_NOTES = "meeting_notes"
|
||||||
|
FINANCIAL_DOC = "financial_doc"
|
||||||
|
CONTRACT = "contract"
|
||||||
|
RESEARCH_PAPER = "research_paper"
|
||||||
|
CODE = "code"
|
||||||
|
DATA = "data"
|
||||||
|
IMAGE = "image"
|
||||||
|
VIDEO = "video"
|
||||||
|
AUDIO = "audio"
|
||||||
|
ARCHIVE = "archive"
|
||||||
|
CONFIG = "config"
|
||||||
|
DOCUMENTATION = "documentation"
|
||||||
|
PRESENTATION = "presentation"
|
||||||
|
SPREADSHEET = "spreadsheet"
|
||||||
|
EMAIL = "email"
|
||||||
|
CHAT_LOG = "chat_log"
|
||||||
|
UNKNOWN = "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
class DocumentPurpose(Enum):
|
||||||
|
"""Why does this document exist?"""
|
||||||
|
REFERENCE = "reference" # For looking things up
|
||||||
|
ACTIVE_WORK = "active_work" # Currently working on
|
||||||
|
ARCHIVE = "archive" # Historical record
|
||||||
|
TEMPLATE = "template" # To be copied/used as starting point
|
||||||
|
COLLABORATION = "collaboration" # Shared with others
|
||||||
|
PERSONAL = "personal" # Just for me
|
||||||
|
DELIVERABLE = "deliverable" # To be sent to someone
|
||||||
|
INPUT = "input" # Source material for something else
|
||||||
|
OUTPUT = "output" # Result of a process
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SemanticMetadata:
|
||||||
|
"""Rich metadata about a file"""
|
||||||
|
# Basic info
|
||||||
|
file_path: str
|
||||||
|
file_hash: str
|
||||||
|
file_size: int
|
||||||
|
mime_type: str
|
||||||
|
created_at: datetime
|
||||||
|
modified_at: datetime
|
||||||
|
last_accessed: datetime
|
||||||
|
|
||||||
|
# Semantic classification
|
||||||
|
document_type: DocumentType = DocumentType.UNKNOWN
|
||||||
|
purpose: DocumentPurpose = DocumentPurpose.REFERENCE
|
||||||
|
confidence: float = 0.0 # Confidence in classification
|
||||||
|
|
||||||
|
# Content analysis
|
||||||
|
title: Optional[str] = None
|
||||||
|
summary: Optional[str] = None
|
||||||
|
keywords: Set[str] = field(default_factory=set)
|
||||||
|
entities: Dict[str, List[str]] = field(default_factory=dict) # people, orgs, dates, etc.
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
related_files: Set[str] = field(default_factory=set)
|
||||||
|
parent_project: Optional[str] = None
|
||||||
|
tags: Set[str] = field(default_factory=set)
|
||||||
|
|
||||||
|
# Usage patterns
|
||||||
|
access_count: int = 0
|
||||||
|
edit_count: int = 0
|
||||||
|
share_count: int = 0
|
||||||
|
|
||||||
|
# Intent graph link
|
||||||
|
intent_node_ids: Set[str] = field(default_factory=set)
|
||||||
|
|
||||||
|
# Custom metadata
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {
|
||||||
|
'file_path': self.file_path,
|
||||||
|
'file_hash': self.file_hash,
|
||||||
|
'file_size': self.file_size,
|
||||||
|
'mime_type': self.mime_type,
|
||||||
|
'created_at': self.created_at.isoformat(),
|
||||||
|
'modified_at': self.modified_at.isoformat(),
|
||||||
|
'last_accessed': self.last_accessed.isoformat(),
|
||||||
|
'document_type': self.document_type.value,
|
||||||
|
'purpose': self.purpose.value,
|
||||||
|
'confidence': self.confidence,
|
||||||
|
'title': self.title,
|
||||||
|
'summary': self.summary,
|
||||||
|
'keywords': list(self.keywords),
|
||||||
|
'entities': self.entities,
|
||||||
|
'related_files': list(self.related_files),
|
||||||
|
'parent_project': self.parent_project,
|
||||||
|
'tags': list(self.tags),
|
||||||
|
'access_count': self.access_count,
|
||||||
|
'edit_count': self.edit_count,
|
||||||
|
'share_count': self.share_count,
|
||||||
|
'intent_node_ids': list(self.intent_node_ids),
|
||||||
|
'metadata': self.metadata
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class SemanticFileSystem:
|
||||||
|
"""
|
||||||
|
A file system that understands what files ARE, not just where they're stored.
|
||||||
|
|
||||||
|
Key features:
|
||||||
|
- Auto-classification based on content
|
||||||
|
- Semantic search (find by purpose, not just name)
|
||||||
|
- Auto-organization (files suggest where they belong)
|
||||||
|
- Relationship tracking (what's related to what)
|
||||||
|
- Intent-aware (files know why they exist)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, index_path: str = ".semantic_fs_index.json"):
|
||||||
|
self.index_path = index_path
|
||||||
|
self.files: Dict[str, SemanticMetadata] = {}
|
||||||
|
self.load_index()
|
||||||
|
|
||||||
|
def load_index(self):
|
||||||
|
"""Load the semantic index from disk"""
|
||||||
|
try:
|
||||||
|
if Path(self.index_path).exists():
|
||||||
|
with open(self.index_path, 'r') as f:
|
||||||
|
# TODO: Implement full deserialization
|
||||||
|
pass
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error loading index: {e}")
|
||||||
|
|
||||||
|
def save_index(self):
|
||||||
|
"""Save the semantic index to disk"""
|
||||||
|
data = {
|
||||||
|
'files': {
|
||||||
|
path: metadata.to_dict()
|
||||||
|
for path, metadata in self.files.items()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
with open(self.index_path, 'w') as f:
|
||||||
|
json.dump(data, f, indent=2)
|
||||||
|
|
||||||
|
def analyze_file(self, file_path: str) -> SemanticMetadata:
|
||||||
|
"""
|
||||||
|
Analyze a file and extract semantic metadata.
|
||||||
|
This is where the magic happens - understanding what a file IS.
|
||||||
|
"""
|
||||||
|
path = Path(file_path)
|
||||||
|
if not path.exists():
|
||||||
|
raise FileNotFoundError(f"File not found: {file_path}")
|
||||||
|
|
||||||
|
# Basic file info
|
||||||
|
stat = path.stat()
|
||||||
|
mime_type, _ = mimetypes.guess_type(file_path)
|
||||||
|
|
||||||
|
# Compute hash
|
||||||
|
with open(file_path, 'rb') as f:
|
||||||
|
file_hash = hashlib.sha256(f.read()).hexdigest()
|
||||||
|
|
||||||
|
metadata = SemanticMetadata(
|
||||||
|
file_path=str(path.absolute()),
|
||||||
|
file_hash=file_hash,
|
||||||
|
file_size=stat.st_size,
|
||||||
|
mime_type=mime_type or "application/octet-stream",
|
||||||
|
created_at=datetime.fromtimestamp(stat.st_ctime),
|
||||||
|
modified_at=datetime.fromtimestamp(stat.st_mtime),
|
||||||
|
last_accessed=datetime.fromtimestamp(stat.st_atime)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Classify the document
|
||||||
|
doc_type, confidence = self._classify_document(file_path, mime_type)
|
||||||
|
metadata.document_type = doc_type
|
||||||
|
metadata.confidence = confidence
|
||||||
|
|
||||||
|
# Extract content if it's text-based
|
||||||
|
if self._is_text_file(mime_type):
|
||||||
|
content = self._extract_text(file_path)
|
||||||
|
metadata.keywords = self._extract_keywords(content)
|
||||||
|
metadata.entities = self._extract_entities(content)
|
||||||
|
metadata.title = self._extract_title(content, path.name)
|
||||||
|
metadata.summary = self._generate_summary(content)
|
||||||
|
|
||||||
|
# Infer purpose based on location and type
|
||||||
|
metadata.purpose = self._infer_purpose(file_path, doc_type)
|
||||||
|
|
||||||
|
return metadata
|
||||||
|
|
||||||
|
def _classify_document(self, file_path: str, mime_type: Optional[str]) -> tuple[DocumentType, float]:
|
||||||
|
"""
|
||||||
|
Classify document based on content and structure.
|
||||||
|
Returns (DocumentType, confidence_score)
|
||||||
|
"""
|
||||||
|
path = Path(file_path)
|
||||||
|
extension = path.suffix.lower()
|
||||||
|
|
||||||
|
# Extension-based classification (basic)
|
||||||
|
ext_map = {
|
||||||
|
'.py': DocumentType.CODE,
|
||||||
|
'.js': DocumentType.CODE,
|
||||||
|
'.ts': DocumentType.CODE,
|
||||||
|
'.java': DocumentType.CODE,
|
||||||
|
'.cpp': DocumentType.CODE,
|
||||||
|
'.c': DocumentType.CODE,
|
||||||
|
'.rs': DocumentType.CODE,
|
||||||
|
'.go': DocumentType.CODE,
|
||||||
|
'.pdf': DocumentType.UNKNOWN, # Need content analysis
|
||||||
|
'.docx': DocumentType.UNKNOWN, # Need content analysis
|
||||||
|
'.doc': DocumentType.UNKNOWN,
|
||||||
|
'.txt': DocumentType.UNKNOWN,
|
||||||
|
'.md': DocumentType.DOCUMENTATION,
|
||||||
|
'.csv': DocumentType.DATA,
|
||||||
|
'.json': DocumentType.DATA,
|
||||||
|
'.xml': DocumentType.DATA,
|
||||||
|
'.yaml': DocumentType.CONFIG,
|
||||||
|
'.yml': DocumentType.CONFIG,
|
||||||
|
'.png': DocumentType.IMAGE,
|
||||||
|
'.jpg': DocumentType.IMAGE,
|
||||||
|
'.jpeg': DocumentType.IMAGE,
|
||||||
|
'.gif': DocumentType.IMAGE,
|
||||||
|
'.mp4': DocumentType.VIDEO,
|
||||||
|
'.mp3': DocumentType.AUDIO,
|
||||||
|
'.zip': DocumentType.ARCHIVE,
|
||||||
|
'.tar': DocumentType.ARCHIVE,
|
||||||
|
'.gz': DocumentType.ARCHIVE,
|
||||||
|
'.pptx': DocumentType.PRESENTATION,
|
||||||
|
'.xlsx': DocumentType.SPREADSHEET,
|
||||||
|
}
|
||||||
|
|
||||||
|
if extension in ext_map:
|
||||||
|
doc_type = ext_map[extension]
|
||||||
|
if doc_type != DocumentType.UNKNOWN:
|
||||||
|
return doc_type, 0.8
|
||||||
|
|
||||||
|
# Content-based classification for unknown types
|
||||||
|
if self._is_text_file(mime_type):
|
||||||
|
content = self._extract_text(file_path)
|
||||||
|
return self._classify_by_content(content, path.name)
|
||||||
|
|
||||||
|
return DocumentType.UNKNOWN, 0.0
|
||||||
|
|
||||||
|
def _classify_by_content(self, content: str, filename: str) -> tuple[DocumentType, float]:
|
||||||
|
"""Classify document by analyzing its content"""
|
||||||
|
content_lower = content.lower()
|
||||||
|
filename_lower = filename.lower()
|
||||||
|
|
||||||
|
# Resume detection
|
||||||
|
resume_keywords = ['resume', 'curriculum vitae', 'cv', 'experience', 'education', 'skills']
|
||||||
|
resume_score = sum(1 for kw in resume_keywords if kw in content_lower or kw in filename_lower)
|
||||||
|
if resume_score >= 3:
|
||||||
|
return DocumentType.RESUME, min(0.9, 0.3 * resume_score)
|
||||||
|
|
||||||
|
# Cover letter
|
||||||
|
if ('dear' in content_lower and 'sincerely' in content_lower) or 'cover letter' in filename_lower:
|
||||||
|
return DocumentType.COVER_LETTER, 0.7
|
||||||
|
|
||||||
|
# Business plan
|
||||||
|
business_keywords = ['executive summary', 'market analysis', 'financial projections', 'business model']
|
||||||
|
if sum(1 for kw in business_keywords if kw in content_lower) >= 2:
|
||||||
|
return DocumentType.BUSINESS_PLAN, 0.8
|
||||||
|
|
||||||
|
# Technical spec
|
||||||
|
tech_keywords = ['architecture', 'requirements', 'specification', 'api', 'implementation']
|
||||||
|
if sum(1 for kw in tech_keywords if kw in content_lower) >= 2:
|
||||||
|
return DocumentType.TECHNICAL_SPEC, 0.7
|
||||||
|
|
||||||
|
# Meeting notes
|
||||||
|
meeting_keywords = ['meeting', 'attendees', 'action items', 'agenda']
|
||||||
|
if sum(1 for kw in meeting_keywords if kw in content_lower) >= 2:
|
||||||
|
return DocumentType.MEETING_NOTES, 0.7
|
||||||
|
|
||||||
|
return DocumentType.UNKNOWN, 0.0
|
||||||
|
|
||||||
|
def _infer_purpose(self, file_path: str, doc_type: DocumentType) -> DocumentPurpose:
|
||||||
|
"""Infer why this file exists based on location and type"""
|
||||||
|
path = Path(file_path)
|
||||||
|
path_lower = str(path).lower()
|
||||||
|
|
||||||
|
# Location-based inference
|
||||||
|
if 'download' in path_lower:
|
||||||
|
return DocumentPurpose.INPUT
|
||||||
|
if 'archive' in path_lower or 'backup' in path_lower:
|
||||||
|
return DocumentPurpose.ARCHIVE
|
||||||
|
if 'template' in path_lower:
|
||||||
|
return DocumentPurpose.TEMPLATE
|
||||||
|
if 'draft' in path_lower or 'wip' in path_lower:
|
||||||
|
return DocumentPurpose.ACTIVE_WORK
|
||||||
|
if 'output' in path_lower or 'export' in path_lower:
|
||||||
|
return DocumentPurpose.OUTPUT
|
||||||
|
|
||||||
|
# Type-based inference
|
||||||
|
if doc_type == DocumentType.RESUME:
|
||||||
|
return DocumentPurpose.DELIVERABLE
|
||||||
|
if doc_type == DocumentType.TEMPLATE:
|
||||||
|
return DocumentPurpose.TEMPLATE
|
||||||
|
if doc_type == DocumentType.MEETING_NOTES:
|
||||||
|
return DocumentPurpose.REFERENCE
|
||||||
|
|
||||||
|
return DocumentPurpose.REFERENCE
|
||||||
|
|
||||||
|
def _is_text_file(self, mime_type: Optional[str]) -> bool:
|
||||||
|
"""Check if file is text-based"""
|
||||||
|
if not mime_type:
|
||||||
|
return False
|
||||||
|
return mime_type.startswith('text/') or mime_type in [
|
||||||
|
'application/json',
|
||||||
|
'application/xml',
|
||||||
|
'application/javascript'
|
||||||
|
]
|
||||||
|
|
||||||
|
def _extract_text(self, file_path: str) -> str:
|
||||||
|
"""Extract text content from file"""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||||
|
return f.read()
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
def _extract_keywords(self, content: str, max_keywords: int = 20) -> Set[str]:
|
||||||
|
"""Extract important keywords from content"""
|
||||||
|
# Simple keyword extraction - in production, use TF-IDF or similar
|
||||||
|
words = re.findall(r'\b[a-z]{4,}\b', content.lower())
|
||||||
|
|
||||||
|
# Remove common words
|
||||||
|
stop_words = {'that', 'this', 'with', 'from', 'have', 'been', 'will', 'your', 'their'}
|
||||||
|
words = [w for w in words if w not in stop_words]
|
||||||
|
|
||||||
|
# Count frequency
|
||||||
|
word_freq = {}
|
||||||
|
for word in words:
|
||||||
|
word_freq[word] = word_freq.get(word, 0) + 1
|
||||||
|
|
||||||
|
# Get top keywords
|
||||||
|
top_words = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)[:max_keywords]
|
||||||
|
return set(word for word, _ in top_words)
|
||||||
|
|
||||||
|
def _extract_entities(self, content: str) -> Dict[str, List[str]]:
|
||||||
|
"""Extract named entities (people, places, orgs, dates, etc.)"""
|
||||||
|
# Simplified entity extraction - in production, use NER
|
||||||
|
entities = {
|
||||||
|
'emails': [],
|
||||||
|
'urls': [],
|
||||||
|
'dates': [],
|
||||||
|
'phone_numbers': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Extract emails
|
||||||
|
entities['emails'] = re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', content)
|
||||||
|
|
||||||
|
# Extract URLs
|
||||||
|
entities['urls'] = re.findall(r'https?://[^\s]+', content)
|
||||||
|
|
||||||
|
# Extract dates (simple patterns)
|
||||||
|
entities['dates'] = re.findall(r'\b\d{1,2}/\d{1,2}/\d{2,4}\b', content)
|
||||||
|
|
||||||
|
# Extract phone numbers (simple pattern)
|
||||||
|
entities['phone_numbers'] = re.findall(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', content)
|
||||||
|
|
||||||
|
return entities
|
||||||
|
|
||||||
|
def _extract_title(self, content: str, filename: str) -> str:
|
||||||
|
"""Extract or infer document title"""
|
||||||
|
lines = content.split('\n')
|
||||||
|
|
||||||
|
# Look for common title patterns
|
||||||
|
for line in lines[:10]: # Check first 10 lines
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Markdown heading
|
||||||
|
if line.startswith('# '):
|
||||||
|
return line[2:].strip()
|
||||||
|
|
||||||
|
# If it's a short line at the start, might be a title
|
||||||
|
if len(line) < 100 and len(line) > 5:
|
||||||
|
return line
|
||||||
|
|
||||||
|
# Fall back to filename
|
||||||
|
return Path(filename).stem.replace('_', ' ').replace('-', ' ').title()
|
||||||
|
|
||||||
|
def _generate_summary(self, content: str, max_length: int = 200) -> str:
|
||||||
|
"""Generate a brief summary of the content"""
|
||||||
|
# Simple summary - first few sentences
|
||||||
|
sentences = re.split(r'[.!?]+', content)
|
||||||
|
summary = ""
|
||||||
|
for sentence in sentences:
|
||||||
|
sentence = sentence.strip()
|
||||||
|
if not sentence:
|
||||||
|
continue
|
||||||
|
if len(summary) + len(sentence) > max_length:
|
||||||
|
break
|
||||||
|
summary += sentence + ". "
|
||||||
|
return summary.strip()
|
||||||
|
|
||||||
|
def index_file(self, file_path: str) -> SemanticMetadata:
|
||||||
|
"""Index a file in the semantic file system"""
|
||||||
|
metadata = self.analyze_file(file_path)
|
||||||
|
self.files[metadata.file_path] = metadata
|
||||||
|
self.save_index()
|
||||||
|
return metadata
|
||||||
|
|
||||||
|
def search(self, query: str, filters: Optional[Dict] = None) -> List[SemanticMetadata]:
|
||||||
|
"""
|
||||||
|
Semantic search - find files by content, purpose, type, etc.
|
||||||
|
Not just filename matching!
|
||||||
|
"""
|
||||||
|
results = []
|
||||||
|
query_lower = query.lower()
|
||||||
|
|
||||||
|
for metadata in self.files.values():
|
||||||
|
score = 0.0
|
||||||
|
|
||||||
|
# Match against title
|
||||||
|
if metadata.title and query_lower in metadata.title.lower():
|
||||||
|
score += 2.0
|
||||||
|
|
||||||
|
# Match against keywords
|
||||||
|
if any(query_lower in kw for kw in metadata.keywords):
|
||||||
|
score += 1.5
|
||||||
|
|
||||||
|
# Match against summary
|
||||||
|
if metadata.summary and query_lower in metadata.summary.lower():
|
||||||
|
score += 1.0
|
||||||
|
|
||||||
|
# Match against filename
|
||||||
|
if query_lower in Path(metadata.file_path).name.lower():
|
||||||
|
score += 0.5
|
||||||
|
|
||||||
|
# Apply filters
|
||||||
|
if filters:
|
||||||
|
if 'document_type' in filters and metadata.document_type != filters['document_type']:
|
||||||
|
continue
|
||||||
|
if 'purpose' in filters and metadata.purpose != filters['purpose']:
|
||||||
|
continue
|
||||||
|
if 'tags' in filters and not set(filters['tags']).intersection(metadata.tags):
|
||||||
|
continue
|
||||||
|
|
||||||
|
if score > 0:
|
||||||
|
results.append((metadata, score))
|
||||||
|
|
||||||
|
# Sort by score
|
||||||
|
results.sort(key=lambda x: x[1], reverse=True)
|
||||||
|
return [metadata for metadata, _ in results]
|
||||||
|
|
||||||
|
def suggest_location(self, file_path: str) -> str:
|
||||||
|
"""
|
||||||
|
Suggest where a file should be organized.
|
||||||
|
This solves the "downloads folder chaos" problem.
|
||||||
|
"""
|
||||||
|
metadata = self.analyze_file(file_path)
|
||||||
|
|
||||||
|
# Base directory structure
|
||||||
|
base_map = {
|
||||||
|
DocumentType.RESUME: "documents/career/resumes",
|
||||||
|
DocumentType.COVER_LETTER: "documents/career/cover_letters",
|
||||||
|
DocumentType.BUSINESS_PLAN: "documents/business",
|
||||||
|
DocumentType.TECHNICAL_SPEC: "documents/technical",
|
||||||
|
DocumentType.MEETING_NOTES: "documents/meetings",
|
||||||
|
DocumentType.FINANCIAL_DOC: "documents/financial",
|
||||||
|
DocumentType.CONTRACT: "documents/legal",
|
||||||
|
DocumentType.CODE: "code",
|
||||||
|
DocumentType.DATA: "data",
|
||||||
|
DocumentType.IMAGE: "media/images",
|
||||||
|
DocumentType.VIDEO: "media/videos",
|
||||||
|
DocumentType.AUDIO: "media/audio",
|
||||||
|
DocumentType.DOCUMENTATION: "docs",
|
||||||
|
DocumentType.PRESENTATION: "documents/presentations",
|
||||||
|
DocumentType.SPREADSHEET: "documents/spreadsheets",
|
||||||
|
}
|
||||||
|
|
||||||
|
base_dir = base_map.get(metadata.document_type, "misc")
|
||||||
|
|
||||||
|
# Add purpose subdirectory
|
||||||
|
if metadata.purpose == DocumentPurpose.ARCHIVE:
|
||||||
|
base_dir += "/archive"
|
||||||
|
elif metadata.purpose == DocumentPurpose.TEMPLATE:
|
||||||
|
base_dir += "/templates"
|
||||||
|
elif metadata.purpose == DocumentPurpose.ACTIVE_WORK:
|
||||||
|
base_dir += "/active"
|
||||||
|
|
||||||
|
# Add project subdirectory if applicable
|
||||||
|
if metadata.parent_project:
|
||||||
|
base_dir += f"/{metadata.parent_project}"
|
||||||
|
|
||||||
|
filename = Path(file_path).name
|
||||||
|
return f"{base_dir}/{filename}"
|
||||||
|
|
||||||
|
def auto_organize(self, file_path: str, dry_run: bool = True) -> str:
|
||||||
|
"""
|
||||||
|
Automatically organize a file based on its semantic classification.
|
||||||
|
|
||||||
|
dry_run=True: Just return where it should go
|
||||||
|
dry_run=False: Actually move the file
|
||||||
|
"""
|
||||||
|
suggested_path = self.suggest_location(file_path)
|
||||||
|
|
||||||
|
if not dry_run:
|
||||||
|
# Create directory if needed
|
||||||
|
Path(suggested_path).parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Move the file
|
||||||
|
Path(file_path).rename(suggested_path)
|
||||||
|
|
||||||
|
# Update index
|
||||||
|
if file_path in self.files:
|
||||||
|
metadata = self.files.pop(file_path)
|
||||||
|
metadata.file_path = suggested_path
|
||||||
|
self.files[suggested_path] = metadata
|
||||||
|
self.save_index()
|
||||||
|
|
||||||
|
return suggested_path
|
||||||
|
|
||||||
|
|
||||||
|
# Example usage
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sfs = SemanticFileSystem()
|
||||||
|
|
||||||
|
# Example: Analyze a resume
|
||||||
|
# metadata = sfs.index_file("~/Downloads/john_doe_resume.pdf")
|
||||||
|
# print(f"Document type: {metadata.document_type}")
|
||||||
|
# print(f"Suggested location: {sfs.suggest_location('~/Downloads/john_doe_resume.pdf')}")
|
||||||
|
|
||||||
|
# Example: Search for all resumes
|
||||||
|
# resumes = sfs.search("", filters={'document_type': DocumentType.RESUME})
|
||||||
|
# for resume in resumes:
|
||||||
|
# print(f"Found resume: {resume.title} at {resume.file_path}")
|
||||||
|
|
||||||
|
print("Semantic File System initialized")
|
||||||
707
cognitive/smart_documents.py
Normal file
707
cognitive/smart_documents.py
Normal file
@@ -0,0 +1,707 @@
|
|||||||
|
"""
|
||||||
|
Smart Document Processing - OCR, ATS-friendly, Auto-organizing
|
||||||
|
|
||||||
|
This is what document management should be:
|
||||||
|
- OCR: Extract text from images and PDFs automatically
|
||||||
|
- ATS-friendly: Format resumes/documents for applicant tracking systems
|
||||||
|
- Auto-format: Documents format themselves for their purpose
|
||||||
|
- Template matching: Automatically apply the right template
|
||||||
|
- Structure extraction: Pull out structured data from documents
|
||||||
|
- Auto-filing: Documents organize themselves (via Semantic FS)
|
||||||
|
|
||||||
|
Combines Notion + Asana + business planning + actual functionality
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from enum import Enum
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Set, Any
|
||||||
|
import re
|
||||||
|
import json
|
||||||
|
|
||||||
|
|
||||||
|
class DocumentTemplate(Enum):
|
||||||
|
"""Document templates"""
|
||||||
|
RESUME_ATS = "resume_ats"
|
||||||
|
RESUME_CREATIVE = "resume_creative"
|
||||||
|
COVER_LETTER = "cover_letter"
|
||||||
|
BUSINESS_PLAN = "business_plan"
|
||||||
|
TECHNICAL_SPEC = "technical_spec"
|
||||||
|
MEETING_NOTES = "meeting_notes"
|
||||||
|
PROPOSAL = "proposal"
|
||||||
|
INVOICE = "invoice"
|
||||||
|
CONTRACT = "contract"
|
||||||
|
REPORT = "report"
|
||||||
|
BLANK = "blank"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ExtractedText:
|
||||||
|
"""Text extracted from a document"""
|
||||||
|
content: str
|
||||||
|
confidence: float = 0.0 # OCR confidence
|
||||||
|
language: str = "en"
|
||||||
|
page_number: Optional[int] = None
|
||||||
|
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StructuredData:
|
||||||
|
"""Structured data extracted from a document"""
|
||||||
|
document_type: str
|
||||||
|
fields: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
entities: Dict[str, List[str]] = field(default_factory=dict)
|
||||||
|
sections: List[Dict[str, str]] = field(default_factory=list)
|
||||||
|
confidence: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
class SmartDocument:
|
||||||
|
"""
|
||||||
|
A smart document that can:
|
||||||
|
- Read itself (OCR)
|
||||||
|
- Format itself (templates)
|
||||||
|
- Organize itself (semantic FS)
|
||||||
|
- Update itself (living docs)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, file_path: str):
|
||||||
|
self.file_path = file_path
|
||||||
|
self.extracted_text: Optional[ExtractedText] = None
|
||||||
|
self.structured_data: Optional[StructuredData] = None
|
||||||
|
self.suggested_template: Optional[DocumentTemplate] = None
|
||||||
|
self.metadata: Dict[str, Any] = {}
|
||||||
|
|
||||||
|
def extract_text(self) -> ExtractedText:
|
||||||
|
"""
|
||||||
|
Extract text from document using OCR if needed.
|
||||||
|
|
||||||
|
For PDFs: Use pdfplumber or similar
|
||||||
|
For images: Use Tesseract OCR
|
||||||
|
For Word docs: Use python-docx
|
||||||
|
"""
|
||||||
|
ext = Path(self.file_path).suffix.lower()
|
||||||
|
|
||||||
|
# For now, simplified - in production, use actual OCR
|
||||||
|
if ext in ['.txt', '.md']:
|
||||||
|
with open(self.file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||||
|
content = f.read()
|
||||||
|
self.extracted_text = ExtractedText(
|
||||||
|
content=content,
|
||||||
|
confidence=1.0
|
||||||
|
)
|
||||||
|
|
||||||
|
elif ext == '.pdf':
|
||||||
|
# In production: use pdfplumber, PyPDF2, or similar
|
||||||
|
# try:
|
||||||
|
# import pdfplumber
|
||||||
|
# with pdfplumber.open(self.file_path) as pdf:
|
||||||
|
# content = '\n'.join(page.extract_text() for page in pdf.pages)
|
||||||
|
# self.extracted_text = ExtractedText(content=content, confidence=0.95)
|
||||||
|
# except:
|
||||||
|
# pass
|
||||||
|
self.extracted_text = ExtractedText(
|
||||||
|
content="[PDF extraction requires pdfplumber]",
|
||||||
|
confidence=0.0
|
||||||
|
)
|
||||||
|
|
||||||
|
elif ext in ['.png', '.jpg', '.jpeg', '.tiff']:
|
||||||
|
# In production: use Tesseract OCR
|
||||||
|
# try:
|
||||||
|
# import pytesseract
|
||||||
|
# from PIL import Image
|
||||||
|
# img = Image.open(self.file_path)
|
||||||
|
# content = pytesseract.image_to_string(img)
|
||||||
|
# self.extracted_text = ExtractedText(content=content, confidence=0.8)
|
||||||
|
# except:
|
||||||
|
# pass
|
||||||
|
self.extracted_text = ExtractedText(
|
||||||
|
content="[OCR requires pytesseract]",
|
||||||
|
confidence=0.0
|
||||||
|
)
|
||||||
|
|
||||||
|
elif ext in ['.docx', '.doc']:
|
||||||
|
# In production: use python-docx
|
||||||
|
# try:
|
||||||
|
# import docx
|
||||||
|
# doc = docx.Document(self.file_path)
|
||||||
|
# content = '\n'.join(para.text for para in doc.paragraphs)
|
||||||
|
# self.extracted_text = ExtractedText(content=content, confidence=1.0)
|
||||||
|
# except:
|
||||||
|
# pass
|
||||||
|
self.extracted_text = ExtractedText(
|
||||||
|
content="[DOCX extraction requires python-docx]",
|
||||||
|
confidence=0.0
|
||||||
|
)
|
||||||
|
|
||||||
|
return self.extracted_text or ExtractedText(content="", confidence=0.0)
|
||||||
|
|
||||||
|
def extract_structured_data(self) -> StructuredData:
|
||||||
|
"""
|
||||||
|
Extract structured data from the document.
|
||||||
|
|
||||||
|
For resumes: name, contact, experience, education, skills
|
||||||
|
For business plans: executive summary, financials, market analysis
|
||||||
|
For contracts: parties, terms, dates, amounts
|
||||||
|
"""
|
||||||
|
if not self.extracted_text:
|
||||||
|
self.extract_text()
|
||||||
|
|
||||||
|
content = self.extracted_text.content if self.extracted_text else ""
|
||||||
|
|
||||||
|
# Detect document type
|
||||||
|
doc_type = self._detect_document_type(content)
|
||||||
|
|
||||||
|
structured = StructuredData(document_type=doc_type)
|
||||||
|
|
||||||
|
# Extract based on type
|
||||||
|
if doc_type == "resume":
|
||||||
|
structured.fields = self._extract_resume_fields(content)
|
||||||
|
elif doc_type == "business_plan":
|
||||||
|
structured.fields = self._extract_business_plan_fields(content)
|
||||||
|
elif doc_type == "meeting_notes":
|
||||||
|
structured.fields = self._extract_meeting_fields(content)
|
||||||
|
|
||||||
|
# Extract common entities
|
||||||
|
structured.entities = self._extract_entities(content)
|
||||||
|
|
||||||
|
self.structured_data = structured
|
||||||
|
return structured
|
||||||
|
|
||||||
|
def _detect_document_type(self, content: str) -> str:
|
||||||
|
"""Detect what type of document this is"""
|
||||||
|
content_lower = content.lower()
|
||||||
|
|
||||||
|
# Resume detection
|
||||||
|
resume_indicators = ['resume', 'curriculum vitae', 'experience', 'education', 'skills']
|
||||||
|
resume_score = sum(1 for ind in resume_indicators if ind in content_lower)
|
||||||
|
if resume_score >= 3:
|
||||||
|
return "resume"
|
||||||
|
|
||||||
|
# Business plan
|
||||||
|
if 'executive summary' in content_lower and 'market analysis' in content_lower:
|
||||||
|
return "business_plan"
|
||||||
|
|
||||||
|
# Meeting notes
|
||||||
|
if 'meeting' in content_lower and 'attendees' in content_lower:
|
||||||
|
return "meeting_notes"
|
||||||
|
|
||||||
|
# Contract
|
||||||
|
if 'agreement' in content_lower and 'parties' in content_lower:
|
||||||
|
return "contract"
|
||||||
|
|
||||||
|
return "unknown"
|
||||||
|
|
||||||
|
def _extract_resume_fields(self, content: str) -> Dict[str, Any]:
|
||||||
|
"""Extract structured fields from a resume"""
|
||||||
|
fields = {}
|
||||||
|
|
||||||
|
# Extract name (usually first line or prominent text)
|
||||||
|
lines = [line.strip() for line in content.split('\n') if line.strip()]
|
||||||
|
if lines:
|
||||||
|
# Name is often the first non-empty line
|
||||||
|
fields['name'] = lines[0]
|
||||||
|
|
||||||
|
# Extract email
|
||||||
|
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
|
||||||
|
emails = re.findall(email_pattern, content)
|
||||||
|
if emails:
|
||||||
|
fields['email'] = emails[0]
|
||||||
|
|
||||||
|
# Extract phone
|
||||||
|
phone_pattern = r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
|
||||||
|
phones = re.findall(phone_pattern, content)
|
||||||
|
if phones:
|
||||||
|
fields['phone'] = phones[0]
|
||||||
|
|
||||||
|
# Extract sections
|
||||||
|
sections = {}
|
||||||
|
current_section = None
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
line_lower = line.lower()
|
||||||
|
|
||||||
|
# Detect section headers
|
||||||
|
if any(keyword in line_lower for keyword in ['experience', 'employment', 'work history']):
|
||||||
|
current_section = 'experience'
|
||||||
|
sections[current_section] = []
|
||||||
|
elif any(keyword in line_lower for keyword in ['education', 'academic']):
|
||||||
|
current_section = 'education'
|
||||||
|
sections[current_section] = []
|
||||||
|
elif any(keyword in line_lower for keyword in ['skills', 'technical skills', 'competencies']):
|
||||||
|
current_section = 'skills'
|
||||||
|
sections[current_section] = []
|
||||||
|
elif current_section:
|
||||||
|
sections[current_section].append(line)
|
||||||
|
|
||||||
|
fields['sections'] = sections
|
||||||
|
|
||||||
|
return fields
|
||||||
|
|
||||||
|
def _extract_business_plan_fields(self, content: str) -> Dict[str, Any]:
|
||||||
|
"""Extract fields from a business plan"""
|
||||||
|
fields = {}
|
||||||
|
|
||||||
|
# Look for key sections
|
||||||
|
sections = {
|
||||||
|
'executive_summary': r'executive summary[:\n](.*?)(?=\n\n[A-Z]|\Z)',
|
||||||
|
'market_analysis': r'market analysis[:\n](.*?)(?=\n\n[A-Z]|\Z)',
|
||||||
|
'financial_projections': r'financial projections[:\n](.*?)(?=\n\n[A-Z]|\Z)',
|
||||||
|
'business_model': r'business model[:\n](.*?)(?=\n\n[A-Z]|\Z)',
|
||||||
|
}
|
||||||
|
|
||||||
|
for section_name, pattern in sections.items():
|
||||||
|
match = re.search(pattern, content, re.IGNORECASE | re.DOTALL)
|
||||||
|
if match:
|
||||||
|
fields[section_name] = match.group(1).strip()
|
||||||
|
|
||||||
|
return fields
|
||||||
|
|
||||||
|
def _extract_meeting_fields(self, content: str) -> Dict[str, Any]:
|
||||||
|
"""Extract fields from meeting notes"""
|
||||||
|
fields = {}
|
||||||
|
|
||||||
|
# Extract date
|
||||||
|
date_pattern = r'\b\d{1,2}/\d{1,2}/\d{2,4}\b'
|
||||||
|
dates = re.findall(date_pattern, content)
|
||||||
|
if dates:
|
||||||
|
fields['date'] = dates[0]
|
||||||
|
|
||||||
|
# Extract attendees
|
||||||
|
attendees_match = re.search(r'attendees[:\s]+(.*?)(?=\n\n|\Z)', content, re.IGNORECASE | re.DOTALL)
|
||||||
|
if attendees_match:
|
||||||
|
attendees = [name.strip() for name in attendees_match.group(1).split(',')]
|
||||||
|
fields['attendees'] = attendees
|
||||||
|
|
||||||
|
# Extract action items
|
||||||
|
action_items = []
|
||||||
|
for line in content.split('\n'):
|
||||||
|
if any(marker in line.lower() for marker in ['action:', 'todo:', '- [ ]', 'task:']):
|
||||||
|
action_items.append(line.strip())
|
||||||
|
if action_items:
|
||||||
|
fields['action_items'] = action_items
|
||||||
|
|
||||||
|
return fields
|
||||||
|
|
||||||
|
def _extract_entities(self, content: str) -> Dict[str, List[str]]:
|
||||||
|
"""Extract named entities"""
|
||||||
|
entities = {
|
||||||
|
'emails': [],
|
||||||
|
'urls': [],
|
||||||
|
'dates': [],
|
||||||
|
'phone_numbers': [],
|
||||||
|
'monetary_amounts': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Emails
|
||||||
|
entities['emails'] = re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', content)
|
||||||
|
|
||||||
|
# URLs
|
||||||
|
entities['urls'] = re.findall(r'https?://[^\s]+', content)
|
||||||
|
|
||||||
|
# Dates
|
||||||
|
entities['dates'] = re.findall(r'\b\d{1,2}/\d{1,2}/\d{2,4}\b', content)
|
||||||
|
|
||||||
|
# Phone numbers
|
||||||
|
entities['phone_numbers'] = re.findall(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', content)
|
||||||
|
|
||||||
|
# Monetary amounts
|
||||||
|
entities['monetary_amounts'] = re.findall(r'\$[\d,]+\.?\d*', content)
|
||||||
|
|
||||||
|
return entities
|
||||||
|
|
||||||
|
def suggest_template(self) -> DocumentTemplate:
|
||||||
|
"""Suggest the best template for this document"""
|
||||||
|
if not self.structured_data:
|
||||||
|
self.extract_structured_data()
|
||||||
|
|
||||||
|
doc_type = self.structured_data.document_type if self.structured_data else "unknown"
|
||||||
|
|
||||||
|
template_map = {
|
||||||
|
"resume": DocumentTemplate.RESUME_ATS,
|
||||||
|
"business_plan": DocumentTemplate.BUSINESS_PLAN,
|
||||||
|
"meeting_notes": DocumentTemplate.MEETING_NOTES,
|
||||||
|
"contract": DocumentTemplate.CONTRACT
|
||||||
|
}
|
||||||
|
|
||||||
|
self.suggested_template = template_map.get(doc_type, DocumentTemplate.BLANK)
|
||||||
|
return self.suggested_template
|
||||||
|
|
||||||
|
def format_as_ats_friendly(self) -> str:
|
||||||
|
"""
|
||||||
|
Format document as ATS-friendly (for resumes).
|
||||||
|
|
||||||
|
ATS (Applicant Tracking System) requirements:
|
||||||
|
- Simple formatting (no tables, columns, graphics)
|
||||||
|
- Standard section headers
|
||||||
|
- Plain text
|
||||||
|
- Standard fonts
|
||||||
|
- No headers/footers
|
||||||
|
- .docx or .pdf format
|
||||||
|
"""
|
||||||
|
if not self.structured_data:
|
||||||
|
self.extract_structured_data()
|
||||||
|
|
||||||
|
if not self.structured_data or self.structured_data.document_type != "resume":
|
||||||
|
return self.extracted_text.content if self.extracted_text else ""
|
||||||
|
|
||||||
|
# Build ATS-friendly version
|
||||||
|
output = []
|
||||||
|
|
||||||
|
# Name
|
||||||
|
if 'name' in self.structured_data.fields:
|
||||||
|
output.append(self.structured_data.fields['name'].upper())
|
||||||
|
output.append('')
|
||||||
|
|
||||||
|
# Contact info
|
||||||
|
contact_parts = []
|
||||||
|
if 'email' in self.structured_data.fields:
|
||||||
|
contact_parts.append(self.structured_data.fields['email'])
|
||||||
|
if 'phone' in self.structured_data.fields:
|
||||||
|
contact_parts.append(self.structured_data.fields['phone'])
|
||||||
|
|
||||||
|
if contact_parts:
|
||||||
|
output.append(' | '.join(contact_parts))
|
||||||
|
output.append('')
|
||||||
|
|
||||||
|
# Sections
|
||||||
|
sections_data = self.structured_data.fields.get('sections', {})
|
||||||
|
|
||||||
|
# Experience section
|
||||||
|
if 'experience' in sections_data:
|
||||||
|
output.append('PROFESSIONAL EXPERIENCE')
|
||||||
|
output.append('-' * 50)
|
||||||
|
output.extend(sections_data['experience'])
|
||||||
|
output.append('')
|
||||||
|
|
||||||
|
# Education section
|
||||||
|
if 'education' in sections_data:
|
||||||
|
output.append('EDUCATION')
|
||||||
|
output.append('-' * 50)
|
||||||
|
output.extend(sections_data['education'])
|
||||||
|
output.append('')
|
||||||
|
|
||||||
|
# Skills section
|
||||||
|
if 'skills' in sections_data:
|
||||||
|
output.append('SKILLS')
|
||||||
|
output.append('-' * 50)
|
||||||
|
output.extend(sections_data['skills'])
|
||||||
|
output.append('')
|
||||||
|
|
||||||
|
return '\n'.join(output)
|
||||||
|
|
||||||
|
def apply_template(self, template: DocumentTemplate, output_path: str) -> None:
|
||||||
|
"""
|
||||||
|
Apply a template to the document and save.
|
||||||
|
|
||||||
|
Templates define structure and formatting for different document types.
|
||||||
|
"""
|
||||||
|
if not self.structured_data:
|
||||||
|
self.extract_structured_data()
|
||||||
|
|
||||||
|
content = ""
|
||||||
|
|
||||||
|
if template == DocumentTemplate.RESUME_ATS:
|
||||||
|
content = self.format_as_ats_friendly()
|
||||||
|
|
||||||
|
elif template == DocumentTemplate.MEETING_NOTES:
|
||||||
|
content = self._format_meeting_notes()
|
||||||
|
|
||||||
|
elif template == DocumentTemplate.BUSINESS_PLAN:
|
||||||
|
content = self._format_business_plan()
|
||||||
|
|
||||||
|
else:
|
||||||
|
content = self.extracted_text.content if self.extracted_text else ""
|
||||||
|
|
||||||
|
# Save to file
|
||||||
|
with open(output_path, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(content)
|
||||||
|
|
||||||
|
def _format_meeting_notes(self) -> str:
|
||||||
|
"""Format as structured meeting notes"""
|
||||||
|
if not self.structured_data:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
output = []
|
||||||
|
fields = self.structured_data.fields
|
||||||
|
|
||||||
|
output.append("# Meeting Notes")
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if 'date' in fields:
|
||||||
|
output.append(f"**Date:** {fields['date']}")
|
||||||
|
|
||||||
|
if 'attendees' in fields:
|
||||||
|
output.append(f"**Attendees:** {', '.join(fields['attendees'])}")
|
||||||
|
|
||||||
|
output.append("")
|
||||||
|
output.append("## Discussion")
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if 'action_items' in fields:
|
||||||
|
output.append("## Action Items")
|
||||||
|
output.append("")
|
||||||
|
for item in fields['action_items']:
|
||||||
|
output.append(f"- [ ] {item}")
|
||||||
|
|
||||||
|
return '\n'.join(output)
|
||||||
|
|
||||||
|
def _format_business_plan(self) -> str:
|
||||||
|
"""Format as structured business plan"""
|
||||||
|
if not self.structured_data:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
output = []
|
||||||
|
fields = self.structured_data.fields
|
||||||
|
|
||||||
|
output.append("# Business Plan")
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if 'executive_summary' in fields:
|
||||||
|
output.append("## Executive Summary")
|
||||||
|
output.append("")
|
||||||
|
output.append(fields['executive_summary'])
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if 'market_analysis' in fields:
|
||||||
|
output.append("## Market Analysis")
|
||||||
|
output.append("")
|
||||||
|
output.append(fields['market_analysis'])
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if 'business_model' in fields:
|
||||||
|
output.append("## Business Model")
|
||||||
|
output.append("")
|
||||||
|
output.append(fields['business_model'])
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if 'financial_projections' in fields:
|
||||||
|
output.append("## Financial Projections")
|
||||||
|
output.append("")
|
||||||
|
output.append(fields['financial_projections'])
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
return '\n'.join(output)
|
||||||
|
|
||||||
|
|
||||||
|
class DocumentProcessor:
|
||||||
|
"""
|
||||||
|
Central processor for smart documents.
|
||||||
|
|
||||||
|
Integrates with:
|
||||||
|
- Semantic FS (auto-filing)
|
||||||
|
- Living Docs (code-aware docs)
|
||||||
|
- Intent Graph (track purpose)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, semantic_fs=None, doc_manager=None, intent_graph=None):
|
||||||
|
self.semantic_fs = semantic_fs
|
||||||
|
self.doc_manager = doc_manager
|
||||||
|
self.intent_graph = intent_graph
|
||||||
|
|
||||||
|
def process_document(self, file_path: str, auto_organize: bool = True,
|
||||||
|
apply_template: bool = True) -> SmartDocument:
|
||||||
|
"""
|
||||||
|
Fully process a document:
|
||||||
|
1. Extract text (OCR if needed)
|
||||||
|
2. Extract structured data
|
||||||
|
3. Suggest/apply template
|
||||||
|
4. Auto-organize into correct folder
|
||||||
|
5. Link to intent graph
|
||||||
|
"""
|
||||||
|
doc = SmartDocument(file_path)
|
||||||
|
|
||||||
|
# Extract text and structure
|
||||||
|
doc.extract_text()
|
||||||
|
doc.extract_structured_data()
|
||||||
|
|
||||||
|
# Index in semantic FS
|
||||||
|
if self.semantic_fs:
|
||||||
|
self.semantic_fs.index_file(file_path)
|
||||||
|
|
||||||
|
# Auto-organize
|
||||||
|
if auto_organize:
|
||||||
|
suggested_path = self.semantic_fs.suggest_location(file_path)
|
||||||
|
print(f"Suggested location: {suggested_path}")
|
||||||
|
|
||||||
|
# Suggest template
|
||||||
|
template = doc.suggest_template()
|
||||||
|
|
||||||
|
# Apply template if requested
|
||||||
|
if apply_template and template != DocumentTemplate.BLANK:
|
||||||
|
output_path = str(Path(file_path).with_suffix('.formatted.txt'))
|
||||||
|
doc.apply_template(template, output_path)
|
||||||
|
print(f"Formatted document saved to: {output_path}")
|
||||||
|
|
||||||
|
return doc
|
||||||
|
|
||||||
|
def create_from_template(self, template: DocumentTemplate,
|
||||||
|
output_path: str, data: Optional[Dict] = None) -> str:
|
||||||
|
"""
|
||||||
|
Create a new document from a template.
|
||||||
|
|
||||||
|
This is like Notion/Asana templates but better - they're smart!
|
||||||
|
"""
|
||||||
|
content = ""
|
||||||
|
|
||||||
|
if template == DocumentTemplate.RESUME_ATS:
|
||||||
|
content = self._create_resume_template(data or {})
|
||||||
|
elif template == DocumentTemplate.MEETING_NOTES:
|
||||||
|
content = self._create_meeting_template(data or {})
|
||||||
|
elif template == DocumentTemplate.BUSINESS_PLAN:
|
||||||
|
content = self._create_business_plan_template(data or {})
|
||||||
|
|
||||||
|
with open(output_path, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(content)
|
||||||
|
|
||||||
|
# Index it
|
||||||
|
if self.semantic_fs:
|
||||||
|
self.semantic_fs.index_file(output_path)
|
||||||
|
|
||||||
|
return output_path
|
||||||
|
|
||||||
|
def _create_resume_template(self, data: Dict) -> str:
|
||||||
|
"""Create ATS-friendly resume template"""
|
||||||
|
return f"""[YOUR NAME]
|
||||||
|
|
||||||
|
[Email] | [Phone] | [LinkedIn] | [Location]
|
||||||
|
|
||||||
|
PROFESSIONAL SUMMARY
|
||||||
|
---
|
||||||
|
[2-3 sentences about your professional background and key strengths]
|
||||||
|
|
||||||
|
PROFESSIONAL EXPERIENCE
|
||||||
|
---
|
||||||
|
[Company Name] | [Job Title] | [Start Date] - [End Date]
|
||||||
|
- [Achievement or responsibility]
|
||||||
|
- [Achievement or responsibility]
|
||||||
|
- [Achievement or responsibility]
|
||||||
|
|
||||||
|
[Company Name] | [Job Title] | [Start Date] - [End Date]
|
||||||
|
- [Achievement or responsibility]
|
||||||
|
- [Achievement or responsibility]
|
||||||
|
|
||||||
|
EDUCATION
|
||||||
|
---
|
||||||
|
[Degree] in [Field] | [University Name] | [Graduation Year]
|
||||||
|
- [Relevant coursework, honors, or achievements]
|
||||||
|
|
||||||
|
SKILLS
|
||||||
|
---
|
||||||
|
Technical: [List technical skills]
|
||||||
|
Tools: [List tools and software]
|
||||||
|
Languages: [List programming/spoken languages]
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _create_meeting_template(self, data: Dict) -> str:
|
||||||
|
"""Create meeting notes template"""
|
||||||
|
date = data.get('date', '[Date]')
|
||||||
|
return f"""# Meeting Notes
|
||||||
|
|
||||||
|
**Date:** {date}
|
||||||
|
**Attendees:** [List attendees]
|
||||||
|
**Duration:** [Duration]
|
||||||
|
|
||||||
|
## Agenda
|
||||||
|
1. [Topic 1]
|
||||||
|
2. [Topic 2]
|
||||||
|
3. [Topic 3]
|
||||||
|
|
||||||
|
## Discussion
|
||||||
|
|
||||||
|
### Topic 1
|
||||||
|
[Notes]
|
||||||
|
|
||||||
|
### Topic 2
|
||||||
|
[Notes]
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
- [Decision 1]
|
||||||
|
- [Decision 2]
|
||||||
|
|
||||||
|
## Action Items
|
||||||
|
- [ ] [Action item] - [Owner] - [Due date]
|
||||||
|
- [ ] [Action item] - [Owner] - [Due date]
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
[What happens next]
|
||||||
|
|
||||||
|
## Next Meeting
|
||||||
|
**Date:** [Date]
|
||||||
|
**Time:** [Time]
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _create_business_plan_template(self, data: Dict) -> str:
|
||||||
|
"""Create business plan template"""
|
||||||
|
return """# Business Plan: [Company Name]
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
[1-2 page summary of the entire business plan]
|
||||||
|
|
||||||
|
## Company Description
|
||||||
|
- **Mission:** [Mission statement]
|
||||||
|
- **Vision:** [Vision statement]
|
||||||
|
- **Legal Structure:** [LLC, Corp, etc.]
|
||||||
|
- **Location:** [Where based]
|
||||||
|
|
||||||
|
## Market Analysis
|
||||||
|
### Target Market
|
||||||
|
- **Demographics:** [Who are your customers]
|
||||||
|
- **Market Size:** [How big is the market]
|
||||||
|
- **Growth Potential:** [Is it growing?]
|
||||||
|
|
||||||
|
### Competitive Analysis
|
||||||
|
- **Competitors:** [Who are they]
|
||||||
|
- **Competitive Advantage:** [What makes you different]
|
||||||
|
|
||||||
|
## Products and Services
|
||||||
|
[Describe what you're selling]
|
||||||
|
|
||||||
|
## Marketing and Sales Strategy
|
||||||
|
- **Marketing Channels:** [How you'll reach customers]
|
||||||
|
- **Sales Process:** [How you'll close deals]
|
||||||
|
- **Pricing Strategy:** [How you'll price]
|
||||||
|
|
||||||
|
## Operations Plan
|
||||||
|
[How the business will operate day-to-day]
|
||||||
|
|
||||||
|
## Management Team
|
||||||
|
- **Founder/CEO:** [Name and background]
|
||||||
|
- **Key Team Members:** [Names and roles]
|
||||||
|
|
||||||
|
## Financial Projections
|
||||||
|
### Revenue Projections
|
||||||
|
- Year 1: $[Amount]
|
||||||
|
- Year 2: $[Amount]
|
||||||
|
- Year 3: $[Amount]
|
||||||
|
|
||||||
|
### Expenses
|
||||||
|
- Fixed Costs: $[Amount]
|
||||||
|
- Variable Costs: $[Amount]
|
||||||
|
|
||||||
|
### Funding Needs
|
||||||
|
- **Amount Needed:** $[Amount]
|
||||||
|
- **Use of Funds:** [How money will be used]
|
||||||
|
|
||||||
|
## Appendix
|
||||||
|
[Supporting documents, charts, etc.]
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
# Example usage
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# processor = DocumentProcessor()
|
||||||
|
|
||||||
|
# Process a resume
|
||||||
|
# doc = processor.process_document("~/Downloads/resume.pdf")
|
||||||
|
# print(f"Document type: {doc.structured_data.document_type}")
|
||||||
|
# print(f"Suggested template: {doc.suggested_template}")
|
||||||
|
|
||||||
|
# Create a new document from template
|
||||||
|
# processor.create_from_template(
|
||||||
|
# DocumentTemplate.MEETING_NOTES,
|
||||||
|
# "meeting_2024_01_15.md",
|
||||||
|
# data={'date': '2024-01-15'}
|
||||||
|
# )
|
||||||
|
|
||||||
|
print("Smart Document Processing initialized")
|
||||||
Reference in New Issue
Block a user