Lean Context Protocol: Token-Efficient AI Context
Lean Context Protocol (LCP) is a context engineering layer that dramatically reduces token consumption while preserving full project awareness for AI coding assistants. It's built into Hermes Agent and available as a standalone MCP server.
Instead of dumping entire files into context (wasting tokens), LCP provides smart compression, caching, and semantic search that delivers only what the AI needs — when it needs it.
This guide covers all 10 compression modes, the architecture, integration with Hermes/MCP, and practical workflows.
The Problem: Context Window Exhaustion
When working on large codebases, AI assistants quickly exhaust their context windows:
- Dumping entire files — A 500-line file costs 2000+ tokens. Most of it is irrelevant to the task.
- Manual curation — You spend time figuring out what to include instead of solving the problem.
- Working blind — Skipping context leads to wrong assumptions and rework.
- No persistence — Every new chat starts from zero. Previous context is lost.
The result: more API calls, higher costs, lower quality answers.
The Solution: Smart Context Layer
LCP sits between the AI and your filesystem, delivering just enough context — compressed, cached, and targeted.
| Feature | Traditional Approach | Lean Context Protocol |
|---|---|---|
| File reads | Always full dump (~2000 tokens per file) | Cached re-reads cost ~13 tokens |
| Search results | Full line context (100s of lines) | Compact regex results, line numbers only |
| Shell output | Raw stdout/stderr | Pattern compression (git/npm/cargo summaries) |
| Project awareness | Explicit file listing each time | Project graph indexed once, reused forever |
| Cross-session memory | Lost between chats | Persistent knowledge graph + session recall |
10 Compression Modes
LCP offers 10 distinct read modes for different contexts. Each mode optimizes for a specific use case:
| Mode | Use Case | Output | Token Cost |
|---|---|---|---|
full | Need complete file | All lines, original formatting | 100% |
map | Understanding structure | Function/class names + line numbers | ~5% |
signatures | API surface inspection | Fn signatures, types, return types | ~8% |
diff | Recent changes | Only lines changed since last read | ~2% |
task | Working on specific task | Context-aware filtering by task keywords | Variable |
reference | Looking up specific symbols | Just the symbol definition + 5 context lines | ~3% |
aggressive | Minimal viable context | Function names + 1-line summaries | ~2% |
entropy | Auto-detects patterns | Uses entropy analysis to keep high-information lines | ~10% |
lines:N-M | Specific line range | Only lines N through M | Proportional |
fresh | Bypass cache | Fresh read, ignores cached version | 100% +13 overhead |
ctx_read(path, mode="signatures") for 92% token savings on large files. Check the function list first, then drill down with mode="reference" on specific functions.
Core Tools & API
ctx_read(path, mode)
The primary context retrieval tool. Cached, 10 compression modes, re-reads ~13 tokens.
# Default: full file (use sparingly)
ctx_read("src/auth.py")
# Only function signatures (great for API overview)
ctx_read("src/auth.py", mode="signatures")
# Specific line range
ctx_read("src/auth.py", mode="lines:100-150")
# Task-aware filtering (pass task in context)
ctx_read("src/auth.py", mode="task")
# Auto-select mode
ctx_analyze("src/auth.py") # Recommends optimal mode
ctx_search(pattern, path)
Regex search with compact output — filename::line_num:match, no full lines.
# Find all validate functions ctx_search(r"def validate", path="src/") # Search across project ctx_search(r"TODO|FIXME", path=".", max_results=50)
ctx_shell(command)
Execute shell commands with intelligent output compression. Git, npm, cargo, pytest outputs are automatically summarized.
ctx_shell("git log --oneline -10")
# Output: 10 commits condensed to hashes + subjects
ctx_shell("npm test -- --coverage")
# Output: Test summary, not 10k lines of diff output
ctx_graph & ctx_impact
Dependency graph construction and impact analysis. Know what breaks before you change it.
# Build project graph ctx_graph(action="build") # What files depend on auth.py? ctx_graph(action="related", path="src/auth.py") # Impact analysis for a change ctx_impact(action="analyze", path="src/auth.py", depth=3)
ctx_semantic_search(query)
Natural language code search — find code by what it does, not by filename.
# Find "JWT token validation"
ctx_semantic_search("JWT token validation")
# Find "rate limiting middleware"
ctx_semantic_search("rate limiting middleware")
ctx_overview(task)
Task-aware project map — shows only files relevant to your current task.
ctx_overview(task="fix login bug") # Returns: auth.py, middleware.py, test_auth.py, relevant configs
Installation & Setup
As Hermes Agent Built-in
LCP is pre-installed with Hermes Agent. No separate setup needed — it's the default context layer.
As Standalone MCP Server
# Install the LCP MCP server
pip install lean-ctx
# Add to ~/.hermes/mcp_config.json
{
"mcpServers": {
"lean-ctx": {
"command": "python",
"args": ["-m", "lean_ctx_mcp", "--root", "."]
}
}
}
Configuration
# ~/.hermes/lean_ctx_config.yaml
cache:
enabled: true
max_age_days: 7
graph_auto_build: true
compression:
default_mode: "signatures"
auto_select: true # ctx_analyze() chooses mode automatically
graph:
index:
- "src/**/*.py"
- "tests/**/*.py"
exclude:
- "**/node_modules/**"
- "**/.git/**"
- "**/venv/**"
providers:
- name: "openai"
api_key: "${OPENAI_API_KEY}"
model: "text-embedding-3-small"
Integration with Hermes Agent
Hermes Agent uses LCP as its native context layer. Every ctx_* call goes through LCP automatically.
Agent Workflow
User: "Fix the bug in auth.py where JWT tokens aren't validated on refresh"
→ Hermes receives message
→ Task classification: "bugfix", area: "auth"
→ ctx_overview(task="fix login bug") returns relevant files
→ ctx_read("src/auth.py", mode="task") fetches targeted context
→ ctx_search("validate_token", path="src/") finds relevant functions
→ ctx_shell("git log -p src/auth.py") gets recent changes
→ All context sent to LLM (90% fewer tokens than naive dump)
→ LLM produces fix
→ Hermes applies patch, runs tests, reports result
The Token Math
Typical session without LCP:
- 5 files read at 2000 tokens each = 10,000 tokens
- Git log dump = 3000 tokens
- Grep results full lines = 1500 tokens
- Total: ~14,500 tokens per session
Same session with LCP:
- 5 files @ mode="signatures" ~ 160 tokens each = 800 tokens
- Git log compressed pattern = 300 tokens
- Grep compact results = 100 tokens
- Total: ~1,200 tokens per session
Savings: 91% — and quality is equal or better because the AI gets cleaner, focused context.
Project Graph Indexing
LCP builds a persistent dependency graph of your project. Once built, it answers questions like:
- "What files import
auth.py?" - "Which tests cover
validate_token()?" - "What's the call chain from
main()toprocess_payment()?"
The graph indexes:
- Symbols — Functions, classes, methods with full signatures
- Imports — Cross-file dependencies
- Call graph — Function-to-function invocation paths
- File metadata — Last modified, size, language
Index Lifecycle
- Boot — Graph loads from
~/.hermes/graph/cache (instant) - Warm — Background scan starts for changed files
- Hot — New symbols indexed incrementally as you work
- Rebuild —
ctx_graph(action="rebuild")for fresh start
Cross-Session Memory
LCP persists memory across sessions via ctx_session. When you start a new chat, previous context is automatically available.
Session Types
| Action | What It Does |
|---|---|
load | Restore previous session (~400 tokens compressed) |
save | Persist current conversation state |
task | Set current task (affects mode selection) |
finding | Record a discovery (auto-compressed) |
decision | Record a choice made (for future reference) |
# Manually save a session ctx_session(action="save") # Load previous session (auto-restores context) ctx_session(action="load") # Record a key insight ctx_session(action="finding", value="Auth bug: validate_token() skips refresh tokens")
In Hermes Agent, session persistence is automatic — every conversation is saved and recalled on demand.
Advanced Features
Prefetch & Preload
Anticipate which files you'll need and cache them proactively.
# Preload context for a task (caches in background) ctx_prefetch(task="refactor database layer") # Fill token budget with most relevant files ctx_fill(budget=3000, paths=["src/", "tests/"])
Delta Updates
Instead of re-reading entire files, get only changed lines.
# Get changes since last read ctx_delta(path="src/auth.py")
Knowledge Consolidation
Extract patterns from a session and save them as reusable skills.
# Consolidate findings across files ctx_knowledge(action="consolidate") # Search cross-session knowledge ctx_knowledge(action="search", query="JWT validation patterns")
Auto Handoff
Share context between agents without re-reading files.
# Hand off task to another agent with full context ctx_agent(action="handoff", to_agent="reviewer", summary="Auth fix ready for review")
Complete Tool Reference
| Tool | Purpose | Example |
|---|---|---|
ctx_read | Read file with compression | ctx_read("main.py", mode="signatures") |
ctx_search | Regex code search | ctx_search("def.*test", path="tests/") |
ctx_shell | Compressed shell output | ctx_shell("git status") |
ctx_semantic_search | Natural language search | ctx_semantic_search("user authentication logic") |
ctx_overview | Task-aware project map | ctx_overview(task="add logging") |
ctx_graph | Dependency graph ops | ctx_graph(action="related", path="utils.py") |
ctx_impact | Change impact analysis | ctx_impact(path="db.py", depth=2) |
ctx_session | Cross-session memory | ctx_session(action="save") |
ctx_knowledge | Persistent fact store | ctx_knowledge(action="remember", key="auth-lib", value="Passport.js") |
ctx_agent | Multi-agent coordination | ctx_agent(action="handoff", to="tester") |
Workflow Examples
Bug Fix Workflow
1. User: "Fix the bug where password reset expires immediately"
2. Hermes → ctx_overview(task="bugfix:password-reset-expiry")
Returns: auth_service.py, email_templates.py, test_auth.py
3. ctx_read("auth_service.py", mode="task")
Returns: Only password reset related code (85% smaller)
4. ctx_search("reset_token_expiry", path="src/")
Returns: 3 function matches with line numbers
5. LLM fix generated and applied
6. ctx_shell("pytest tests/test_auth.py::test_password_reset")
Returns: Compressed test result summary
Feature Implementation Workflow
1. User: "Add rate limiter to the API"
2. ctx_overview(task="feature:rate-limiting")
Returns: middleware.py, api_server.py, config.py
3. ctx_read("middleware.py", mode="map")
Returns: Function list → identify existing middleware patterns
4. ctx_semantic_search("rate limiting")
Returns: Any existing rate-limiting code (maybe from another project)
5. ctx_graph(action="related", path="middleware.py")
Returns: Routes that use this middleware — ensure coverage
6. LLM implements feature with full structural context
Integration with Other Tools
Hermes Agent
Native, automatic. All ctx_* tools available by default. Session persistence built-in.
VS Code / Cursor
# Install the Lean Context extension
# .vscode/settings.json
{
"leanContext.enabled": true,
"leanContext.defaultMode": "signatures",
"leanContext.projectRoot": "${workspaceFolder}"
}
Claude Code CLI
# Use via MCP server
# ~/.claude/settings.json
{
"mcpServers": {
"lean-ctx": {
"command": "python",
"args": ["-m", "lean_ctx_mcp"]
}
}
}
Custom Scripts
from lean_ctx import ContextManager
ctx = ContextManager(root=".")
# Programmatic access
sig = ctx.read("main.py", mode="signatures")
matches = ctx.search("def test_")
graph = ctx.build_graph()
ctx.save_session()
Performance Benchmarks
| Operation | Without LCP | With LCP | Savings |
|---|---|---|---|
| Read 500-line file (full dump) | 2100 tokens | ~13 tokens (cached) | 99.4% |
| Read same file (subsequent) | 2100 tokens | ~13 tokens | 99.4% |
| Read with mode="signatures" | 2100 tokens | ~180 tokens | 91.4% |
| Grep 1000 matches (full lines) | 8500 tokens | ~220 tokens (compact) | 97.4% |
| git log -20 (raw) | 3200 tokens | ~150 tokens | 95.3% |
| Project overview (100 files) | ~8000 tokens | ~400 tokens (graph only) | 95% |
| Typical 30-min session | ~14,500 tokens | ~1,200 tokens | 91.7% |
Caching Strategy
LCP uses a multi-level cache:
- L1: In-memory (session) — Re-reads cost ~13 tokens (hash lookup)
- L2: Disk (~/.hermes/cache/) — Persists across restarts, 7-day TTL default
- L3: Graph index — Symbol-level cache, never expires unless file changes
Cache invalidation is hash-based (SHA-256 of file contents). If the file changes, the cache key changes automatically — no manual invalidation needed.
Cache Management
# View cache stats ctx_cache(action="status") # Clear all caches ctx_cache(action="clear") # Invalidate specific file ctx_cache(action="invalidate", path="src/auth.py") # Rebuild graph index (force) ctx_graph(action="rebuild") # Semantic embeddings reindex ctx_knowledge(action="embeddings_reindex")
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
"Cache directory not writable" | ~/.hermes/cache/ permissions | chmod 755 ~/.hermes/cache |
"Graph build failed" | Unsupported language/parser | Check ctx_graph(action="status") for unsupported files |
"Mode not recognized" | Old version | Update: pip install --upgrade lean-ctx |
"Session not found" | Session file deleted or corrupted | ctx_session(action="list") then load valid session ID |
"Provider quota exceeded" | Embedding API limit | Switch to local embeddings (disable in config) or wait for quota reset |
Verdict: Essential for AI-Native Development
Lean Context Protocol is not optional if you're using AI agents on large codebases. The token savings are too significant to ignore — 60–90% reduction is the difference between a $5/month habit and a $50/month habit.
What makes LCP stand out is that it's not just compression — it's smart compression. The different modes mean you can tailor context to exactly what the AI needs. The project graph means you get structural understanding for free. And the caching means you never pay for the same context twice.
If you're running Hermes Agent, you're already using it. If you're using Claude Code or another agent, install the MCP server — it's a force multiplier.
Pros
- 60–90% token reduction typical
- 10 specialized compression modes
- Project graph indexing (dependencies, call chains)
- Cross-session memory persistence
- Zero additional cost — built into Hermes
- Standalone MCP server available
- Active development, frequent improvements
Cons
- Initial graph build can be slow on huge repos
- Mode selection requires understanding your use case
- Cache can grow large on multi-project setups
- Some advanced features need OpenAI API (embeddings)
- Documentation scattered across repos
API Reference Quick Reference
# File Operations
ctx_read(path, mode="full|map|signatures|diff|task|reference|aggressive|entropy|lines:N-M|fresh")
ctx_delta(path) # Changes since last read
ctx_search(pattern, path, max_results=20)
ctx_semantic_search(query, top_k=10)
# Shell & External
ctx_shell(command, raw=False) # Pattern-compressed output
ctx_shell("git diff HEAD~1", raw=True) # Skip compression
# Project Graph
ctx_graph(action="build|related|symbol|impact|status", path=...)
ctx_impact(action="analyze", path, depth=5)
ctx_overview(task="...") # Task-aware project map
# Session & Memory
ctx_session(action="load|save|list|task|finding|decision")
ctx_knowledge(action="remember|recall|pattern|consolidate|gotcha")
ctx_agent(action="handoff|sync|diary")
# Cache & Performance
ctx_cache(action="status|clear|invalidate", path=...)
ctx_prefetch(task="...", budget_tokens=3000)
ctx_compress_memory(path) # Compress config files
ctx_feedback(action="record") # Latency/token tracking
# Advanced
ctx_execute(language="python", code="...") # Sandboxed execution
ctx_expand(action="retrieve", id="...") # Retrieve archived output
ctx_handoff(action="create") # Create context ledger
Repository (LCP MCP): github.com/Kilo-AI/lean-ctx-mcp
Python package: pypi.org/project/lean-ctx
Resources & Repos
- Hermes Agent (built-in LCP): github.com/Kilo-AI/hermes-agent
- LCP MCP Server: github.com/Kilo-AI/lean-ctx-mcp
- lean-ctx Python package: pypi.org/project/lean-ctx/
- Hermes CLI: github.com/Kilo-AI/hermes-cli
- Kilo Gateway: github.com/Kilo-AI/hermes-gateway
- MCP Specification: spec.modelcontextprotocol.io
- Hermes Docs: kilo-ai.gitbook.io/hermes-agent