← Back to Blog

Lean Context Protocol: Token-Efficient AI Context

Published: April 17, 2026 Tags: Tools Productivity MCP Token Optimization Read time: 10 min
~13
Re-read Tokens
10
Read Modes
60-90%
Token Reduction
0
Extra Cost

Lean Context Protocol (LCP) is a context engineering layer that dramatically reduces token consumption while preserving full project awareness for AI coding assistants. It's built into Hermes Agent and available as a standalone MCP server.

Instead of dumping entire files into context (wasting tokens), LCP provides smart compression, caching, and semantic search that delivers only what the AI needs — when it needs it.

This guide covers all 10 compression modes, the architecture, integration with Hermes/MCP, and practical workflows.

The Problem: Context Window Exhaustion

When working on large codebases, AI assistants quickly exhaust their context windows:

  • Dumping entire files — A 500-line file costs 2000+ tokens. Most of it is irrelevant to the task.
  • Manual curation — You spend time figuring out what to include instead of solving the problem.
  • Working blind — Skipping context leads to wrong assumptions and rework.
  • No persistence — Every new chat starts from zero. Previous context is lost.

The result: more API calls, higher costs, lower quality answers.

The Solution: Smart Context Layer

LCP sits between the AI and your filesystem, delivering just enough context — compressed, cached, and targeted.

FeatureTraditional ApproachLean Context Protocol
File readsAlways full dump (~2000 tokens per file)Cached re-reads cost ~13 tokens
Search resultsFull line context (100s of lines)Compact regex results, line numbers only
Shell outputRaw stdout/stderrPattern compression (git/npm/cargo summaries)
Project awarenessExplicit file listing each timeProject graph indexed once, reused forever
Cross-session memoryLost between chatsPersistent knowledge graph + session recall

10 Compression Modes

LCP offers 10 distinct read modes for different contexts. Each mode optimizes for a specific use case:

ModeUse CaseOutputToken Cost
fullNeed complete fileAll lines, original formatting100%
mapUnderstanding structureFunction/class names + line numbers~5%
signaturesAPI surface inspectionFn signatures, types, return types~8%
diffRecent changesOnly lines changed since last read~2%
taskWorking on specific taskContext-aware filtering by task keywordsVariable
referenceLooking up specific symbolsJust the symbol definition + 5 context lines~3%
aggressiveMinimal viable contextFunction names + 1-line summaries~2%
entropyAuto-detects patternsUses entropy analysis to keep high-information lines~10%
lines:N-MSpecific line rangeOnly lines N through MProportional
freshBypass cacheFresh read, ignores cached version100% +13 overhead
💡 Pro tip — Use ctx_read(path, mode="signatures") for 92% token savings on large files. Check the function list first, then drill down with mode="reference" on specific functions.

Core Tools & API

ctx_read(path, mode)

The primary context retrieval tool. Cached, 10 compression modes, re-reads ~13 tokens.

# Default: full file (use sparingly)
ctx_read("src/auth.py")

# Only function signatures (great for API overview)
ctx_read("src/auth.py", mode="signatures")

# Specific line range
ctx_read("src/auth.py", mode="lines:100-150")

# Task-aware filtering (pass task in context)
ctx_read("src/auth.py", mode="task")

# Auto-select mode
ctx_analyze("src/auth.py")  # Recommends optimal mode

ctx_search(pattern, path)

Regex search with compact output — filename::line_num:match, no full lines.

# Find all validate functions
ctx_search(r"def validate", path="src/")

# Search across project
ctx_search(r"TODO|FIXME", path=".", max_results=50)

ctx_shell(command)

Execute shell commands with intelligent output compression. Git, npm, cargo, pytest outputs are automatically summarized.

ctx_shell("git log --oneline -10")
# Output: 10 commits condensed to hashes + subjects

ctx_shell("npm test -- --coverage")
# Output: Test summary, not 10k lines of diff output

ctx_graph & ctx_impact

Dependency graph construction and impact analysis. Know what breaks before you change it.

# Build project graph
ctx_graph(action="build")

# What files depend on auth.py?
ctx_graph(action="related", path="src/auth.py")

# Impact analysis for a change
ctx_impact(action="analyze", path="src/auth.py", depth=3)

ctx_semantic_search(query)

Natural language code search — find code by what it does, not by filename.

# Find "JWT token validation"
ctx_semantic_search("JWT token validation")

# Find "rate limiting middleware"
ctx_semantic_search("rate limiting middleware")

ctx_overview(task)

Task-aware project map — shows only files relevant to your current task.

ctx_overview(task="fix login bug")
# Returns: auth.py, middleware.py, test_auth.py, relevant configs

Installation & Setup

As Hermes Agent Built-in

LCP is pre-installed with Hermes Agent. No separate setup needed — it's the default context layer.

As Standalone MCP Server

# Install the LCP MCP server
pip install lean-ctx

# Add to ~/.hermes/mcp_config.json
{
  "mcpServers": {
    "lean-ctx": {
      "command": "python",
      "args": ["-m", "lean_ctx_mcp", "--root", "."]
    }
  }
}

Configuration

# ~/.hermes/lean_ctx_config.yaml
cache:
  enabled: true
  max_age_days: 7
  graph_auto_build: true

compression:
  default_mode: "signatures"
  auto_select: true  # ctx_analyze() chooses mode automatically

graph:
  index:
    - "src/**/*.py"
    - "tests/**/*.py"
    exclude:
      - "**/node_modules/**"
      - "**/.git/**"
      - "**/venv/**"

providers:
  - name: "openai"
    api_key: "${OPENAI_API_KEY}"
    model: "text-embedding-3-small"

Integration with Hermes Agent

Hermes Agent uses LCP as its native context layer. Every ctx_* call goes through LCP automatically.

Agent Workflow

User: "Fix the bug in auth.py where JWT tokens aren't validated on refresh"

→ Hermes receives message
→ Task classification: "bugfix", area: "auth"
→ ctx_overview(task="fix login bug") returns relevant files
→ ctx_read("src/auth.py", mode="task") fetches targeted context
→ ctx_search("validate_token", path="src/") finds relevant functions
→ ctx_shell("git log -p src/auth.py") gets recent changes
→ All context sent to LLM (90% fewer tokens than naive dump)
→ LLM produces fix
→ Hermes applies patch, runs tests, reports result
How it works — Hermes Agent automatically chooses the optimal compression mode per file based on the task. You get maximum token savings without manual mode selection.
💡 Need a provider? Among our favorites is Z.AI — their coding plans cover GLM 5.1 and GLM 5 Turbo, both available as API endpoints you can plug into any agent. Use this link for 10% off any Z.AI coding plan.

The Token Math

Typical session without LCP:

  • 5 files read at 2000 tokens each = 10,000 tokens
  • Git log dump = 3000 tokens
  • Grep results full lines = 1500 tokens
  • Total: ~14,500 tokens per session

Same session with LCP:

  • 5 files @ mode="signatures" ~ 160 tokens each = 800 tokens
  • Git log compressed pattern = 300 tokens
  • Grep compact results = 100 tokens
  • Total: ~1,200 tokens per session

Savings: 91% — and quality is equal or better because the AI gets cleaner, focused context.

Project Graph Indexing

LCP builds a persistent dependency graph of your project. Once built, it answers questions like:

  • "What files import auth.py?"
  • "Which tests cover validate_token()?"
  • "What's the call chain from main() to process_payment()?"

The graph indexes:

  • Symbols — Functions, classes, methods with full signatures
  • Imports — Cross-file dependencies
  • Call graph — Function-to-function invocation paths
  • File metadata — Last modified, size, language

Index Lifecycle

  1. Boot — Graph loads from ~/.hermes/graph/ cache (instant)
  2. Warm — Background scan starts for changed files
  3. Hot — New symbols indexed incrementally as you work
  4. Rebuildctx_graph(action="rebuild") for fresh start

Cross-Session Memory

LCP persists memory across sessions via ctx_session. When you start a new chat, previous context is automatically available.

Session Types

ActionWhat It Does
loadRestore previous session (~400 tokens compressed)
savePersist current conversation state
taskSet current task (affects mode selection)
findingRecord a discovery (auto-compressed)
decisionRecord a choice made (for future reference)
# Manually save a session
ctx_session(action="save")

# Load previous session (auto-restores context)
ctx_session(action="load")

# Record a key insight
ctx_session(action="finding", value="Auth bug: validate_token() skips refresh tokens")

In Hermes Agent, session persistence is automatic — every conversation is saved and recalled on demand.

Advanced Features

Prefetch & Preload

Anticipate which files you'll need and cache them proactively.

# Preload context for a task (caches in background)
ctx_prefetch(task="refactor database layer")

# Fill token budget with most relevant files
ctx_fill(budget=3000, paths=["src/", "tests/"])

Delta Updates

Instead of re-reading entire files, get only changed lines.

# Get changes since last read
ctx_delta(path="src/auth.py")

Knowledge Consolidation

Extract patterns from a session and save them as reusable skills.

# Consolidate findings across files
ctx_knowledge(action="consolidate")

# Search cross-session knowledge
ctx_knowledge(action="search", query="JWT validation patterns")

Auto Handoff

Share context between agents without re-reading files.

# Hand off task to another agent with full context
ctx_agent(action="handoff", to_agent="reviewer", summary="Auth fix ready for review")

Complete Tool Reference

ToolPurposeExample
ctx_readRead file with compressionctx_read("main.py", mode="signatures")
ctx_searchRegex code searchctx_search("def.*test", path="tests/")
ctx_shellCompressed shell outputctx_shell("git status")
ctx_semantic_searchNatural language searchctx_semantic_search("user authentication logic")
ctx_overviewTask-aware project mapctx_overview(task="add logging")
ctx_graphDependency graph opsctx_graph(action="related", path="utils.py")
ctx_impactChange impact analysisctx_impact(path="db.py", depth=2)
ctx_sessionCross-session memoryctx_session(action="save")
ctx_knowledgePersistent fact storectx_knowledge(action="remember", key="auth-lib", value="Passport.js")
ctx_agentMulti-agent coordinationctx_agent(action="handoff", to="tester")

Workflow Examples

Bug Fix Workflow

1. User: "Fix the bug where password reset expires immediately"

2. Hermes → ctx_overview(task="bugfix:password-reset-expiry")
   Returns: auth_service.py, email_templates.py, test_auth.py

3. ctx_read("auth_service.py", mode="task")
   Returns: Only password reset related code (85% smaller)

4. ctx_search("reset_token_expiry", path="src/")
   Returns: 3 function matches with line numbers

5. LLM fix generated and applied

6. ctx_shell("pytest tests/test_auth.py::test_password_reset")
   Returns: Compressed test result summary

Feature Implementation Workflow

1. User: "Add rate limiter to the API"

2. ctx_overview(task="feature:rate-limiting")
   Returns: middleware.py, api_server.py, config.py

3. ctx_read("middleware.py", mode="map")
   Returns: Function list → identify existing middleware patterns

4. ctx_semantic_search("rate limiting")
   Returns: Any existing rate-limiting code (maybe from another project)

5. ctx_graph(action="related", path="middleware.py")
   Returns: Routes that use this middleware — ensure coverage

6. LLM implements feature with full structural context

Integration with Other Tools

Hermes Agent

Native, automatic. All ctx_* tools available by default. Session persistence built-in.

VS Code / Cursor

# Install the Lean Context extension
# .vscode/settings.json
{
  "leanContext.enabled": true,
  "leanContext.defaultMode": "signatures",
  "leanContext.projectRoot": "${workspaceFolder}"
}

Claude Code CLI

# Use via MCP server
# ~/.claude/settings.json
{
  "mcpServers": {
    "lean-ctx": {
      "command": "python",
      "args": ["-m", "lean_ctx_mcp"]
    }
  }
}

Custom Scripts

from lean_ctx import ContextManager

ctx = ContextManager(root=".")

# Programmatic access
sig = ctx.read("main.py", mode="signatures")
matches = ctx.search("def test_")
graph = ctx.build_graph()
ctx.save_session()

Performance Benchmarks

OperationWithout LCPWith LCPSavings
Read 500-line file (full dump)2100 tokens~13 tokens (cached)99.4%
Read same file (subsequent)2100 tokens~13 tokens99.4%
Read with mode="signatures"2100 tokens~180 tokens91.4%
Grep 1000 matches (full lines)8500 tokens~220 tokens (compact)97.4%
git log -20 (raw)3200 tokens~150 tokens95.3%
Project overview (100 files)~8000 tokens~400 tokens (graph only)95%
Typical 30-min session~14,500 tokens~1,200 tokens91.7%
💡 Real-world result — The system pays for itself in the first 2–3 interactions. At $0.01 per 1K tokens (Opus pricing), that's $0.14 saved per session. On high-volume usage, LCP pays for itself in under a week.

Caching Strategy

LCP uses a multi-level cache:

  • L1: In-memory (session) — Re-reads cost ~13 tokens (hash lookup)
  • L2: Disk (~/.hermes/cache/) — Persists across restarts, 7-day TTL default
  • L3: Graph index — Symbol-level cache, never expires unless file changes

Cache invalidation is hash-based (SHA-256 of file contents). If the file changes, the cache key changes automatically — no manual invalidation needed.

Cache Management

# View cache stats
ctx_cache(action="status")

# Clear all caches
ctx_cache(action="clear")

# Invalidate specific file
ctx_cache(action="invalidate", path="src/auth.py")

# Rebuild graph index (force)
ctx_graph(action="rebuild")

# Semantic embeddings reindex
ctx_knowledge(action="embeddings_reindex")

Troubleshooting

SymptomCauseFix
"Cache directory not writable"~/.hermes/cache/ permissionschmod 755 ~/.hermes/cache
"Graph build failed"Unsupported language/parserCheck ctx_graph(action="status") for unsupported files
"Mode not recognized"Old versionUpdate: pip install --upgrade lean-ctx
"Session not found"Session file deleted or corruptedctx_session(action="list") then load valid session ID
"Provider quota exceeded"Embedding API limitSwitch to local embeddings (disable in config) or wait for quota reset

Verdict: Essential for AI-Native Development

Lean Context Protocol is not optional if you're using AI agents on large codebases. The token savings are too significant to ignore — 60–90% reduction is the difference between a $5/month habit and a $50/month habit.

What makes LCP stand out is that it's not just compression — it's smart compression. The different modes mean you can tailor context to exactly what the AI needs. The project graph means you get structural understanding for free. And the caching means you never pay for the same context twice.

If you're running Hermes Agent, you're already using it. If you're using Claude Code or another agent, install the MCP server — it's a force multiplier.

Pros

  • 60–90% token reduction typical
  • 10 specialized compression modes
  • Project graph indexing (dependencies, call chains)
  • Cross-session memory persistence
  • Zero additional cost — built into Hermes
  • Standalone MCP server available
  • Active development, frequent improvements

Cons

  • Initial graph build can be slow on huge repos
  • Mode selection requires understanding your use case
  • Cache can grow large on multi-project setups
  • Some advanced features need OpenAI API (embeddings)
  • Documentation scattered across repos

API Reference Quick Reference

# File Operations
ctx_read(path, mode="full|map|signatures|diff|task|reference|aggressive|entropy|lines:N-M|fresh")
ctx_delta(path)                    # Changes since last read
ctx_search(pattern, path, max_results=20)
ctx_semantic_search(query, top_k=10)

# Shell & External
ctx_shell(command, raw=False)       # Pattern-compressed output
ctx_shell("git diff HEAD~1", raw=True)  # Skip compression

# Project Graph
ctx_graph(action="build|related|symbol|impact|status", path=...)
ctx_impact(action="analyze", path, depth=5)
ctx_overview(task="...")           # Task-aware project map

# Session & Memory
ctx_session(action="load|save|list|task|finding|decision")
ctx_knowledge(action="remember|recall|pattern|consolidate|gotcha")
ctx_agent(action="handoff|sync|diary")

# Cache & Performance
ctx_cache(action="status|clear|invalidate", path=...)
ctx_prefetch(task="...", budget_tokens=3000)
ctx_compress_memory(path)          # Compress config files
ctx_feedback(action="record")      # Latency/token tracking

# Advanced
ctx_execute(language="python", code="...")  # Sandboxed execution
ctx_expand(action="retrieve", id="...")    # Retrieve archived output
ctx_handoff(action="create")       # Create context ledger

Repository (LCP MCP): github.com/Kilo-AI/lean-ctx-mcp

Python package: pypi.org/project/lean-ctx

Resources & Repos