Lean Context Protocol: Token-Efficient AI Context

Published: April 17, 2026 Tags: Tools Productivity MCP Token Optimization Read time: 10 min

~13

Re-read Tokens

Read Modes

60-90%

Token Reduction

Extra Cost

Lean Context Protocol (LCP) is a context engineering layer that dramatically reduces token consumption while preserving full project awareness for AI coding assistants. It's built into Hermes Agent and available as a standalone MCP server.

Instead of dumping entire files into context (wasting tokens), LCP provides smart compression, caching, and semantic search that delivers only what the AI needs — when it needs it.

This guide covers all 10 compression modes, the architecture, integration with Hermes/MCP, and practical workflows.

The Problem: Context Window Exhaustion

When working on large codebases, AI assistants quickly exhaust their context windows:

Dumping entire files — A 500-line file costs 2000+ tokens. Most of it is irrelevant to the task.
Manual curation — You spend time figuring out what to include instead of solving the problem.
Working blind — Skipping context leads to wrong assumptions and rework.
No persistence — Every new chat starts from zero. Previous context is lost.

The result: more API calls, higher costs, lower quality answers.

The Solution: Smart Context Layer

LCP sits between the AI and your filesystem, delivering just enough context — compressed, cached, and targeted.

Feature	Traditional Approach	Lean Context Protocol
File reads	Always full dump (~2000 tokens per file)	Cached re-reads cost ~13 tokens
Search results	Full line context (100s of lines)	Compact regex results, line numbers only
Shell output	Raw stdout/stderr	Pattern compression (git/npm/cargo summaries)
Project awareness	Explicit file listing each time	Project graph indexed once, reused forever
Cross-session memory	Lost between chats	Persistent knowledge graph + session recall

10 Compression Modes

LCP offers 10 distinct read modes for different contexts. Each mode optimizes for a specific use case:

Mode	Use Case	Output	Token Cost
`full`	Need complete file	All lines, original formatting	100%
`map`	Understanding structure	Function/class names + line numbers	~5%
`signatures`	API surface inspection	Fn signatures, types, return types	~8%
`diff`	Recent changes	Only lines changed since last read	~2%
`task`	Working on specific task	Context-aware filtering by task keywords	Variable
`reference`	Looking up specific symbols	Just the symbol definition + 5 context lines	~3%
`aggressive`	Minimal viable context	Function names + 1-line summaries	~2%
`entropy`	Auto-detects patterns	Uses entropy analysis to keep high-information lines	~10%
`lines:N-M`	Specific line range	Only lines N through M	Proportional
`fresh`	Bypass cache	Fresh read, ignores cached version	100% +13 overhead

💡 Pro tip — Use ctx_read(path, mode="signatures") for 92% token savings on large files. Check the function list first, then drill down with mode="reference" on specific functions.

Core Tools & API

ctx_read(path, mode)

The primary context retrieval tool. Cached, 10 compression modes, re-reads ~13 tokens.

# Default: full file (use sparingly)
ctx_read("src/auth.py")

# Only function signatures (great for API overview)
ctx_read("src/auth.py", mode="signatures")

# Specific line range
ctx_read("src/auth.py", mode="lines:100-150")

# Task-aware filtering (pass task in context)
ctx_read("src/auth.py", mode="task")

# Auto-select mode
ctx_analyze("src/auth.py")  # Recommends optimal mode

ctx_search(pattern, path)

Regex search with compact output — filename::line_num:match, no full lines.

# Find all validate functions
ctx_search(r"def validate", path="src/")

# Search across project
ctx_search(r"TODO|FIXME", path=".", max_results=50)

ctx_shell(command)

Execute shell commands with intelligent output compression. Git, npm, cargo, pytest outputs are automatically summarized.

ctx_shell("git log --oneline -10")
# Output: 10 commits condensed to hashes + subjects

ctx_shell("npm test -- --coverage")
# Output: Test summary, not 10k lines of diff output

ctx_graph & ctx_impact

Dependency graph construction and impact analysis. Know what breaks before you change it.

# Build project graph
ctx_graph(action="build")

# What files depend on auth.py?
ctx_graph(action="related", path="src/auth.py")

# Impact analysis for a change
ctx_impact(action="analyze", path="src/auth.py", depth=3)

ctx_semantic_search(query)

Natural language code search — find code by what it does, not by filename.

# Find "JWT token validation"
ctx_semantic_search("JWT token validation")

# Find "rate limiting middleware"
ctx_semantic_search("rate limiting middleware")

ctx_overview(task)

Task-aware project map — shows only files relevant to your current task.

ctx_overview(task="fix login bug")
# Returns: auth.py, middleware.py, test_auth.py, relevant configs

Installation & Setup

As Hermes Agent Built-in

LCP is pre-installed with Hermes Agent. No separate setup needed — it's the default context layer.

As Standalone MCP Server

# Install the LCP MCP server
pip install lean-ctx

# Add to ~/.hermes/mcp_config.json
{
  "mcpServers": {
    "lean-ctx": {
      "command": "python",
      "args": ["-m", "lean_ctx_mcp", "--root", "."]
    }
  }
}

Configuration

# ~/.hermes/lean_ctx_config.yaml
cache:
  enabled: true
  max_age_days: 7
  graph_auto_build: true

compression:
  default_mode: "signatures"
  auto_select: true  # ctx_analyze() chooses mode automatically

graph:
  index:
    - "src/**/*.py"
    - "tests/**/*.py"
    exclude:
      - "**/node_modules/**"
      - "**/.git/**"
      - "**/venv/**"

providers:
  - name: "openai"
    api_key: "${OPENAI_API_KEY}"
    model: "text-embedding-3-small"

Integration with Hermes Agent

Hermes Agent uses LCP as its native context layer. Every ctx_* call goes through LCP automatically.

Agent Workflow

User: "Fix the bug in auth.py where JWT tokens aren't validated on refresh"

→ Hermes receives message
→ Task classification: "bugfix", area: "auth"
→ ctx_overview(task="fix login bug") returns relevant files
→ ctx_read("src/auth.py", mode="task") fetches targeted context
→ ctx_search("validate_token", path="src/") finds relevant functions
→ ctx_shell("git log -p src/auth.py") gets recent changes
→ All context sent to LLM (90% fewer tokens than naive dump)
→ LLM produces fix
→ Hermes applies patch, runs tests, reports result

How it works — Hermes Agent automatically chooses the optimal compression mode per file based on the task. You get maximum token savings without manual mode selection.

💡 Need a provider? Among our favorites is Z.AI — their coding plans cover GLM 5.1 and GLM 5 Turbo, both available as API endpoints you can plug into any agent. Use this link for 10% off any Z.AI coding plan.

The Token Math

Typical session without LCP:

5 files read at 2000 tokens each = 10,000 tokens
Git log dump = 3000 tokens
Grep results full lines = 1500 tokens
Total: ~14,500 tokens per session

Same session with LCP:

5 files @ mode="signatures" ~ 160 tokens each = 800 tokens
Git log compressed pattern = 300 tokens
Grep compact results = 100 tokens
Total: ~1,200 tokens per session

Savings: 91% — and quality is equal or better because the AI gets cleaner, focused context.

Project Graph Indexing

LCP builds a persistent dependency graph of your project. Once built, it answers questions like:

"What files import auth.py?"
"Which tests cover validate_token()?"
"What's the call chain from main() to process_payment()?"

The graph indexes:

Symbols — Functions, classes, methods with full signatures
Imports — Cross-file dependencies
Call graph — Function-to-function invocation paths
File metadata — Last modified, size, language

Index Lifecycle

Boot — Graph loads from ~/.hermes/graph/ cache (instant)
Warm — Background scan starts for changed files
Hot — New symbols indexed incrementally as you work
Rebuild — ctx_graph(action="rebuild") for fresh start

Cross-Session Memory

LCP persists memory across sessions via ctx_session. When you start a new chat, previous context is automatically available.

Session Types

Action	What It Does
`load`	Restore previous session (~400 tokens compressed)
`save`	Persist current conversation state
`task`	Set current task (affects mode selection)
`finding`	Record a discovery (auto-compressed)
`decision`	Record a choice made (for future reference)

# Manually save a session
ctx_session(action="save")

# Load previous session (auto-restores context)
ctx_session(action="load")

# Record a key insight
ctx_session(action="finding", value="Auth bug: validate_token() skips refresh tokens")

In Hermes Agent, session persistence is automatic — every conversation is saved and recalled on demand.

Advanced Features

Prefetch & Preload

Anticipate which files you'll need and cache them proactively.

# Preload context for a task (caches in background)
ctx_prefetch(task="refactor database layer")

# Fill token budget with most relevant files
ctx_fill(budget=3000, paths=["src/", "tests/"])

Delta Updates

Instead of re-reading entire files, get only changed lines.

# Get changes since last read
ctx_delta(path="src/auth.py")

Knowledge Consolidation

Extract patterns from a session and save them as reusable skills.

# Consolidate findings across files
ctx_knowledge(action="consolidate")

# Search cross-session knowledge
ctx_knowledge(action="search", query="JWT validation patterns")

Auto Handoff

Share context between agents without re-reading files.

# Hand off task to another agent with full context
ctx_agent(action="handoff", to_agent="reviewer", summary="Auth fix ready for review")

Complete Tool Reference

Tool	Purpose	Example
`ctx_read`	Read file with compression	`ctx_read("main.py", mode="signatures")`
`ctx_search`	Regex code search	`ctx_search("def.*test", path="tests/")`
`ctx_shell`	Compressed shell output	`ctx_shell("git status")`
`ctx_semantic_search`	Natural language search	`ctx_semantic_search("user authentication logic")`
`ctx_overview`	Task-aware project map	`ctx_overview(task="add logging")`
`ctx_graph`	Dependency graph ops	`ctx_graph(action="related", path="utils.py")`
`ctx_impact`	Change impact analysis	`ctx_impact(path="db.py", depth=2)`
`ctx_session`	Cross-session memory	`ctx_session(action="save")`
`ctx_knowledge`	Persistent fact store	`ctx_knowledge(action="remember", key="auth-lib", value="Passport.js")`
`ctx_agent`	Multi-agent coordination	`ctx_agent(action="handoff", to="tester")`

Workflow Examples

Bug Fix Workflow

1. User: "Fix the bug where password reset expires immediately"

2. Hermes → ctx_overview(task="bugfix:password-reset-expiry")
   Returns: auth_service.py, email_templates.py, test_auth.py

3. ctx_read("auth_service.py", mode="task")
   Returns: Only password reset related code (85% smaller)

4. ctx_search("reset_token_expiry", path="src/")
   Returns: 3 function matches with line numbers

5. LLM fix generated and applied

6. ctx_shell("pytest tests/test_auth.py::test_password_reset")
   Returns: Compressed test result summary

Feature Implementation Workflow

1. User: "Add rate limiter to the API"

2. ctx_overview(task="feature:rate-limiting")
   Returns: middleware.py, api_server.py, config.py

3. ctx_read("middleware.py", mode="map")
   Returns: Function list → identify existing middleware patterns

4. ctx_semantic_search("rate limiting")
   Returns: Any existing rate-limiting code (maybe from another project)

5. ctx_graph(action="related", path="middleware.py")
   Returns: Routes that use this middleware — ensure coverage

6. LLM implements feature with full structural context

Integration with Other Tools

Hermes Agent

Native, automatic. All ctx_* tools available by default. Session persistence built-in.

VS Code / Cursor

# Install the Lean Context extension
# .vscode/settings.json
{
  "leanContext.enabled": true,
  "leanContext.defaultMode": "signatures",
  "leanContext.projectRoot": "${workspaceFolder}"
}

Claude Code CLI

# Use via MCP server
# ~/.claude/settings.json
{
  "mcpServers": {
    "lean-ctx": {
      "command": "python",
      "args": ["-m", "lean_ctx_mcp"]
    }
  }
}

Custom Scripts

from lean_ctx import ContextManager

ctx = ContextManager(root=".")

# Programmatic access
sig = ctx.read("main.py", mode="signatures")
matches = ctx.search("def test_")
graph = ctx.build_graph()
ctx.save_session()

Performance Benchmarks

Operation	Without LCP	With LCP	Savings
Read 500-line file (full dump)	2100 tokens	~13 tokens (cached)	99.4%
Read same file (subsequent)	2100 tokens	~13 tokens	99.4%
Read with mode="signatures"	2100 tokens	~180 tokens	91.4%
Grep 1000 matches (full lines)	8500 tokens	~220 tokens (compact)	97.4%
git log -20 (raw)	3200 tokens	~150 tokens	95.3%
Project overview (100 files)	~8000 tokens	~400 tokens (graph only)	95%
Typical 30-min session	~14,500 tokens	~1,200 tokens	91.7%

💡 Real-world result — The system pays for itself in the first 2–3 interactions. At $0.01 per 1K tokens (Opus pricing), that's $0.14 saved per session. On high-volume usage, LCP pays for itself in under a week.

Caching Strategy

LCP uses a multi-level cache:

L1: In-memory (session) — Re-reads cost ~13 tokens (hash lookup)
L2: Disk (~/.hermes/cache/) — Persists across restarts, 7-day TTL default
L3: Graph index — Symbol-level cache, never expires unless file changes

Cache invalidation is hash-based (SHA-256 of file contents). If the file changes, the cache key changes automatically — no manual invalidation needed.

Cache Management

# View cache stats
ctx_cache(action="status")

# Clear all caches
ctx_cache(action="clear")

# Invalidate specific file
ctx_cache(action="invalidate", path="src/auth.py")

# Rebuild graph index (force)
ctx_graph(action="rebuild")

# Semantic embeddings reindex
ctx_knowledge(action="embeddings_reindex")

Troubleshooting

Symptom	Cause	Fix
`"Cache directory not writable"`	`~/.hermes/cache/` permissions	`chmod 755 ~/.hermes/cache`
`"Graph build failed"`	Unsupported language/parser	Check `ctx_graph(action="status")` for unsupported files
`"Mode not recognized"`	Old version	Update: `pip install --upgrade lean-ctx`
`"Session not found"`	Session file deleted or corrupted	`ctx_session(action="list")` then load valid session ID
`"Provider quota exceeded"`	Embedding API limit	Switch to local embeddings (disable in config) or wait for quota reset

Verdict: Essential for AI-Native Development

Lean Context Protocol is not optional if you're using AI agents on large codebases. The token savings are too significant to ignore — 60–90% reduction is the difference between a $5/month habit and a $50/month habit.

What makes LCP stand out is that it's not just compression — it's smart compression. The different modes mean you can tailor context to exactly what the AI needs. The project graph means you get structural understanding for free. And the caching means you never pay for the same context twice.

If you're running Hermes Agent, you're already using it. If you're using Claude Code or another agent, install the MCP server — it's a force multiplier.

Pros

60–90% token reduction typical
10 specialized compression modes
Project graph indexing (dependencies, call chains)
Cross-session memory persistence
Zero additional cost — built into Hermes
Standalone MCP server available
Active development, frequent improvements

Cons

Initial graph build can be slow on huge repos
Mode selection requires understanding your use case
Cache can grow large on multi-project setups
Some advanced features need OpenAI API (embeddings)
Documentation scattered across repos

API Reference Quick Reference

# File Operations
ctx_read(path, mode="full|map|signatures|diff|task|reference|aggressive|entropy|lines:N-M|fresh")
ctx_delta(path)                    # Changes since last read
ctx_search(pattern, path, max_results=20)
ctx_semantic_search(query, top_k=10)

# Shell & External
ctx_shell(command, raw=False)       # Pattern-compressed output
ctx_shell("git diff HEAD~1", raw=True)  # Skip compression

# Project Graph
ctx_graph(action="build|related|symbol|impact|status", path=...)
ctx_impact(action="analyze", path, depth=5)
ctx_overview(task="...")           # Task-aware project map

# Session & Memory
ctx_session(action="load|save|list|task|finding|decision")
ctx_knowledge(action="remember|recall|pattern|consolidate|gotcha")
ctx_agent(action="handoff|sync|diary")

# Cache & Performance
ctx_cache(action="status|clear|invalidate", path=...)
ctx_prefetch(task="...", budget_tokens=3000)
ctx_compress_memory(path)          # Compress config files
ctx_feedback(action="record")      # Latency/token tracking

# Advanced
ctx_execute(language="python", code="...")  # Sandboxed execution
ctx_expand(action="retrieve", id="...")    # Retrieve archived output
ctx_handoff(action="create")       # Create context ledger

Repository (LCP MCP): github.com/Kilo-AI/lean-ctx-mcp

Python package: pypi.org/project/lean-ctx

Resources & Repos

Hermes Agent (built-in LCP): github.com/Kilo-AI/hermes-agent
LCP MCP Server: github.com/Kilo-AI/lean-ctx-mcp
lean-ctx Python package: pypi.org/project/lean-ctx/
Hermes CLI: github.com/Kilo-AI/hermes-cli
Kilo Gateway: github.com/Kilo-AI/hermes-gateway
MCP Specification: spec.modelcontextprotocol.io
Hermes Docs: kilo-ai.gitbook.io/hermes-agent