Highlights
- 160–95% token reduction
- 2Library, proxy, MCP server
- 3Preserves model answers
Overview
headroom compresses tool outputs, logs, files, and RAG chunks before they reach the LLM, claiming 60–95% fewer tokens while delivering the same answers. The project ships as a Python library, a network proxy, and an MCP server implementation so you can integrate compression as a direct library call, place it in front of model endpoints, or run a centralized compression service.