prune.
p.
Private beta·Waitlist open

Cut LLM bills without rewriting your app

One import change. Security on every call. Estimate savings before you ship; receipts on every response.

Join the waitlist — we'll email your beta invite. Use the same email when you create your account.

Every response is a receipt

Saved about 340 tokens — template opt, schema compression, and output cap on a JSON workload.

Full receipt (for your logs)

AUTO MODE
prune_metadata
{  "choices": [{ "message": { "content": "{ ... }" } }],  "usage": { "prompt_tokens": 684, "completion_tokens": 412 },  "prune_metadata": {    "cache_hit": false,    "tokens_saved": 340,    "template_opt_saved": 142,    "compressed_tokens_saved": 10,    "schema_compression_applied": true,    "schema_compression_tokens_saved": 18,    "suggested_max_tokens": 512,    "optimizations_applied": [      "template_opt",      "schema_compress",      "output_cap:512"    ]  }}

Works with

OpenAIAnthropicGeminiBedrock

Built for production

Security on every request. Savings on every layer.

Not cache-only — pre-request estimates, schema compression, template opt, output caps, and spend signals ship today.

Shield

Every request

Vault, rate limits, and spend caps enforced before any LLM call.

Savings estimate

New

POST a sample prompt — get per-mechanism breakdown with zero LLM cost.

Multi-tier cache

Savings

Exact, semantic, and template-aware reuse — $0 upstream on hits.

Template optimization

Live

Stable system prompts optimized once per template — savings on call #1.

Schema compression

New

Models emit terse JSON keys; Prune expands to your full schema.

Variable compress

Beta

Opt-in trimming of unique user goal text — separate from template opt.

Output caps

Live

Auto-tighten max_tokens from historical output lengths.

JSON repair

Live

Fix malformed structured output without a full model retry.

Spend signals

Live

Off-hours spikes, model drift, and arbitrage opportunities surfaced.

  • Shield on every request — encrypted vault, rate limits, spend caps
  • Pre-request estimate API — model savings before your first live call
  • Schema compression — terse JSON in, full schema out to your app
  • Template + variable compression — stable instructions and unique goal text
  • Output caps from real p95 lengths — Auto mode tightens max_tokens
  • One Prune key — OpenAI, Claude, Gemini, Bedrock

Read the docs · code samples, headers, and receipt fields