Shield
Every requestVault, rate limits, and spend caps enforced before any LLM call.
One import change. Security on every call. Estimate savings before you ship; receipts on every response.
Join the waitlist — we'll email your beta invite. Use the same email when you create your account.
Every response is a receipt
Saved about 340 tokens — template opt, schema compression, and output cap on a JSON workload.
Full receipt (for your logs)
AUTO MODE{ "choices": [{ "message": { "content": "{ ... }" } }], "usage": { "prompt_tokens": 684, "completion_tokens": 412 }, "prune_metadata": { "cache_hit": false, "tokens_saved": 340, "template_opt_saved": 142, "compressed_tokens_saved": 10, "schema_compression_applied": true, "schema_compression_tokens_saved": 18, "suggested_max_tokens": 512, "optimizations_applied": [ "template_opt", "schema_compress", "output_cap:512" ] }}Works with
Built for production
Not cache-only — pre-request estimates, schema compression, template opt, output caps, and spend signals ship today.
Vault, rate limits, and spend caps enforced before any LLM call.
POST a sample prompt — get per-mechanism breakdown with zero LLM cost.
Exact, semantic, and template-aware reuse — $0 upstream on hits.
Stable system prompts optimized once per template — savings on call #1.
Models emit terse JSON keys; Prune expands to your full schema.
Opt-in trimming of unique user goal text — separate from template opt.
Auto-tighten max_tokens from historical output lengths.
Fix malformed structured output without a full model retry.
Off-hours spikes, model drift, and arbitrage opportunities surfaced.
Read the docs · code samples, headers, and receipt fields