Usage-based pricing

Pricing based on actual usage.

No required monthly platform subscription, LLMLab charges for tokens and system activity that are actually used.

Costs follow workflow execution, model usage, codebase parsing, retrieval, storage, hosted surfaces, and optional GPU activity. LLMLab's own support-agent setup used less than $0.10 in platform usage.

Secure Payments LLMLab uses Stripe to manage secure payments, subscriptions, invoices, and usage credits.
Adaptive model spend Use lower-cost paths by default and escalate only when a model asks or validation fails
Answer memory Saved answers can resolve known questions before a workflow spends tokens on a fresh response.
Budget controls Monthly caps and usage visibility help keep workflow spend predictable
Current rate card* 2026-05-18
USD pricing Usage-based No monthly minimum

On smaller screens, swipe horizontally to view the full table.

Service Price Billing unit Notes
Platform activity
Workflow run $0.003 per run Structured workflow execution overhead
Preset generation $0.001 per preset Template and preset generation overhead
Codebase parsing $0.01 per 100K lines parsed Source parsing for codebase ingestion
Hosted API tokens
openai:gpt-5.4 $4.225 input / $25.35 output per 1M tokens OpenAI flagship hosted path†
openai:gpt-5.4-mini $1.2675 input / $7.605 output per 1M tokens Lower-cost OpenAI hosted path†
anthropic:claude-sonnet-4-0 $5.07 input / $25.35 output per 1M tokens Anthropic Sonnet hosted path†
anthropic:claude-opus-4-1 $25.35 input / $126.75 output per 1M tokens Anthropic highest-cost hosted path†
google:gemini-2.5-pro $4.225 input / $25.35 output per 1M tokens Google hosted pro path†
google:gemini-2.5-flash $0.507 input / $4.225 output per 1M tokens Faster lower-cost Google hosted path†
google:gemini-2.5-flash-lite $0.169 input / $0.676 output per 1M tokens Lowest-cost Google flash hosted path†
deepseek:deepseek-chat $0.4563 input / $1.859 output per 1M tokens DeepSeek chat hosted path†
deepseek:deepseek-reasoner $0.9295 input / $3.7011 output per 1M tokens DeepSeek reasoning hosted path†
xai:grok-4 $5.07 input / $25.35 output per 1M tokens xAI Grok hosted path†
mistral:mistral-medium-2508 $0.676 input / $3.38 output per 1M tokens Mistral medium hosted path†
mistral:mistral-small-2603 $0.2535 input / $1.014 output per 1M tokens Mistral small hosted path†
Embeddings and reranking†
sentence-transformers/all-minilm-l6-v2 $1.00 per 1M tokens Hosted dense embedding model
intfloat/multilingual-e5-small $1.00 per 1M tokens Hosted multilingual embedding model
qdrant/bm25 $0.40 per 1M tokens Hosted sparse lexical embedding
voyage:rerank-2.5 / voyage:voyage-rerank-2.5 $0.10 per 1K documents Hosted reranker usage
GPU compute‡
Budget GPU tier $0.60 per GPU-hour RTX 3070 Ti / RTX 3080(Ti) / T4
Mid GPU tier $1.20 per GPU-hour RTX 3090(Ti) / L4 / A10
High GPU tier $4.00 per GPU-hour RTX 4090 / A100 40GB / A40 / L40
Ultra GPU tier $10.00 per GPU-hour A100 80GB / H100 / H200 class
Storage
Vector storage $0.60 per GiB-month Persistent vector index footprint
Hosted model storage $0.60 per GiB-month Persistent hosted model storage

* Amounts are rounded up to the next whole $0.01 increment across charge groups.

† User-provided tokens are not billed by LLMLab. Provider rates are charged separately by the provider.

‡ One of these GPU options will be selected depending on availability and current cloud pricing.

Cost controls

Usage-based does not mean uncontrolled.

A public support integration needs pricing controls, abuse protection, and visibility. LLMLab is designed to keep spend tied to real support value.

Traffic signal Abuse monitoring

Monitor suspicious traffic patterns and spam-like usage against a public integration or organization surface.

Traffic monitoring Abuse detection
Protective action Automated lockout

Support for protective behavior when traffic appears to abuse an exposed integration, assistant, or organization surface.

IP lockouts Org protection
What you are paying for

Usage across the actual system surface.

LLMLab pricing is designed around actual platform activity: workflow runs, model-backed responses, context ingestion, codebase parsing, retrieval, answer memory, web integration interactions, hosted model paths, and optional model infrastructure.

Tokens Model-backed responses and validation
Workflow runs Structured workflow execution
Parsing GitHub and codebase context extraction
Retrieval Knowledge search, embeddings, and reranking
Storage and GPU Vector storage, hosted models, and optional GPU worker activity
Get started

See how far $5 can go with a usage based plan

Use the free credit to build workflows, support agents, assistants, and operational AI systems. See just how far $5 can go before having to get the credit card.