ToolBox.Online

LLM Pricing Calculator — Compare AI API Costs (GPT, Claude, Gemini, Gemma) [2026]

Compare AI API costs across providers. Enter token counts to see pricing for OpenAI GPT-4o, Anthropic Claude, Google Gemini, Mistral, DeepSeek and more. Calculate monthly AI spend. Free cost calculator.

Provider:

Cheapest (Cloud)

$0.00/mo

Gemma 4 (self-hosted)

Most Expensive

$157.50/mo

Claude Opus 4

Potential Monthly Savings

$156.75

vs cheapest paid option

ModelProviderInput $/1MOutput $/1MCost/RequestMonthly Cost
Gemma 4 (self-hosted)Open SourceSelf-Hosted$0$0$0.00$0.00
Llama 3 (self-hosted)Open SourceSelf-Hosted$0$0$0.00$0.00
Mistral SmallMistral$0.10$0.30$0.0003$0.750
Gemini 2.0 FlashGoogle$0.10$0.40$0.0003$0.900
GPT-4o MiniOpenAI$0.15$0.60$0.0004$1.35
DeepSeek V3DeepSeek$0.27$1.10$0.0008$2.46
GPT-4.1 MiniOpenAI$0.40$1.60$0.0012$3.60
DeepSeek R1DeepSeek$0.55$2.19$0.0016$4.94
Claude Haiku 3.5Anthropic$0.80$4.00$0.0028$8.40
o4-miniOpenAI$1.10$4.40$0.0033$9.90
Mistral LargeMistral$2.00$6.00$0.0050$15.00
GPT-4.1OpenAI$2.00$8.00$0.0060$18.00
Gemini 2.5 ProGoogle$1.25$10.00$0.0063$18.75
GPT-4oOpenAI$2.50$10.00$0.0075$22.50
Claude Sonnet 4Anthropic$3.00$15.00$0.010$31.50
o3OpenAI$10.00$40.00$0.030$90.00
Claude Opus 4Anthropic$15.00$75.00$0.052$157.50

Prices reflect published API rates as of early 2026 and may change. Self-hosted models have $0 API cost but require GPU hardware. Use our AI VRAM Calculator to estimate hardware requirements.

What is LLM Pricing Calculator?

AI API pricing varies dramatically across providers — from $0.10 per million tokens for budget models to $75+ per million for frontier models. Choosing the right model is a critical cost/capability tradeoff that directly impacts your monthly AI spend. For developers building AI applications, the difference between models can be 100x or more. A chatbot handling 10,000 requests per day could cost $15/month with Gemini Flash or $22,500/month with Claude Opus 4. Picking the wrong model can make the difference between a profitable product and a money pit. This calculator lets you compare real costs across all major providers — OpenAI (GPT-4o, GPT-4.1, o3, o4-mini), Anthropic (Claude Sonnet 4, Opus 4, Haiku 3.5), Google (Gemini 2.5 Pro, Flash), DeepSeek (V3, R1), Mistral, and self-hosted open-source models like Gemma 4. Enter your expected usage and see exactly what each option costs.

How to Use LLM Pricing Calculator

Enter the number of input tokens and output tokens per request, then set how many requests you make per day. The calculator instantly shows the cost per request and estimated monthly cost for every major AI model. Filter by provider to focus on specific APIs. Sort by cost to find the cheapest option. The cheapest model is highlighted in green. For self-hosted models like Gemma 4, API cost is $0 — use our VRAM Calculator to estimate hardware costs instead.

How LLM Pricing Calculator Works

The calculator uses a straightforward pricing formula: **Cost per Request** = (Input Tokens ÷ 1,000,000 × Input Price) + (Output Tokens ÷ 1,000,000 × Output Price) **Monthly Cost** = Cost per Request × Requests per Day × 30 All prices are per million tokens as published by each provider. The tool compares 17+ models across 6 providers, sorted by total cost per request. For self-hosted open-source models like Gemma 4, Llama 3, and Mistral, the API cost is $0 since you run inference on your own hardware. The real cost is GPU hardware — use our AI VRAM Calculator to estimate the hardware requirements for these models. Note: Prices reflect published rates as of early 2026. Providers may change pricing — always verify on the provider's website before committing to large-scale usage.

Common Use Cases

  • Comparing API costs across OpenAI, Anthropic, Google, DeepSeek, and Mistral before committing to a provider
  • Estimating monthly AI spend for a new product or feature before launch
  • Finding the cheapest model that meets your quality requirements for production workloads
  • Budgeting AI infrastructure costs for startups and small teams
  • Evaluating whether to switch from an expensive model to a cheaper alternative
  • Calculating the cost difference between using cloud APIs vs self-hosting open-source models like Gemma 4

Frequently Asked Questions

Which AI model is the cheapest?

For cloud APIs, Google Gemini 2.0 Flash ($0.10/$0.40 per million tokens) and DeepSeek V3 ($0.27/$1.10) are among the cheapest. For zero API cost, self-hosted open-source models like Gemma 4, Llama 3, and Mistral are free to run — you only pay for GPU hardware.

Why are output tokens more expensive than input tokens?

Generating output text (completions) requires sequential computation — the model predicts one token at a time, each depending on the previous one. Processing input tokens can be parallelized across the GPU. This makes output generation 2–5x more compute-intensive, which is reflected in higher per-token pricing.

How accurate are the prices shown?

Prices reflect published API rates as of early 2026. Providers occasionally update pricing — usually downward. Always verify current rates on each provider's pricing page before committing to large-scale usage. The relative cost rankings rarely change significantly.

Should I self-host instead of using APIs?

Self-hosting makes sense if you have high, consistent volume (100K+ requests/day), need data privacy, or already own GPU hardware. For most startups and moderate workloads, cloud APIs are cheaper and simpler. Use our AI VRAM Calculator to estimate hardware costs for self-hosting.

What is the difference between per-token and per-request pricing?

Most AI APIs charge per token (per million tokens). A few services offer per-request pricing (flat rate regardless of length). Per-token is more common and more predictable for variable-length inputs. This calculator uses per-token pricing as that is the industry standard.

How can I reduce my AI API costs?

Route simple tasks to cheaper models (use GPT-4o Mini or Gemini Flash instead of Opus). Cache repeated prompts. Use batch APIs for non-real-time tasks (typically 50% cheaper). Reduce prompt length by removing unnecessary context. Set max_tokens limits to avoid runaway output costs. Consider fine-tuning a smaller model for specific tasks.

Related Tools

Explore More Free Tools

Discover more tools from our network — all free, browser-based, and privacy-first.