Back to blog
Comparison
February 2, 202612 min read

I Calculated What 1M Tokens Costs Across 50+ LLM Models

A comprehensive cost comparison of 50+ LLM models from OpenAI, Anthropic, Google, Mistral, and more. Real pricing data for GPT-5, Claude 4.5, Gemini 3, and every major model.

I spent weeks compiling pricing for every major LLM model while building a cost tracking tool. Here's the complete breakdown—50+ models across OpenAI, Anthropic, Google, Mistral, and more.

Most developers guess their AI costs. They pick a model, ship to production, and hope for the best. But the price difference between models is staggering—o1-pro costs 1,500x more than GPT-5-nano for output tokens.

This guide covers every major model as of February 2026, with real pricing data and practical recommendations.

How I Got This Data
I compiled this while building Orbit, an LLM cost tracking tool. All prices are from official API documentation, updated February 2026.

The Complete Pricing Table

Prices are per 1 million tokens. Most models charge differently for input (what you send) and output (what the model generates).

OpenAI Models

GPT-5 Series (Latest)

ModelInput/1MOutput/1MBest For
GPT-5.2$1.75$14.00Most capable, latest
GPT-5.2-pro$21.00$168.00Maximum quality
GPT-5$1.25$10.00General purpose
GPT-5-pro$15.00$120.00Complex reasoning
GPT-5-mini$0.25$2.00Cost-effective
GPT-5-nano$0.05$0.40High volume, simple tasks

GPT-4.1 Series

ModelInput/1MOutput/1MBest For
GPT-4.1$2.00$8.00Balanced performance
GPT-4.1-mini$0.40$1.60Budget option
GPT-4.1-nano$0.10$0.40Ultra-low cost

GPT-4o Series

ModelInput/1MOutput/1MBest For
GPT-4o$2.50$10.00Multimodal, production
GPT-4o-mini$0.15$0.60Fast, cheap

O-Series (Reasoning Models)

ModelInput/1MOutput/1MBest For
o4-mini$1.10$4.40Latest reasoning, efficient
o3$2.00$8.00Advanced reasoning
o3-pro$20.00$80.00Maximum reasoning power
o3-mini$1.10$4.40Cost-effective reasoning
o1$15.00$60.00Complex problem-solving
o1-pro$150.00$600.00Research-grade reasoning
o1-mini$1.10$4.40Reasoning on a budget
Hidden Costs with Reasoning Models
O-series models generate internal "thinking" tokens you pay for but don't see. A simple query can use 10x more tokens than expected. Always monitor actual usage.

Anthropic Models

Claude 4.5 (Latest)

ModelInput/1MOutput/1MBest For
Claude 4.5 Opus$5.00$25.00Most capable Claude
Claude 4.5 Sonnet$3.00$15.00Balanced performance
Claude 4.5 Haiku$1.00$5.00Fast, cost-effective

Claude 4 Series

ModelInput/1MOutput/1MBest For
Claude 4 Opus$15.00$75.00Complex tasks
Claude 4 Sonnet$3.00$15.00Production workloads

Claude 3.x Series

ModelInput/1MOutput/1MBest For
Claude 3.7 Sonnet$3.00$15.00Coding, analysis
Claude 3.5 Sonnet$3.00$15.00General purpose
Claude 3.5 Haiku$1.00$5.00Fast responses
Claude 3 Haiku$0.25$1.25Cheapest Claude

Google Gemini Models

Gemini 3 (Latest)

ModelInput/1MOutput/1MBest For
Gemini 3 Pro$2.00$12.00Latest, most capable
Gemini 3 Flash$0.50$3.00Fast, efficient

Gemini 2.5

ModelInput/1MOutput/1MBest For
Gemini 2.5 Pro$1.25$10.00Complex tasks
Gemini 2.5 Flash$0.30$2.50Balanced
Gemini 2.5 Flash Lite$0.10$0.40Ultra-low cost

Gemini 2.0 (Deprecating March 2026)

ModelInput/1MOutput/1MBest For
Gemini 2.0 Flash$0.10$0.40Cheapest option (for now)
Gemini 2.0 Flash Lite$0.075$0.30Absolute minimum cost

Mistral Models

ModelInput/1MOutput/1MBest For
Mistral Large$2.00$6.00Complex reasoning
Mistral Small$0.20$0.60Fast, efficient
Codestral$0.20$0.60Code generation
Ministral 8B$0.10$0.10Edge deployment
Ministral 3B$0.04$0.04Ultra-light

Surprising Findings

1. The Price Range is Insane

Output tokens range from $0.04/1M (Ministral 3B) to $600/1M (o1-pro). That's a 15,000x difference.

$0.04
Cheapest (Ministral 3B)
$600
Most Expensive (o1-pro)

2. The "Mini" Model Wars

Every provider now has a mini model competing for the budget tier:

  • GPT-5-nano: $0.05 input / $0.40 output
  • Gemini 2.5 Flash Lite: $0.10 input / $0.40 output
  • Ministral 3B: $0.04 input / $0.04 output
  • GPT-4o-mini: $0.15 input / $0.60 output

For simple classification and extraction, these models are practically free at scale.

3. Reasoning Models are 10-100x More Expensive

The o-series models (o1, o3, o4) cost significantly more—and that's just the sticker price. They also generate hidden "thinking" tokens that inflate your actual costs.

4. Claude 4.5 Opus is Surprisingly Cheap

At $5/1M input and $25/1M output, Claude 4.5 Opus is cheaper than Claude 4 Opus ($15/$75). Anthropic is getting more aggressive on pricing.

5. Google Offers the Best Value for High-Volume

Gemini 2.0 Flash at $0.10/$0.40 is hard to beat for volume. And Gemini 2.5 Flash Lite at $0.10/$0.40 maintains quality while staying cheap.

Real Cost Comparisons

Let's calculate actual costs for common tasks:

Customer Support Bot (1M conversations/month)

Average: 500 input tokens, 300 output tokens per conversation

ModelMonthly Cost
GPT-5-nano$145
GPT-4o-mini$255
Claude 3 Haiku$500
GPT-5$3,625
Claude 4.5 Opus$10,000
o1$25,500
Cost Savings Tip
For customer support, GPT-5-nano or GPT-4o-mini handles 90% of queries at 1/100th the cost of premium models. Route complex queries to better models.

Document Summarization (100K docs/month)

Average: 4,000 input tokens, 500 output tokens per document

ModelMonthly Cost
Gemini 2.0 Flash$60
GPT-5-mini$200
Claude 3.5 Haiku$650
GPT-5$1,000
Claude 4.5 Opus$3,250

AI Agent (50 steps per task, 10K tasks/month)

Average: 1,000 input tokens, 500 output tokens per step

ModelMonthly Cost
GPT-4o-mini$225
o3-mini$1,650
Claude 4.5 Sonnet$5,250
o3$3,000
o1$22,500

My Recommendations

For Startups (Cost-Sensitive)

  1. Default to GPT-5-nano or GPT-4o-mini — Handle 80% of tasks
  2. Use Gemini 2.0/2.5 Flash for volume — Best price/performance
  3. Route complex queries to GPT-5 or Claude 4.5 Sonnet
  4. Avoid reasoning models unless necessary — 10x+ cost increase

For Enterprise (Quality-Focused)

  1. Claude 4.5 Opus or GPT-5.2 for critical tasks
  2. Claude 4.5 Sonnet or GPT-5 for general production
  3. o3-mini for reasoning tasks — Good balance of capability and cost
  4. Track everything — Know your cost-per-feature

For AI Agents

  1. Start with GPT-4o-mini — Test your agent logic cheaply
  2. Use o3-mini for reasoning steps — Not o1 or o3-pro
  3. Batch and cache aggressively — Agents make many similar calls
  4. Set per-task budgets — Runaway agents can cost hundreds per task

How to Track All This

Knowing prices is one thing. Tracking actual costs in production is another. Provider dashboards show totals, but not which features drive costs.

I built Orbit to solve this. One-line SDK integration, and you get per-feature cost breakdowns across all providers:

import { Orbit } from '@with-orbit/sdk';
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';

const orbit = new Orbit({ apiKey: process.env.ORBIT_API_KEY });

// Track OpenAI costs by feature
const openai = orbit.wrapOpenAI(new OpenAI(), {
  feature: 'chat-assistant'
});

// Track Anthropic costs by feature
const anthropic = orbit.wrapAnthropic(new Anthropic(), {
  feature: 'document-analysis'
});

// All calls automatically tracked with cost, tokens, latency

Track LLM Costs Across All Providers

Orbit gives you real-time visibility into costs across OpenAI, Anthropic, Google, and more. See spending by feature, model, and environment.

  • 50+ models supported
  • Per-feature cost tracking
  • Free tier: 10,000 events/month
Start tracking free