Mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post]
(https://www.inceptionlabs.ai/blog/introducing-mercury) here.

Share

Model details

Context window128,000 tokens
Max completion size83 tokens
Prompt cost / 1K tokens$0.00000025
Completion cost / 1K tokens$0.000001
Accepts
Produces

Benchmark performance

Overall

73
score
12th
placement

Cost

99
score
2nd
placement

Logic

66
score
13th
placement

Speed

100
score
1st
placement

Scoring

53
score
7th
placement

Tool Use

26
score
8th
placement

Hallucination

29
score
22nd
placement

Classification

50
score
1st
placement

Structured Output

75
score
4th
placement

Pricing

Usage pricing
Prompt
$0.00000025
Completion
$0.000001
Request
FREE
Image
FREE
Web Search
FREE
Internal Reasoning
FREE

Best Overall scoring LLMs

xAI

Grok 4 Fast

88
score
1st
placement
Qwen

Qwen3 VL 235B A22B Instruct

86
score
2nd
placement
xAI

Grok 4.1 Fast

84
score
3rd
placement
OpenAI

GPT-5.1 Chat

82
score
4th
placement
OpenAI

GPT-5.1-Codex

82
score
4th
placement
Anthropic

Claude Haiku 4.5

80
score
5th
placement