Rankings

Real-world LLM rankings, built from production traffic

Most leaderboards are benchmark runs. NemoRouter's rankings are aggregated from real customer traffic — what actually gets shipped, what costs what, what fails when. Live data, hourly refresh, anonymized at the source.

Notify me at launch See the model catalog

rankings · preview

Sample preview

#1 throughputgemini-2.5-flash

#1 latency p50142 ms

#1 cost / 1M tok$0.075

#1 reliability (24h)99.99%

Models tracked20+

AnonymizedReal trafficHourly

Live models: 20+
Providers: 76
Ranking categories: 6
Update cadence: Hourly

Sample data only. Live rankings ship once the network has aggregated enough traffic to be statistically meaningful — currently 20+ models on Google Vertex AI, with Google (Gemini, Imagen, Veo) — Anthropic, OpenAI, and AWS Bedrock shipping next.

What ships at launch

Six rankings, all driven by real traffic

Rankings only matter when they reflect production reality. Each ranking below ships with the same methodology page — open dataset definitions, refresh cadence, and statistical caveats.

Token volume leaderboard

Top 20 models ranked by 30-day token throughput across the NemoRouter network. The volume tells you what teams are actually shipping with — not what they pinned for a benchmark run.

Rolling 30-day window, anonymized per-org
Filter by capability tag (vision, code, long-context)
Sparkline shows 7-day momentum

Performance matrix

Sortable matrix combining speed (TTFT, tokens/sec), quality (independent eval scores), cost efficiency ($/1M tokens), and reliability (success rate, error class).

Linked to /vs/* for side-by-side context
Pin a model to compare against the rest
Quality scores sourced from public eval suites + community votes

Provider market share

Treemap of provider token share, refreshed in real time. Watch which provider gets called most — and where customer traffic shifts after a price drop or a new model release.

Live treemap with weekly delta indicators
Drill into per-provider model breakdown
Historical mode shows the last 6 months of share movement

Cost efficiency

Dollar-per-million-tokens compared across the catalog, normalized for context length and tool-use overhead. The ranking that should drive routing-strategy decisions.

Normalized for input/output token mix
Includes provider list price + observed effective price
Updated when upstream provider pricing data changes

Trending models

7-day sparklines showing which models are gaining or losing share — the early-signal page for which provider to evaluate next.

7-day delta in token share + active orgs
Filter by tier (frontier, mid-tier, small/fast)
Subscribe to a weekly trending email

Latency benchmarks

P50/P95/P99 measurements per model and provider, refreshed hourly. These are observed numbers from production traffic — not vendor-reported claims.

Streaming TTFT + non-streaming end-to-end
Filter by region (US, EU, APAC)
Outlier flagging when P99 spikes

Methodology

Where the numbers come from

The integrity of the ranking is the integrity of the data source. Here is exactly what we use.

Anonymized network traffic

All rankings are aggregated from real production calls flowing through NemoRouter. We never log prompt content for ranking purposes — only aggregate counts, latency, cost, and provider/model identity.

Live cost headers

Cost data comes from the x-nemo-response-cost header on every call — provider-authoritative, never estimated. The same number that hits your credit ledger drives the ranking.

Public eval suites

Quality scores supplement throughput rankings using public eval results (HumanEval, MMLU, GSM8K, MT-Bench). We disclose every source on the methodology page.

Providers tracked

The shortlist that drives the rankings

OpenAIAnthropicGoogleGeminiMistralMetaCohereAzure OpenAIAWS BedrockVertex AIGroqDeepSeekTogetherPerplexityReplicateFireworksOpenRouterNVIDIAOllamaHugging FaceCerebrasAI21xAISambaNovaDeepInfraCloudflareDatabricksInflectionQwenIBM watsonxSnowflakeHerokuVercelGitHub CopilotLambdaAlibaba CloudBaiduTencentXiaomiByteDanceMeituanStability AIElevenLabsAssemblyAIDeepgramVoyageJinafal.aiRunwayDuckDuckGoWeights & BiasesFirecrawlMiniMaxMoonshotNLP CloudSageMakerVolcengineAnyscaleWriterUpstageRekaLiquidMorphStepFunNous ResearchArceeAllen AIEleutherAIDeepCogitoEssential AIPrime IntellectInceptionKwaiPilotZ-AIRelaceAion Labs

Launch list · ~weekly drops

Get the launch email — first to see live rankings

One email when the data crosses the statistical-significance threshold. Then nothing until something interesting happens.

Notify me See the catalog