$5 free credits when you sign up
Rankings

Real-world LLM rankings, built from production traffic

Most leaderboards are benchmark runs. NemoRouter's rankings are aggregated from real customer traffic — what actually gets shipped, what costs what, what fails when. Live data, hourly refresh, anonymized at the source.

rankings · preview

Sample preview

#1 throughputgemini-2.5-flash
#1 latency p50142 ms
#1 cost / 1M tok$0.075
#1 reliability (24h)99.99%
Models tracked20+
AnonymizedReal trafficHourly
Live models
20+

Across the NemoRouter network

Providers
76

Tracked for ranking eligibility

Ranking categories
6

Volume, latency, cost, quality, trend, reliability

Update cadence
Hourly

P50/P95/P99 windows refresh on the hour

Sample data only. Live rankings ship once the network has aggregated enough traffic to be statistically meaningful — currently 20+ models on Google Vertex AI, with Google (Gemini, Imagen, Veo) — Anthropic, OpenAI, and AWS Bedrock shipping next.

What ships at launch

Six rankings, all driven by real traffic

Rankings only matter when they reflect production reality. Each ranking below ships with the same methodology page — open dataset definitions, refresh cadence, and statistical caveats.

Token volume leaderboard

Top 20 models ranked by 30-day token throughput across the NemoRouter network. The volume tells you what teams are actually shipping with — not what they pinned for a benchmark run.

  • Rolling 30-day window, anonymized per-org
  • Filter by capability tag (vision, code, long-context)
  • Sparkline shows 7-day momentum

Performance matrix

Sortable matrix combining speed (TTFT, tokens/sec), quality (independent eval scores), cost efficiency ($/1M tokens), and reliability (success rate, error class).

  • Linked to /vs/* for side-by-side context
  • Pin a model to compare against the rest
  • Quality scores sourced from public eval suites + community votes

Provider market share

Treemap of provider token share, refreshed in real time. Watch which provider gets called most — and where customer traffic shifts after a price drop or a new model release.

  • Live treemap with weekly delta indicators
  • Drill into per-provider model breakdown
  • Historical mode shows the last 6 months of share movement

Cost efficiency

Dollar-per-million-tokens compared across the catalog, normalized for context length and tool-use overhead. The ranking that should drive routing-strategy decisions.

  • Normalized for input/output token mix
  • Includes provider list price + observed effective price
  • Updated when upstream provider pricing data changes

Trending models

7-day sparklines showing which models are gaining or losing share — the early-signal page for which provider to evaluate next.

  • 7-day delta in token share + active orgs
  • Filter by tier (frontier, mid-tier, small/fast)
  • Subscribe to a weekly trending email

Latency benchmarks

P50/P95/P99 measurements per model and provider, refreshed hourly. These are observed numbers from production traffic — not vendor-reported claims.

  • Streaming TTFT + non-streaming end-to-end
  • Filter by region (US, EU, APAC)
  • Outlier flagging when P99 spikes
Methodology

Where the numbers come from

The integrity of the ranking is the integrity of the data source. Here is exactly what we use.

Anonymized network traffic

All rankings are aggregated from real production calls flowing through NemoRouter. We never log prompt content for ranking purposes — only aggregate counts, latency, cost, and provider/model identity.

Live cost headers

Cost data comes from the x-nemo-response-cost header on every call — provider-authoritative, never estimated. The same number that hits your credit ledger drives the ranking.

Public eval suites

Quality scores supplement throughput rankings using public eval results (HumanEval, MMLU, GSM8K, MT-Bench). We disclose every source on the methodology page.

Providers tracked

The shortlist that drives the rankings

OpenAIAnthropicGoogleGeminiMistralMetaCohereAzure OpenAIAWS BedrockVertex AIGroqDeepSeekTogetherPerplexityReplicateFireworksOpenRouterNVIDIAOllamaHugging FaceCerebrasAI21xAISambaNovaDeepInfraCloudflareDatabricksInflectionQwenIBM watsonxSnowflakeHerokuVercelGitHub CopilotLambdaAlibaba CloudBaiduTencentXiaomiByteDanceMeituanStability AIElevenLabsAssemblyAIDeepgramVoyageJinafal.aiRunwayDuckDuckGoWeights & BiasesFirecrawlMiniMaxMoonshotNLP CloudSageMakerVolcengineAnyscaleWriterUpstageRekaLiquidMorphStepFunNous ResearchArceeAllen AIEleutherAIDeepCogitoEssential AIPrime IntellectInceptionKwaiPilotZ-AIRelaceAion Labs

Launch list · ~weekly drops

Get the launch email — first to see live rankings

One email when the data crosses the statistical-significance threshold. Then nothing until something interesting happens.