$5 free credits when you sign up
API Reference

One key, one base URL, the OpenAI API you already know.

This is the scannable reference: endpoints, authentication, rate limits, error codes, webhooks, and the OpenAPI spec — on one page. NemoRouter speaks the OpenAI REST API, so the request and response shapes are exactly what your SDK already sends.

api · reference · contract

The contract at a glance

Base URL/v1
Auth schemeBearer sk-nemo-…
API formatOpenAI-compatible
Cost headerx-nemo-request-cost
Error classesNemoError family
OpenAPI specdownloadable
OpenAI-compatibleBearer authx-nemo-* headers
API surface
OpenAI

Drop-in /v1/chat/completions, /embeddings, /models

Auth
Bearer

One virtual key — sk-nemo-…

Base URL
/v1

https://api.nemorouter.ai/v1

Response headers
x-nemo-*

Cost, latency, guardrails — on every response

This page is a quick-reference overview. The full documentation carries per-endpoint request/response schemas and guides.

Quickstart

Your first request, in seven languages

These snippets are generated from the canonical NemoRouter SDK examples — the same source the playground and dashboard keys page use. Set NEMOROUTER_API_KEY in your environment and run.

Installpip install nemoroutersdk
1# Cache: enabled (org default). Pass nemo_cache: false to skip.
2from nemoroutersdk import NemoRouter, NemoGuardrailBlockedError
3
4client = NemoRouter() # reads NEMOROUTER_API_KEY from env
5
6try:
7 response = client.chat.completions.create(
8 model="gemini-2.5-flash",
9 temperature=1,
10 max_tokens=1024,
11 top_p=1,
12 messages=[
13 {"role": "user", "content": "Hello! What models do you support?"},
14 ],
15 # nemo_cache=False, # Uncomment to skip cache
16 )
17 print(response.choices[0].message.content)
18
19 # Auto-captured metadata
20 meta = client.last_response
21 if meta:
22 print(f"Cost: ${meta.cost}")
23 print(f"Guardrails: {meta.guardrails_applied}")
24
25except NemoGuardrailBlockedError as e:
26 print(f"Blocked by guardrail: {e}")

Point any OpenAI SDK at the Nemo Router endpoint. Set NEMOROUTER_API_KEY in your environment before running.

Base URL is always https://api.nemorouter.ai/v1. Works with the official OpenAI SDKs and any OpenAI-compatible client.

Authentication

Bearer auth with a virtual key

Every LLM request authenticates with a NemoRouter virtual key (sk-nemo-…) in the Authorization header. There are no provider keys to manage — NemoRouter holds those — and no master key ever leaves a trusted service.

The Authorization header

Pass your virtual key as a bearer token. This is the only credential your application ever holds.

Authorization: Bearer sk-nemo-xxxxxxxxxxxx
  • Create keys in the dashboard — the full key is shown once at creation
  • Each key carries its own RPM/TPM caps, budget, and spend tracking
  • Revoke a key instantly — the blast radius of a leak is one key

Why virtual keys, not provider keys

NemoRouter is a managed gateway: it owns the OpenAI, Anthropic, and Google credentials. You never see them, never rotate them, and never risk leaking them.

  • One key reaches every model — switch models by changing a string
  • Per-key budgets and rate limits are enforced at the gateway
  • Keys are hashed at rest; observability scopes by key, team, and org
Endpoints

The endpoints you will actually call

The inference surface is OpenAI-compatible — same paths, same payloads. Models and usage endpoints are NemoRouter additions for catalog discovery and spend visibility.

Inference

The OpenAI-compatible inference surface. Chat completions is the primary endpoint and runs the full feature path — guardrails, prompts, A/B tests, caching.

  • POST/v1/chat/completions
    chat.completions.create()
  • POST/v1/completions
    completions.create()
  • POST/v1/embeddings
    embeddings.create()
  • POST/v1/images/generations
    images.generate()
  • POST/v1/audio/speech
    audio.speech.create()
  • POST/v1/audio/transcriptions
    audio.transcriptions.create()

Models

Discover the live catalog. Both endpoints are cache-aware, so listing models is cheap to poll from your app.

  • GET/v1/models
    models.list()
  • GET/v1/models/{model_id}
    models.retrieve()
  • GET/api/models/pricing
    GET /api/models/pricing

Usage & credits

Read your credit balance and spend history. Cost is authoritative — the routing engine owns it and surfaces it on the x-nemo-request-cost header.

  • GET/api/credits/balance
    GET /api/credits/balance
  • GET/api/credits/history
    GET /api/credits/history
  • GET/nemo/guardrail/logs
    GET /nemo/guardrail/logs

Files, fine-tuning, batches, and the Assistants API are not yet available through NemoRouter — use chat.completions.create() for now. The docs track the full supported list.

Response headers

Every response carries x-nemo-* metadata

Beyond the standard OpenAI body, NemoRouter attaches headers describing what happened on the gateway — cost, latency, which guardrails ran, which prompt version and A/B variant were used.

HeaderDescriptionPresent
x-nemo-request-idUnique request ID — quote it in any support ticket.Always
x-nemo-latency-msServer-side latency in milliseconds.Always
x-nemo-request-costExact request cost in USD — authoritative.When applicable
x-nemo-guardrails-appliedGuardrails that ran on this request (CSV).When applicable
x-nemo-prompt-versionPrompt template version injected, if any.When applicable
x-nemo-ab-testA/B test variant selected for this request.When applicable
x-nemo-cacheCache hit/miss on the models endpoint.When applicable

Cost tracking is authoritative — the routing engine owns it and NemoRouter reads it back; we never re-estimate cost ourselves.

Rate limits

RPM and TPM, per tier

Rate limits and throughput are the only differentiators between plans. All features are free on every tier.

Plan tierRPMTPMTPS
Tier 1 — Pay As You Go500500K10
Tier 2 — Monthly500500K50
Tier 3 — Annual1K1M200
EnterpriseCustomCustomCustom

Limits apply per key

RPM, TPM, and per-key budgets are enforced at the gateway. Spread load across keys, or scale a single key up by moving tiers.

Hitting a ceiling returns 429

Exceed RPM or TPM and the gateway returns 429 with a NemoRateLimitError — retry after a short backoff.

Error reference

Error codes you should handle

NemoRouter raises a small, typed family of errors for gateway-specific conditions, on top of the standard OpenAI HTTP status codes. Catch the class, read the code, branch accordingly.

NemoRouter gateway errors

HTTPCode / classWhat it means
400guardrail_blockedNemoGuardrailBlockedErrorA guardrail blocked the request — e.g. detected PII, a prompt-injection attempt, or a secret in the payload.
402insufficient_creditsNemoCreditErrorThe key would breach its credit balance or budget. No partial debit occurs — the reservation is released cleanly.
429rate_limit_exceededNemoRateLimitErrorThe requests-per-minute (RPM) ceiling for the key or tier was exceeded. Retry after a short backoff.
429tpm_limit_exceededNemoRateLimitErrorThe tokens-per-minute (TPM) ceiling was exceeded. Lower request size or spread load across the window.
503credit_check_failedNemoServiceErrorThe credit system was briefly unavailable. The request is rejected rather than run un-metered — fail-safe by design.

Standard HTTP status codes

400

Bad Request

The request was malformed or missing required parameters.

401

Authentication Error

Invalid or missing API key.

403

Permission Denied

Your key does not have access to this model or resource.

404

Not Found

The requested model or endpoint does not exist.

408

Request Timeout

The LLM provider did not respond in time.

429

Rate Limit Exceeded

You have exceeded your tokens-per-minute or requests-per-minute limit.

500

Internal Server Error

An unexpected error occurred in the proxy.

502

Bad Gateway

The LLM provider returned an invalid response.

503

Service Unavailable

The LLM provider is temporarily unavailable.

504

Gateway Timeout

The LLM provider did not respond in time.

A 402 never partially debits credits — the reserve is released cleanly, so a rejected request costs nothing.

Webhooks

Webhooks for alerts and lifecycle events

NemoRouter pushes operational events to an HTTPS endpoint you control — no polling. Configure webhook and Slack delivery under Observability → Alerts in the dashboard; there is no webhook plumbing to write yourself.

Event types

  • llm.error

    A request failed after retries and failover — surfaces provider outages fast.

  • budget.threshold

    A key, team, or org crossed a configured spend threshold.

  • provider.outage

    A provider circuit breaker opened — the gateway is failing over.

  • guardrail.triggered

    A guardrail blocked or redacted content — useful for a security channel.

  • key.created / key.revoked

    A virtual key changed — pair with the audit trail for compliance.

Delivery you can trust

Events are delivered with retries and backoff. Pair webhooks with the Slack callback to fan critical alerts straight into a channel.

Honest scope: webhooks here cover NemoRouter operational events. An OpenAI-style /v1/webhooks management API is not yet available — see the observability overview.
OpenAPI spec

Machine-readable specification

The NemoRouter API specification is published as a JSON document — endpoints, response headers, error classes, and the extra_body fields that drive guardrails, prompts, and caching. Download it, or read the source in the open SDK repository.

The published spec covers the supported surface today. Endpoints that are not yet available are listed explicitly inside the spec under unsupported_endpoints — nothing is implied that the API does not do.

One key · two-line switch

Build against the API in five minutes

Create a virtual key, point your OpenAI client at the NemoRouter base URL, and ship. The full per-endpoint reference and guides live in the docs.

Test endpoints live in the playground.