API Reference

One key, one base URL, the OpenAI API you already know.

This is the scannable reference: endpoints, authentication, rate limits, error codes, webhooks, and the OpenAPI spec — on one page. NemoRouter speaks the OpenAI REST API, so the request and response shapes are exactly what your SDK already sends.

Get an API key Full documentation

api · reference · contract

The contract at a glance

Base URL/v1

Auth schemeBearer sk-nemo-…

API formatOpenAI-compatible

Cost headerx-nemo-request-cost

Error classesNemoError family

OpenAPI specdownloadable

OpenAI-compatibleBearer authx-nemo-* headers

API surface: OpenAI
Auth: Bearer
Base URL: /v1
Response headers: x-nemo-*

This page is a quick-reference overview. The full documentation carries per-endpoint request/response schemas and guides.

Quickstart

Your first request, in seven languages

These snippets are generated from the canonical NemoRouter SDK examples — the same source the playground and dashboard keys page use. Set NEMOROUTER_API_KEY in your environment and run.

Installpip install nemoroutersdk

1	`# Cache: enabled (org default). Pass nemo_cache: false to skip.`
2	`from nemoroutersdk import NemoRouter, NemoGuardrailBlockedError`
3
4	`client = NemoRouter() # reads NEMOROUTER_API_KEY from env`
5
6	`try:`
7	`response = client.chat.completions.create(`
8	`model="gemini-2.5-flash",`
9	`temperature=1,`
10	`max_tokens=1024,`
11	`top_p=1,`
12	`messages=[`
13	`{"role": "user", "content": "Hello! What models do you support?"},`
14	`],`
15	`# nemo_cache=False, # Uncomment to skip cache`
16	`)`
17	`print(response.choices[0].message.content)`
18
19	`# Auto-captured metadata`
20	`meta = client.last_response`
21	`if meta:`
22	`print(f"Cost: ${meta.cost}")`
23	`print(f"Guardrails: {meta.guardrails_applied}")`
24
25	`except NemoGuardrailBlockedError as e:`
26	`print(f"Blocked by guardrail: {e}")`

Point any OpenAI SDK at the Nemo Router endpoint. Set NEMOROUTER_API_KEY in your environment before running.

Base URL is always https://api.nemorouter.ai/v1. Works with the official OpenAI SDKs and any OpenAI-compatible client.

Authentication

Bearer auth with a virtual key

Every LLM request authenticates with a NemoRouter virtual key (sk-nemo-…) in the Authorization header. There are no provider keys to manage — NemoRouter holds those — and no master key ever leaves a trusted service.

The Authorization header

Pass your virtual key as a bearer token. This is the only credential your application ever holds.

Authorization: Bearer sk-nemo-xxxxxxxxxxxx

Create keys in the dashboard — the full key is shown once at creation
Each key carries its own RPM/TPM caps, budget, and spend tracking
Revoke a key instantly — the blast radius of a leak is one key

Why virtual keys, not provider keys

NemoRouter is a managed gateway: it owns the OpenAI, Anthropic, and Google credentials. You never see them, never rotate them, and never risk leaking them.

One key reaches every model — switch models by changing a string
Per-key budgets and rate limits are enforced at the gateway
Keys are hashed at rest; observability scopes by key, team, and org

Endpoints

The endpoints you will actually call

The inference surface is OpenAI-compatible — same paths, same payloads. Models and usage endpoints are NemoRouter additions for catalog discovery and spend visibility.

Inference

The OpenAI-compatible inference surface. Chat completions is the primary endpoint and runs the full feature path — guardrails, prompts, A/B tests, caching.

POST/v1/chat/completions
chat.completions.create()
POST/v1/completions
completions.create()
POST/v1/embeddings
embeddings.create()
POST/v1/images/generations
images.generate()
POST/v1/audio/speech
audio.speech.create()
POST/v1/audio/transcriptions
audio.transcriptions.create()

Models

Discover the live catalog. Both endpoints are cache-aware, so listing models is cheap to poll from your app.

GET/v1/models
models.list()
GET/v1/models/{model_id}
models.retrieve()
GET/api/models/pricing
GET /api/models/pricing

Usage & credits

Read your credit balance and spend history. Cost is authoritative — the routing engine owns it and surfaces it on the x-nemo-request-cost header.

GET/api/credits/balance
GET /api/credits/balance
GET/api/credits/history
GET /api/credits/history
GET/nemo/guardrail/logs
GET /nemo/guardrail/logs

Files, fine-tuning, batches, and the Assistants API are not yet available through NemoRouter — use chat.completions.create() for now. The docs track the full supported list.

Response headers

Every response carries x-nemo-* metadata

Beyond the standard OpenAI body, NemoRouter attaches headers describing what happened on the gateway — cost, latency, which guardrails ran, which prompt version and A/B variant were used.

Header	Description	Present
`x-nemo-request-id`	Unique request ID — quote it in any support ticket.	Always
`x-nemo-latency-ms`	Server-side latency in milliseconds.	Always
`x-nemo-request-cost`	Exact request cost in USD — authoritative.	When applicable
`x-nemo-guardrails-applied`	Guardrails that ran on this request (CSV).	When applicable
`x-nemo-prompt-version`	Prompt template version injected, if any.	When applicable
`x-nemo-ab-test`	A/B test variant selected for this request.	When applicable
`x-nemo-cache`	Cache hit/miss on the models endpoint.	When applicable

Cost tracking is authoritative — the routing engine owns it and NemoRouter reads it back; we never re-estimate cost ourselves.

Rate limits

RPM and TPM, per tier

Rate limits and throughput are the only differentiators between plans. All features are free on every tier.

Plan tier	RPM	TPM	TPS
Tier 1 — Pay As You Go	500	500K	10
Tier 2 — Monthly	500	500K	50
Tier 3 — Annual	1K	1M	200
Enterprise	Custom	Custom	Custom

Limits apply per key

RPM, TPM, and per-key budgets are enforced at the gateway. Spread load across keys, or scale a single key up by moving tiers.

Hitting a ceiling returns 429

Exceed RPM or TPM and the gateway returns 429 with a NemoRateLimitError — retry after a short backoff.

Error reference

Error codes you should handle

NemoRouter raises a small, typed family of errors for gateway-specific conditions, on top of the standard OpenAI HTTP status codes. Catch the class, read the code, branch accordingly.

NemoRouter gateway errors

HTTP	Code / class	What it means
400	`guardrail_blockedNemoGuardrailBlockedError`	A guardrail blocked the request — e.g. detected PII, a prompt-injection attempt, or a secret in the payload.
402	`insufficient_creditsNemoCreditError`	The key would breach its credit balance or budget. No partial debit occurs — the reservation is released cleanly.
429	`rate_limit_exceededNemoRateLimitError`	The requests-per-minute (RPM) ceiling for the key or tier was exceeded. Retry after a short backoff.
429	`tpm_limit_exceededNemoRateLimitError`	The tokens-per-minute (TPM) ceiling was exceeded. Lower request size or spread load across the window.
503	`credit_check_failedNemoServiceError`	The credit system was briefly unavailable. The request is rejected rather than run un-metered — fail-safe by design.

Standard HTTP status codes

400

Bad Request

The request was malformed or missing required parameters.

401

Authentication Error

Invalid or missing API key.

403

Permission Denied

Your key does not have access to this model or resource.

404

Not Found

The requested model or endpoint does not exist.

408

Request Timeout

The LLM provider did not respond in time.

429

Rate Limit Exceeded

You have exceeded your tokens-per-minute or requests-per-minute limit.

500

Internal Server Error

An unexpected error occurred in the proxy.

502

Bad Gateway

The LLM provider returned an invalid response.

503

Service Unavailable

The LLM provider is temporarily unavailable.

504

Gateway Timeout

The LLM provider did not respond in time.

A 402 never partially debits credits — the reserve is released cleanly, so a rejected request costs nothing.

Webhooks

Webhooks for alerts and lifecycle events

NemoRouter pushes operational events to an HTTPS endpoint you control — no polling. Configure webhook and Slack delivery under Observability → Alerts in the dashboard; there is no webhook plumbing to write yourself.

Event types

llm.error
A request failed after retries and failover — surfaces provider outages fast.
budget.threshold
A key, team, or org crossed a configured spend threshold.
provider.outage
A provider circuit breaker opened — the gateway is failing over.
guardrail.triggered
A guardrail blocked or redacted content — useful for a security channel.
key.created / key.revoked
A virtual key changed — pair with the audit trail for compliance.

Delivery you can trust

Events are delivered with retries and backoff. Pair webhooks with the Slack callback to fan critical alerts straight into a channel.

Honest scope: webhooks here cover NemoRouter operational events. An OpenAI-style /v1/webhooks management API is not yet available — see the observability overview.

OpenAPI spec

Machine-readable specification

The NemoRouter API specification is published as a JSON document — endpoints, response headers, error classes, and the extra_body fields that drive guardrails, prompts, and caching. Download it, or read the source in the open SDK repository.

Download

openapi.json

The full API spec, served from this site. Feed it to your client generator or API tooling.

Source

nemo-router-sdk

The spec is canonical at spec/nemo-sdk-spec.json in the open SDK repo — alongside runnable examples.

The published spec covers the supported surface today. Endpoints that are not yet available are listed explicitly inside the spec under unsupported_endpoints — nothing is implied that the API does not do.

One key · two-line switch

Build against the API in five minutes

Create a virtual key, point your OpenAI client at the NemoRouter base URL, and ship. The full per-endpoint reference and guides live in the docs.

Get an API key Read the full docs

Test endpoints live in the playground.