Why do AI agents need a gateway?

An agent fans out many LLM calls per task — planning, tool selection, reflection, synthesis. A single provider hiccup can break a long run. A gateway adds auto-failover, spend caps, and a unified request log so an agent run is reliable, bounded, and auditable.

Can I cap how much an agent can spend?

Yes. Issue each agent its own virtual key and set a per-key budget. When the agent exhausts its budget, the gateway returns a 402 instead of silently burning credits — a hard ceiling on a runaway loop.

How do I audit what an agent did?

Every LLM call an agent makes lands in the request log with the model, prompt metadata, latency, cost, and result. Filter by the agent’s virtual key to replay an entire run, or export logs to Langfuse, Datadog, or S3 via a logging callback.

Why do AI agents need a gateway?

An agent fans out many LLM calls per task — planning, tool selection, reflection, synthesis. A single provider hiccup can break a long run. A gateway adds auto-failover, spend caps, and a unified request log so an agent run is reliable, bounded, and auditable.

Can I cap how much an agent can spend?

Yes. Issue each agent its own virtual key and set a per-key budget. When the agent exhausts its budget, the gateway returns a 402 instead of silently burning credits — a hard ceiling on a runaway loop.

How do I audit what an agent did?

Every LLM call an agent makes lands in the request log with the model, prompt metadata, latency, cost, and result. Filter by the agent’s virtual key to replay an entire run, or export logs to Langfuse, Datadog, or S3 via a logging callback.

Does the OpenAI Agents SDK or LangChain work with Nemo Router?

Yes. Nemo Router exposes an OpenAI-compatible API, so any agent framework that targets the OpenAI SDK works by changing the base URL and key. Tool use, streaming, and JSON mode are all proxied transparently.

Use Case · AI Agents

Agents that stay up and stay in budget.

An autonomous agent fans out dozens of LLM calls per task. Nemo Router keeps a long run alive when a provider degrades, caps each agent with a per-key budget, and logs every tool call for audit.

Get started See the run flow

agent-run · key sk-nemo-7f2a

One agent, one key, one budget

Steps this run24 calls

Fallback usedstep 11

Run statuscompleted

Budget$2.40 / $5.00

Tool calls logged24 / 24

Runaway protection402 ceiling

per-agent keyfailover-safeaudited

Long runs: Stay up
Per-agent spend: Budgeted
Tool calls: 100% logged
Catalog: 20+

Why Nemo for agents

Reliability, spend control, and a trail

Agents are unpredictable by design. The gateway makes them dependable: a run that survives a provider blip, a budget it cannot exceed, and a log of everything it did.

Auto-failover for long runs

Agents fan out dozens of LLM calls per task. One provider hiccup halfway through a run shouldn’t kill it. The fallback chain retries the next link transparently so the agent keeps going.

Ordered fallback chain per model group
Timeouts, 5xx, and circuit-breaks all trigger the next link
Cross-provider failover for planning and tool-use calls alike
Each fallback logged so you can see what degraded

Per-agent budgets

A buggy agent loop can burn credits fast. Give each agent its own virtual key with a per-key budget — when it runs out, the gateway returns a 402 instead of spending without a ceiling.

One virtual key per agent, each with its own budget
Hard 402 ceiling stops a runaway reasoning loop
Per-key RPM/TPM limits throttle a misbehaving agent
Org and team budgets cap the fleet above the per-agent cap

Every tool call audited

When an agent does something surprising, you need the trail. Every LLM call — planning, tool selection, reflection — lands in the request log with model, latency, cost, and result.

Filter the request log by an agent’s virtual key to replay a run
Model, prompt metadata, latency, cost, and result per call
Export to Langfuse, Datadog, or S3 via a logging callback
The same trail powers spend analytics and incident review

Model choice per agent step

Planning wants a strong reasoning model; a quick tool-argument extraction wants a fast one. Pick the model per call from the catalog — no provider account, no SDK swap.

Choose any catalog model per request
Tag-filtered routing keeps function-calling steps on capable models
Latency-based routing for the steps a user is waiting on
20+ models live, more shipping

How it works

An agent run, end to end

Each agent carries its own virtual key. Every planning and tool-selection step is a separate LLM call — routed, failover-protected, budget-checked, and logged.

Agent run flow

Agent key
sk-nemo-... · per agent
One virtual key per agent — its own budget and rate limit.
Plan + tool calls
many /chat/completions
Each reasoning + tool-selection step is a separate call.
Routing + failover
fallback chain
A degraded provider triggers the next link — the run survives.
Budget check
reserve + settle
Out of budget → 402, before the call. No runaway spend.
Logged per step
request log
Model, latency, cost, result — replay the whole run.

The budget check is a reserve-then-settle: credits are reserved before the call and settled at the real cost after. If an agent is out of budget, it gets a 402 — never a surprise overspend.

Failover

A degraded provider should not kill a 40-step run

Fallback chains

Step 11 fails over — step 12 never knows

The longer the run, the higher the chance one call hits a degraded provider. With an ordered fallback chain per model group, a 5xx or timeout triggers the next link transparently. The agent loop keeps iterating; the failover is recorded for you to review later.

Ordered chain per model group — not a global toggle
Cross-provider chains for planning and tool-use calls
Retries honor cooldown and provider rate-limit hints
Final failure returns a clean 502 with the chain in headers

run · step 11 fallback

Fallback during a run

Step 11 primarytimeout

Fallback 1retried

Step 11 resulttool_call ok

Run interruptedno

Steps completed24 / 24

transparentcross-providerlogged

The code

Point your agent framework at one endpoint

Nemo Router speaks the OpenAI API, so any agent framework that targets the OpenAI SDK works with a base-URL and key change. These snippets are generated from the same SDK examples the playground uses — give each agent its own key and the per-key budget does the rest.

Installpip install openai

1	`# Cache: enabled (org default). Pass nemo_cache: false to skip.`
2	`from openai import OpenAI`
3	`import os`
4
5	`client = OpenAI(`
6	`api_key=os.environ["NEMOROUTER_API_KEY"],`
7	`base_url="https://api.nemorouter.ai/v1",`
8	`)`
9
10	`response = client.chat.completions.create(`
11	`model="gemini-2.5-flash",`
12	`temperature=1,`
13	`max_tokens=1024,`
14	`top_p=1,`
15	`messages=[`
16	`{"role": "user", "content": "Hello! What models do you support?"},`
17	`],`
18	`extra_body={`
19	`# "nemo_cache": False, # Uncomment to skip cache`
20	`},`
21	`)`
22
23	`print(response.choices[0].message.content)`

Issue one virtual key per agent — spend, rate limits, and the request log all scope to that key.

FAQ

Common agent questions

Reliable, bounded, auditable

Run agents on a gateway built for long runs

Auto-failover, per-agent budgets, and a full tool-call log — unlocked on every plan, behind one NemoRouter key.

Get started How routing works