$5 free credits when you sign up
Use Case · AI Agents

Agents that stay up and stay in budget.

An autonomous agent fans out dozens of LLM calls per task. Nemo Router keeps a long run alive when a provider degrades, caps each agent with a per-key budget, and logs every tool call for audit.

agent-run · key sk-nemo-7f2a

One agent, one key, one budget

Steps this run24 calls
Fallback usedstep 11
Run statuscompleted
Budget$2.40 / $5.00
Tool calls logged24 / 24
Runaway protection402 ceiling
per-agent keyfailover-safeaudited
Long runs
Stay up

Fallback chains survive a provider blip

Per-agent spend
Budgeted

A 402 stops a runaway loop

Tool calls
100% logged

Every step in the request log

Catalog
20+

models on Google Vertex AI

Why Nemo for agents

Reliability, spend control, and a trail

Agents are unpredictable by design. The gateway makes them dependable: a run that survives a provider blip, a budget it cannot exceed, and a log of everything it did.

Auto-failover for long runs

Agents fan out dozens of LLM calls per task. One provider hiccup halfway through a run shouldn’t kill it. The fallback chain retries the next link transparently so the agent keeps going.

  • Ordered fallback chain per model group
  • Timeouts, 5xx, and circuit-breaks all trigger the next link
  • Cross-provider failover for planning and tool-use calls alike
  • Each fallback logged so you can see what degraded

Per-agent budgets

A buggy agent loop can burn credits fast. Give each agent its own virtual key with a per-key budget — when it runs out, the gateway returns a 402 instead of spending without a ceiling.

  • One virtual key per agent, each with its own budget
  • Hard 402 ceiling stops a runaway reasoning loop
  • Per-key RPM/TPM limits throttle a misbehaving agent
  • Org and team budgets cap the fleet above the per-agent cap

Every tool call audited

When an agent does something surprising, you need the trail. Every LLM call — planning, tool selection, reflection — lands in the request log with model, latency, cost, and result.

  • Filter the request log by an agent’s virtual key to replay a run
  • Model, prompt metadata, latency, cost, and result per call
  • Export to Langfuse, Datadog, or S3 via a logging callback
  • The same trail powers spend analytics and incident review

Model choice per agent step

Planning wants a strong reasoning model; a quick tool-argument extraction wants a fast one. Pick the model per call from the catalog — no provider account, no SDK swap.

  • Choose any catalog model per request
  • Tag-filtered routing keeps function-calling steps on capable models
  • Latency-based routing for the steps a user is waiting on
  • 20+ models live, more shipping
How it works

An agent run, end to end

Each agent carries its own virtual key. Every planning and tool-selection step is a separate LLM call — routed, failover-protected, budget-checked, and logged.

Agent run flow

  1. Agent key

    sk-nemo-... · per agent

    One virtual key per agent — its own budget and rate limit.

  2. Plan + tool calls

    many /chat/completions

    Each reasoning + tool-selection step is a separate call.

  3. Routing + failover

    fallback chain

    A degraded provider triggers the next link — the run survives.

  4. Budget check

    reserve + settle

    Out of budget → 402, before the call. No runaway spend.

  5. Logged per step

    request log

    Model, latency, cost, result — replay the whole run.

The budget check is a reserve-then-settle: credits are reserved before the call and settled at the real cost after. If an agent is out of budget, it gets a 402 — never a surprise overspend.

Failover

A degraded provider should not kill a 40-step run

Fallback chains

Step 11 fails over — step 12 never knows

The longer the run, the higher the chance one call hits a degraded provider. With an ordered fallback chain per model group, a 5xx or timeout triggers the next link transparently. The agent loop keeps iterating; the failover is recorded for you to review later.

  • Ordered chain per model group — not a global toggle
  • Cross-provider chains for planning and tool-use calls
  • Retries honor cooldown and provider rate-limit hints
  • Final failure returns a clean 502 with the chain in headers
run · step 11 fallback

Fallback during a run

Step 11 primarytimeout
Fallback 1retried
Step 11 resulttool_call ok
Run interruptedno
Steps completed24 / 24
transparentcross-providerlogged
The code

Point your agent framework at one endpoint

Nemo Router speaks the OpenAI API, so any agent framework that targets the OpenAI SDK works with a base-URL and key change. These snippets are generated from the same SDK examples the playground uses — give each agent its own key and the per-key budget does the rest.

Installpip install openai
1# Cache: enabled (org default). Pass nemo_cache: false to skip.
2from openai import OpenAI
3import os
4
5client = OpenAI(
6 api_key=os.environ["NEMOROUTER_API_KEY"],
7 base_url="https://api.nemorouter.ai/v1",
8)
9
10response = client.chat.completions.create(
11 model="gemini-2.5-flash",
12 temperature=1,
13 max_tokens=1024,
14 top_p=1,
15 messages=[
16 {"role": "user", "content": "Hello! What models do you support?"},
17 ],
18 extra_body={
19 # "nemo_cache": False, # Uncomment to skip cache
20 },
21)
22
23print(response.choices[0].message.content)

Issue one virtual key per agent — spend, rate limits, and the request log all scope to that key.

FAQ

Common agent questions

Reliable, bounded, auditable

Run agents on a gateway built for long runs

Auto-failover, per-agent budgets, and a full tool-call log — unlocked on every plan, behind one NemoRouter key.