Does Nemo Router support vision-capable models for document work?

Yes. The model catalog tags vision-capable models, and tag-filtered routing keeps a document request on a vision model — a request that needs vision can never fall back to a text-only model by accident.

Can I run large batches of documents through the gateway?

Yes. Each document is a standard request through the OpenAI-compatible endpoint, so a batch job is a loop of those calls. Per-key RPM/TPM limits pace the batch, fallback chains absorb a provider blip mid-run, and caching skips identical pages.

How do I track the cost of a document-processing job?

LiteLLM reports the real cost of every call via the response-cost header. Tag a batch with a job id in request metadata and the dashboard attributes total spend to that job — so cost-per-document is a measured number.

Does Nemo Router support vision-capable models for document work?

Yes. The model catalog tags vision-capable models, and tag-filtered routing keeps a document request on a vision model — a request that needs vision can never fall back to a text-only model by accident.

Can I run large batches of documents through the gateway?

Yes. Each document is a standard request through the OpenAI-compatible endpoint, so a batch job is a loop of those calls. Per-key RPM/TPM limits pace the batch, fallback chains absorb a provider blip mid-run, and caching skips identical pages.

How do I track the cost of a document-processing job?

LiteLLM reports the real cost of every call via the response-cost header. Tag a batch with a job id in request metadata and the dashboard attributes total spend to that job — so cost-per-document is a measured number.

What happens if a provider degrades in the middle of a batch?

The fallback chain retries the next provider transparently. A 5xx, timeout, or circuit-break on the primary triggers the next link — the batch keeps running from where it was, and the failover is logged.

Use Case · Document Processing

Extract and classify documents at volume.

Document workloads run vision-capable models over scans and PDFs in large batches. Nemo Router keeps every request on a vision model, paces the batch against provider quotas, and tracks cost per job.

Get started See the batch flow

batch · job-id invoices-0418

A document batch in flight

Documents4,200 pages

Tag filtervision

Modelgemini-2.5-pro

Cache hits312 repeats

Fallback used2 pages

Job costtracked

vision-onlybatch-pacedcost-tracked

Vision models: Tag-filtered
Batch jobs: Paced
Cost per job: Tracked
Catalog: 20+

Why Nemo for documents

The right model, at volume, with a cost number

Document processing has three constants: it needs vision, it runs in bulk, and it has a budget. Nemo Router handles all three behind one key.

Vision-capable models, by tag

Extracting fields from a scanned invoice or classifying a PDF needs a vision model. The catalog tags vision capability, and tag-filtered routing guarantees a document request lands on a vision-capable model.

Catalog tags surface vision-capable models
Tag-filtered routing — a vision request never falls back to text-only
Multi-tag filters combine vision with long-context for big PDFs
A mismatched tag set returns a 400, not a wrong model

Built for batch workloads

Document processing runs in bulk — thousands of pages per job. Each page is a standard request, so a batch is a loop. Per-key rate limits pace it, and failover absorbs a provider blip mid-run.

Each document is a standard chat.completions call
Per-key RPM/TPM limits pace the batch against provider quotas
Fallback chains keep a long batch alive through a degradation
Caching skips identical pages — boilerplate cover sheets, repeats

Cost tracking per job

A batch job has a budget. LiteLLM reports the real cost of every call; tag the batch with a job id and the dashboard attributes total spend to that job — cost-per-document becomes a measured number.

Real per-call cost from the LiteLLM response-cost header
Tag a batch with a job id in request metadata
Spend analytics break down cost by job and model
Per-team and per-key budgets cap a runaway batch

Failover for long-running batches

A batch can run for hours. A provider degradation halfway through shouldn’t mean restarting from page one. The fallback chain retries the next link transparently and the run continues.

Ordered fallback chain per model group
Timeouts, 5xx, and circuit-breaks trigger the next link
Retries honor cooldown and provider rate-limit hints
Each fallback logged so you can see what degraded

How it works

A document batch, end to end

A batch is a loop of standard requests. Each document routes to a vision-capable model, the loop is rate-paced and failover-protected, and total spend rolls up by job id.

Document batch flow

Batch job
job-id in metadata
Thousands of documents — scans, PDFs, images.
Per document
one /chat/completions
Each page is a standard vision request.
Vision tag route
catalog · vision filter
Tag filter pins the request to a vision-capable model.
Rate-paced + failover
RPM/TPM · fallback
Limits pace the loop; failover survives a provider blip.
Cost per job
spend analytics
Total spend rolled up by the job-id tag.

Tag-filtered routing is the safety rail — a request tagged vision can never silently fall back to a text-only model.

Tag-filtered routing

A vision request stays on a vision model

Capability tags

The catalog enforces the capability — not your code

Document work breaks if a request meant for a vision model lands on a text-only one. Tag a request with the capability it needs and the candidate pool is filtered to models that have it. Combine vision with long-context for multi-page PDFs; a mismatched tag set returns a 400 instead of a wrong answer.

Capability tags resolved from the model catalog at request time
Multi-tag intersection — e.g. vision AND long-context
Per-key default tags with a per-request override
Mismatched tag set returns 400 — never a wrong model

routing · vision tag filter

Tag filter in action

Request tagsvision

Candidate poolvision models

Text-only modelsexcluded

Chosengemini-2.5-pro

Wrong-model risknone

capability-awaremulti-tagsafe fallback

The code

One request per document, in a loop

A document batch is a loop of standard chat.completions calls — image content in, structured fields out. These snippets are generated from the SDK examples the playground uses; add a job-id to metadata to attribute the batch.

Installpip install openai

1	`# Cache: enabled (org default). Pass nemo_cache: false to skip.`
2	`from openai import OpenAI`
3	`import os`
4
5	`client = OpenAI(`
6	`api_key=os.environ["NEMOROUTER_API_KEY"],`
7	`base_url="https://api.nemorouter.ai/v1",`
8	`)`
9
10	`response = client.chat.completions.create(`
11	`model="gemini-2.5-flash",`
12	`temperature=1,`
13	`max_tokens=1024,`
14	`top_p=1,`
15	`messages=[`
16	`{"role": "user", "content": "Hello! What models do you support?"},`
17	`],`
18	`extra_body={`
19	`# "nemo_cache": False, # Uncomment to skip cache`
20	`},`
21	`)`
22
23	`print(response.choices[0].message.content)`

Tag each call with a job id in request metadata — spend analytics rolls the whole batch up to that job.

FAQ

Common document-processing questions

Vision, at volume, with a cost number

Process documents without minding provider quotas

Tag-filtered vision routing, batch-friendly rate limits, failover, and per-job cost tracking — all unlocked on every plan.

Get started Browse models