$5 free credits when you sign up
Use Case · Document Processing

Extract and classify documents at volume.

Document workloads run vision-capable models over scans and PDFs in large batches. Nemo Router keeps every request on a vision model, paces the batch against provider quotas, and tracks cost per job.

batch · job-id invoices-0418

A document batch in flight

Documents4,200 pages
Tag filtervision
Modelgemini-2.5-pro
Cache hits312 repeats
Fallback used2 pages
Job costtracked
vision-onlybatch-pacedcost-tracked
Vision models
Tag-filtered

Requests stay on vision-capable models

Batch jobs
Paced

Per-key RPM/TPM limits throttle the loop

Cost per job
Tracked

Spend attributed by job-id metadata

Catalog
20+

models on Google Vertex AI

Why Nemo for documents

The right model, at volume, with a cost number

Document processing has three constants: it needs vision, it runs in bulk, and it has a budget. Nemo Router handles all three behind one key.

Vision-capable models, by tag

Extracting fields from a scanned invoice or classifying a PDF needs a vision model. The catalog tags vision capability, and tag-filtered routing guarantees a document request lands on a vision-capable model.

  • Catalog tags surface vision-capable models
  • Tag-filtered routing — a vision request never falls back to text-only
  • Multi-tag filters combine vision with long-context for big PDFs
  • A mismatched tag set returns a 400, not a wrong model

Built for batch workloads

Document processing runs in bulk — thousands of pages per job. Each page is a standard request, so a batch is a loop. Per-key rate limits pace it, and failover absorbs a provider blip mid-run.

  • Each document is a standard chat.completions call
  • Per-key RPM/TPM limits pace the batch against provider quotas
  • Fallback chains keep a long batch alive through a degradation
  • Caching skips identical pages — boilerplate cover sheets, repeats

Cost tracking per job

A batch job has a budget. LiteLLM reports the real cost of every call; tag the batch with a job id and the dashboard attributes total spend to that job — cost-per-document becomes a measured number.

  • Real per-call cost from the LiteLLM response-cost header
  • Tag a batch with a job id in request metadata
  • Spend analytics break down cost by job and model
  • Per-team and per-key budgets cap a runaway batch

Failover for long-running batches

A batch can run for hours. A provider degradation halfway through shouldn’t mean restarting from page one. The fallback chain retries the next link transparently and the run continues.

  • Ordered fallback chain per model group
  • Timeouts, 5xx, and circuit-breaks trigger the next link
  • Retries honor cooldown and provider rate-limit hints
  • Each fallback logged so you can see what degraded
How it works

A document batch, end to end

A batch is a loop of standard requests. Each document routes to a vision-capable model, the loop is rate-paced and failover-protected, and total spend rolls up by job id.

Document batch flow

  1. Batch job

    job-id in metadata

    Thousands of documents — scans, PDFs, images.

  2. Per document

    one /chat/completions

    Each page is a standard vision request.

  3. Vision tag route

    catalog · vision filter

    Tag filter pins the request to a vision-capable model.

  4. Rate-paced + failover

    RPM/TPM · fallback

    Limits pace the loop; failover survives a provider blip.

  5. Cost per job

    spend analytics

    Total spend rolled up by the job-id tag.

Tag-filtered routing is the safety rail — a request tagged vision can never silently fall back to a text-only model.

Tag-filtered routing

A vision request stays on a vision model

Capability tags

The catalog enforces the capability — not your code

Document work breaks if a request meant for a vision model lands on a text-only one. Tag a request with the capability it needs and the candidate pool is filtered to models that have it. Combine vision with long-context for multi-page PDFs; a mismatched tag set returns a 400 instead of a wrong answer.

  • Capability tags resolved from the model catalog at request time
  • Multi-tag intersection — e.g. vision AND long-context
  • Per-key default tags with a per-request override
  • Mismatched tag set returns 400 — never a wrong model
routing · vision tag filter

Tag filter in action

Request tagsvision
Candidate poolvision models
Text-only modelsexcluded
Chosengemini-2.5-pro
Wrong-model risknone
capability-awaremulti-tagsafe fallback
The code

One request per document, in a loop

A document batch is a loop of standard chat.completions calls — image content in, structured fields out. These snippets are generated from the SDK examples the playground uses; add a job-id to metadata to attribute the batch.

Installpip install openai
1# Cache: enabled (org default). Pass nemo_cache: false to skip.
2from openai import OpenAI
3import os
4
5client = OpenAI(
6 api_key=os.environ["NEMOROUTER_API_KEY"],
7 base_url="https://api.nemorouter.ai/v1",
8)
9
10response = client.chat.completions.create(
11 model="gemini-2.5-flash",
12 temperature=1,
13 max_tokens=1024,
14 top_p=1,
15 messages=[
16 {"role": "user", "content": "Hello! What models do you support?"},
17 ],
18 extra_body={
19 # "nemo_cache": False, # Uncomment to skip cache
20 },
21)
22
23print(response.choices[0].message.content)

Tag each call with a job id in request metadata — spend analytics rolls the whole batch up to that job.

FAQ

Common document-processing questions

Vision, at volume, with a cost number

Process documents without minding provider quotas

Tag-filtered vision routing, batch-friendly rate limits, failover, and per-job cost tracking — all unlocked on every plan.