Five routing strategies
Pick the strategy that fits your workload. Simple-shuffle randomizes uniformly, least-busy steers to the lowest concurrent-request endpoint, and usage/latency/cost-based weights pick from live signal.
- simple-shuffle — uniform random across healthy endpoints
- least-busy — fewest in-flight requests wins
- usage-based — distribute by RPM/TPM headroom
- latency-based + cost-based — optimize for the metric you care about