Rate limits
Per-workspace and per-endpoint limits, headers, and retry behavior.
UnifAPI enforces two layers of rate limiting: a per-workspace limit that smooths traffic across all endpoints, and a per-endpoint limit that mirrors what the upstream provider will tolerate.
Headers
Every response — 2xx or 429 — carries the standard headers:
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Requests allowed in the current window |
X-RateLimit-Remaining | Requests left in the current window |
X-RateLimit-Reset | Unix timestamp (seconds) when the window resets |
Retry-After | Seconds to wait before retrying (429 only) |
When both a workspace and an endpoint limit apply, the headers report whichever is tighter.
Defaults
| Plan | Workspace limit | Per-endpoint limit |
|---|---|---|
| Free | 60 req/min | Whatever the upstream allows |
| Pay-as-you-go | 600 req/min | Whatever the upstream allows |
| Enterprise | Custom | Custom |
Handling 429
async function call(url: string, init: RequestInit, attempt = 0): Promise<Response> {
const res = await fetch(url, init);
if (res.status !== 429 || attempt >= 5) return res;
const retryAfter = Number(res.headers.get('Retry-After') ?? 1);
await new Promise(r => setTimeout(r, retryAfter * 1000));
return call(url, init, attempt + 1);
}Two rules to live by:
- Honor
Retry-After. UnifAPI returns the exact value the upstream provider asked for — guessing usually makes things worse. - Back off on repeated 429s. If you hit the limit five times in a row, the workspace is undersized for the workload. Upgrade or shard across workspaces.
Burst behavior
Limits are token-bucket: a workspace that has been idle for a minute can briefly burst above its steady-state limit. Don't rely on this for production traffic — design for the steady-state rate.
Asking for more
Need a higher limit? Email support@unifapi.com with your workspace ID (in the dashboard) and a rough QPS estimate.