Rate limits

UnifAPI enforces two layers of rate limiting: a per-workspace limit that smooths traffic across all endpoints, and a per-endpoint limit that mirrors what the upstream provider will tolerate.

Headers

Every response — 2xx or 429 — carries the standard headers:

Header	Meaning
`X-RateLimit-Limit`	Requests allowed in the current window
`X-RateLimit-Remaining`	Requests left in the current window
`X-RateLimit-Reset`	Unix timestamp (seconds) when the window resets
`Retry-After`	Seconds to wait before retrying (`429` only)

When both a workspace and an endpoint limit apply, the headers report whichever is tighter.

Defaults

Plan	Workspace limit	Per-endpoint limit
Free	60 req/min	Whatever the upstream allows
Pay-as-you-go	600 req/min	Whatever the upstream allows
Enterprise	Custom	Custom

Handling `429`

async function call(url: string, init: RequestInit, attempt = 0): Promise<Response> {
  const res = await fetch(url, init);
  if (res.status !== 429 || attempt >= 5) return res;

  const retryAfter = Number(res.headers.get('Retry-After') ?? 1);
  await new Promise(r => setTimeout(r, retryAfter * 1000));
  return call(url, init, attempt + 1);
}

Two rules to live by:

Honor Retry-After. UnifAPI returns the exact value the upstream provider asked for — guessing usually makes things worse.
Back off on repeated 429s. If you hit the limit five times in a row, the workspace is undersized for the workload. Upgrade or shard across workspaces.

Burst behavior

Limits are token-bucket: a workspace that has been idle for a minute can briefly burst above its steady-state limit. Don't rely on this for production traffic — design for the steady-state rate.

Asking for more

Need a higher limit? Email support@unifapi.com with your workspace ID (in the dashboard) and a rough QPS estimate.