Skip to content

Transient vs permanent AI error classification

generate-content retries AI generation and falls back to a commit-derived description when generation fails. Not all failures are worth retrying or hiding behind a generic fallback: a 401 Unauthorized means the API key is wrong — retrying wastes time and the fallback silently masks a configuration problem the user needs to fix.

Before this decision, all AI failures were treated uniformly: they either triggered retries or the fallback path, and none surfaced directly as user-visible configuration errors.

  • Treat all AI failures as transient — Simple. Every failure retries and falls back. User never sees an auth error directly; they get a commit-derived description with no indication of why AI generation failed.
  • Treat all AI failures as permanent — Conservative. Every failure surfaces as an error. Safe for auth errors but breaks the fallback path for network blips and rate limits.
  • Classify by HTTP status or error reason — Inspect the failure type and route accordingly: permanent failures surface as AutoPrConfigError; transient failures continue to the fallback.

Chosen option: classify by HTTP status or error reason, via isTransientAiError in src/auto-pr/errors.ts.

Error typeConditionClassification
AiProviderErrorstatus 401 or 403Permanent — bad credentials or auth config
AiProviderErrorstatus null, 429, 5xx, or other 4xxTransient — network, rate limit, server error
AiError (Effect AI)reason AuthenticationErrorPermanent — invalid API key
AiError (Effect AI)any other reason (including InvalidRequestError)Transient
DescriptionParseErrorTransient — schema decode failure; retry may succeed with a different model output
Anything elseTransient — unknown errors default to transient

InvalidRequestError (HTTP 400) is classified as transient even though it is technically a client error. Local llama.cpp servers can return 400 for model-limitation reasons (e.g. context overflow on a specific prompt) that may not reproduce on retry or that the fallback path handles fine. Treating it as permanent would break the fallback for local model users.

  • Good: Auth errors surface immediately as AutoPrConfigError, naming the problem. Users don’t see a commit-derived description and wonder why AI generation silently failed.
  • Good: Transient errors (network, rate limit, server errors) continue to the existing retry-then-fallback path — no regression for those cases.
  • Good: isTransientAiError is a pure function and is straightforwardly tested.
  • Neutral: The classification logic has to be kept in sync with new AiError reason types as Effect AI evolves. The default-to-transient fallback (return true at the end) is intentionally conservative.
  • Implementation: src/auto-pr/errors.ts (isTransientAiError), src/workflow/auto-pr-generate-content.ts (catchTags for AiError)
  • Related: ADR 0007 (AI provider abstraction), ADR 0011 (DiffToolkit and GitContext)