Skip to main content

Retry policies

Every agent runs its LLM call inside a RetryPolicy. When the response fails validation (malformed JSON for a typed agent, custom rule from ValidateOutput, anything else you signal), the policy re-prompts the LLM with the failure context.

RetryPolicy reference

public sealed class RetryPolicy
{
public int MaxRetries { get; } // default 3
public TimeSpan Delay { get; } // default 500 ms

public RetryPolicy(int maxRetries = 3, TimeSpan? delay = null);
}

The total number of attempts is MaxRetries + 1 (one initial call, plus up to MaxRetries retries). Delay between attempts is Delay × attemptNumber — linearly increasing.

Default policy

Every Agent<T> and AgentBase<T> defaults to new RetryPolicy() — 3 retries, 500 ms base delay. Total wait budget on a fully exhausted run: 0 + 500 + 1000 + 1500 = 3000 ms plus four LLM round-trips.

Tuning

Pass to Agent<T>:

IAgent agent = new Agent<string>(
"Strict",
"Picky agent.",
"Reply with exactly three bullet lines.",
llm,
retryPolicy: new RetryPolicy(maxRetries: 5, delay: TimeSpan.FromMilliseconds(200)));

Or override on AgentBase<T>:

protected override RetryPolicy RetryPolicy =>
new RetryPolicy(maxRetries: 5, delay: TimeSpan.FromMilliseconds(200));

When the agent gives up

If validation never passes, the policy throws AgentOutputException with the last raw response and last error. The admin then turns that into a RunFailedEvent (and the exception bubbles out of RunAsync).

try
{
var output = await admin.RunAsync(input);
}
catch (AgentOutputException ex)
{
// last response and last error are in ex.Message
log.Error(ex, "Agent never produced valid output.");
}

What triggers a retry

The framework retries when ValidateOutput returns a non-null string. The defaults are:

  • Agent<string> — never retries on content. JSON parsing isn't enforced.
  • Agent<TPoco> — retries when the response can't be extracted to a JSON object/array, or when deserialisation to TPoco fails.

You can add your own validation by overriding AgentBase<T>.ValidateOutput.

What does not trigger a retry

  • Network exceptions from the LLM provider. These bubble up directly. Wrap your own retry / circuit breaker around the call site if you want to retry transport failures.
  • Tool exceptions. A failing tool call is reported via ToolCallFailedEvent and re-thrown — RetryPolicy doesn't intervene. If you want the LLM to recover from a tool error, catch inside the tool and return an error message instead of throwing (see Custom tools (typed)).
  • BudgetExceededException. Spend cap reached — retrying would burn more budget. The exception escapes the policy.

Observability

Every retry fires LlmCallRetryingEvent on the bus. The default LogicGridLogger writes a [Agent] retry N — error line when ShowRetries = true (the default).

09:14:03.402 [WRN] [a3f2c891] [Strict] retry 1 — Expected exactly 3 bullets, got 4.

The trace records every retry under AgentSpan.RetryAttempts:

foreach (var attempt in trace.AgentSpans.SelectMany(s => s.RetryAttempts))
Console.WriteLine($"retry {attempt.AttemptNumber}: {attempt.Error}");

Common pitfalls

  • High MaxRetries + flaky validator. A non-deterministic validator (e.g. one that depends on external state) will retry until exhaustion every time. Keep validation pure.
  • Zero Delay in production. Most providers rate-limit. A retry storm with no delay can lock you out.
  • Retry to mask a bad prompt. If you're regularly hitting MaxRetries, fix the prompt or relax the validation — retries cost real tokens.