Retry policies
Every agent runs its LLM call inside a RetryPolicy. When the response
fails validation (malformed JSON for a typed agent, custom rule from
ValidateOutput, anything else you signal), the policy re-prompts the
LLM with the failure context.
RetryPolicy reference
public sealed class RetryPolicy
{
public int MaxRetries { get; } // default 3
public TimeSpan Delay { get; } // default 500 ms
public RetryPolicy(int maxRetries = 3, TimeSpan? delay = null);
}
The total number of attempts is MaxRetries + 1 (one initial call,
plus up to MaxRetries retries). Delay between attempts is
Delay × attemptNumber — linearly increasing.
Default policy
Every Agent<T> and AgentBase<T> defaults to
new RetryPolicy() — 3 retries, 500 ms base delay. Total wait
budget on a fully exhausted run: 0 + 500 + 1000 + 1500 = 3000 ms
plus four LLM round-trips.
Tuning
Pass to Agent<T>:
IAgent agent = new Agent<string>(
"Strict",
"Picky agent.",
"Reply with exactly three bullet lines.",
llm,
retryPolicy: new RetryPolicy(maxRetries: 5, delay: TimeSpan.FromMilliseconds(200)));
Or override on AgentBase<T>:
protected override RetryPolicy RetryPolicy =>
new RetryPolicy(maxRetries: 5, delay: TimeSpan.FromMilliseconds(200));
When the agent gives up
If validation never passes, the policy throws
AgentOutputException with the last raw response and last error.
The admin then turns that into a RunFailedEvent (and the
exception bubbles out of RunAsync).
try
{
var output = await admin.RunAsync(input);
}
catch (AgentOutputException ex)
{
// last response and last error are in ex.Message
log.Error(ex, "Agent never produced valid output.");
}
What triggers a retry
The framework retries when ValidateOutput returns a non-null
string. The defaults are:
Agent<string>— never retries on content. JSON parsing isn't enforced.Agent<TPoco>— retries when the response can't be extracted to a JSON object/array, or when deserialisation toTPocofails.
You can add your own validation by overriding AgentBase<T>.ValidateOutput.
What does not trigger a retry
- Network exceptions from the LLM provider. These bubble up directly. Wrap your own retry / circuit breaker around the call site if you want to retry transport failures.
- Tool exceptions. A failing tool call is reported via
ToolCallFailedEventand re-thrown —RetryPolicydoesn't intervene. If you want the LLM to recover from a tool error, catch inside the tool and return an error message instead of throwing (see Custom tools (typed)). BudgetExceededException. Spend cap reached — retrying would burn more budget. The exception escapes the policy.
Observability
Every retry fires LlmCallRetryingEvent on the bus. The default
LogicGridLogger writes a [Agent] retry N — error line when
ShowRetries = true (the default).
09:14:03.402 [WRN] [a3f2c891] [Strict] retry 1 — Expected exactly 3 bullets, got 4.
The trace records every retry under AgentSpan.RetryAttempts:
foreach (var attempt in trace.AgentSpans.SelectMany(s => s.RetryAttempts))
Console.WriteLine($"retry {attempt.AttemptNumber}: {attempt.Error}");
Common pitfalls
- High
MaxRetries+ flaky validator. A non-deterministic validator (e.g. one that depends on external state) will retry until exhaustion every time. Keep validation pure. - Zero
Delayin production. Most providers rate-limit. A retry storm with no delay can lock you out. - Retry to mask a bad prompt. If you're regularly hitting
MaxRetries, fix the prompt or relax the validation — retries cost real tokens.