Retry policies

Every agent runs its LLM call inside a RetryPolicy. When the response fails validation (malformed JSON for a typed agent, custom rule from ValidateOutput, anything else you signal), the policy re-prompts the LLM with the failure context.

`RetryPolicy` reference

public sealed class RetryPolicy
{
    public int      MaxRetries { get; }     // default 3
    public TimeSpan Delay      { get; }     // default 500 ms

    public RetryPolicy(int maxRetries = 3, TimeSpan? delay = null);
}

The total number of attempts is MaxRetries + 1 (one initial call, plus up to MaxRetries retries). Delay between attempts is Delay × attemptNumber — linearly increasing.

Default policy

Every Agent<T> and AgentBase<T> defaults to new RetryPolicy() — 3 retries, 500 ms base delay. Total wait budget on a fully exhausted run: 0 + 500 + 1000 + 1500 = 3000 ms plus four LLM round-trips.

Tuning

Pass to Agent<T>:

IAgent agent = new Agent<string>(
    "Strict",
    "Picky agent.",
    "Reply with exactly three bullet lines.",
    llm,
    retryPolicy: new RetryPolicy(maxRetries: 5, delay: TimeSpan.FromMilliseconds(200)));

Or override on AgentBase<T>:

protected override RetryPolicy RetryPolicy =>
    new RetryPolicy(maxRetries: 5, delay: TimeSpan.FromMilliseconds(200));

When the agent gives up

If validation never passes, the policy throws AgentOutputException with the last raw response and last error. The admin then turns that into a RunFailedEvent (and the exception bubbles out of RunAsync).

try
{
    var output = await admin.RunAsync(input);
}
catch (AgentOutputException ex)
{
    // last response and last error are in ex.Message
    log.Error(ex, "Agent never produced valid output.");
}

What triggers a retry

The framework retries when ValidateOutput returns a non-null string. The defaults are:

Agent<string> — never retries on content. JSON parsing isn't enforced.
Agent<TPoco> — retries when the response can't be extracted to a JSON object/array, or when deserialisation to TPoco fails.

You can add your own validation by overriding AgentBase<T>.ValidateOutput.

What does not trigger a retry

Network exceptions from the LLM provider. These bubble up directly. Wrap your own retry / circuit breaker around the call site if you want to retry transport failures.
Tool exceptions. A failing tool call is reported via ToolCallFailedEvent and re-thrown — RetryPolicy doesn't intervene. If you want the LLM to recover from a tool error, catch inside the tool and return an error message instead of throwing (see Custom tools (typed)).
BudgetExceededException. Spend cap reached — retrying would burn more budget. The exception escapes the policy.

Observability

Every retry fires LlmCallRetryingEvent on the bus. The default LogicGridLogger writes a [Agent] retry N — error line when ShowRetries = true (the default).

09:14:03.402 [WRN] [a3f2c891] [Strict] retry 1 — Expected exactly 3 bullets, got 4.

The trace records every retry under AgentSpan.RetryAttempts:

foreach (var attempt in trace.AgentSpans.SelectMany(s => s.RetryAttempts))
    Console.WriteLine($"retry {attempt.AttemptNumber}: {attempt.Error}");

Common pitfalls

High MaxRetries + flaky validator. A non-deterministic validator (e.g. one that depends on external state) will retry until exhaustion every time. Keep validation pure.
Zero Delay in production. Most providers rate-limit. A retry storm with no delay can lock you out.
Retry to mask a bad prompt. If you're regularly hitting MaxRetries, fix the prompt or relax the validation — retries cost real tokens.

RetryPolicy reference​

Default policy​

Tuning​

When the agent gives up​

What triggers a retry​

What does not trigger a retry​

Observability​

Common pitfalls​