Skip to main content

RAG pipeline

RagPipeline is the end-to-end retrieval-augmented generation component. It chunks documents, embeds the chunks, stores them, and retrieves the most relevant chunks at query time. Pair it with RagAgent to answer questions over the corpus.

End-to-end flow

Ingest happens once per document; search and RagAgent happen per-query.

Minimum example

using LogicGrid.Core.Agents;
using LogicGrid.Core.Llm;
using LogicGrid.Memory.Embeddings;
using LogicGrid.Memory.VectorStores;
using LogicGrid.Rag;

var embedder = new OllamaEmbeddingClient("nomic-embed-text");
var store = new InMemoryVectorStore();
var pipeline = new RagPipeline(embedder, store);

await pipeline.IngestAsync("./docs/architecture.md");
await pipeline.IngestAsync("./docs/deployment.md");

var llm = LlmClientBase.Ollama("llama3.2");
IAgent agent = new RagAgent(llm, pipeline);

var answer = await agent.RunAsync(
"How do I deploy to production?",
new AgentContext());

Console.WriteLine(answer);

What IngestAsync does

  1. Load. Picks the right document loader for the file extension.
  2. Chunk. Splits the text with the configured chunker (RecursiveTextChunker by default).
  3. Embed. Batches each chunk through the embedder.
  4. Store. Upserts every chunk into the vector store with source
    • chunkIndex metadata.

Returns the number of chunks created. Re-ingestion is idempotent — chunk IDs derive from (source, chunkIndex).

Search directly (no agent)

IList<VectorSearchResult> hits =
await pipeline.SearchAsync("query text", topK: 5);

foreach (var hit in hits)
Console.WriteLine($"[{hit.Score:F2}] {hit.Document.Text}");

Configuration

public RagPipeline(
EmbeddingClientBase embedder,
IVectorStore vectorStore,
IChunkingStrategy? chunker = null,
RagPipelineOptions? options = null)
var pipeline = new RagPipeline(
embedder,
store,
chunker: new RecursiveTextChunker(maxChunkSize: 500, overlap: 50),
options: new RagPipelineOptions
{
BatchSize = 32,
IncludeSourceInMetadata = true,
MaxConcurrentIngest = 4,
});
OptionDefaultEffect
BatchSize32Embedding requests per batch.
IncludeSourceInMetadatatrueAdd source + chunkIndex keys to each chunk's metadata.
MaxConcurrentIngest4Files ingested concurrently.

Persistent storage

Use a persistent vector store and re-ingestion stays idempotent:

var store = new QdrantVectorStore(
collectionName: "docs",
dimensions: 768,
baseUrl: "http://qdrant:6333");

var pipeline = new RagPipeline(embedder, store);
await pipeline.IngestAsync("./docs/architecture.md");

Hybrid retrieval

Wrap your vector store in HybridVectorStore to add BM25 keyword indexing alongside dense retrieval. Most production deployments want this — it rescues queries that hinge on a specific term, code, or proper noun.

using LogicGrid.Memory.Search;

var inner = new InMemoryVectorStore();
var hybrid = new HybridVectorStore(inner);
var pipeline = new RagPipeline(embedder, hybrid);

await pipeline.IngestAsync("./docs/architecture.md");

IList<HybridSearchResult> hits = await pipeline.HybridSearchAsync(
"How do I configure Qdrant?",
topK: 5);

See Hybrid search.

RagAgent

RagAgent is a ready-made agent that calls pipeline.SearchAsync, formats the top chunks as system-prompt context, then answers:

public sealed class RagAgent : AgentBase<string>
{
public RagAgent(LlmClientBase llm, RagPipeline pipeline, int topK = 5);
// overrides RenderSystemPromptAsync to inject retrieved chunks
}

IAgent agent = new RagAgent(llm, pipeline, topK: 5);

If you need a different prompt or different behaviour, write your own RAG agent by deriving from AgentBase<string> and overriding RenderSystemPromptAsync. See Overriding AgentBase<T>.