RAG pipeline
RagPipeline is the end-to-end retrieval-augmented generation
component. It chunks documents, embeds the chunks, stores them, and
retrieves the most relevant chunks at query time. Pair it with
RagAgent to answer questions over the corpus.
End-to-end flow
Ingest happens once per document; search and RagAgent happen
per-query.
Minimum example
using LogicGrid.Core.Agents;
using LogicGrid.Core.Llm;
using LogicGrid.Memory.Embeddings;
using LogicGrid.Memory.VectorStores;
using LogicGrid.Rag;
var embedder = new OllamaEmbeddingClient("nomic-embed-text");
var store = new InMemoryVectorStore();
var pipeline = new RagPipeline(embedder, store);
await pipeline.IngestAsync("./docs/architecture.md");
await pipeline.IngestAsync("./docs/deployment.md");
var llm = LlmClientBase.Ollama("llama3.2");
IAgent agent = new RagAgent(llm, pipeline);
var answer = await agent.RunAsync(
"How do I deploy to production?",
new AgentContext());
Console.WriteLine(answer);
What IngestAsync does
- Load. Picks the right document loader for the file extension.
- Chunk. Splits the text with the configured
chunker (
RecursiveTextChunkerby default). - Embed. Batches each chunk through the embedder.
- Store. Upserts every chunk into the vector store with
sourcechunkIndexmetadata.
Returns the number of chunks created. Re-ingestion is idempotent —
chunk IDs derive from (source, chunkIndex).
Search directly (no agent)
IList<VectorSearchResult> hits =
await pipeline.SearchAsync("query text", topK: 5);
foreach (var hit in hits)
Console.WriteLine($"[{hit.Score:F2}] {hit.Document.Text}");
Configuration
public RagPipeline(
EmbeddingClientBase embedder,
IVectorStore vectorStore,
IChunkingStrategy? chunker = null,
RagPipelineOptions? options = null)
var pipeline = new RagPipeline(
embedder,
store,
chunker: new RecursiveTextChunker(maxChunkSize: 500, overlap: 50),
options: new RagPipelineOptions
{
BatchSize = 32,
IncludeSourceInMetadata = true,
MaxConcurrentIngest = 4,
});
| Option | Default | Effect |
|---|---|---|
BatchSize | 32 | Embedding requests per batch. |
IncludeSourceInMetadata | true | Add source + chunkIndex keys to each chunk's metadata. |
MaxConcurrentIngest | 4 | Files ingested concurrently. |
Persistent storage
Use a persistent vector store and re-ingestion stays idempotent:
var store = new QdrantVectorStore(
collectionName: "docs",
dimensions: 768,
baseUrl: "http://qdrant:6333");
var pipeline = new RagPipeline(embedder, store);
await pipeline.IngestAsync("./docs/architecture.md");
Hybrid retrieval
Wrap your vector store in HybridVectorStore to add BM25 keyword
indexing alongside dense retrieval. Most production deployments want
this — it rescues queries that hinge on a specific term, code, or
proper noun.
using LogicGrid.Memory.Search;
var inner = new InMemoryVectorStore();
var hybrid = new HybridVectorStore(inner);
var pipeline = new RagPipeline(embedder, hybrid);
await pipeline.IngestAsync("./docs/architecture.md");
IList<HybridSearchResult> hits = await pipeline.HybridSearchAsync(
"How do I configure Qdrant?",
topK: 5);
See Hybrid search.
RagAgent
RagAgent is a ready-made agent that calls pipeline.SearchAsync,
formats the top chunks as system-prompt context, then answers:
public sealed class RagAgent : AgentBase<string>
{
public RagAgent(LlmClientBase llm, RagPipeline pipeline, int topK = 5);
// overrides RenderSystemPromptAsync to inject retrieved chunks
}
IAgent agent = new RagAgent(llm, pipeline, topK: 5);
If you need a different prompt or different behaviour, write your
own RAG agent by deriving from AgentBase<string> and overriding
RenderSystemPromptAsync. See
Overriding AgentBase<T>.