OpenAI-compatible
Any service that exposes the OpenAI Chat Completions API works through this client. Most modern inference runtimes target the OpenAI shape, so the same configuration covers a wide range of hosted and self-hosted providers.
Verified integrations
| Service | Base URL pattern | Auth | Docs |
|---|---|---|---|
| vLLM | http://localhost:8000/v1 | None or token | docs.vllm.ai |
| LM Studio | http://localhost:1234/v1 | None | lmstudio.ai/docs |
| Text Generation Inference (TEI) | http://localhost:3000/v1 | None | huggingface.co/docs/text-generation-inference |
| DeepSeek | https://api.deepseek.com/v1 | Bearer | platform.deepseek.com |
| Groq | https://api.groq.com/openai/v1 | Bearer | console.groq.com |
| Together AI | https://api.together.xyz/v1 | Bearer | docs.together.ai |
| Mistral La Plateforme | https://api.mistral.ai/v1 | Bearer | docs.mistral.ai |
| Anyscale | https://api.endpoints.anyscale.com/v1 | Bearer | anyscale.com/endpoints |
| Fireworks | https://api.fireworks.ai/inference/v1 | Bearer | fireworks.ai/docs |
Use it
There are three equivalent ways to instantiate the OpenAI-compatible client.
Option 1 — static factory (recommended)
using LogicGrid.Core.Llm;
// vLLM
var vllm = LlmClientBase.Compatible(
baseUrl: "http://localhost:8000/v1",
model: "Qwen/Qwen2.5-7B-Instruct");
// LM Studio
var lmstudio = LlmClientBase.Compatible(
baseUrl: "http://localhost:1234/v1",
model: "lmstudio-community/Llama-3.2-3B-Instruct-GGUF");
// DeepSeek
var deepseek = LlmClientBase.Compatible(
baseUrl: "https://api.deepseek.com/v1",
model: "deepseek-chat",
apiKey: Environment.GetEnvironmentVariable("DEEPSEEK_KEY"));
// Groq
var groq = LlmClientBase.Compatible(
baseUrl: "https://api.groq.com/openai/v1",
model: "llama-3.3-70b-versatile",
apiKey: Environment.GetEnvironmentVariable("GROQ_KEY"));
| Parameter | Type | Default | Notes |
|---|---|---|---|
baseUrl | string | (required) | Service base URL — usually ends in /v1. |
model | string | (required) | Model identifier expected by the runtime. |
apiKey | string? | null | Optional Bearer token. Local runtimes (vLLM, LM Studio, TEI) typically don't need one; hosted services (DeepSeek, Groq, Together) do. |
Option 2 — direct construction
using LogicGrid.Core.Providers;
var llm = new OpenAiCompatibleClient(
baseUrl: "http://localhost:8000/v1",
defaultModel: "Qwen/Qwen2.5-7B-Instruct",
apiKey: null);
| Parameter | Type | Default | Notes |
|---|---|---|---|
baseUrl | string | (required) | Same as the factory's baseUrl. |
defaultModel | string | (required) | The model used when the agent or call site doesn't override it. |
apiKey | string? | null | Same as the factory's apiKey. |
The factory and the constructor produce equivalent clients. Use direct construction when you need an injected HttpClient explained below (for retries, proxies, or testing).
Option 3 — injected HttpClient
using System.Net.Http;
using LogicGrid.Core.Providers;
var http = new HttpClient();
// http.DefaultRequestHeaders.Add("Authorization", "Bearer ...");
var llm = new OpenAiCompatibleClient(
httpClient: http,
baseUrl: "https://api.deepseek.com/v1",
defaultModel: "deepseek-chat");
| Parameter | Type | Default | Notes |
|---|---|---|---|
httpClient | HttpClient | (required) | Pre-configured client. Caller sets any Authorization header. |
baseUrl | string | (required) | Same as above. |
defaultModel | string | (required) | Same as above. |
This overload takes no apiKey — the caller is responsible for setting the Authorization: Bearer ... header on the supplied HttpClient (typically via IHttpClientFactory, a DelegatingHandler, or a test fake). Use it when auth is managed outside the client, or in unit tests with a mocked transport.
Tool calling
Native tool calling depends on both the runtime and the model.
Most OpenAI-compatible backends (vLLM, LM Studio, etc.) implement
the OpenAI tool-call protocol, but the model itself has to actually
emit tool calls. Stay on PromptSchemaStrategy (the default) and
switch to native only after testing the specific runtime + model.
See Tool calling strategy for
the strategy reference and how to switch.
Embeddings
Same two-way pattern. Works with TEI, vLLM running an embedding model, or any service that exposes /v1/embeddings.
Option 1 — static factory (recommended)
using LogicGrid.Memory.Embeddings;
var embedder = EmbeddingClientBase.Compatible(
baseUrl: "http://localhost:3000", // TEI
model: "BAAI/bge-large-en-v1.5");
| Parameter | Type | Default | Notes |
|---|---|---|---|
baseUrl | string | (required) | Service base URL without the trailing /v1 — the client appends it. |
model | string | (required) | Model identifier expected by the runtime. |
dimensions | int | 0 | Vector size. 0 = auto-detect from the first response. |
apiKey | string? | null | Optional Bearer token. |
Option 2 — direct construction
Mirrors OpenAiCompatibleClient exactly: baseUrl first, then defaultModel.
using LogicGrid.Memory.Embeddings;
var embedder = new OpenAiCompatibleEmbeddingClient(
baseUrl: "http://localhost:3000",
defaultModel: "BAAI/bge-large-en-v1.5",
dimensions: 1024,
apiKey: null);
| Parameter | Type | Default | Notes |
|---|---|---|---|
baseUrl | string | (required) | Server base URL without the trailing /v1. |
defaultModel | string | (required) | Embedding model used when the call site doesn't override it. |
dimensions | int | 0 | Vector size. 0 = auto-detect from first response. |
apiKey | string? | null | Optional bearer token. |
A second overload accepts a custom HttpClient for retries, proxies, or DI — the caller is responsible for any required Authorization header:
using System.Net.Http;
using LogicGrid.Memory.Embeddings;
var http = new HttpClient();
// http.DefaultRequestHeaders.Add("Authorization", "Bearer ...");
var embedder = new OpenAiCompatibleEmbeddingClient(
httpClient: http,
baseUrl: "http://localhost:3000",
defaultModel: "BAAI/bge-large-en-v1.5",
dimensions: 1024);
| Parameter | Type | Default | Notes |
|---|---|---|---|
httpClient | HttpClient | (required) | Pre-configured client. Caller sets any Authorization header. |
baseUrl | string | (required) | Same as above. |
defaultModel | string | (required) | Same as above. |
dimensions | int | 0 | Same as above. |
Compatibility caveats
| Issue | What to do |
|---|---|
| Service rejects unknown fields | Some compat layers don't ignore temperature or max_tokens they don't support. Check the runtime's docs. |
No usage field in response | Cost tracking will be zero. Expected on most local runtimes — they don't bill. |
| Tool calls return as plain text | Stay on PromptSchemaStrategy. |