LLM providers
anyclaude-sdk talks to any OpenAI- or Anthropic-compatible endpoint through a single small interface. Pick a transport client, point it at a base URL, pass a model — that's the whole integration. The agent loop, tools, and streaming work identically across providers.
The LLMClient interface
Everything the agent needs from a model is one method. You pass an LLMClient to query({ llm }); the SDK calls streamChat() once per turn, streams text through onToken, and reads back any tool calls.
interface LLMClient {
streamChat(
messages: ChatMsg[],
opts: {
model?: string // per-call model override
tools?: ToolDef[] // OpenAI-shape function definitions
signal?: AbortSignal // cancellation (Esc / abort)
onToken: (delta: string) => void // streamed text deltas
onTool?: (calls: ToolCall[]) => void // tool calls as they assemble
}
): Promise<StreamResult>
}
type StreamResult = {
text: string // full assistant text for the turn
toolCalls: ToolCall[] // tool calls the model requested
model: string
usage?: Usage // input/output tokens, cache reads
stopReason?: StopReason
}Because it's just an interface, you can also bring your own client — wrap any SDK or HTTP API that can stream text and emit tool calls, and the agent loop won't know the difference.
Bring your own transport, reuse the SDK's codec
If your custom client speaks the OpenAI /chat/completions shape but you own the transport (a proxy, an encrypted RPC, a different URL), don't re-implement the ChatMsg → wire conversion — import it. anyclaude-sdk/llm exports the same mappers createOpenAIClient uses internally, plus consumeSSE and the inline tool-call parser, so your client stays in lockstep with the SDK's content-block handling (text / image / PDF document / tool_result) and never drifts.
import {
toOpenAIMessages, // ChatMsg[] → OpenAI messages[] (text/image/PDF/tool_result)
consumeSSE, // read an SSE body, one data: payload at a time
parseInlineToolCalls, // recover tool calls a weak model emitted as text
type LLMClient, type ChatMsg, type StreamResult, // fully typed, no bare-root import
} from 'anyclaude-sdk/llm'
const myClient: LLMClient = {
async streamChat(messages, opts) {
const body = { model: opts.model, messages: toOpenAIMessages(messages), stream: true,
...(opts.tools?.length ? { tools: opts.tools, tool_choice: 'auto' } : {}) }
const res = await myTransport.post('/chat', seal(body), { signal: opts.signal }) // your own transport
let text = ''
await consumeSSE(res.body, (data) => { /* accumulate deltas, call opts.onToken */ })
return { text, toolCalls: [], model: opts.model ?? '', stopReason: 'end_turn' }
},
}Also exported: toOpenAIMessage (singular), blocksToOpenAIContent, blocksToText, the OpenAIChatMessage type, and the client types ToolCall / ToolDef / StopReason / Usage / ContentBlockParam — all from the browser-clean /llm subpath (no node: modules pulled).
Reliable tool use on cheap / open models 0.5.0
Frontier models (GPT, Claude) emit clean native function-calls. The cheaper and open models people route to — Qwen, DeepSeek, Kimi/Moonshot, GLM, Mistral, Llama via Ollama — frequently don't: they narrate tool calls as text in their own "dialect," or emit malformed/incomplete argument JSON. anyclaude-sdk closes that gap with three layers so the same agent loop works across the long tail, not just the frontier.
1 · Tool-call dialects
When a model skips native tool_calls, the client recovers the call from text. Three pluggable dialects ship in anyclaude-sdk/llm:
| Dialect | Shape | Common in |
|---|---|---|
xml-function | <function=write_file><parameter=path>…</parameter></function> | vLLM, many relays |
hermes | <tool_call>{"name":…,"arguments":{…}}</tool_call> | Qwen, Hermes, Ollama models |
json-fence | a ```json block with {"name":…,"arguments":…} | DeepSeek, Mistral, generic |
Use parseToolCalls(text, { dialects }) directly, or let the client do it automatically. The detector is conservative — ordinary JSON the model prints for the user is not misread as a tool call.
2 · Model profiles (auto-detected)
The client picks per-model defaults — which dialects to try, tool_choice, parallel_tool_calls, and a sane temperature — from the model id. Pass profile to override, or omit it to auto-detect. Explicit options always win.
import { createOpenAIClient } from 'anyclaude-sdk/llm'
// auto-detected: 'qwen' profile → hermes/xml/json dialects, parallel off, temp 0.3
const llm = createOpenAIClient({ baseUrl: 'http://localhost:11434/v1', model: 'qwen2.5-coder:7b' })
// or force a profile / custom dialects:
const llm2 = createOpenAIClient({ baseUrl, model, profile: 'deepseek' })
const llm3 = createOpenAIClient({ baseUrl, model, toolDialects: ['hermes'] })Built-ins: openai, anthropic (native, no fallback) · qwen, deepseek, moonshot (Kimi), zhipu (GLM), mistral, llama · generic (unknown models: full fallback + guidance). toolGuidancePrompt(tools) returns a short scaffolding prompt you can append for the weakest models.
3 · Self-healing argument repair
Before a tool runs, the loop validates the model's arguments against the tool schema. On malformed or incomplete JSON it does not execute with garbage — it returns a corrective is_error tool_result naming exactly what was wrong plus the expected schema, so the model retries with a valid call. On by default; set query({ repairToolCalls: false }) to opt out.
Missing required argument for "write_file": "content". Call it again
including it. Expected: { path: string (required), content: string (required) }Prove it on your endpoints
Don't take our word for it — the repo ships a harness that runs the real loop against any models you list and prints a pass/fail matrix (native vs. with-anyclaude):
npm run build
node scripts/compat-matrix.mjs ./compat.config.json # your endpoints; keys via env:NAMEIt scores each model on "called the tool and used the result." Models that fail native often pass once dialects + repair are on — that delta is the point. A GitHub Action regenerates the table on a schedule; current results live in COMPATIBILITY.md.
Three built-in transport clients
All three are exported from the package root and implement LLMClient. They differ only in the wire format they speak:
createOpenAIClient
POSTs to {baseUrl}/chat/completions. The workhorse — OpenAI, xAI, Groq, Together, OpenRouter, Ollama, Kilo, local servers.
createAnthropicClient
POSTs to {baseUrl}/messages with the Anthropic Messages shape (system extracted, anthropic-version header).
createResponsesClient
POSTs to {baseUrl}/responses (OpenAI Responses API). Sends full history each turn; store optional.
Options
createOpenAIClient(options) — the most featureful:
createOpenAIClient({
apiKey?: string | (() => string | undefined), // static key OR a getter (key rotation)
baseUrl?: string, // default https://api.openai.com/v1
model?: string, // default model id (override per-call via opts.model)
headers?: Record<string,string>, // extra headers (Groq/OpenRouter/Together/xAI)
temperature?: number,
maxTokens?: number, // → max_tokens
reasoningEffort?: string, // → reasoning_effort ('none' | 'low' | 'high' …) for reasoning models
parallelToolCalls?: boolean, // → parallel_tool_calls (only sent when tools are present)
})createAnthropicClient takes apiKey, baseUrl, model, headers, temperature, maxTokens (default 4096), and anthropicVersion (default '2023-06-01'). createResponsesClient takes apiKey, baseUrl, model, headers, temperature, maxTokens (→ max_output_tokens), and store (default false — full history is sent each turn).
All clients stream Server-Sent Events and request stream_options.include_usage, so token usage and cost are reported even mid-stream. Keyless endpoints just omit apiKey — no authorization header is sent.
Provider recipes
Every recipe is just a client + a base URL + a model. Use it as the llm in query().
import { createOpenAIClient } from 'anyclaude-sdk'
const llm = createOpenAIClient({
baseUrl: 'https://api.openai.com/v1',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY,
})import { createAnthropicClient } from 'anyclaude-sdk'
const llm = createAnthropicClient({
baseUrl: 'https://api.anthropic.com/v1',
model: 'claude-sonnet-4-6',
apiKey: process.env.ANTHROPIC_API_KEY,
maxTokens: 8192,
})import { createOpenAIClient } from 'anyclaude-sdk'
// grok-4.x: 0 reasoning tokens (cheaper/faster) + parallel tools +
// a stable conv id for ~6× cheaper prefix-cached tokens.
const llm = createOpenAIClient({
baseUrl: 'https://api.x.ai/v1',
model: 'grok-4.3',
apiKey: process.env.XAI_API_KEY,
temperature: 0.3,
reasoningEffort: 'none',
parallelToolCalls: true,
headers: { 'x-grok-conv-id': 'my-stable-run-id' },
})import { createOpenAIClient } from 'anyclaude-sdk'
const llm = createOpenAIClient({
baseUrl: 'https://ollama.com/v1', // or http://localhost:11434/v1
model: 'qwen3-coder-next',
apiKey: process.env.OLLAMA_API_KEY, // omit for local Ollama
})import { createOpenAIClient } from 'anyclaude-sdk'
const llm = createOpenAIClient({
baseUrl: 'https://api.kilo.ai/api/gateway',
model: 'kilo-auto/free',
apiKey: process.env.KILO_API_KEY,
})import { createOpenAIClient } from 'anyclaude-sdk'
const llm = createOpenAIClient({
baseUrl: 'https://api.groq.com/openai/v1',
model: 'llama-3.3-70b-versatile',
apiKey: process.env.GROQ_API_KEY,
})import { createOpenAIClient } from 'anyclaude-sdk'
const llm = createOpenAIClient({
baseUrl: 'https://api.together.xyz/v1',
model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
apiKey: process.env.TOGETHER_API_KEY,
})import { createOpenAIClient } from 'anyclaude-sdk'
const llm = createOpenAIClient({
baseUrl: 'https://openrouter.ai/api/v1',
model: 'anthropic/claude-sonnet-4',
apiKey: process.env.OPENROUTER_API_KEY,
headers: { 'HTTP-Referer': 'https://your.app', 'X-Title': 'Your App' },
})import { createOpenAIClient } from 'anyclaude-sdk'
// llama.cpp server, LM Studio, vLLM — any OpenAI-compatible local server.
const llm = createOpenAIClient({
baseUrl: 'http://localhost:8080/v1',
model: 'local-model',
// no apiKey needed
})Inline tool-call parsing
Many relays and open models don't emit native function-calling blocks — they write tool calls as inline XML text in the assistant message:
<tool_call>
<function=write_file>
<parameter=path>index.html</parameter>
<parameter=content><!DOCTYPE html> ...</parameter>
</function>
</tool_call>The OpenAI client detects and parses this format automatically (tolerant of missing closing tags and a missing <tool_call> wrapper), then strips the markup out of the user-visible text. The result: tools work even on endpoints with no native tool support — your agent code is identical.
Key rotation
apiKey accepts a function evaluated per request, so you can spread calls across a pool of keys (handy for rate-limited free tiers):
const keys = ['key_a', 'key_b', 'key_c']
let i = 0
const nextKey = () => keys[i++ % keys.length]
const llm = createOpenAIClient({
baseUrl: 'https://ollama.com/v1',
model: 'qwen3-coder-next',
apiKey: nextKey, // round-robin: every request uses the next key
})Each streamChat call resolves the function fresh, so main-loop turns, sub-agents, and background tasks all rotate automatically.
Next
Tools →
Built-in tools, file reading, and your own custom tools.
Sandboxes & FS →
Where the agent's tools actually run.
API reference →
Every query() option and message type.