LLM providers

anyclaude-sdk talks to any OpenAI- or Anthropic-compatible endpoint through a single small interface. Pick a transport client, point it at a base URL, pass a model — that's the whole integration. The agent loop, tools, and streaming work identically across providers.

The `LLMClient` interface

Everything the agent needs from a model is one method. You pass an LLMClient to query({ llm }); the SDK calls streamChat() once per turn, streams text through onToken, and reads back any tool calls.

interface LLMClient {
  streamChat(
    messages: ChatMsg[],
    opts: {
      model?: string                       // per-call model override
      tools?: ToolDef[]                    // OpenAI-shape function definitions
      signal?: AbortSignal                 // cancellation (Esc / abort)
      onToken: (delta: string) => void     // streamed text deltas
      onTool?: (calls: ToolCall[]) => void  // tool calls as they assemble
    }
  ): Promise<StreamResult>
}

type StreamResult = {
  text: string            // full assistant text for the turn
  toolCalls: ToolCall[]   // tool calls the model requested
  model: string
  usage?: Usage           // input/output tokens, cache reads
  stopReason?: StopReason
}

Because it's just an interface, you can also bring your own client — wrap any SDK or HTTP API that can stream text and emit tool calls, and the agent loop won't know the difference.

Bring your own transport, reuse the SDK's codec

If your custom client speaks the OpenAI /chat/completions shape but you own the transport (a proxy, an encrypted RPC, a different URL), don't re-implement the ChatMsg → wire conversion — import it. anyclaude-sdk/llm exports the same mappers createOpenAIClient uses internally, plus consumeSSE and the inline tool-call parser, so your client stays in lockstep with the SDK's content-block handling (text / image / PDF document / tool_result) and never drifts.

import {
  toOpenAIMessages,        // ChatMsg[] → OpenAI messages[] (text/image/PDF/tool_result)
  consumeSSE,              // read an SSE body, one data: payload at a time
  parseInlineToolCalls,    // recover tool calls a weak model emitted as text
  type LLMClient, type ChatMsg, type StreamResult, // fully typed, no bare-root import
} from 'anyclaude-sdk/llm'

const myClient: LLMClient = {
  async streamChat(messages, opts) {
    const body = { model: opts.model, messages: toOpenAIMessages(messages), stream: true,
                   ...(opts.tools?.length ? { tools: opts.tools, tool_choice: 'auto' } : {}) }
    const res = await myTransport.post('/chat', seal(body), { signal: opts.signal }) // your own transport
    let text = ''
    await consumeSSE(res.body, (data) => { /* accumulate deltas, call opts.onToken */ })
    return { text, toolCalls: [], model: opts.model ?? '', stopReason: 'end_turn' }
  },
}

Also exported: toOpenAIMessage (singular), blocksToOpenAIContent, blocksToText, the OpenAIChatMessage type, and the client types ToolCall / ToolDef / StopReason / Usage / ContentBlockParam — all from the browser-clean /llm subpath (no node: modules pulled).

Reliable tool use on cheap / open models 0.5.0

Frontier models (GPT, Claude) emit clean native function-calls. The cheaper and open models people route to — Qwen, DeepSeek, Kimi/Moonshot, GLM, Mistral, Llama via Ollama — frequently don't: they narrate tool calls as text in their own "dialect," or emit malformed/incomplete argument JSON. anyclaude-sdk closes that gap with three layers so the same agent loop works across the long tail, not just the frontier.

1 · Tool-call dialects

When a model skips native tool_calls, the client recovers the call from text. Three pluggable dialects ship in anyclaude-sdk/llm:

Dialect	Shape	Common in
`xml-function`	`<function=write_file><parameter=path>…</parameter></function>`	vLLM, many relays
`hermes`	`<tool_call>{"name":…,"arguments":{…}}</tool_call>`	Qwen, Hermes, Ollama models
`json-fence`	a ```json block with `{"name":…,"arguments":…}`	DeepSeek, Mistral, generic

Use parseToolCalls(text, { dialects }) directly, or let the client do it automatically. The detector is conservative — ordinary JSON the model prints for the user is not misread as a tool call.

2 · Model profiles (auto-detected)

The client picks per-model defaults — which dialects to try, tool_choice, parallel_tool_calls, and a sane temperature — from the model id. Pass profile to override, or omit it to auto-detect. Explicit options always win.

import { createOpenAIClient } from 'anyclaude-sdk/llm'

// auto-detected: 'qwen' profile → hermes/xml/json dialects, parallel off, temp 0.3
const llm = createOpenAIClient({ baseUrl: 'http://localhost:11434/v1', model: 'qwen2.5-coder:7b' })

// or force a profile / custom dialects:
const llm2 = createOpenAIClient({ baseUrl, model, profile: 'deepseek' })
const llm3 = createOpenAIClient({ baseUrl, model, toolDialects: ['hermes'] })

Built-ins: openai, anthropic (native, no fallback) · qwen, deepseek, moonshot (Kimi), zhipu (GLM), mistral, llama · generic (unknown models: full fallback + guidance). toolGuidancePrompt(tools) returns a short scaffolding prompt you can append for the weakest models.

3 · Self-healing argument repair

Before a tool runs, the loop validates the model's arguments against the tool schema. On malformed or incomplete JSON it does not execute with garbage — it returns a corrective is_error tool_result naming exactly what was wrong plus the expected schema, so the model retries with a valid call. On by default; set query({ repairToolCalls: false }) to opt out.

Missing required argument for "write_file": "content". Call it again
including it. Expected: { path: string (required), content: string (required) }

Prove it on your endpoints

Don't take our word for it — the repo ships a harness that runs the real loop against any models you list and prints a pass/fail matrix (native vs. with-anyclaude):

npm run build
node scripts/compat-matrix.mjs ./compat.config.json   # your endpoints; keys via env:NAME

It scores each model on "called the tool and used the result." Models that fail native often pass once dialects + repair are on — that delta is the point. A GitHub Action regenerates the table on a schedule; current results live in COMPATIBILITY.md.

Three built-in transport clients

All three are exported from the package root and implement LLMClient. They differ only in the wire format they speak:

`createOpenAIClient`

POSTs to {baseUrl}/chat/completions. The workhorse — OpenAI, xAI, Groq, Together, OpenRouter, Ollama, Kilo, local servers.

`createAnthropicClient`

POSTs to {baseUrl}/messages with the Anthropic Messages shape (system extracted, anthropic-version header).

`createResponsesClient`

POSTs to {baseUrl}/responses (OpenAI Responses API). Sends full history each turn; store optional.

Options

createOpenAIClient(options) — the most featureful:

createOpenAIClient({
  apiKey?: string | (() => string | undefined), // static key OR a getter (key rotation)
  baseUrl?: string,            // default https://api.openai.com/v1
  model?: string,              // default model id (override per-call via opts.model)
  headers?: Record<string,string>,  // extra headers (Groq/OpenRouter/Together/xAI)
  temperature?: number,
  maxTokens?: number,          // → max_tokens
  reasoningEffort?: string,    // → reasoning_effort ('none' | 'low' | 'high' …) for reasoning models
  parallelToolCalls?: boolean, // → parallel_tool_calls (only sent when tools are present)
})

createAnthropicClient takes apiKey, baseUrl, model, headers, temperature, maxTokens (default 4096), and anthropicVersion (default '2023-06-01'). createResponsesClient takes apiKey, baseUrl, model, headers, temperature, maxTokens (→ max_output_tokens), and store (default false — full history is sent each turn).

All clients stream Server-Sent Events and request stream_options.include_usage, so token usage and cost are reported even mid-stream. Keyless endpoints just omit apiKey — no authorization header is sent.

Provider recipes

Every recipe is just a client + a base URL + a model. Use it as the llm in query().

import { createOpenAIClient } from 'anyclaude-sdk'

const llm = createOpenAIClient({
  baseUrl: 'https://api.openai.com/v1',
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
})

import { createAnthropicClient } from 'anyclaude-sdk'

const llm = createAnthropicClient({
  baseUrl: 'https://api.anthropic.com/v1',
  model: 'claude-sonnet-4-6',
  apiKey: process.env.ANTHROPIC_API_KEY,
  maxTokens: 8192,
})

import { createOpenAIClient } from 'anyclaude-sdk'

// grok-4.x: 0 reasoning tokens (cheaper/faster) + parallel tools +
// a stable conv id for ~6× cheaper prefix-cached tokens.
const llm = createOpenAIClient({
  baseUrl: 'https://api.x.ai/v1',
  model: 'grok-4.3',
  apiKey: process.env.XAI_API_KEY,
  temperature: 0.3,
  reasoningEffort: 'none',
  parallelToolCalls: true,
  headers: { 'x-grok-conv-id': 'my-stable-run-id' },
})

import { createOpenAIClient } from 'anyclaude-sdk'

const llm = createOpenAIClient({
  baseUrl: 'https://ollama.com/v1',   // or http://localhost:11434/v1
  model: 'qwen3-coder-next',
  apiKey: process.env.OLLAMA_API_KEY, // omit for local Ollama
})

import { createOpenAIClient } from 'anyclaude-sdk'

const llm = createOpenAIClient({
  baseUrl: 'https://api.kilo.ai/api/gateway',
  model: 'kilo-auto/free',
  apiKey: process.env.KILO_API_KEY,
})

import { createOpenAIClient } from 'anyclaude-sdk'

const llm = createOpenAIClient({
  baseUrl: 'https://api.groq.com/openai/v1',
  model: 'llama-3.3-70b-versatile',
  apiKey: process.env.GROQ_API_KEY,
})

import { createOpenAIClient } from 'anyclaude-sdk'

const llm = createOpenAIClient({
  baseUrl: 'https://api.together.xyz/v1',
  model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
  apiKey: process.env.TOGETHER_API_KEY,
})

import { createOpenAIClient } from 'anyclaude-sdk'

const llm = createOpenAIClient({
  baseUrl: 'https://openrouter.ai/api/v1',
  model: 'anthropic/claude-sonnet-4',
  apiKey: process.env.OPENROUTER_API_KEY,
  headers: { 'HTTP-Referer': 'https://your.app', 'X-Title': 'Your App' },
})

import { createOpenAIClient } from 'anyclaude-sdk'

// llama.cpp server, LM Studio, vLLM — any OpenAI-compatible local server.
const llm = createOpenAIClient({
  baseUrl: 'http://localhost:8080/v1',
  model: 'local-model',
  // no apiKey needed
})

Inline tool-call parsing

Many relays and open models don't emit native function-calling blocks — they write tool calls as inline XML text in the assistant message:

<tool_call>
<function=write_file>
<parameter=path>index.html</parameter>
<parameter=content><!DOCTYPE html> ...</parameter>
</function>
</tool_call>

The OpenAI client detects and parses this format automatically (tolerant of missing closing tags and a missing <tool_call> wrapper), then strips the markup out of the user-visible text. The result: tools work even on endpoints with no native tool support — your agent code is identical.

Key rotation

apiKey accepts a function evaluated per request, so you can spread calls across a pool of keys (handy for rate-limited free tiers):

const keys = ['key_a', 'key_b', 'key_c']
let i = 0
const nextKey = () => keys[i++ % keys.length]

const llm = createOpenAIClient({
  baseUrl: 'https://ollama.com/v1',
  model: 'qwen3-coder-next',
  apiKey: nextKey,   // round-robin: every request uses the next key
})

Each streamChat call resolves the function fresh, so main-loop turns, sub-agents, and background tasks all rotate automatically.

LLM providers

The `LLMClient` interface

Bring your own transport, reuse the SDK's codec

Reliable tool use on cheap / open models 0.5.0

1 · Tool-call dialects

2 · Model profiles (auto-detected)

3 · Self-healing argument repair

Prove it on your endpoints

Three built-in transport clients

`createOpenAIClient`

`createAnthropicClient`

`createResponsesClient`

Options

Provider recipes

Inline tool-call parsing

Key rotation

Next

Tools →

Sandboxes & FS →

API reference →

LLM providers

The LLMClient interface

Bring your own transport, reuse the SDK's codec

Reliable tool use on cheap / open models 0.5.0

1 · Tool-call dialects

2 · Model profiles (auto-detected)

3 · Self-healing argument repair

Prove it on your endpoints

Three built-in transport clients

createOpenAIClient

createAnthropicClient

createResponsesClient

Options

Provider recipes

Inline tool-call parsing

Key rotation

Next

Tools →

Sandboxes & FS →

API reference →

The `LLMClient` interface

`createOpenAIClient`

`createAnthropicClient`

`createResponsesClient`