Deploy anywhere

The SDK is headless and runtime-agnostic — the same code runs in a browser tab, a serverless function, or inside a long-lived sandbox. The only design question is where the agent loop runs, because that's what a function time limit applies to.

Pure browser — no backend

Drive the agent straight from the tab against your LLM endpoint. Use a WebContainer (real Node + shell in the browser) or an in-memory/IndexedDB FS for file-only agents. Nothing server-side.

import { query, createOpenAIClient } from 'anyclaude-sdk'
import { WebContainerWorkspace } from 'anyclaude-sdk'

const workspace = await WebContainerWorkspace.boot()
const llm = createOpenAIClient({ baseUrl: 'https://api.openai.com/v1', model: 'gpt-4o', apiKey: userKey })
for await (const m of query({ prompt, workspace, llm, model: 'gpt-4o', includePartialMessages: true })) render(m)

Recipe 1 — Serverless function (browser + function)

Run query() in a serverless function; stream the SDKMessages back to a browser client. Great for "no infra, free tier." The function's wall-clock cap bounds the agent loop, so checkpoint to a SessionStore if a run can exceed it.

// app/api/agent/route.ts  (Vercel function, streamed)
import { query, createOpenAIClient, MemoryFileSystem, NoopCommandExecutor, composeWorkspace } from 'anyclaude-sdk'

export const maxDuration = 300 // set ~10s below your platform's cap

export async function POST(req: Request) {
  const { prompt, sessionId } = await req.json()
  const workspace = composeWorkspace(new MemoryFileSystem(), new NoopCommandExecutor(), '/work')
  const llm = createOpenAIClient({ baseUrl: process.env.LLM_BASE, model: process.env.LLM_MODEL, apiKey: process.env.LLM_KEY })
  const stream = new ReadableStream({
    async start(c) {
      for await (const m of query({ prompt, workspace, llm, model: process.env.LLM_MODEL })) c.enqueue(new TextEncoder().encode(JSON.stringify(m) + '\n'))
      c.close()
    },
  })
  return new Response(stream, { headers: { 'content-type': 'application/x-ndjson' } })
}

Free-tier function limits

Limits change — set maxDuration ~10s below your platform's configured cap. These are current free-tier defaults:

Platform (free)	Limit model	Suggested budget
Vercel Hobby	wall-clock, ~300s (Fluid Compute)	~290s
Appwrite	wall-clock, 30s	~25s
Netlify	sync funcs ~10s (background up to 15 min)	~8s, or background funcs
Cloudflare Workers	CPU-time (I/O waits are free)	I/O-bound agents rarely hit it; or use Durable Objects / Workflows
AWS Lambda	wall-clock, 15 min	~14m50s

The "survivor" — span the cap shipped in 0.2.0

For runs longer than your function's limit, the SDK checkpoints at a turn boundary just before the deadline (query({ maxDurationMs })), persists to a pluggable SessionStore (Supabase / Neon / Vercel KV / Upstash Redis / IndexedDB / memory), and emits a paused message. The continuation runs with query({ resume: true, continueRun: true }) and the same sessionId. The anyclaude-react client (createAgentClient / createEndpointClient) auto-stitches the streams into one — so a 30s Appwrite function runs an unlimited-length agent, transparently.

Recipe 2 — Sandbox-as-runtime (no time limit)

Run query() inside a long-lived sandbox (E2B, Vercel Sandbox, Daytona) and stream to the browser. The sandbox microVM is the compute, so there's no per-invocation function cap — the agent runs as long as the sandbox is alive (configurable, minutes to hours). Best for heavy / long autonomous runs. No survivor needed.

// runs in a process started inside the E2B sandbox
import { query, LocalSandbox, createOpenAIClient } from 'anyclaude-sdk'

const workspace = new LocalSandbox({ cwd: '/home/user' }) // local to the sandbox
const llm = createOpenAIClient({ baseUrl: process.env.LLM_BASE, model: process.env.LLM_MODEL, apiKey: process.env.LLM_KEY })
for await (const m of query({ prompt, workspace, llm, model: process.env.LLM_MODEL })) emit(m) // → exposed port → browser

Gotcha: if the loop runs in a function and only uses E2B for tool execution, the function's time limit still applies — the sandbox being alive doesn't save the orchestrator. The no-limit benefit requires the loop itself to run in the sandbox.

Recipe 3 — Server brain, browser hands (clientTools)

Keep your LLM key and the agent loop on the server, but execute specific tools in the user's browser — e.g. run bash in a WebContainer so the agent edits and runs real code on the client, no sandbox bill. List those names in clientTools: the run pauses with a client_tool_request, the browser executes them, and you resume with clientToolResults.

// server function
for await (const m of query({ prompt, workspace, llm, model, sessionId, clientTools: ['bash'] })) {
  if (m.type === 'system' && m.subtype === 'client_tool_request') return respond(m.requests) // → browser runs them
  emit(m)
}
// next request — browser sends back results:
query({ workspace, llm, model, sessionId, resume: true, continueRun: true,
        clientTools: ['bash'], clientToolResults })

Full working example: vercel-clienttools.

Recipe 4 — Drop-in Claude Code router 0.6.0

Run Claude Code itself against any OpenAI-compatible model — DeepSeek, Qwen, GLM, Kimi, local Ollama, OpenRouter. The anyclaude-sdk/anthropic-endpoint subpath bridges the Anthropic Messages API to the SDK's LLMClient: stand up an Anthropic-compatible /v1/messages server, point Claude Code at it with ANTHROPIC_BASE_URL, and every turn is served by your chosen model. Unlike a naive proxy, inline tool-call dialects are recovered into proper tool_use blocks (via model profiles), so tool use actually works on cheap models.

// server (Node) — one turn per request; Claude Code runs the loop
import { createOpenAIClient } from 'anyclaude-sdk/llm'
import { anthropicToChat, anthropicSSE } from 'anyclaude-sdk/anthropic-endpoint'

const llm = createOpenAIClient({ baseUrl: 'https://api.deepseek.com/v1', model: 'deepseek-chat', apiKey: process.env.DEEPSEEK_API_KEY })

// POST /v1/messages
const req = anthropicToChat(body)                 // Anthropic request → ChatMsg[] + tools
for await (const evt of anthropicSSE(llm, req, { model: 'deepseek-chat' })) res.write(evt) // → Anthropic SSE

# then point Claude Code at it:
ANTHROPIC_BASE_URL=http://localhost:8787 ANTHROPIC_API_KEY=dummy claude

Runnable, config-driven (default / background / long-context routing): examples/claude-code-router. Also exports streamResultToAnthropicMessage (non-streaming) and anthropicToolsToDefs.

Working examples

Each is a runnable Vite project (Vercel-ready, or browser-only) in the repo's examples/:

Hosting the frontend

The client UI is just static files (or a Vite/Next app). Deploy it anywhere — Vercel, Netlify, Cloudflare Pages, or Puter static hosting. This very docs site is served from Puter.

Deploy anywhere

Pure browser — no backend

Recipe 1 — Serverless function (browser + function)

Free-tier function limits

The "survivor" — span the cap shipped in 0.2.0

Recipe 2 — Sandbox-as-runtime (no time limit)

Recipe 3 — Server brain, browser hands (clientTools)

Recipe 4 — Drop-in Claude Code router 0.6.0

Working examples

browser-ide

vercel-clienttools

vercel-kv-survivor

vercel-supabase-survivor

vercel-indexeddb-survivor

browser-chat

claude-code-router

Hosting the frontend