Advanced

Sessions, hooks & more

The capabilities that turn the agent loop into a product: persistent sessions, fine-grained permissions and plan mode, lifecycle hooks, skills and memory, a live message queue, slash commands, cost accounting, and auto-compaction. Every one is an opt-in option on query().

Sessions & resume

Persist the full transcript per turn and pick it up later. The built-in SessionStore writes session metadata + the LLM transcript to IndexedDB (via Dexie), so list / resume / fork / rename survive reloads. Pass a store, a stable sessionId, and resume: true.

import { query, SessionStore } from 'anyclaude-sdk'

const store = new SessionStore()                 // IndexedDB-backed (browser)
const sessionId = 'project-alpha'

for await (const m of query({
  prompt: 'continue where we left off',
  workspace, llm,
  sessionStore: store,
  sessionId,
  resume: true,                                  // reload this session's transcript first
})) { /* … */ }

// Manage sessions
const sessions = await store.list()              // SessionMeta[] (id, messageCount, updatedAt…)
const loaded   = await store.load(sessionId)     // ChatMsg[] | null
await store.rename(sessionId, 'Alpha build')
await store.fork(sessionId, 'project-alpha-2')
await store.remove(sessionId)

A transcript is an array of ChatMsg ({ role, content }, where content is a string or an array of content blocks — text / tool_use / tool_result / image / document). SessionStore is an interface: implement load/save/list against any backend (a Node file store, Postgres, KV, Redis) to persist outside the browser — see Deploy.

Permissions

Gate every tool call. Two ways, often combined: declarative rules and an imperative callback.

Permission rules

permissionRules takes allow / deny / ask lists of rule strings matched against tool calls (by tool name, with optional argument patterns). Deny always wins — a matching deny rule blocks the call even under bypassPermissions. A built-in dangerous-bash backstop flags destructive shell commands.

for await (const m of query({
  prompt, workspace, llm,
  permissionMode: 'default',
  permissionRules: {
    allow: ['read_file', 'glob', 'grep', 'bash(npm run test)'],
    deny:  ['bash(rm *)', 'delete_file'],          // deny wins, always
    ask:   ['write_file', 'edit_file'],            // prompt the user
  },
  onPermissionAsk: async (tool, input) => confirm(`Allow ${tool}?`),
})) { /* … */ }

Permission modes

default

Rules + ask prompts apply.

acceptEdits

Auto-allow file edits; still gate the rest.

plan

Research only — mutating tools blocked (see Plan mode).

bypassPermissions

Allow everything except explicit deny rules.

dontAsk

Apply rules but never prompt (deny on would-ask).

For full control, supply canUseTool(name, input) directly — it runs before each call and returns an allow/deny decision.

Plan mode

In plan mode the agent researches and proposes, but cannot mutate. The enter_plan_mode / exit_plan_mode tools toggle it, and while active all mutating tools (write/edit/delete/bash-writes) are denied by the gate. Start in plan mode with permissionMode: 'plan', or let the model enter it on demand.

import { query, PLAN_MODE_TOOLS, enterPlanMode, exitPlanMode } from 'anyclaude-sdk'

query({ prompt: 'design the refactor', workspace, llm, permissionMode: 'plan' })
// The agent calls exit_plan_mode with a finalized plan when ready to act.

Lifecycle hooks

Observe and intervene at every stage. hooks maps an event name to an array of callbacks; a PreToolUse hook can even block or inject context. All 15 events:

PreToolUse

Before a tool runs — may block or add context.

PostToolUse

After a tool returns successfully.

PostToolUseFailure

After a tool errors.

PermissionRequest

A tool call is awaiting a decision.

PermissionDenied

A call was denied by the gate.

FileChanged

A file was written/edited/deleted.

UserPromptSubmit

A user turn entered the loop.

Notification

An agent notification fired.

Stop

The turn finished.

SessionStart / SessionEnd

Run lifecycle boundaries.

SubagentStart / SubagentStop

A sub-agent began / finished.

PreCompact / PostCompact

Around transcript compaction.

query({
  prompt, workspace, llm,
  hooks: {
    PreToolUse: [(input) => { console.log('about to run', input.tool_name) }],
    PostCompact: [() => { console.log('context compacted — token budget reset') }],
    Stop: [(input) => { console.log('done:', input.last_assistant_message) }],
  },
})

Skills

Skills are Markdown prompt templates the agent can invoke as slash commands. Set skills: true to auto-load .claude/skills/*.md from the workspace, or pass your own Skill[].

import { query, loadSkillsFromFs } from 'anyclaude-sdk'

// auto-load from .claude/skills/*.md
query({ prompt, workspace, llm, skills: true })

// or load explicitly / supply your own
const skills = await loadSkillsFromFs(workspace)
query({ prompt, workspace, llm, skills })

Each loaded skill ({ name, description, argumentHint, prompt }) appears as /name and expands its template into the conversation.

Memory

A persistent MemoryStore whose entries are rendered into the system prompt each run and are editable by the agent via memory tools (memory_write / memory_list / memory_delete). Use it for durable facts, preferences, and project context that should survive across sessions.

import { query, MemoryStore } from 'anyclaude-sdk'

const memory = new MemoryStore(/* options */)
query({ prompt, workspace, llm, memory })   // entries load into the prompt; tools can edit them

Message queue

Type while the agent works. Push follow-up messages into a live run with MessageQueue; the loop drains them one per turn boundary — each queued message is injected as a user turn before the next LLM call, and a turn that would otherwise end keeps going while the queue is non-empty.

import { query, MessageQueue } from 'anyclaude-sdk'

const queue = new MessageQueue()
const run = query({ prompt: 'build the landing page', workspace, llm, messageQueue: queue })

// …meanwhile, the user types a follow-up; it's delivered at the next turn boundary
const id = queue.push('also add a dark mode toggle')   // returns a stable id

// changed your mind before it's drained? cancel just that one (e.g. a per-pill ✕):
queue.remove(id)                                       // → true if still pending
queue.onChange((size) => renderPills(queue.list()))    // list() items carry { id, content, at }

for await (const m of run) { /* render */ }

push returns a stable id; remove(id) cancels a single still-pending message (returns false for unknown / already-drained ids). list() snapshots pending items as { id, content, at } for rendering removable chips — alongside shift / clear / peek / size.

Slash commands

Built-in commands run inside the loop when a user turn starts with /; their output arrives as a system message with subtype: 'local_command_output'. Merge your own with commands.

BUILTIN_COMMANDS includes: /help, /clear, /compact, /tools, /cost, /model, /sessions, /resume, /rename, /diff, /tasks, /board, /context, /files, /agents, /mcp, /permissions, /memory, /export, /review, /init.

import { query, BUILTIN_COMMANDS } from 'anyclaude-sdk'

query({
  prompt: '/cost',
  workspace, llm,
  commands: [
    { name: 'deploy', description: 'Ship to prod', run: async (ctx) => ({ systemText: 'Deploying…' }) },
  ],
})
// handle the result:
//   if (m.type === 'system' && m.subtype === 'local_command_output') render(m.content)

Cost & usage

Every run ends with a result message carrying usage (input / output / cache-read / cache-creation tokens) and total_cost_usd, computed from a built-in pricing table covering the Claude, GPT, and Grok families (and free/local endpoints at $0).

for await (const m of query({ prompt, workspace, llm })) {
  if (m.type === 'result') {
    console.log(m.usage)            // { input_tokens, output_tokens, cache_read_input_tokens, … }
    console.log('$', m.total_cost_usd)
  }
}

Auto-compaction

Long runs can summarize themselves before hitting the context window. Enable autoCompact; tune the window and trigger point. It runs PreCompact/PostCompact hooks and emits compact_boundary system messages around the work.

query({
  prompt, workspace, llm,
  autoCompact: true,
  contextLimit: 200_000,      // tokens; defaults to the model's window
  compactThreshold: 0.8,      // compact when the transcript passes 80% of the limit
})

Live compaction marker 0.6.2

Summarization can take seconds, so auto-compaction emits the boundary twicestatus: 'start' before it begins (show a live "compacting…" shimmer) and status: 'end' after (swap to a retroactive marker with the token reduction). pre_tokens / post_tokens give the before/after estimates.

for await (const m of run) {
  if (m.type === 'system' && m.subtype === 'compact_boundary') {
    const { status, pre_tokens, post_tokens } = m.compact_metadata
    if (status === 'start') showCompactingShimmer()                    // live, during the LLM call
    else hideShimmer(`compacted ${pre_tokens} → ${post_tokens} tokens`) // status 'end' (or absent on old streams)
  }
}

Compaction is a turn-boundary operation, so it never interrupts a tool mid-flight. Treat an absent status as 'end' (pre-0.6.2 streams only emitted the post-compaction boundary). Pair it with a persistent session for unbounded multi-turn runs.

Token & latency tuning 0.8–0.9

Opt-in knobs to make the loop cheaper and faster — biggest impact on weak / uncached models (where every fixed token is paid each turn):

query({
  prompt, workspace, llm,
  systemPromptPreset: 'lean',       // ~70% shorter built-in prompt (paid every turn on uncached models)
  keepToolResults: 6,               // context editing: stub tool_results older than the last 6 — caps transcript growth
  parallelToolExecution: true,      // run a turn's read-only tool calls concurrently (~2× on multi-read turns)
  deferredTools: ['stripe_charge', 'supabase_query', /* … */], // keep niche tools out of the payload
})
KnobEffect
systemPromptPreset: 'lean'Swaps the full Claude-Code contract for a compact one. Per-turn savings on uncached endpoints.
keepToolResults: NKeeps the most recent N tool_result messages verbatim; older ones become a stub. Long-run growth cap.
parallelToolExecutionConcurrently executes a turn's calls when all are read-only / parallelSafe server tools (mutating/bash/delegated stay serial; order preserved).
deferredTools / defineTool({ defer })Large tool pools: register all, send a lean core; tool_search arms a tool on demand. Register 35, send ~10.

Mark a custom read tool concurrency-safe with defineTool({ parallelSafe: true }). All four are correctness-preserving — they trim cost/latency, not behavior.

Telemetry 0.7.0

The SDK can emit anonymous, aggregate usage telemetry — one run event per query() carrying only sdk_version, runtime, a random non-identifying install id, a coarse model_family bucket, and feature booleans. It never sends repo URLs, project names, paths, source, prompts, tool args, LLM responses, API keys, or endpoints — track() whitelists keys and value types and drops everything else.

It's off with one switch. The default collector is an aggregate-only Puter Worker (https://anyclaude-telemetry.puter.work); repoint or disable it:

ANYCLAUDE_TELEMETRY=0      # or DO_NOT_TRACK=1; auto-off under CI
ANYCLAUDE_TELEMETRY_URL=https://your-collector/   # send to your own collector instead
query({ /* … */, disableTelemetry: true })                 // off for this run
query({ /* … */, telemetry: { url: 'https://your-collector/' } })  // or point it explicitly
// browser opt-out: localStorage.setItem('anyclaude_telemetry','0')

Full disclosure of every field in TELEMETRY.md; a reference aggregate-only collector (Cloudflare/Vercel/Deno/Puter) is in examples/telemetry-collector. Opt-out, anonymous, DO_NOT_TRACK-aware — the same model Next.js / Astro use.