Sandbox Agent

Why `SandboxAgent`?

Most agents are stateless: each run() starts fresh. But many real workloads aren’t:

Code agents that iteratively edit a file, run tests, fix the file, run tests again.
Research agents that take notes to disk across many turns.
Migration agents that clone a repo, transform files, commit, push.
Multi-day investigations that need to resume right where they left off.

SandboxAgent provides a persistent workspace — an isolated FS + shell + git checkout — that survives across runs and can be snapshotted + restored.

Architecture

                    ┌─────────────────────────────────────┐
                    │           SandboxAgent              │
                    │                                     │
                    │  ┌────────────────────────────┐    │
                    │  │  WorkspaceManifest         │    │
                    │  │   • files (seeded)         │    │
                    │  │   • gitClones              │    │
                    │  │   • env                    │    │
                    │  └────────────────────────────┘    │
                    │                                     │
                    │  ┌────────────────────────────┐    │
                    │  │  Backend                   │    │
                    │  │   "unix-local"  - tempdir  │    │
                    │  │   "docker"      - container│    │
                    │  │   "remote"      - CloudSbx │    │
                    │  └────────────────────────────┘    │
                    └─────────────────────────────────────┘
                                     ▲
                                     │
                       snapshot()  ──┴──  resume(snapshot)
                       (capture entire workspace)

Backends

Backend	Implementation	Best for	Deps
`"unix-local"` (default)	`child_process.spawn` in a system tempdir	Local dev, trusted CI	none
`"docker"`	Bind-mount tempdir into a container	Untrusted code, host isolation	`npm i dockerode`
`"remote"`	Delegates to a `CloudSandbox` (E2B / Daytona)	Production, multi-user, regulated	`npm i @e2b/sdk` or `@daytonaio/sdk`

Quick start

import { SandboxAgent } from "@agentium/core";

const agent = new SandboxAgent({
  backend: "unix-local",
  workspace: {
    env: { OPENAI_API_KEY: process.env.OPENAI_API_KEY! },
    files: [
      { path: "data.csv",    contents: "name,score\nalice,9\nbob,7\n" },
      { path: "src/main.ts", contents: 'console.log("hi");' },
    ],
    gitClones: [
      { repo: "https://github.com/agentiumOS/example-skill.git", path: "vendor/skill", ref: "v1.0.0" },
    ],
  },
});

await agent.start();

const r = await agent.run("console.log(require('fs').readdirSync('.'))", { language: "node" });
console.log(r.output);

await agent.close();

API

Constructor

interface SandboxAgentConfig {
  backend: "unix-local" | "docker" | "remote";
  remote?: CloudSandbox;          // required when backend === "remote"
  workspace?: WorkspaceManifest;
  dockerImage?: string;           // default "node:20-alpine"; only used for backend "docker"
}

interface WorkspaceManifest {
  files?: WorkspaceFile[];        // seeded into the workspace at start()
  gitClones?: { repo: string; path: string; ref?: string }[];
  env?: Record<string, string>;   // exposed to every run() / shell() call
}

interface WorkspaceFile {
  path: string;                   // relative to workspace root
  contents: string;               // utf-8 or base64 (per encoding field)
  encoding?: "utf8" | "base64";   // default "utf8"
}

Methods

`start(): Promise<void>`

Creates the workspace (tempdir or remote session), writes seeded files, runs gitClones. Idempotent — calling twice is a no-op. For backend: "remote", this also calls remote.start() and writes the seeded files into the remote sandbox via remote.writeFile().

`run(code, options?): Promise<SandboxRunResult>`

Execute code in the workspace. The language option picks the interpreter:

"node" (default): node -e "${code}"
"python": python3 -c "${code}"
"shell": passes code directly to /bin/sh -c

For backend: "remote", delegates to remote.run(code, options).

const r = await agent.run("import math; print(math.pi)", {
  language: "python",
  timeoutSeconds: 30,
  env: { LOG_LEVEL: "debug" },
});
console.log(r.output);   // "3.141592653589793\n"
console.log(r.exitCode); // 0
console.log(r.timedOut); // false

If the command exceeds timeoutSeconds, the child is killed with SIGKILL and the result has timedOut: true, exitCode: 124.

`shell(command, options?): Promise<SandboxRunResult>`

Same as run(command, { language: "shell" }) but more explicit:

await agent.shell("git status && ls -la");

`writeFile(path, contents, encoding?): Promise<void>`

Writes a file in the workspace. Creates parent directories automatically. For backend: "remote", delegates to remote.writeFile().

`readFile(path, encoding?): Promise<string | null>`

Reads a file. Returns null if the file doesn’t exist. For binary files, pass encoding: "base64".

`snapshot(): Promise<WorkspaceSnapshot>`

Captures the full workspace state — every file (base64-encoded), the env vars — and returns it as a plain object you can serialize and store.

const snap = await agent.snapshot();
await fs.writeFile("snapshot.json", JSON.stringify(snap));

interface WorkspaceSnapshot {
  takenAt: number;
  files: WorkspaceFile[];
  env: Record<string, string>;
}

For backend: "remote", snapshotting is provider-specific and currently returns an empty file list (use the cloud provider’s native snapshot API instead).

`resume(snapshot): Promise<void>`

Restores a workspace from a snapshot. Effectively a constructor + start() that materializes the files from the snapshot.

const next = new SandboxAgent({ backend: "unix-local" });
await next.resume(snap);
// Files are restored. Previous tempdir is unrelated.

`close(): Promise<void>`

Removes the local tempdir (unix-local / docker) or calls remote.close(). Always call this in a finally block.

`ready: boolean`

true after start() succeeds; false after close(). Read-only.

Compose with `CloudSandbox`

The killer combo is SandboxAgent + CloudSandbox — a persistent workspace in a hardened cloud VM:

import { E2BSandbox, SandboxAgent } from "@agentium/core";

const remote = new E2BSandbox({ template: "data-science" });

const agent = new SandboxAgent({
  backend: "remote",
  remote,
  workspace: {
    files: [{ path: "data.csv", contents: csvData }],
  },
});

await agent.start();
await agent.run("import pandas; print(pandas.read_csv('data.csv').describe())", { language: "python" });
await agent.close();

Compose with `Agent`

SandboxAgent is not itself an LLM-driven agent — it’s a workspace. Plug it into a regular Agent by exposing its methods as tools:

import { Agent, defineTool, openai, SandboxAgent } from "@agentium/core";
import { z } from "zod";

const sandbox = new SandboxAgent({ backend: "unix-local" });
await sandbox.start();

const tools = [
  defineTool({
    name: "shell",
    description: "Run a shell command in the workspace.",
    parameters: z.object({ command: z.string() }),
    execute: async ({ command }) => {
      const r = await sandbox.shell(command);
      return JSON.stringify(r);
    },
  }),
  defineTool({
    name: "writeFile",
    description: "Write a file at the given workspace path.",
    parameters: z.object({ path: z.string(), contents: z.string() }),
    execute: async ({ path, contents }) => {
      await sandbox.writeFile(path, contents);
      return "ok";
    },
  }),
];

const agent = new Agent({ name: "code-bot", model: openai("gpt-4o"), tools });
await agent.run("Create a Node script that prints hello.");

(A higher-level helper createSandboxTools(sandbox) may land in a future release; for now wire them yourself.)

Persistence across processes

A common pattern: a long-running investigation where each user turn is a separate process.

// Turn 1
const agent = new SandboxAgent({ backend: "unix-local", workspace: {...} });
await agent.start();
// ... do work ...
const snap = await agent.snapshot();
await redis.set(`session:${id}:snapshot`, JSON.stringify(snap));
await agent.close();

// Turn 2 (different process)
const saved = JSON.parse(await redis.get(`session:${id}:snapshot`));
const next = new SandboxAgent({ backend: "unix-local" });
await next.resume(saved);
// ... continue work ...

Failure modes

Situation	Behavior
`start()` before workspace deps available	Throws — e.g. `"dockerode is required"` if `backend: "docker"` and SDK missing
`run()` / `shell()` exceeds `timeoutSeconds`	Returns `{ timedOut: true, exitCode: 124, output }`
`readFile` on missing path	Returns `null`
`writeFile` outside workspace tempdir	Path is joined with the workspace root via `path.join` — escapes are blocked by the filesystem itself (you’re inside a tempdir owned by your process)
`close()` called twice	No-op the second time

Getting Started

Agents

Memory

Skills

Handoff

Cost Tracking

Semantic Cache

Eval Framework

Compliance & Audit

Culture System

Webhooks

Capacity Planning

Observability

Voice Agents

Browser Agents

Models

Teams

Workflows

Storage

Knowledge & RAG

Toolkits

MCP (Model Context Protocol)

A2A (Agent-to-Agent)

Edge & IoT

Transport

Queue

Scheduling

Advanced Features

v2.0 Features

Documentation Index

​Sandbox Agent

​Why SandboxAgent?

​Architecture

​Backends

​Quick start

​API

​Constructor

​Methods

​start(): Promise<void>

​run(code, options?): Promise<SandboxRunResult>

​shell(command, options?): Promise<SandboxRunResult>

​writeFile(path, contents, encoding?): Promise<void>

​readFile(path, encoding?): Promise<string | null>

​snapshot(): Promise<WorkspaceSnapshot>

​resume(snapshot): Promise<void>

​close(): Promise<void>

​ready: boolean

​Compose with CloudSandbox

​Compose with Agent

​Persistence across processes

​Failure modes

​See also