Session Checkpointing
Pause checkpoints with conversation tails; full automatic resume is planned
How Checkpoints Work
The runtime currently implements session pause/checkpoint creation. A checkpoint captures constraint state, active agent handles, and per-agent conversation tails so a later resume implementation has the data it needs to reconstruct context.
Full automatic resume is still a roadmap item. This page separates what exists today from the intended restore flow so operators do not mistake checkpoint support for complete pause/resume orchestration.
When the engine pauses a session, it serializes a SessionCheckpoint that captures the current state needed for a future restore path.
SessionCheckpoint
| Field | Type | Description |
|---|---|---|
sessionId | string | Identifier of the paused session |
constraintState | object | Budget spent, rounds completed, and time elapsed at pause time |
activeAgents | AgentCheckpoint[] | One checkpoint record per agent that was running |
roundsCompleted | number | Total deliberation rounds completed before pause |
pendingDelegations | object[] | Delegations that were in flight at pause time |
transcriptReplayDepth | number | How many tail events to replay per agent on resume |
createdAt | string | ISO timestamp of when the checkpoint was written |
AgentCheckpoint
| Field | Type | Description |
|---|---|---|
agentId | string | The agent's unique identifier |
parentAgentId | string | undefined | Set for child agents in a hierarchical delegation tree |
depth | number | Nesting depth (0 for top-level perspective agents) |
conversationTail | TranscriptEvent[] | The last N transcript events where this agent was involved |
expertiseSnapshot | string | undefined | Full contents of the agent's scratch pad at pause time |
Note
The checkpoint is stored as a JSON blob associated with the session record. Only one checkpoint exists per session at a time — pausing again overwrites the previous checkpoint.
Per-Agent Conversation Tails
Each agent receives its own filtered slice of the transcript, not a copy of the full session history. The tail contains the last N events where the agent was meaningfully involved.
An event is included in an agent's tail if any of the following are true:
- The event's
agentIdmatches this agent - The event's
childAgentIdmatches this agent - The event is a delegation whose target is this agent
The default tail depth is 50 events. This is configurable via transcriptReplayDepth in the execution profile. Shorter tails reduce prompt size and cost on resume; longer tails preserve more deliberation nuance.
Context Isolation
The filtering ensures agents do not receive unrelated content from other agents' delegations or file operations. A cost analyst agent waking up after a pause sees its own reasoning history, not the legal agent's domain access log.
Pause and Planned Resume Flow
Pausing
Engine.pauseSession()is called — either by the harness when a constraint is hit, by a gate result, or by an explicit API call.- The engine captures current constraint state (budget spent, rounds completed, elapsed time) and collects handles for all active agents.
- Per-agent conversation tails are extracted by filtering the full transcript for each agent's involvement.
- Each agent's expertise scratch pad is read and stored in the corresponding
AgentCheckpoint. - The
SessionCheckpointis serialized and written to the session store. - A
session_pausedtranscript event is emitted with the checkpoint timestamp. - All agent processes are torn down cleanly.
Planned Resuming
The following flow describes the target resume behavior. It is not currently exposed as a complete CLI/API workflow.
- The resume command or API call loads the
SessionCheckpointfor the given session ID. - Agents listed in
activeAgentsare re-spawned using the same agent definitions and model configuration as the original session. - Expertise scratch pads are reloaded from
expertiseSnapshotso each agent's memory is current. - Each agent's
conversationTailis replayed into its context as a structured prompt section:
## Session Context (Resumed)
This session was paused and has now resumed. Below is a summary of recent activity
you were involved in before the pause.
[Filtered transcript tail for this agent] - The orchestrator resumes from the round where the session left off, using
roundsCompletedfrom the checkpoint. - A
session_resumedtranscript event is emitted.
Planned Resume Constraint Behavior
When full resume support lands, constraint handling should differ across the three tracked dimensions:
| Constraint | Behavior on resume |
|---|---|
| Time elapsed | Resets to zero — the clock starts fresh from the moment of resume |
| Budget spent | Continues from checkpoint value — cumulative spend is preserved |
| Rounds | Continues from checkpoint value — no rounds are credited for free |
Design Rationale
A session paused at $4.80 of a $5.00 budget would resume with only $0.20 remaining, but a session paused after 45 minutes of a 60-minute limit would get a fresh 60-minute working window. Time limits are intended to bound individual working sessions, not total calendar time.
Limitations
Automatic resume is not complete.
The runtime can create checkpoints; it does not yet provide an end-to-end CLI/API resume flow that re-spawns every agent and continues orchestration.
Agent context window is not preserved.
Only the conversation tail can be replayed. The full internal context the LLM held at pause time — intermediate reasoning chains, cached attention — is gone. A future replay prompt can reconstruct working context, but very long chains of reasoning within a single round may not fully survive a pause.
Long sessions may lose nuance.
The tail depth is a fixed window. A session with 200 rounds of dense deliberation will have its early reasoning compressed to whatever fits in the tail. For sessions expected to run long, increase transcriptReplayDepth in the execution profile.
Replay depth is configurable per profile.
Set transcriptReplayDepth in the execution profile's constraint block:
constraints:
transcriptReplayDepth: 100 # default: 50 Higher values improve resume fidelity at the cost of larger prompts and higher token spend on the first resumed round.