Skip to content

Claude Code adapter notes

Empirical freeze of what the Claude Code adapter handles in portaconv v0.1, based on a spike against six real JSONLs pulled from ~/.claude/projects/ on 2026-04-20 (host: Linux/WSL, Claude Code versions 2.0.76 → 2.1.51 observed).

The adapter trait and schema types get committed in Phase 2 against this contract.

  • Normative: any record type listed in §3 is handled as specified.
  • Any record type not listed is unknown — the adapter must not silently drop it; it must either land in a per-record extensions: serde_json::Value bag or flip a one-line warning.
  • The schema contract itself (Conversation / Message / ContentBlock) is locked in the project’s design decisions; this doc does not reopen it. It only defines the adapter’s mapping onto it.

Six JSONLs, spread across size, OS encoding, and on-disk shape. Parser: examples/scan-claude-jsonl.rs (throwaway, deleted before Phase 2 lands).

Project directory names from the sampled corpus are redacted (<project-N>) since this repo is public and they reference the repo author’s own other projects.

#SlotSizeLinesEncoded dir (prefix)Filename shapeClaude ver(s)
1Tiny708 B3-mnt-c-…-<project-1> (WSL)<uuid>.jsonl(no version field in records)
2Small WSL100 KB42-mnt-c-…-<project-2> (WSL)<uuid>.jsonl2.1.39, 2.1.42
3Medium Win995 KB489C--…-<project-3> (Windows)<uuid>.jsonl2.1.20
4Subagent (new)11 KB6-mnt-c-…-<project-4>/<parent-uuid>/subagents/agent-acompact-<hash>.jsonl2.1.51
5Large19.8 MB805-mnt-c-…-<project-5> (WSL)<uuid>.jsonl2.1.2, 2.1.9
6Agent (old)80 KB23-mnt-c-…-<project-6> (WSL)agent-<hash>.jsonl2.0.76

Streaming line-parse completed all six (≈22 MB combined) in 5.5 s wall-time — well under the “seconds not minutes” bar the plan set. Zero parse errors across 1368 records.

The top-level type discriminator has six values in this sample.

typeDescriptionBucketv0.1 handlingNotes
userConversational message, user role. message.{role, content[]} is the event-stream shape.SchemaMap to Message { role: User, content: Vec<ContentBlock>, timestamp, extensions }. Copy top-level uuid, parentUuid, requestId, isSidechain, slug, userType into message.extensions.Includes compact-summary records (isCompactSummary: true) — those are regular user records whose content is the summary text. Keep them.
assistantConversational message, assistant role. Same event-stream shape.SchemaSame as user. Content includes tool_use and thinking blocks alongside text.thinkingMetadata on the record goes into message.extensions.
systemMetadata event (e.g. subtype: "turn_duration" with durationMs). No message.content.Extensions (conv-level)Append as-is to Conversation.extensions.system_events[] (append-only array). Do not render to paste output in v0.1.Only 6–9 per long session. Small.
file-history-snapshotClaude Code’s file-tracking metadata: {messageId, snapshot: {trackedFileBackups, timestamp}, isSnapshotUpdate}.SkipDrop. Not surfaced in Conversation.extensions.These files are tracked elsewhere in Claude’s workspace; portaconv is not a file-state tool.
progressLive streaming events for subagent runs. Carries data.{message, type: "agent_progress", prompt, agentId} + toolUseID + parentToolUseID.SkipDrop.The final assistant message already carries the consolidated tool_use; the stream is transient. Re-evaluate if v0.1 paste output looks lossy on subagent-heavy sessions.
queue-operationClaude Code’s prompt-queue bookkeeping (operation: "enqueue", raw user-typed content).SkipDrop.Internal scheduling state; not part of the rendered dialogue.

Unknown record types encountered during adapter load must be accumulated into Conversation.extensions.unknown_records[] with their raw JSON — this is the resilience hook the locked plan calls for. Don’t silently ignore.

Types observed in production but not in the original spike sample (first real-corpus run on 2026-04-20 surfaced 150 such records across 5 distinct types in a 457-message session):

typePayload fieldsv0.2 target bucket
permission-modepermissionMode, sessionIdExtensions — track state changes
attachmentattachment, entrypoint, cwd, sessionId, plus the standard record-header fieldsExtensions until renderer decides how to show
last-promptlastPrompt, sessionIdSkip
custom-titlecustomTitle, sessionIdSchema — promote to Conversation.title when present (overrides the first-user-message derivation)
agent-nameagentName, sessionIdExtensions — relates to subagent naming

v0.1 lands these as unknown_records via the adapter’s catch-all. The v0.2 polish pass that promotes them should also re-examine whether the sample was too small (likely yes — 6 JSONLs out of 2607 in the author’s corpus).

Two on-disk shapes observed for subagent sessions:

  • Old (pre-2.1.x?): <project-dir>/agent-<hash>.jsonl at the project root. Example: the v2.0.76 sample #6. Records carry agentId but no nested subdir.
  • New (≥ 2.1.x): <project-dir>/<parent-session-uuid>/subagents/agent-<name>-<hash>.jsonl. Example: the v2.1.51 sample #4.

Both carry the distinguishing field agentId on every record, and both omit version information on some records (sample #6 did not emit requestId on user records, for instance — earlier Claude versions carried less metadata).

v0.1 decision: option (a) from the plan — ignore subagent JSONLs in pconv list and pconv dump. Rationale:

  • Subagent sessions are transient reasoning loops, not user-facing dialogues the user typically wants to paste-resume.
  • The parent session’s tool_use / tool_result pair already captures the subagent invocation and its consolidated output in the main stream.
  • Surfacing subagents would require either a subagent_of field (breaks the Conversation model shape) or a join (exposes tool-call structure the schema is designed to hide behind ContentBlock::ToolUse).

Detection rule for the adapter’s list(): skip any JSONL whose path contains /subagents/ or whose filename matches ^agent-[a-z0-9_-]+\.jsonl$ at the project root.

Re-evaluate in v0.2+ — if users ask for “show me what the subagent actually thought,” option (b) (subagent_of in extensions) becomes the natural add without reshaping the core model.

Observed across the six samples:

Block typeCount across samplesv0.1 handling
text120ContentBlock::Text(String)
tool_use271ContentBlock::ToolUse { id, name, input }
tool_result270ContentBlock::ToolResult { tool_use_id, output, is_error }
thinking211ContentBlock::Text with a one-line prefix noting it was a thinking block. Rendering decision: paste output collapses thinking by default; --include-thinking opts in.
<string-content>36When message.content is a bare JSON string instead of an array (older Claude versions do this on some user records), treat as a single Text(String) block.
<no-content>13Observed on some assistant records with tool-only turns mid-stream. Normalize to an empty Vec<ContentBlock>; don’t drop the message.

Not observed in this sample but known to exist in Anthropic’s API: image, document, redacted_thinking. Adapter must accept them:

  • imageContentBlock::Text("<image omitted in v0.1>") with the raw block stashed in the Message.extensions.original_content[].
  • document → same passthrough.
  • redacted_thinking → dropped entirely, not even mentioned in paste output (respects the redaction).

Any other unknown block type goes to Text("<unknown block: X>") with the raw JSON in message.extensions.original_content[].

Feeds the later --rewrite transform design. Counts below are line-level presence (regex match anywhere in a record line), not strict content-only substring counts; the real transform needs a stricter scope (prose text + tool-call args, not JSON field names or cwd). Noted as an open question in §7.

SampleLines with /mnt/Lines with X:\\
#1 tiny00
#2 small WSL38 (90%)4
#3 medium Windows-encoded464 (95%)57
#4 subagent6 (100%)0
#5 large WSL800 (99%)158
#6 old-agent23 (100%)3

Key observation: the Windows-encoded bucket (#3) has 464 lines mentioning /mnt/ paths but only 57 mentioning C:\\. This is the content-layer path-poisoning the research doc predicted — confirmed at scale. A WSL-authored session lives in the Windows-encoded dir because it was launched from Windows once, but its content still carries WSL path references from the authoring side. Pure file-layer sync cannot fix this; --rewrite wsl-to-win on this session’s output is the right fix.

Single-session WSL-encoded files (#2, #5) also carry small numbers of C:\ references — 4 and 158. These are probably paste-ins from user messages or web search results, not authored paths. The rewrite transform must not assume all encountered paths are rewritable.

Flag these to the user before types are committed.

  1. Multi-session JSONLs. Sample #5 (20 MB) contained 2 distinct sessionId values in one file. Spot-check suggests /compact writes the continuation under a new session UUID but appends to the same file. v0.1 adapter must decide:
    • (a) list() surfaces every distinct sessionId seen, load(id) filters to records matching that sessionId only; OR
    • (b) list() surfaces only the file-level id (first sessionId seen), load() returns the full file concatenated.
    • Recommendation: (a). Matches the Claude-side mental model where /resume picks by sessionId, not by file.
  2. Compact-summary record inclusion. The isCompactSummary: true user record carries the summarized prior session’s text in message.content[0].text. v0.1 treats this as a normal user message. Confirm the user wants this (it can be 10+ KB of prose that wasn’t typed by the human). An --exclude-compact-summaries flag may be worthwhile.
  3. Path-rewrite scope. §6 counted line-level hits as a proxy. The real transform must target prose inside text blocks plus tool-call input fields (e.g. Read.file_path) — not the record’s own cwd field, which is metadata describing where the session ran. Spec this during Phase 2 before regex tuning.
  4. Oldest Claude Code version supported. Sample #6 (v2.0.76) was missing requestId and slug fields on many records. v2.1.x samples have them. The adapter must tolerate missing fields gracefully. Worth picking an explicit floor (“tested against ≥ 2.0.x; earlier versions best-effort”).
  5. progress record skip. Confirm: does dropping all 204 progress records from sample #5 lose anything a paste-recovery user would want? If the subagent ran a 5-step plan that only showed up in progress streams (never in the final tool_result), that context is gone. Spot-check one paste output to validate.

Appendix A — full top-level field set observed

Section titled “Appendix A — full top-level field set observed”

Union across all six samples, for reference when building the serde structs:

agentId, compactMetadata, content, cwd, data, durationMs,
error, gitBranch, isApiErrorMessage, isCompactSummary, isMeta,
isSidechain, isSnapshotUpdate, isVisibleInTranscriptOnly, level,
logicalParentUuid, message, messageId, operation, parentToolUseID,
parentUuid, permissionMode, requestId, sessionId, slug, snapshot,
sourceToolAssistantUUID, subtype, thinkingMetadata, timestamp,
todos, toolUseID, toolUseResult, type, userType, uuid, version

Core set the schema claims first-class: uuid, parentUuid, sessionId, cwd, gitBranch, version, timestamp, type, message. Everything else goes into extensions until a future adapter version promotes it.