Claude Code adapter notes
Empirical freeze of what the Claude Code adapter handles in portaconv
v0.1, based on a spike against six real JSONLs pulled from
~/.claude/projects/ on 2026-04-20 (host: Linux/WSL, Claude Code
versions 2.0.76 → 2.1.51 observed).
The adapter trait and schema types get committed in Phase 2 against this contract.
1. Scope
Section titled “1. Scope”- Normative: any record type listed in §3 is handled as specified.
- Any record type not listed is unknown — the adapter must not
silently drop it; it must either land in a per-record
extensions: serde_json::Valuebag or flip a one-line warning. - The schema contract itself (
Conversation/Message/ContentBlock) is locked in the project’s design decisions; this doc does not reopen it. It only defines the adapter’s mapping onto it.
2. Sample inventory
Section titled “2. Sample inventory”Six JSONLs, spread across size, OS encoding, and on-disk shape.
Parser: examples/scan-claude-jsonl.rs (throwaway, deleted before
Phase 2 lands).
Project directory names from the sampled corpus are redacted
(<project-N>) since this repo is public and they reference the
repo author’s own other projects.
| # | Slot | Size | Lines | Encoded dir (prefix) | Filename shape | Claude ver(s) |
|---|---|---|---|---|---|---|
| 1 | Tiny | 708 B | 3 | -mnt-c-…-<project-1> (WSL) | <uuid>.jsonl | (no version field in records) |
| 2 | Small WSL | 100 KB | 42 | -mnt-c-…-<project-2> (WSL) | <uuid>.jsonl | 2.1.39, 2.1.42 |
| 3 | Medium Win | 995 KB | 489 | C--…-<project-3> (Windows) | <uuid>.jsonl | 2.1.20 |
| 4 | Subagent (new) | 11 KB | 6 | -mnt-c-…-<project-4>/<parent-uuid>/subagents/ | agent-acompact-<hash>.jsonl | 2.1.51 |
| 5 | Large | 19.8 MB | 805 | -mnt-c-…-<project-5> (WSL) | <uuid>.jsonl | 2.1.2, 2.1.9 |
| 6 | Agent (old) | 80 KB | 23 | -mnt-c-…-<project-6> (WSL) | agent-<hash>.jsonl | 2.0.76 |
Streaming line-parse completed all six (≈22 MB combined) in 5.5 s wall-time — well under the “seconds not minutes” bar the plan set. Zero parse errors across 1368 records.
3. Record type table
Section titled “3. Record type table”The top-level type discriminator has six values in this sample.
type | Description | Bucket | v0.1 handling | Notes |
|---|---|---|---|---|
user | Conversational message, user role. message.{role, content[]} is the event-stream shape. | Schema | Map to Message { role: User, content: Vec<ContentBlock>, timestamp, extensions }. Copy top-level uuid, parentUuid, requestId, isSidechain, slug, userType into message.extensions. | Includes compact-summary records (isCompactSummary: true) — those are regular user records whose content is the summary text. Keep them. |
assistant | Conversational message, assistant role. Same event-stream shape. | Schema | Same as user. Content includes tool_use and thinking blocks alongside text. | thinkingMetadata on the record goes into message.extensions. |
system | Metadata event (e.g. subtype: "turn_duration" with durationMs). No message.content. | Extensions (conv-level) | Append as-is to Conversation.extensions.system_events[] (append-only array). Do not render to paste output in v0.1. | Only 6–9 per long session. Small. |
file-history-snapshot | Claude Code’s file-tracking metadata: {messageId, snapshot: {trackedFileBackups, timestamp}, isSnapshotUpdate}. | Skip | Drop. Not surfaced in Conversation.extensions. | These files are tracked elsewhere in Claude’s workspace; portaconv is not a file-state tool. |
progress | Live streaming events for subagent runs. Carries data.{message, type: "agent_progress", prompt, agentId} + toolUseID + parentToolUseID. | Skip | Drop. | The final assistant message already carries the consolidated tool_use; the stream is transient. Re-evaluate if v0.1 paste output looks lossy on subagent-heavy sessions. |
queue-operation | Claude Code’s prompt-queue bookkeeping (operation: "enqueue", raw user-typed content). | Skip | Drop. | Internal scheduling state; not part of the rendered dialogue. |
Unknown record types encountered during adapter load must be
accumulated into Conversation.extensions.unknown_records[] with
their raw JSON — this is the resilience hook the locked plan calls
for. Don’t silently ignore.
Types observed in production but not in the original spike sample (first real-corpus run on 2026-04-20 surfaced 150 such records across 5 distinct types in a 457-message session):
type | Payload fields | v0.2 target bucket |
|---|---|---|
permission-mode | permissionMode, sessionId | Extensions — track state changes |
attachment | attachment, entrypoint, cwd, sessionId, plus the standard record-header fields | Extensions until renderer decides how to show |
last-prompt | lastPrompt, sessionId | Skip |
custom-title | customTitle, sessionId | Schema — promote to Conversation.title when present (overrides the first-user-message derivation) |
agent-name | agentName, sessionId | Extensions — relates to subagent naming |
v0.1 lands these as unknown_records via the adapter’s catch-all.
The v0.2 polish pass that promotes them should also re-examine
whether the sample was too small (likely yes — 6 JSONLs out of 2607
in the author’s corpus).
4. Subagent decision
Section titled “4. Subagent decision”Two on-disk shapes observed for subagent sessions:
- Old (pre-2.1.x?):
<project-dir>/agent-<hash>.jsonlat the project root. Example: the v2.0.76 sample #6. Records carryagentIdbut no nested subdir. - New (≥ 2.1.x):
<project-dir>/<parent-session-uuid>/subagents/agent-<name>-<hash>.jsonl. Example: the v2.1.51 sample #4.
Both carry the distinguishing field agentId on every record, and
both omit version information on some records (sample #6 did not
emit requestId on user records, for instance — earlier Claude
versions carried less metadata).
v0.1 decision: option (a) from the plan — ignore subagent JSONLs
in pconv list and pconv dump. Rationale:
- Subagent sessions are transient reasoning loops, not user-facing dialogues the user typically wants to paste-resume.
- The parent session’s
tool_use/tool_resultpair already captures the subagent invocation and its consolidated output in the main stream. - Surfacing subagents would require either a
subagent_offield (breaks the Conversation model shape) or a join (exposes tool-call structure the schema is designed to hide behindContentBlock::ToolUse).
Detection rule for the adapter’s list(): skip any JSONL whose
path contains /subagents/ or whose filename matches
^agent-[a-z0-9_-]+\.jsonl$ at the project root.
Re-evaluate in v0.2+ — if users ask for “show me what the subagent
actually thought,” option (b) (subagent_of in extensions) becomes
the natural add without reshaping the core model.
5. Content-block variants
Section titled “5. Content-block variants”Observed across the six samples:
Block type | Count across samples | v0.1 handling |
|---|---|---|
text | 120 | ContentBlock::Text(String) |
tool_use | 271 | ContentBlock::ToolUse { id, name, input } |
tool_result | 270 | ContentBlock::ToolResult { tool_use_id, output, is_error } |
thinking | 211 | ContentBlock::Text with a one-line prefix noting it was a thinking block. Rendering decision: paste output collapses thinking by default; --include-thinking opts in. |
<string-content> | 36 | When message.content is a bare JSON string instead of an array (older Claude versions do this on some user records), treat as a single Text(String) block. |
<no-content> | 13 | Observed on some assistant records with tool-only turns mid-stream. Normalize to an empty Vec<ContentBlock>; don’t drop the message. |
Not observed in this sample but known to exist in Anthropic’s API:
image, document, redacted_thinking. Adapter must accept them:
image→ContentBlock::Text("<image omitted in v0.1>")with the raw block stashed in theMessage.extensions.original_content[].document→ same passthrough.redacted_thinking→ dropped entirely, not even mentioned in paste output (respects the redaction).
Any other unknown block type goes to Text("<unknown block: X>")
with the raw JSON in message.extensions.original_content[].
6. Path-content observations
Section titled “6. Path-content observations”Feeds the later --rewrite transform design. Counts below are
line-level presence (regex match anywhere in a record line), not
strict content-only substring counts; the real transform needs a
stricter scope (prose text + tool-call args, not JSON field names or
cwd). Noted as an open question in §7.
| Sample | Lines with /mnt/ | Lines with X:\\ |
|---|---|---|
| #1 tiny | 0 | 0 |
| #2 small WSL | 38 (90%) | 4 |
| #3 medium Windows-encoded | 464 (95%) | 57 |
| #4 subagent | 6 (100%) | 0 |
| #5 large WSL | 800 (99%) | 158 |
| #6 old-agent | 23 (100%) | 3 |
Key observation: the Windows-encoded bucket (#3) has 464 lines
mentioning /mnt/ paths but only 57 mentioning C:\\. This is
the content-layer path-poisoning the research doc predicted —
confirmed at scale. A WSL-authored session lives in the
Windows-encoded dir because it was launched from Windows once, but
its content still carries WSL path references from the authoring
side. Pure file-layer sync cannot fix this; --rewrite wsl-to-win
on this session’s output is the right fix.
Single-session WSL-encoded files (#2, #5) also carry small numbers
of C:\ references — 4 and 158. These are probably paste-ins from
user messages or web search results, not authored paths. The
rewrite transform must not assume all encountered paths are
rewritable.
7. Open questions surfaced for Phase 2
Section titled “7. Open questions surfaced for Phase 2”Flag these to the user before types are committed.
- Multi-session JSONLs. Sample #5 (20 MB) contained 2 distinct
sessionIdvalues in one file. Spot-check suggests/compactwrites the continuation under a new session UUID but appends to the same file. v0.1 adapter must decide:- (a)
list()surfaces every distinctsessionIdseen,load(id)filters to records matching that sessionId only; OR - (b)
list()surfaces only the file-level id (first sessionId seen),load()returns the full file concatenated. - Recommendation: (a). Matches the Claude-side mental model
where
/resumepicks by sessionId, not by file.
- (a)
- Compact-summary record inclusion. The
isCompactSummary: trueuser record carries the summarized prior session’s text inmessage.content[0].text. v0.1 treats this as a normal user message. Confirm the user wants this (it can be 10+ KB of prose that wasn’t typed by the human). An--exclude-compact-summariesflag may be worthwhile. - Path-rewrite scope. §6 counted line-level hits as a proxy.
The real transform must target prose inside
textblocks plus tool-callinputfields (e.g.Read.file_path) — not the record’s owncwdfield, which is metadata describing where the session ran. Spec this during Phase 2 before regex tuning. - Oldest Claude Code version supported. Sample #6 (v2.0.76)
was missing
requestIdandslugfields on many records. v2.1.x samples have them. The adapter must tolerate missing fields gracefully. Worth picking an explicit floor (“tested against ≥ 2.0.x; earlier versions best-effort”). progressrecord skip. Confirm: does dropping all 204progressrecords from sample #5 lose anything a paste-recovery user would want? If the subagent ran a 5-step plan that only showed up in progress streams (never in the final tool_result), that context is gone. Spot-check one paste output to validate.
Appendix A — full top-level field set observed
Section titled “Appendix A — full top-level field set observed”Union across all six samples, for reference when building the serde structs:
agentId, compactMetadata, content, cwd, data, durationMs,error, gitBranch, isApiErrorMessage, isCompactSummary, isMeta,isSidechain, isSnapshotUpdate, isVisibleInTranscriptOnly, level,logicalParentUuid, message, messageId, operation, parentToolUseID,parentUuid, permissionMode, requestId, sessionId, slug, snapshot,sourceToolAssistantUUID, subtype, thinkingMetadata, timestamp,todos, toolUseID, toolUseResult, type, userType, uuid, versionCore set the schema claims first-class: uuid, parentUuid,
sessionId, cwd, gitBranch, version, timestamp, type,
message. Everything else goes into extensions until a future
adapter version promotes it.