Stale sessions-index.json after ungraceful WSL shutdown
Why this is captured
Section titled “Why this is captured”First encounter with this failure mode. /resume couldn’t find an actively-in-use session that had four months of continuous history. Not a portaconv case — that tool is for cross-OS content poisoning and moved-folder recovery, neither of which applied. Different bug, same surface complaint, worth its own lineage.
The evidence
Section titled “The evidence”Single project, single encoded bucket (WSL only), single session UUID:
| File | mtime | Size | What it should describe |
|---|---|---|---|
97d7b58b-…jsonl | 2026-04-23 02:22 | 59 MB | Active session — continuous append activity from 2025-12-10 through today |
sessions-index.json | 2026-01-27 16:25 | 2.8 KB | Describes the session state as of 2026-01-27 |
The index’s entry for the active session:
{ "sessionId": "97d7b58b-09f5-41ea-a59f-a12f230083b0", "firstPrompt": "No prompt", "summary": "OpenCode permissions setup and Terminal Workspaces bug fix", "messageCount": 9, "created": "2026-01-09T15:38:59.766Z", "modified": "2026-01-09T15:48:27.815Z"}Frozen in January. The jsonl itself meanwhile grew from ~100 KB to 59 MB across four months of real work. The /resume picker reads the index to build its list, so from the user’s perspective the active session looked like a trivial nine-message January stub and got outranked (or misrendered) by other entries.
No path issue. No OS-crossing. Just metadata that stopped tracking reality.
Hypothesis
Section titled “Hypothesis”Claude Code rewrites sessions-index.json on graceful shutdown of the CLI — clean exit flushes the cache. Ungraceful closures (window-close of the WSL terminal, wsl --shutdown, machine suspend without cleanup) kill the process before the index rewrite. The jsonl-append path appears to be per-turn fsync, so conversation content survives fine, but the index doesn’t catch up.
Repeat a few times across a long-running, repeatedly-resumed session and the index drifts arbitrarily far from the jsonls.
flowchart TD Turn["Conversation turn"] --> Append["jsonl: append + fsync<br/>(per-turn, durable)"] Append --> Cache["sessions-index.json:<br/>updated in-memory cache"] Cache --> Exit{"How does the<br/>process end?"} Exit -- "/exit, clean SIGTERM" --> Flush["Flush cache → write index<br/>✅ index matches jsonl"] Exit -- "window close, wsl --shutdown,<br/>SIGKILL, suspend" --> Drop["Process dies before flush<br/>❌ index frozen at last write"] Drop --> Drift["Next launch: picker reads<br/>stale index, jsonl keeps growing"] Drift -.repeat over months.-> DriftEven some clean exits drop entries (#41946) — the write path also re-reads cached state and overwrites rather than merging, so the drift compounds across both forced kills and certain graceful closures.
This is systemic, not a one-off
Section titled “This is systemic, not a one-off”Same-day scan of ~/.claude/projects/*/ showed 14 other project directories with sessions-index.json at least 7 days behind the newest jsonl in the same dir. Lag distribution:
| Lag (days) | Project (tail of encoded name) |
|---|---|
| 93 | -mnt-d-MEDIA |
| 90 | home-assistant-projects |
| 84 | …tasknotes…obsidian-plugins-tasknotes (duplicate-encoded) |
| 80 | …tasknotes-enhancements |
| 79 | cynario |
| 66 | retake-studio |
| 65 | 4-VAULTS |
| 57 | …terminal-workspaces |
| 57 | b-g-33---Home-Lab--Home-Server |
| 46 | b-g-15---Family-Planning--Parenthood |
| 44 | 3d-printing |
| 12–12 | b-g-vault-b-g, …tasknotes…plugins-tasknotes (dot-prefix variant) |
| 11 | DFD-Excalidraw-System |
That rules out “one-time WSL crash.” It’s consistent with every ungraceful close dropping an index update, accumulating across months of normal WSL lifecycle. Whatever fraction of Claude Code shutdowns on this machine are graceful, it’s not 100%, and the drift is cumulative.
Action: the same mv sessions-index.json sessions-index.json.bak-<date> recipe applies to each of those dirs — one command per project, indexes rebuild on next claude launch. Deferred pending user approval since touching other projects’ state is out of scope for this bug note.
Fix applied
Section titled “Fix applied”cd ~/.claude/projects/-mnt-c-…-mcp-workflow-and-tech-stack/mv sessions-index.json sessions-index.json.bak-2026-04-23Next claude launch should scan the jsonls and regenerate the index. Backup kept for one cycle in case the rebuild misbehaves.
Alternative: resume by explicit ID, skip the picker
Section titled “Alternative: resume by explicit ID, skip the picker”When you already know which session you want, this is immune to index staleness:
claude -r 97d7b58b-09f5-41ea-a59f-a12f230083b0The resume-by-ID path reads the jsonl directly. Worth promoting as the recovery primitive any time the picker looks wrong — cheaper and safer than rebuilding the index if you only need one session back.
Why this is distinct from challenge 02
Section titled “Why this is distinct from challenge 02”Sibling bug, different mechanism. Surfaces the same “/resume is lying to me” complaint, so it’s worth documenting under the same lineage.
| Dimension | 02 · Fragmentation | This bug (stale index) |
|---|---|---|
| Encoded buckets involved | Two (WSL + Windows) | One |
| Session files involved | Two diverging .jsonls | One .jsonl, one stale index |
| Broken layer | Content — OS-specific paths baked into jsonl | Metadata — index lags jsonl |
| Trigger | Dual-OS workspace access | Ungraceful shutdown |
| portaconv is primary fix? | Yes (paste-first extraction) | No (but pconv dump is a fine rescue) |
| User-visible symptom | Wrong session list | Wrong session details for a session that is in the list |
Why portaconv isn’t the primary fix here
Section titled “Why portaconv isn’t the primary fix here”portaconv is the escape hatch for this bug, not the primary fix. Its value proposition is paste-first extraction when (a) the jsonl content is poisoned with the wrong OS’s paths, or (b) the file is in a bucket the current /resume can’t see. Neither applies here — the jsonl is clean, in the right bucket, keyed to the right cwd. The index is the only broken artifact, and Claude Code can regenerate it from the jsonls.
pconv dump 97d7b58b-… remains a rescue path if:
- Rebuilding the index doesn’t help (jsonl itself is corrupted)
- The session is too big for
/resumeto load practically - You’d rather carve out a slice and paste into a fresh session than continue the 59 MB monolith
But the right shape of the fix for this specific bug is a small dedicated rebuilder (or an upstream fix), not bolted onto portaconv’s extractor surface.
Upstream status — already widely reported
Section titled “Upstream status — already widely reported”This is well-known upstream. Multiple open GitHub issues on anthropics/claude-code with the same symptom pattern (stale picker, active sessions missing from list):
| Issue | Title / context | Relevance |
|---|---|---|
| #25032 | ”sessions-index.json not updated” (macOS, v2.1.39) | Canonical. Commenter @tirufege ships a repair script — most-cited workaround. |
| #24729 | ”summaries not generated, new sessions not indexed since ~v2.1.31” | Commenter @agatho deobfuscated cli.js, points to function xa with a case-sensitivity bug in the multi-worktree code path. Others confirm the bug occurs without worktrees too. |
| #44346 | WSL2, v2.1.92 — same env as here | Commenter @SRHSoulja claims picker scans .jsonl files directly (contradicting the index-driven theory); @junaidtitan confirms via workaround. |
| #38340 | Synthetic test: manually-placed valid .jsonl invisible to picker despite v2.1.81 changelog claiming filesystem scan | Proves picker does NOT scan the filesystem in that version. |
| #41946 | Clean-exit session missing from picker | Rules out “only SIGKILL causes this” — even graceful exits can lose sessions from the index. |
| #46522 | Mixed-cwd sessions hidden from picker | Adjacent: sessions excluded based on cwd mismatch, not index staleness. |
| #47128 | PID-keyed ~/.claude/sessions/<PID>.json files collide after container/PID-namespace reset | Different artifact, but confirms index state is tied to process lifecycle. |
| #42030, #22878, #18619, #18897 | Duplicate reports | Shows this is widespread and cross-platform (not WSL-specific). |
Not covered by fragmentation issues #17682 / #9668 / #9306 — those are about WSL↔Windows path encoding, a different mechanism.
No Anthropic docs mention sessions-index.json at all. The format is undocumented. Ref: docs.anthropic.com, docs.claude.com.
Related upstream evidence — the “clean-exit also fails” wrinkle
Section titled “Related upstream evidence — the “clean-exit also fails” wrinkle”The graceful-shutdown hypothesis is partially corroborated but not sufficient:
- Zenn article by tjst_t explicitly calls out: “SIGKILL terminates a process immediately, application-side termination logic (signal handlers, atexit) is never executed” → index update lost; JSONL survives because it’s append-on-write. Also reports the index “rolled back two weeks” — the write path re-reads cached state and overwrites, rather than merging.
- But #41946 shows clean exits can also lose sessions. So the bug isn’t purely “SIGKILL skips the flush.” There’s additional breakage — possibly the case-sensitivity bug in
xa(#24729 comment), possibly buffered/debounced writes that drop on any abnormal timing.
The fuller mechanism is likely: write path is both racy and re-reads stale cached state, so even some graceful exits lose entries. SIGKILL just makes it worse. The 14-project-wide drift on this machine could be a mix of both.
Third-party tools targeting exactly this problem
Section titled “Third-party tools targeting exactly this problem”None of these use portaconv. They fit the “small dedicated thing” shape.
| Tool | Shape | What it does |
|---|---|---|
| tirufege/repair-sessions-index.py | Python gist, ~100 lines | Rebuilds the index by scanning .jsonl files directly. Backs up first. Reference algorithm below. |
| KirillPuljavin/cres | Rust + Homebrew | Drop-in /resume picker replacement. Reads .jsonl directly, filters out headless runs. |
| riii111/claude-resume | Rust + Ink | Rich picker UI, .jsonl-direct, bypasses index entirely. |
| Ruya-AI/cozempic | doctor subcommand (planned: index-integrity check) | Broader health-check tool; this is on roadmap. |
| kanafm/claude-code-resume-anywhere | Helper | Resumes from any directory (orthogonal: cwd scoping). |
Reference algorithm (tirufege’s, confirmed from the gist): per-.jsonl-file scan:
sessionId← filename minus.jsonlfullPath← absolute pathfileMtime←os.path.getmtime() * 1000firstPrompt← first 200 chars of first"user"-typed message text blockcustomTitle← from"custom-title"-typed entries if presentmessageCount← count of"user"+"assistant"linescreated← timestamp of first message, fall back tost_birthtime/st_ctimemodified← timestamp of last message, fall back to file mtimegitBranch← empty string (not reconstructed)projectPath← from old index if present, otherwise derivable from dir nameisSidechain←False(doesn’t actually detect sidechains)- Edge cases handled: malformed JSON lines skipped, missing timestamps fall back, content types (string vs list-of-blocks) both handled
- Write:
shutil.copy2backup first, thenjson.dumpoverwrite. Not atomic (no temp+rename). Good enough in practice; not robust under crash.
Gaps vs a proper rebuild: doesn’t detect sidechains, doesn’t repopulate gitBranch, non-atomic write. Fine for manual recovery; would want hardening for an automated tool.
Solutions matrix
Section titled “Solutions matrix”| Option | Shape | Pros | Cons | When to use |
|---|---|---|---|---|
Rebuild manually (mv + next launch) | One command | Zero code | Requires a clean launch to regenerate; if picker doesn’t FS-scan, rebuild needs a write trigger | One-off recovery, willing to verify next launch |
claude -r <uuid> | One command | Immune to index; works now | Requires knowing the UUID; doesn’t fix picker for future launches | Immediate recovery of a known session |
| Rebuild script (tirufege-style, batch-capable) | Small Python/Bash, .claude/tools/rebuild-sessions-index | Deterministic, doesn’t depend on Claude Code’s write path; batch across all ~/.claude/projects/*/ | Bypasses whatever write-path logic upstream provides; can’t repopulate gitBranch correctly | Routine recovery, systemic cleanup (14+ stale projects on this machine) |
Replacement picker (cres, claude-resume) | External binary + shell alias | Full fix for the picker UX; index becomes irrelevant | Another tool to install; doesn’t help anything else that reads the index | If picker UX is the only thing you care about |
SessionEnd hook in settings.json | Add hook that calls a graceful-shutdown action | Prevents future drift without changing user habits | Doesn’t fire on SIGKILL / WSL force-close — same failure mode as the bug itself | Only catches the subset of closures that already fire SessionEnd — partial |
Wrapper script (claude-safe that traps SIGTERM and calls /exit) | Shell wrapper in ~/.local/bin/ | Covers more shutdown types than SessionEnd | SIGKILL / process-group-kill from wsl --shutdown still bypasses traps; complex to get right | Marginal improvement; not a full fix |
| File upstream with evidence | GitHub issue comment on #25032 or #24729 | Root-cause fix for everyone | Slow; depends on Anthropic prioritizing; you’d want measurements first | Once the mechanism is pinned down (strace or deobf’d xa reading) |
Move to ~/.local (WSL-native FS) | One-time project move | Sidesteps dual-encoding AND is faster I/O | Breaks Windows-editor access; doesn’t actually fix index staleness (still happens on single-OS) | Only solves fragmentation, not this bug |
Recommended stack for this machine (ranked by leverage):
- Immediate recovery (done): rename this project’s index. Use
claude -r <uuid>if the rebuild doesn’t trigger on next launch. - Systemic cleanup: a batch rebuild script that walks
~/.claude/projects/*/, detects lag > N days, rebuilds. One-shot run clears the 14-project backlog, and a weekly cron keeps it clean going forward. - Upstream push: add an evidence-bearing comment on #25032 with the “cumulative 14-project lag distribution” observation and the Zenn article’s “rolled back two weeks” mechanism hypothesis. Goal: raise the bar for what “fix” means — the write path appears to re-read stale cached state rather than merge.
- Defer: the picker replacement (
cres/claude-resume). Worth revisiting if #38340 is right and the picker really doesn’t FS-scan in current versions — would make the index entirely bypassable.
Not recommended: wrapping portaconv. The escape hatch is fine where it is.
Empirical tests still worth running (by cost)
Section titled “Empirical tests still worth running (by cost)”strace -fe trace=openat,rename,write,fsync -p $(pgrep -f 'claude.*cli')through a full session, diffed against/exitvs SIGKILL vswsl --shutdown. Pins down the write cadence. ~30 min of work.- Check whether next
claudelaunch with absent index rebuilds it, or picker just starts empty. ~1 min: start fresh, observe/resumepicker + index file presence after clean exit. (I can’t run this myself from inside a session; worth you observing.) - Diff
cli.jsfrom v2.1.31 vs v2.1.118 to see if functionxachanged shape. If it did, the case-sensitivity hypothesis is falsifiable across versions. Medium cost: deobfuscate both.
Open questions (updated after investigation)
Section titled “Open questions (updated after investigation)”Is this already tracked upstream?Yes, extensively. #25032 is the canonical; #24729 has the deepest RE (functionxa, case-sensitivity bug); #44346 is the exact WSL2 environment match; #38340 proves picker doesn’t FS-scan as of v2.1.81; #41946 proves clean exits can also lose entries.Does graceful shutdown always succeed?No — #41946 shows even clean exits can drop sessions from the index. The write path is racier than the pure “SIGKILL skips flush” model.- Is
/resume’s picker list derived purely fromsessions-index.json, or does it also consult.jsonlmtimes? Still contested: #38340 says no FS scan (synthetic-test evidence), #44346 says FS scan does happen. Likely version-dependent, and the answer drives whether a rebuild script is sufficient or whether a replacement picker is needed. - What exact trigger events write
sessions-index.json? Undocumented. Would needstrace -fe trace=openat,rename,write,fsyncon a live session across/exit, SIGTERM, SIGKILL to pin down. Zenn writeup suggests the write path re-reads cached state and overwrites, which would explain “rolled back two weeks” and match this machine’s cumulative 14-project drift. - Is there a practical max jsonl size beyond which starting a fresh session beats continuing? 59 MB crosses some threshold of pain (compaction cost, load time, index churn). Not formally bounded.
- Does this reproduce on OpenCode? Different storage format, but the ungraceful-shutdown failure mode is generic enough to be worth checking. Out of scope for this note.
- Does the case-sensitivity bug in function
xa(#24729 comment) still exist in v2.1.118? @agatho’s RE was against an older version. Worth diffingcli.jsacross versions.
Actions taken
Section titled “Actions taken”- Renamed this project’s
sessions-index.json→sessions-index.json.bak-2026-04-23 - Scanned all
~/.claude/projects/*/— found 14 other projects with lag > 7 days (table above) - Surveyed upstream bug tracker, docs, and third-party tooling landscape
- Collected reference rebuild algorithm (tirufege gist, fields + derivations)
- Documented solutions matrix with tradeoffs
- Shipped
pconv doctor+pconv rebuild-indexsubcommands in portaconv v0.1.0 — shared adapter helpers (detect_staleness,build_index_for_project,write_index_atomic), MCP integration fordoctor, 23 new integration tests across both subcommands. Rebuilt this project’s index viapconv rebuild-index --project <path>;.bak-2026-04-23preserved as historical artifact. Round-trip verified:pconv doctorreports clean. - Posted upstream comment on #25032 — comment 4304663073. Evidence-bearing (14-project drift distribution, observed
sessions-index.jsonformat, mechanism hypothesis tying graceful-shutdown dependency to Zenn’s “overwrite-with-cached-state” observation) with four prioritized asks (rebuild-on-startup-if-stale; merge-with-filesystem write path; document the format; emit a warning from the picker when it disagrees with.jsonlmtimes). - Promoted to stack pattern + research learning — see “Published as” below.
- Verify next clean
claudelaunch in this project regenerates the index (follow-up, requires launch) - Monitor upstream response on #25032; add OpenCode adapter if a PR opens that needs testing
Published as
Section titled “Published as”Findings landed across three scaffold layers:
- Stack pattern (tier 2, the discoverable how-to): 02-stack/patterns/claude-code-session-recovery.md — decision tree for picking among
claude -r <uuid>/pconv rebuild-index/pconv doctor --dump-stale. - Research learning (the insight-archive entry): research/learnings/2026-04-23-stale-sessions-index-detection-and-recovery.md — why the detect + recover-via-extractor pattern generalizes to any append-only-log-plus-summary-index architecture.
- Upstream comment (the slow-path fix): drafted at 2026-04-23-upstream-comment-25032.md, posted at #25032 comment 4304663073.
This research note stays in place as the investigation proof — full evidence, hypothesis-chase, and dead ends. The three published artifacts above are the usable takeaways.
See also
Section titled “See also”- 02 · Claude Code conversation fragmentation — sibling
/resumefailure mode, resolved via portaconv - portaconv — extractor you’d reach for if the jsonl itself becomes unusable
- anthropics/claude-code#17682 — cross-environment history sync (different problem, adjacent area)