2026-04-26 — Worklog

S5 · Instance log 2026-04-26

Session 1: lost-context recovery via pconv (recurrence of stale-sessions-index)

The same WSL-shutdown-induced stale-sessions-index bug surfaced again — /resume showed wrong/old metadata for the active 97d7b58b session despite the JSONL being current. Recovery using the exact recipe documented in research/learnings/2026-04-23-stale-sessions-index-detection-and-recovery.md:

pconv list confirmed the active session was 97d7b58b, ~10,430 messages, 66 MB JSONL.
pconv dump 97d7b58b --tail 200 --full-results --include-system-events > /tmp/recovered-dokploy-context.md (968 KB).
Read targeted slices around the Dokploy thread anchors (lines 5732 → 6786) — recovered the full TigerVNC walkthrough I had given prior.
Renamed sessions-index.json → sessions-index.json.bak-2026-04-26-stale so the next graceful Claude launch rebuilds it.
Added a new project-memory entry (memory/reference_pconv_recovery.md + MEMORY.md index update) so the next recurrence runs the recipe immediately rather than re-deriving.

Why it mattered: demonstrates the recovery pattern works repeatedly with no surprises. Confirms the pconv dump --tail + sessions-index rename combination is the right reflex for this failure mode. The portaconv doctor / rebuild-index subcommands shipped 2026-04-25 would streamline this further; revisit using them next time instead of the manual sequence.

Session 2: Dokploy VM stood up live on TrueNAS Scale

Followed the playbook from the lost-context-pre-recovery research directly into a live deployment. The arc:

Provisioned personal/vm/dokploy zvol (60 GB sparse, lz4) on the TrueNAS Scale box.
Created the VM (Debian 13, 2 vCPU / 4 GB RAM, UEFI, single VirtIO disk, bridged on br0).
Installed Debian via TigerVNC. Hit two known footguns: (1) setting a root password during install means sudo does NOT get installed and the user is NOT added to the sudo group — fix is su - && apt install sudo && usermod -aG sudo <user> then relog; (2) curl is no longer in Debian’s “standard system utilities” — fix is apt install -y curl ca-certificates.
Hit the CD-ROM-not-detached loop once — first reboot landed back at the installer because the ISO wasn’t detached before clicking “Continue” at install-complete. Lost ~15 min recovering. Documented prominently in the new tier-2 bootstrap pattern (Phase 2 step 14).
Tailscale up on the VM (tailscale up --ssh --hostname=dokploy); MagicDNS resolved immediately. VNC retired permanently after this point.
Dokploy install via curl -sSL https://dokploy.com/install.sh | sh — Docker installed, docker swarm init, all services converged, total ~3 min.
Dokploy installer’s final URL prints the box’s WAN IP (from curl ifconfig.me) — corrected via Dokploy → Settings → Server → IP / Domain to the tailnet identity. The WAN URL must NEVER be opened or port-forwarded. Documented this footgun explicitly in the new pattern doc.

End state: Dokploy reachable at http://dokploy:3000 from any tailnet device. Currently HTTP-over-tailnet (Level 0); HTTPS upgrade plan logged separately for when the trigger fires.

Session 3: stuck-ZFS-dataset post-mortem + new escape-hatch pattern

While cleaning up the leftover personal/dokploy/{data,pgdb,redis} datasets from a March 2026 failed Dokploy-on-TrueNAS-direct attempt, hit EBUSY — dataset is busy despite:

mounted=no on every dataset
Empty mount, findmnt -A, lsof
No snapshots, no holds, not encrypted
No Docker container/volume references
No NFS/SMB shares
No receive_resume_token set
systemctl restart middlewared no help
No relevant kernel log entries
Reboot would have been next but tried one more thing first

Escape hatch that worked: zfs rename personal/dokploy personal/_trash_dokploy_2026-04-26 && zfs destroy -R -f -v personal/_trash_dokploy_2026-04-26. Renaming the dataset out of the way unblocked the destroy.

Why it works: TrueNAS Scale’s middleware caches dataset references by path string in its internal SQLite DB, not by ZFS dataset GUID. Some prior Apps operation registered the dataset for management, and that bookkeeping entry held the path open as EBUSY even after the App was gone. Renaming changed the path string; middlewared’s stale entry now pointed at a non-existent path; the reference dangled; the kernel released the lock.

This is exactly the kind of finding that needed to be written down — it isn’t in zfs(8) (it’s a layer above ZFS), it isn’t in TrueNAS official docs, and the diagnostic ladder (mount → lsof → findmnt → docker → snapshots → holds → encryption → receive_resume_token → middlewared restart → reboot → rename) is the kind of canonical decision tree that future-me would reinvent the hard way.

Session 4: documentation pass — 4 tier-2 patterns + 2 tier-3 docs

Captured everything from the day’s live work into structured pattern docs. Frontmatter validates per the schema enum.

02-stack/patterns/dokploy-on-truenas-via-vm.md (status stable) — practitioner walkthrough; cross-refs the existing self-hosted-paas-truenas-conflict.md learning (the why) and self-hosted-deployment-platforms.md reference (the broader landscape). Architecture diagram, the full setup arc, footgun catalogue, “when this pattern is wrong” section.
02-stack/patterns/tailscale-https-three-levels.md (stable) — three-level decision framework (HTTP-over-tunnel / tailscale serve / tailscale cert + Traefik + systemd timer) with full Level 2 recipe (renewal script + service + timer units). Mermaid decision tree. Migration cost table between levels.
02-stack/patterns/truenas-stuck-zfs-dataset.md (stable) — 8-step diagnostic ladder + the rename-then-destroy escape hatch + the why-it-works callout. Mermaid decision tree. Worst-case-acceptance section so the next person doesn’t waste hours past the right giving-up point.
02-stack/patterns/debian-vm-tailnet-bootstrap.md (stable) — every-screen netinst recipe + the two installer footguns (sudo, curl) + the cloud-init alternative for “the next time you’re doing this routinely” + comprehensive failure-mode table.
03-work/homelab/dokploy-vm.md (active) — cybersader’s specific deployment instance. Cross-refs all four tier-2 patterns; contains the post-mortem on the personal/dokploy/* cleanup; lists what’s next (first deploy / HTTPS trigger / DHCP reservation / re-evaluating the cron-pull design).
03-work/homelab/tailnet-https.md (planning) — plan for the Level 2 upgrade when triggered. Decision logged (skip Level 1 entirely; go Level 0 → Level 2 directly because of the Traefik-port-443 conflict). Trigger criteria + pre-flight checklist + cybersader-specific value substitutions.
Tier-2 patterns index updated with 4 new entries.
Tier-3 homelab index updated with 2 new entries.

Session 5: cc-resume-here shell helper + recurrence research note

End-of-day follow-up to Session 1’s stale-sessions-index recovery. The recurrence (3 days after pconv doctor shipped in portaconv v0.1.0) makes it clear that the gap isn’t a missing tool but the lack of an automatic launch-time trigger.

profiles/bashrc-snippets/claude-code-helpers.sh — added cc-resume-here function (alias ccrh). Reads encoded-cwd dir directly, picks most-recent .jsonl by mtime, resumes via claude -r <uuid>. Bypasses sessions-index.json entirely. Encoded-cwd derivation matches Claude Code’s convention (every non-alnum char → single dash). Bash syntax verified; sources cleanly.
agent-context/zz-research/2026-04-26-stale-index-recurrence-and-shell-helper-layer.md (status research) — captures (a) the recurrence-fact (1 day after a fresh rebuild, the index drifted again), (b) the shell helper as a complementary layer to pconv (additive, not replacement), and (c) five look-into items for closing the delivery gap (SessionStart hook, portagenty shim, replacement picker, upstream fix, promotion-to-challenge-03). Cadence-tracking table left for future appends.

Why it mattered: the 2026-04-23 fix-by-tooling shipped (pconv doctor + pconv rebuild-index). The 2026-04-26 recurrence proves the tool isn’t enough on its own — nothing fires it automatically. The shell helper is the bashrc-layer answer (“I just want to resume without thinking”). Worth tracking as research because the right next move (which look-into to commit to) isn’t obvious yet.

Cross-references

The four new tier-2 patterns ship to the agentic public mirror per the existing 02-stack/ allowlist.
The two tier-3 homelab docs stay in the gitignored 03-work/homelab/ per the existing convention.
This worklog itself is tier-2-clean (zz-log/ ships per CLAUDE.md hard rule) and serves as the cross-reference index for the four-pattern + two-doc bundle.

Notes / observations

The pconv recovery worked end-to-end in <5 minutes — context recovered, sessions-index renamed, memory note added, work resumed without losing the day. This is exactly the outcome the 2026-04-21 portaconv ship targeted; it’s now the reflex, not a discovery. The recurrence rate (every WSL terminal-session close on this path) makes the per-recurrence cost essentially zero.
The Dokploy installer’s WAN-IP-as-the-go-here-URL is genuinely dangerous — anyone who blindly pastes that URL into firewall → port forward 3000 has just exposed the admin UI to the public internet. Worth flagging upstream to Dokploy (they could detect WAN vs LAN/tailnet IPs and prefer the latter).
Rename-then-destroy was discovered on the live system in the moment, not from prior knowledge — the canonical “everything else failed” answer. Worth promoting from “discovered today” → “documented pattern” in this same session, which is what the new tier-2 doc accomplishes.
The four-pattern tier-2 set composes deliberately — Dokploy-on-TrueNAS-via-VM points at Debian-VM-tailnet-bootstrap for §2, points at Tailscale-HTTPS-three-levels for the eventual upgrade, points at TrueNAS-stuck-ZFS-dataset for cleanup. Each doc is single-concern; the network of cross-references creates the full picture without any one doc becoming bloated.