2026-04-26 — Worklog
Session 1: lost-context recovery via pconv (recurrence of stale-sessions-index)
Section titled “Session 1: lost-context recovery via pconv (recurrence of stale-sessions-index)”The same WSL-shutdown-induced stale-sessions-index bug surfaced again — /resume showed wrong/old metadata for the active 97d7b58b session despite the JSONL being current. Recovery using the exact recipe documented in research/learnings/2026-04-23-stale-sessions-index-detection-and-recovery.md:
pconv listconfirmed the active session was 97d7b58b, ~10,430 messages, 66 MB JSONL.pconv dump 97d7b58b --tail 200 --full-results --include-system-events > /tmp/recovered-dokploy-context.md(968 KB).- Read targeted slices around the Dokploy thread anchors (lines 5732 → 6786) — recovered the full TigerVNC walkthrough I had given prior.
- Renamed
sessions-index.json→sessions-index.json.bak-2026-04-26-staleso the next graceful Claude launch rebuilds it. - Added a new project-memory entry (
memory/reference_pconv_recovery.md+MEMORY.mdindex update) so the next recurrence runs the recipe immediately rather than re-deriving.
Why it mattered: demonstrates the recovery pattern works repeatedly with no surprises. Confirms the pconv dump --tail + sessions-index rename combination is the right reflex for this failure mode. The portaconv doctor / rebuild-index subcommands shipped 2026-04-25 would streamline this further; revisit using them next time instead of the manual sequence.
Session 2: Dokploy VM stood up live on TrueNAS Scale
Section titled “Session 2: Dokploy VM stood up live on TrueNAS Scale”Followed the playbook from the lost-context-pre-recovery research directly into a live deployment. The arc:
- Provisioned
personal/vm/dokployzvol (60 GB sparse, lz4) on the TrueNAS Scale box. - Created the VM (Debian 13, 2 vCPU / 4 GB RAM, UEFI, single VirtIO disk, bridged on br0).
- Installed Debian via TigerVNC. Hit two known footguns: (1) setting a root password during install means
sudodoes NOT get installed and the user is NOT added to the sudo group — fix issu - && apt install sudo && usermod -aG sudo <user>then relog; (2)curlis no longer in Debian’s “standard system utilities” — fix isapt install -y curl ca-certificates. - Hit the CD-ROM-not-detached loop once — first reboot landed back at the installer because the ISO wasn’t detached before clicking “Continue” at install-complete. Lost ~15 min recovering. Documented prominently in the new tier-2 bootstrap pattern (Phase 2 step 14).
- Tailscale up on the VM (
tailscale up --ssh --hostname=dokploy); MagicDNS resolved immediately. VNC retired permanently after this point. - Dokploy install via
curl -sSL https://dokploy.com/install.sh | sh— Docker installed,docker swarm init, all services converged, total ~3 min. - Dokploy installer’s final URL prints the box’s WAN IP (from
curl ifconfig.me) — corrected via Dokploy → Settings → Server → IP / Domain to the tailnet identity. The WAN URL must NEVER be opened or port-forwarded. Documented this footgun explicitly in the new pattern doc.
End state: Dokploy reachable at http://dokploy:3000 from any tailnet device. Currently HTTP-over-tailnet (Level 0); HTTPS upgrade plan logged separately for when the trigger fires.
Session 3: stuck-ZFS-dataset post-mortem + new escape-hatch pattern
Section titled “Session 3: stuck-ZFS-dataset post-mortem + new escape-hatch pattern”While cleaning up the leftover personal/dokploy/{data,pgdb,redis} datasets from a March 2026 failed Dokploy-on-TrueNAS-direct attempt, hit EBUSY — dataset is busy despite:
mounted=noon every dataset- Empty
mount,findmnt -A,lsof - No snapshots, no holds, not encrypted
- No Docker container/volume references
- No NFS/SMB shares
- No
receive_resume_tokenset systemctl restart middlewaredno help- No relevant kernel log entries
- Reboot would have been next but tried one more thing first
Escape hatch that worked: zfs rename personal/dokploy personal/_trash_dokploy_2026-04-26 && zfs destroy -R -f -v personal/_trash_dokploy_2026-04-26. Renaming the dataset out of the way unblocked the destroy.
Why it works: TrueNAS Scale’s middleware caches dataset references by path string in its internal SQLite DB, not by ZFS dataset GUID. Some prior Apps operation registered the dataset for management, and that bookkeeping entry held the path open as EBUSY even after the App was gone. Renaming changed the path string; middlewared’s stale entry now pointed at a non-existent path; the reference dangled; the kernel released the lock.
This is exactly the kind of finding that needed to be written down — it isn’t in zfs(8) (it’s a layer above ZFS), it isn’t in TrueNAS official docs, and the diagnostic ladder (mount → lsof → findmnt → docker → snapshots → holds → encryption → receive_resume_token → middlewared restart → reboot → rename) is the kind of canonical decision tree that future-me would reinvent the hard way.
Session 4: documentation pass — 4 tier-2 patterns + 2 tier-3 docs
Section titled “Session 4: documentation pass — 4 tier-2 patterns + 2 tier-3 docs”Captured everything from the day’s live work into structured pattern docs. Frontmatter validates per the schema enum.
02-stack/patterns/dokploy-on-truenas-via-vm.md(statusstable) — practitioner walkthrough; cross-refs the existingself-hosted-paas-truenas-conflict.mdlearning (the why) andself-hosted-deployment-platforms.mdreference (the broader landscape). Architecture diagram, the full setup arc, footgun catalogue, “when this pattern is wrong” section.02-stack/patterns/tailscale-https-three-levels.md(stable) — three-level decision framework (HTTP-over-tunnel /tailscale serve/tailscale cert + Traefik + systemd timer) with full Level 2 recipe (renewal script + service + timer units). Mermaid decision tree. Migration cost table between levels.02-stack/patterns/truenas-stuck-zfs-dataset.md(stable) — 8-step diagnostic ladder + the rename-then-destroy escape hatch + the why-it-works callout. Mermaid decision tree. Worst-case-acceptance section so the next person doesn’t waste hours past the right giving-up point.02-stack/patterns/debian-vm-tailnet-bootstrap.md(stable) — every-screen netinst recipe + the two installer footguns (sudo, curl) + the cloud-init alternative for “the next time you’re doing this routinely” + comprehensive failure-mode table.03-work/homelab/dokploy-vm.md(active) — cybersader’s specific deployment instance. Cross-refs all four tier-2 patterns; contains the post-mortem on thepersonal/dokploy/*cleanup; lists what’s next (first deploy / HTTPS trigger / DHCP reservation / re-evaluating the cron-pull design).03-work/homelab/tailnet-https.md(planning) — plan for the Level 2 upgrade when triggered. Decision logged (skip Level 1 entirely; go Level 0 → Level 2 directly because of the Traefik-port-443 conflict). Trigger criteria + pre-flight checklist + cybersader-specific value substitutions.- Tier-2 patterns index updated with 4 new entries.
- Tier-3 homelab index updated with 2 new entries.
Session 5: cc-resume-here shell helper + recurrence research note
Section titled “Session 5: cc-resume-here shell helper + recurrence research note”End-of-day follow-up to Session 1’s stale-sessions-index recovery. The recurrence (3 days after pconv doctor shipped in portaconv v0.1.0) makes it clear that the gap isn’t a missing tool but the lack of an automatic launch-time trigger.
profiles/bashrc-snippets/claude-code-helpers.sh— addedcc-resume-herefunction (aliasccrh). Reads encoded-cwd dir directly, picks most-recent.jsonlby mtime, resumes viaclaude -r <uuid>. Bypassessessions-index.jsonentirely. Encoded-cwd derivation matches Claude Code’s convention (every non-alnum char → single dash). Bash syntax verified; sources cleanly.agent-context/zz-research/2026-04-26-stale-index-recurrence-and-shell-helper-layer.md(statusresearch) — captures (a) the recurrence-fact (1 day after a fresh rebuild, the index drifted again), (b) the shell helper as a complementary layer to pconv (additive, not replacement), and (c) five look-into items for closing the delivery gap (SessionStart hook, portagenty shim, replacement picker, upstream fix, promotion-to-challenge-03). Cadence-tracking table left for future appends.
Why it mattered: the 2026-04-23 fix-by-tooling shipped (pconv doctor + pconv rebuild-index). The 2026-04-26 recurrence proves the tool isn’t enough on its own — nothing fires it automatically. The shell helper is the bashrc-layer answer (“I just want to resume without thinking”). Worth tracking as research because the right next move (which look-into to commit to) isn’t obvious yet.
Cross-references
Section titled “Cross-references”- The four new tier-2 patterns ship to the agentic public mirror per the existing 02-stack/ allowlist.
- The two tier-3 homelab docs stay in the gitignored
03-work/homelab/per the existing convention. - This worklog itself is tier-2-clean (zz-log/ ships per CLAUDE.md hard rule) and serves as the cross-reference index for the four-pattern + two-doc bundle.
Notes / observations
Section titled “Notes / observations”- The pconv recovery worked end-to-end in <5 minutes — context recovered, sessions-index renamed, memory note added, work resumed without losing the day. This is exactly the outcome the 2026-04-21 portaconv ship targeted; it’s now the reflex, not a discovery. The recurrence rate (every WSL terminal-session close on this path) makes the per-recurrence cost essentially zero.
- The Dokploy installer’s WAN-IP-as-the-go-here-URL is genuinely dangerous — anyone who blindly pastes that URL into
firewall → port forward 3000has just exposed the admin UI to the public internet. Worth flagging upstream to Dokploy (they could detect WAN vs LAN/tailnet IPs and prefer the latter). - Rename-then-destroy was discovered on the live system in the moment, not from prior knowledge — the canonical “everything else failed” answer. Worth promoting from “discovered today” → “documented pattern” in this same session, which is what the new tier-2 doc accomplishes.
- The four-pattern tier-2 set composes deliberately — Dokploy-on-TrueNAS-via-VM points at Debian-VM-tailnet-bootstrap for §2, points at Tailscale-HTTPS-three-levels for the eventual upgrade, points at TrueNAS-stuck-ZFS-dataset for cleanup. Each doc is single-concern; the network of cross-references creates the full picture without any one doc becoming bloated.