Direction — research wave (Challenges 08/09/10), critical gaps, and the roadmap reshape
Table of contents
Section titled “Table of contents”- §1 Why this log exists
- §2 Research waves landed
- §3 Critical assessment — Ch 10 gaps, Ch 09 minor, Ch 08 omissions, Ch 11/12/13 reads
- §4 Research challenges status — 11 resolved + Ch 14/15/16 spun up
- §5 Direction posture — superseded by TL;DR
- §6 Phase plan refresh
- §7 Proposed roadmap deltas — listed for review; not yet applied
- §8 StewardshipProfile rename ripples — listed for review; not yet applied
- §9 What’s still deferred
- §10 Long-horizon ideas considered, not committed
- §11 Related
Why this log exists
Section titled “Why this log exists”Yesterday’s pair of logs (orientation AM, decisions PM) set the table: web-of-webs framing, five Foundation commitments, three new research challenges (08/09/10), the StewardshipProfile rename, the meta-schema lifecycle commitment (“Crosswalker eats its own dog food”), and an explicit deferral of roadmap edits to this log.
Today two research waves landed, totalling 9 fresh-agent deliverables:
- First wave (Ch 08/09/10): 3 deliverables addressing the gaps surfaced by yesterday’s commitments — git audit-trail tenability, identifier strategy, graph→tabular bridging engine.
- Second wave (Ch 11/12/13): 6 deliverables (Ch 11 produced three independent runs with slightly different recommendations on Tier 2/3 engine choice; Ch 12 produced two; Ch 13 produced one). Multi-agent convergence across the three Ch 11 runs is itself load-bearing evidence for the Tier 2 layered stack.
This log:
- Summarizes both waves (§2.A first wave, §2.B second wave)
- Gives critical reads of each (§3)
- Marks the resolved challenges and spins up Challenge 14 for the engines surfaced during Ch 11 research (§4)
- Direction posture: confirmed / pending / partial / follow-on (§5)
- Phase plan refresh (§6)
- Roadmap deltas listed for review (§7)
- StewardshipProfile rename ripples listed (§8) — preserved from earlier today
- What’s still deferred (§9)
- NEW: Long-horizon ideas considered, not committed (§10) — LinkML, IPLD, Tier 1.5 compilation, AI-augmented mapping, etc.
§2.A First research wave (Ch 08/09/10)
Section titled “§2.A First research wave (Ch 08/09/10)”Three fresh-agent research deliverables, all 2026-05-02 morning:
2.1 Challenge 08 — Git history as a compliance audit trail (verdict: augment, not replace)
Section titled “2.1 Challenge 08 — Git history as a compliance audit trail (verdict: augment, not replace)”Full deliverable: Ch 08: Is git history a tenable compliance audit trail?. Brief: Challenge 08.
Headline finding: bare signed-commit + branch-protection git fails on four of five core audit-evidence standards (SAS 142 “susceptibility to management bias,” PCAOB AS 1105 IPE controls, ISO 27001 A.8.15 tamper-resistance against privileged insiders, SOX §802 WORM expectation). It satisfies the integrity leg via the Merkle DAG but fails the trusted-time and non-repudiation legs because the author controls the timestamp and the repository is internally written.
Recommended Tier 1 hardening (all required, none deferred to Tier 2/3):
- Signed commits on every evidence-link state change (SSH signing preferred over GPG)
- Mandatory remote mirror — refuse to claim “audit trail” for an unmirrored vault
- RFC 3161 trusted-timestamp receipts on every audit-relevant commit (free public TSAs make this operationally negligible)
gc.reflogExpire = neverandgc.reflogExpireUnreachable = never- Built-in Audit Authenticity Report export (FRE 902(13)/(14)-shaped certification PDF/JSON)
Tier 2 adds nothing audit-wise on its own (the sql.js sidecar is a derived view, not a system of record) but is the right place to enforce write-through-to-git in code.
Tier 3 adds external monitoring (continuous git verify-commit + TSA receipt verification + fsck --full), per-commit/hourly WORM mirror to S3 Object Lock Compliance (or equivalent — Cohasset confirms compliance with SEC 17a-4, FINRA 4511, CFTC 1.31), and SOC 2-attestable infrastructure controls converting the user’s IPE problem into the server’s IPE problem.
14-failure-mode inventory ranked by likelihood and mitigation cost; the residual risk after augmentation is a three-party collusion (admin bypass + WORM compromise + forged TSA), which is below the bar that any commercial GRC tool credibly defends against.
2.2 Challenge 09 — UUID / CWUUID cross-cutting identifier strategy
Section titled “2.2 Challenge 09 — UUID / CWUUID cross-cutting identifier strategy”Full deliverable: Ch 09: UUID/CWUUID cross-cutting identifier strategy. Brief: Challenge 09.
Headline finding: layered scheme. UUIDv7 (RFC 9562, May 2024) for almost everything Crosswalker generates; content-addressed sha256 CIDs only where the entity is its content (spine snapshots, schema releases); CURIEs for external references (controls, frameworks, ORCIDs); ORCID CURIEs verbatim for SSSOM author/reviewer slots. “CWUUID” is a display convention, not a new algebra — every CW-minted ID is a canonical UUIDv7 stored in YAML frontmatter; cw: prefix and short hex suffixes are UI affordances only.
Filename strategy: human-readable filenames for browseable classes (ontology nodes, evidence notes); composite filenames with --cwunder 6-hex> suffix for collision-prone classes (junction notes, lifecycle records, crosswalk edges with multiple predicates). Aligns with the prevailing Obsidian convention of uid in frontmatter (Advanced URI plugin, etc.).
OSCAL round-trip: preserve incoming @uuid flags verbatim on import; mint UUIDv7 only for entities Crosswalker creates de novo. UUIDv7 is syntactically valid in every OSCAL uuid slot — published OSCAL schemas constrain only the regex grammar, not the version nibble. Risk of NIST tightening to v4-only is low.
Minimum viable Foundation set (v0.1) — six identifier classes:
- Vault UUID
- Ontology-web UUID
- Ontology-node UUID (alongside CURIE natural key)
- Junction-note UUID (covers crosswalk + evidence-link)
- Spine snapshot CID
- SSSOM author CURIE (ORCID-preferred)
The deliverable is substantively complete. Minor gaps flagged in §3.2 below don’t warrant a re-run.
2.3 Challenge 10 — Graph→tabular bridging engine for the web-of-webs
Section titled “2.3 Challenge 10 — Graph→tabular bridging engine for the web-of-webs”Full deliverable: Ch 10: Graph→tabular bridging engine for the web-of-webs. Brief: Challenge 10.
Headline finding: hybrid 3-tier strategy.
| Tier | Strategy | Engine choice |
|---|---|---|
| Tier 1 | Build | Materialized-folder generator inside the plugin; flattens graph queries into Bases-compatible YAML notes; per-folder .view.yaml, .view.lock.json dependency manifest, .view.stale flag |
| Tier 2 | Integrate | DuckDB-WASM (~3.2 MB compressed shell, MIT, MotherDuck-backed). Recursive CTEs handle multi-hop traversal; PIVOT/UNPIVOT/window functions handle cross-tabs; Apache Arrow zero-copy to a Polars-JS or Arquero renderer-side pivot layer |
| Tier 3 | Integrate | PostgreSQL + Apache AGE (openCypher graph traversal + full SQL, Apache 2.0, ASF governance). Oxigraph as RDF sidecar for SSSOM/SKOS/STRM workloads needing SPARQL property-path closure |
KuzuDB explicitly rejected despite Cypher elegance and native property-graph fit, on the basis that the upstream project was archived 10 October 2025 with no upstream maintenance commitment. The deliverable claims this is a load-bearing supply-chain risk for a multi-year compliance tool. ⚠️ Verify before acting on it (see §3.1).
Five data-flow invariants — file canonicity, determinism/idempotency, explicit staleness, writes-always-land-in-files, transparent cross-tier query routing. Same posture as the 05-01 §2.5 meta-schema commitment: files are the only writable surface; everything else is a content-addressed cache.
Cost ceilings sketched per tier (e.g., Tier 1 caps at ~10K vault notes / 30K crosswalk edges; Tier 2 at ~250K notes / 5M edges; Tier 3 effectively unbounded). Marked design-time targets, not benchmarks.
The deliverable is technically thorough on the shortlist it considered, but the shortlist is incomplete in ways that matter — see §3.1.
§2.B Second research wave (Ch 11/12/13)
Section titled “§2.B Second research wave (Ch 11/12/13)”Six fresh-agent research deliverables, all 2026-05-02 (afternoon/evening):
2.4 Challenge 11 — Tier 2/3 engine deep survey (3 deliverables)
Section titled “2.4 Challenge 11 — Tier 2/3 engine deep survey (3 deliverables)”Three independent fresh-agent runs, each producing slightly different recommendations. Multi-agent convergence is itself load-bearing evidence.
| Deliverable | Distinguishing recommendation |
|---|---|
| Ch 11a — TerminusDB-as-Tier-3 emphasis | Keep DuckDB-WASM Tier 2; drop AGE; adopt TerminusDB as default Tier 3 (git-style branch/diff/merge). Includes a Grafeo follow-up (potential game-changer engine that could collapse the layered Tier 2 stack to one engine). |
| Ch 11b — Layered Tier 2 stack | Layer Tier 2: DuckDB-WASM + Oxigraph-WASM + Nemo-WASM (each lazy-loaded). AGE remains optional Tier 3. |
| Ch 11c — Layered + OSCAL/FedRAMP angle | Same layered Tier 2 as 11b. Tier 3 = AGE+Jena Fuseki + optional TerminusDB vault-mirror. Strategic insight: FedRAMP RFC-0024 mandates machine-readable authorisation packages by Sept 2026 → OSCAL native support is a 10× value-multiplier. |
Convergence summary (multi-agent evidence):
| Recommendation | 11a | 11b | 11c | Convergence |
|---|---|---|---|---|
| KuzuDB upstream dead, no fork stable | ✅ | ✅ | ✅ | 3-of-3 strong |
| DuckDB-WASM stays Tier 2 default | ✅ | ✅ | ✅ | 3-of-3 strong |
| Datalog (Nemo) for SSSOM derivation | ✅ | ✅ | ✅ | 3-of-3 strong |
| AGE alone is insufficient for Tier 3 (no RDF) | ✅ | ✅ | ✅ | 3-of-3 strong |
| Polars-WASM not viable today | ✅ | ✅ | ✅ | 3-of-3 strong |
| TerminusDB cannot be Tier 2 (no embedded WASM) | ✅ | ✅ | ✅ | 3-of-3 strong |
| Tier 2 layered stack (DuckDB + Oxigraph + Nemo) | ❌ (single-engine + Tier 3 swap) | ✅ | ✅ | 2-of-3 — 11a takes a different shape |
| Tier 3 default = AGE+Jena Fuseki | ❌ (TerminusDB-as-default) | ✅ | ✅ | 2-of-3 — split |
| TerminusDB as optional Tier 3 vault-mirror | ✅ (as primary) | ✅ (as alternative) | ✅ (as optional vault-mirror) | 3-of-3 strong (role varies) |
New engines surfaced during research (absent from original Ch 11 brief): Grafeo (potential game-changer — pure-Rust LPG+RDF+vector with WASM bindings, all major query languages, could collapse layered Tier 2 to single engine), Minigraf (embedded bi-temporal Datalog with WASM), CozoDB (Datalog+graph+vector embedded), SurrealDB-WASM (multi-model unified, BSL 1.1 license), Comunica + N3 + HDT (TS-native SPARQL meta-engine, ~200 KB gzipped vs Oxigraph’s ~3 MB), cr-sqlite (CRDT SQLite), simple-graph + sqlite-vec stack. All flagged for follow-on Challenge 14.
2.5 Challenge 12 — Datalog vs SQL for SSSOM chain rules (2 deliverables)
Section titled “2.5 Challenge 12 — Datalog vs SQL for SSSOM chain rules (2 deliverables)”| Deliverable | Distinguishing recommendation |
|---|---|
| Ch 12a — Focused fork-in-the-road | Hybrid: rules expressed as Datalog DSL, executable via either Datalog engine (Nemo, browser/CLI) or compiled to SQL recursive CTEs (DuckDB-WASM/SQLite-WASM). OxO2 reference architecture validates Datalog at scale (1.16M mappings → 49.5K inferences, ~17 min, ~380 MB on a laptop). |
| Ch 12b — Beyond the known engine landscape | Long-horizon: LinkML as canonical schema substrate (auto-generates JSON Schema, OWL, SHACL, Pydantic, TypeScript from one YAML). IPLD content-addressed crosswalks. Tier 1.5 compilation pipeline producing Parquet/HDT/JSON-LD/OSCAL/IPLD-CAR. AI-augmented mapping per OAEI 2025/26 (LogMap-LLM, GenOM). Many engines absent from prior surveys. |
Convergence: Both deliverables converge on Datalog (Nemo) as primary derivation engine, OxO2 as the reference architecture. Both endorse the rules-as-data DSL pattern compiled to either Datalog or SQL. Deliverable B expands well beyond the Ch 12 brief into long-horizon architectural ideas — those land in §10.
2.6 Challenge 13 — Modern attestation primitives (confirms Ch 08, adds in-toto)
Section titled “2.6 Challenge 13 — Modern attestation primitives (confirms Ch 08, adds in-toto)”Full deliverable: Ch 13 — Modern attestation primitives. Brief: Challenge 13.
Verdict by primitive:
| Primitive | Decision | Why |
|---|---|---|
| Sigstore / gitsign / Rekor | Complement (configurable alternative for commit signing) | Cleanest path to SLSA L3; non-air-gap only; offer alongside GPG/SSH. |
| in-toto attestations | Complement (mandatory, Tier 1) | Replaces commit messages as authoritative review-chain record; backward-compatible with PDF Audit Authenticity Report. |
| SLSA targeting | Adopt as framing model | L1 for v0.1, L2 for v1.0, L3 for v2.0+ (via gitsign). |
| OpenTimestamps | Skip Tier 1; offer Tier 2 | Latency + auditor unfamiliarity outweigh marginal benefit for 7-year retention; permanence value re-emerges only for >25-year horizon. |
| W3C Verifiable Credentials | Skip near-term; track for v2.0+ | Standards-stable since May 2025 but not in audit toolchains today; viable for cross-vault federation later. |
| AWS QLDB | Drop entirely | Service ended 2025-07-31. |
| Azure Confidential Ledger | Skip Tier 1; document as Tier 3 | High cost (~$3/day per ledger), narrow incremental benefit, vendor-locked. |
| immudb | Skip Tier 1; document as Tier 3 | Useful for high-volume scale but not auditor-recognized as external party. |
| Challenge 08 stack | Confirm + extend with in-toto | Auditor-familiar floor; in-toto is the missing review/approval schema. |
Net architectural impact: Ch 13 adds exactly one mandatory primitive (in-toto) to Ch 08’s Tier 1 stack. Tier 1 minimum bar is now: signed commits + RFC 3161 TSA + S3 Object Lock WORM + FRE 902(13) PDF cert + in-toto attestations.
Critical assessment of the research waves
Section titled “Critical assessment of the research waves”3.1 Challenge 10 has substantive gaps
Section titled “3.1 Challenge 10 has substantive gaps”The deliverable evaluates ~9 engines and makes the DuckDB+AGE call on the basis of that shortlist. Whole classes of relevant systems are not engaged with at all:
| Class | Specific systems missed | Why this matters for Crosswalker |
|---|---|---|
| Datalog engines | Soufflé, Nemo, Differential Datalog, Datomic, RDFox | Datalog is the native fit for SSSOM chain rules. Recursive Datalog with provenance is mathematically cleaner than recursive CTE for transitive crosswalk derivation. The OxO2 paper Crosswalker already cites uses Nemo (Datalog). Direct fork in the road that the deliverable doesn’t even pose |
| Production triple stores | Apache Jena Fuseki, GraphDB (Ontotext), Virtuoso, RDF4J, Stardog, AnzoGraph, Blazegraph | The project is RDF-flavored (SSSOM, SKOS, STRM) by design. Oxigraph is “the WASM one we found”; no comparative analysis against the mature RDF stack |
| Versioned graph databases | TerminusDB (Git-style branching/diff/merge over RDF) | Terminus’s versioned-graph-with-diff-and-merge model is uncannily aligned with Crosswalker’s “files canonical, derived stores rebuildable” ethos. Should arguably be a top contender; not mentioned at all |
| Other property graphs | Memgraph, NebulaGraph, ArangoDB, Dgraph, FalkorDB, OrientDB | The “Kuzu rejected → AGE accepted” leap skips half the relevant landscape |
| Embedded analytical | Polars-WASM (Tier 1.5 candidate), DataFusion (Apache), LanceDB, ClickHouse-local, Velox | Polars-WASM as a Tier 1.5 join/pivot layer without SQL is a real alternative the deliverable mentions as a renderer-side helper but doesn’t seriously evaluate as a primary engine |
| Vector + graph hybrids | Weaviate, Qdrant, Milvus, FalkorDB+vec | Becomes critical for AI-assisted schema matching (deferred future workstream). No architectural slot reserved |
| Streaming / incremental MV | Materialize, Differential Dataflow, Snowflake Dynamic Tables, ksqlDB | Deliverable cites Postgres MV/BQ/Redshift but skips state-of-the-art incremental view maintenance — directly relevant for the materialized-folder Tier 1 design |
| Virtual / federated | Ontop (SPARQL-over-relational), Trino, Dremio | Could turn “files → derived store” into “files → virtual SQL view” without materializing |
| Query unification | GraphQL gateway, Substrait | A unified query layer that abstracts the engine choice across tiers. Not mentioned at all |
Architectural questions left unasked:
- Datalog vs recursive CTE for the core SSSOM chain-rule derivation. The deliverable picks the harder, less-expressive option without justifying it
- TerminusDB’s versioned-graph model deserves first-class evaluation given Crosswalker’s ethos
- Polars-WASM as Tier 1.5 (without DuckDB) — bundle-size argument deserves deeper treatment
- GraphQL as a tier-agnostic query surface (compiles to SQL/Cypher/SPARQL per tier)
- CRDT layer for the deferred live-edit team mode (Yjs / Automerge / Loro)
- Concrete WASM bundle optimization strategies (tree-shaking, code-splitting, on-demand loading)
- LLM/NL-query architecture for AI-assisted features
- No real benchmarks against representative GRC data — explicitly out-of-scope per the brief, but means the choices are theoretical against an unmeasured workload
Empirical claims worth verifying before acting:
- “KuzuDB archived 10 October 2025” — load-bearing for the entire engine choice. Verify upstream + the “bighorn” community fork status
- “DuckDB-WASM ~3.2 MB compressed” — confirm against current build; bundle has grown over releases
- “DuckPGQ extension not yet WASM-friendly” — check current state
- Apache AGE PostgreSQL version compatibility window
3.2 Challenge 09 minor gaps (no new research session needed)
Section titled “3.2 Challenge 09 minor gaps (no new research session needed)”Challenge 09 is substantively complete. Minor flags for implementation only:
- SHA-256 vs SHA-3 for content addressing: sha256 fine, but if a downstream user mandates SHA-3 (some federal contexts), allow algorithm agility in the CID prefix
- UUIDv7 timestamp leakage: every UUIDv7 reveals creation millisecond — a side-channel in adversarial/forensic settings. Document explicitly
- UUIDv7-from-mtime migration entropy: the migration script derives UUIDs from file mtime, which has near-zero entropy. Should be flagged that migrated UUIDs are predictable and unsuitable as security tokens
- Cross-vault federation protocol: punted to Phase 2 in the deliverable; the URN form
urn:crosswalker:<vault-uuid>:<entity-uuid>is sketched but not analyzed against DID:web for vault identity or IPFS CIDs for content
Action: lessons-learned annotation on the existing Challenge 09 brief; no new research session.
3.3 Challenge 08 — two real omissions
Section titled “3.3 Challenge 08 — two real omissions”Challenge 08 is broadly complete but skips a category of modern attestation primitives that meaningfully change the design:
- Sigstore + in-toto attestations + SLSA framework — federated OIDC-backed signing (Sigstore alone could replace the entire “manage GPG/SSH signing keys” UX); in-toto is the standard for “this evidence was reviewed by X using process Y” attestations; SLSA frames the whole supply-chain integrity story. None of these are engaged with
- OpenTimestamps (Bitcoin-anchored, free, decentralized) — gets one passing mention in the failure-mode table but isn’t compared against RFC 3161 TSAs as an alternative or complement
- W3C Verifiable Credentials for the qualified-person certification — flexible alternative to the proposed PDF Audit Authenticity Report
- AWS QLDB / Azure Confidential Ledger rejected without deep analysis
Action: spin up a narrower follow-on research challenge focused specifically on these primitives — resolved by Challenge 13 (deliverable summarized in §2.6 above; critical read in §3.6 below).
3.4 Critical read of Ch 11
Section titled “3.4 Critical read of Ch 11”Convergence is strong across the three independent runs:
- 3-of-3 rejected KuzuDB upstream and all four forks (Bighorn, Ladybug, RyuGraph, Vela) as not stable enough for a multi-year compliance tool
- 3-of-3 kept DuckDB-WASM as Tier 2 default
- 3-of-3 committed to Datalog (Nemo, OxO2 architecture) for SSSOM derivation
- 3-of-3 found AGE alone insufficient at Tier 3 (no RDF; SSSOM/SKOS/STRM are RDF-native)
- 3-of-3 rejected Polars-WASM as Tier 1.5 (Pyodide-only path, alpha)
Divergence on the layered Tier 2 stack — 11b and 11c explicitly recommend layered Tier 2 (DuckDB-WASM + Oxigraph-WASM + Nemo-WASM lazy-loaded). 11a takes a different shape: keep DuckDB single-engine at Tier 2 and replace AGE with TerminusDB at Tier 3. Both architectures are coherent. Decision pending user input — see §5.B.
Divergence on Tier 3 default:
- 11a: TerminusDB as default (versioning is the load-bearing requirement)
- 11b/11c: AGE+Jena Fuseki as default; TerminusDB as optional vault-mirror
New engines surfaced — Grafeo (most consequential potential game-changer), Minigraf, CozoDB, SurrealDB-WASM, Comunica + N3 + HDT, cr-sqlite, sqlite-vec stack. Spin up Challenge 14.
FedRAMP RFC-0024 (mandates machine-readable authorisation packages by Sept 2026) flagged by 11c as a 10× value-multiplier for Crosswalker’s federal market — argues for elevating OSCAL native support from feature to architectural concern.
3.5 Critical read of Ch 12
Section titled “3.5 Critical read of Ch 12”Strong convergence across 12a and 12b plus all three Ch 11 deliverables: Datalog (Nemo) is the right derivation engine, OxO2 is the reference architecture, rules-as-data DSL is the architectural pattern.
Ch 12b expansion beyond brief: deliverable B explicitly went beyond the narrow Datalog-vs-SQL question and proposed major architectural ideas not previously on the table:
- LinkML as canonical schema substrate — auto-generates JSON Schema, OWL, SHACL, Pydantic, TypeScript from one YAML; SSSOM is itself defined in LinkML; could become Crosswalker’s “Tier 0”
- IPLD content-addressed crosswalks — every SSSOM bundle hashes to a CID; distribution via Merkle-DAG; signed with W3C VCs
- Tier 1.5 compilation pipeline — Markdown+SSSOM TSV → Parquet/HDT/JSON-LD/OSCAL/IPLD-CAR multi-target artifact compiler
- AI-augmented mapping — LogMap-LLM, GenOM, BERTMap; OAEI 2025/2026 results validate hybrid LLM approach
These are major architectural pivots, not adopted today. Catalogued in §10 (long-horizon ideas considered, not committed).
3.6 Critical read of Ch 13
Section titled “3.6 Critical read of Ch 13”Highest confidence of the second wave. Ch 13 directly confirms Ch 08’s stack with one explicit addition (in-toto attestations as mandatory Tier 1) and several explicit non-adoptions (skip OpenTimestamps for Tier 1, drop QLDB entirely as it’s dead, skip ACL/immudb/VCs as Tier 1).
Single new commitment vector: in-toto attestations as the mandatory schema for review/approval evidence. Custom Crosswalker predicate type https://crosswalker.dev/predicates/evidence-review/v1 plus reuse of SLSA Provenance v1.
No surprises, no contradictions, no major divergences from Ch 08. Lift to commit in §5.B.
Research challenges status
Section titled “Research challenges status”4.1 Challenge 11 — Tier 2/3 engine deep survey
Section titled “4.1 Challenge 11 — Tier 2/3 engine deep survey”✅ RESOLVED by 3 fresh-agent deliverables (one of the strongest convergence signals in the project so far). Summarized in §2.4; critical read in §3.4.
- Ch 11a — TerminusDB-as-Tier-3 emphasis
- Ch 11b — Layered Tier 2 stack (DuckDB + Oxigraph + Nemo)
- Ch 11c — Layered + OSCAL/FedRAMP angle
Net direction: layered Tier 2 stack (2-of-3 explicit + alignment from 11a’s Grafeo follow-up); Tier 3 default open between AGE+Jena and TerminusDB-as-primary; new engines surfaced → Challenge 14.
4.2 Challenge 12 — Datalog vs SQL for SSSOM chain-rule derivation
Section titled “4.2 Challenge 12 — Datalog vs SQL for SSSOM chain-rule derivation”✅ RESOLVED by 2 fresh-agent deliverables. Strong convergence on Datalog (Nemo) primary + OxO2 reference architecture. Summarized in §2.5; critical read in §3.5.
- Ch 12a — focused fork-in-the-road analysis; rules-as-data DSL
- Ch 12b — long-horizon expansion (LinkML, IPLD, Tier 1.5 compilation, AI-augmented mapping); long-horizon ideas catalogued in §10
4.3 Challenge 13 — Modern attestation primitives
Section titled “4.3 Challenge 13 — Modern attestation primitives”✅ RESOLVED by 1 fresh-agent deliverable. Confirms Ch 08 Tier 1 stack; adds in-toto attestations as mandatory; offers Sigstore/gitsign as configurable alternative; SLSA L1→L2→L3 progression. Summarized in §2.6; critical read in §3.6.
- Ch 13 — modern attestation primitives
4.4 Challenge 14 — Missed engines evaluation
Section titled “4.4 Challenge 14 — Missed engines evaluation”✅ RESOLVED by 1 fresh-agent deliverable later on 2026-05-02 (third wave). Verdict: keep Ch 11 layered Tier 2 stack as production; add Tier 2-Lite (sqlite-wasm + sqlite-vec + simple-graph + recursive-CTE) and Comunica + N3 + HDT federation as additive extensions; track Grafeo and Minigraf with explicit, falsifiable migration triggers; reject SurrealDB (BSL + 12.6 MB bundle), cr-sqlite (stalled Oct 2024), CozoDB (no release since v0.7 in 2023). Synthesized in third-wave log §2.
- Ch 14 deliverable — missed engines evaluation
- Archived Challenge 14 brief
4.5 Challenge 15 — Audit-trail alternatives without external git tooling
Section titled “4.5 Challenge 15 — Audit-trail alternatives without external git tooling”✅ RESOLVED by 1 fresh-agent deliverable later on 2026-05-02 (third wave). Verdict: adopt 4-tier model (T0/T1/T2/T3) with OpenTimestamps .ots on signed chain checkpoints as new T2 default; reposition the Ch 08+13 git+RFC3161+S3-Object-Lock+FRE 902 stack as one of three T3 options (others: eIDAS QTSA + W3C VC for EU; Sigstore Rekor v2 + in-toto for supply-chain). Crypto-agile PQC migration plan 2026→2032 ahead of NIST IR 8547 2035 deadline. Synthesized in third-wave log §4.
- Ch 15 deliverable — non-git audit-trail alternatives
- Archived Challenge 15 brief
4.6 Challenge 16 — Tier 3 stack reconsideration
Section titled “4.6 Challenge 16 — Tier 3 stack reconsideration”✅ RESOLVED by 1 fresh-agent deliverable later on 2026-05-02 (third wave). Verdict: demote Apache AGE from default to optional fallback; promote Apache Jena Fuseki as new Tier 3 default; document oxigraph-server as the lighter same-API alternative (architectural symmetry: same engine as Tier 2, just oxigraph serve); document layered Fuseki + DuckDB-on-server as power-user upgrade path; TerminusDB v12 as opt-in vault-mirror with small-vendor (DFRNT) risk explicitly flagged. Synthesized in third-wave log §3.
- Ch 16 deliverable — Tier 3 stack reconsideration
- Archived Challenge 16 brief
4.7 (informational) Possible future challenges identified
Section titled “4.7 (informational) Possible future challenges identified”Not spun up; flagged for user signal:
- Ch 17 candidate — LinkML adoption as canonical schema substrate (per Ch 12b §2.3)
- Ch 18 candidate — Tier 2-Lite SSSOM rule subset and scale ceiling (per Ch 14 §2.7) — most actionable follow-on
- Ch 19 candidate — PQC dual-sign protocol detail (per Ch 15 §5.6) — defer toward 2027
- Ch 20 candidate — eIDAS 2.0 / EUDI Wallet integration profile (per Ch 15 §3.1) — defer to 2027+
- (older) IPLD content-addressed crosswalk distribution (per Ch 12b §4.2); reactive/incremental computation for derived crosswalks (Feldera/DBSP/Materialize)
Direction posture
Section titled “Direction posture”The user signal: lift convergent items in this log; embed inline checkpoint questions for explicit yes/no per item; do follow-on research where convergence is partial.
Four buckets:
5.A Confirmed commitments (locked in this log)
Section titled “5.A Confirmed commitments (locked in this log)”Items with strong convergence and no objections expected; lock in immediately.
| Topic | Source |
|---|---|
| Identifier strategy — UUIDv7 + sha256 CIDs + CURIEs + ORCID for SSSOM authors | Ch 09 deliverable; single deliverable, substantively complete |
| KuzuDB and forks: do NOT adopt as Tier 2 primary (no fork has 12+ months stability) | 3-of-3 Ch 11 convergence |
| Polars-WASM: NOT viable as Tier 1.5 today (alpha, Pyodide-only path) | 3-of-3 Ch 11 convergence; reaffirmed by Ch 12b |
| AWS QLDB: drop entirely (service ended 2025-07-31) | Ch 13 |
| Pairwise + optional pivot crosswalk architecture | 05-01 §2.1 commitment stays |
| Junction-note evidence-link form factor | 04-10 synthesis + 05-01 §2.2 commitment stays |
| OxO2 reference architecture for SSSOM chain-rule derivation | 3-of-3 Ch 11 + 2-of-2 Ch 12 |
5.B Convergent commitments — pending user sign-off
Section titled “5.B Convergent commitments — pending user sign-off”Strong convergence; flagged for explicit yes/no per item before locking. Each row ends with a checkpoint question.
| Topic | Convergence | Checkpoint |
|---|---|---|
| Tier 2 layered stack: DuckDB-WASM + Oxigraph-WASM + Nemo-WASM (all lazy-loaded). Total compressed ~10 MB worst case; under 100 KB plugin shell + ~3 MB on first analytical query. | 2-of-3 Ch 11 (11a takes a different shape — single-engine + TerminusDB-as-Tier-3); reaffirmed by Ch 12 + Ch 14 brief | [Confirm? Y/N] Adopt layered Tier 2 (DuckDB + Oxigraph + Nemo) over single-engine Tier 2? |
| Tier 1 audit-trail bar: signed commits + RFC 3161 TSA + S3 Object Lock WORM mirror + FRE 902(13) PDF cert + in-toto attestations (mandatory). | Total convergence (Ch 13 confirms Ch 08 + adds in-toto only) | [Confirm? Y/N] Adopt the 5-layer Tier 1 audit-trail bar including in-toto? |
| Datalog (Nemo) for SSSOM chain-rule derivation per OxO2 architecture (build-pipeline derivation, not live query). DuckDB recursive CTE remains for ad-hoc queries over already-derived facts. | Total convergence (3-of-3 Ch 11 + 2-of-2 Ch 12) | [Confirm? Y/N] Lock in Nemo-as-derivation-engine, OxO2-as-reference-architecture? |
| Sigstore/gitsign as configurable alternative for commit signing (path to SLSA L3); GPG/SSH remains default. | Ch 13 explicit | [Confirm? Y/N] Offer Sigstore/gitsign as alternative-not-replacement? |
| SLSA targeting: L1 for v0.1, L2 for v1.0, L3 for v2.0+ via gitsign. | Ch 13 explicit | [Confirm? Y/N] Adopt SLSA progression (L1→L2→L3 per version)? |
| Tier 1 materialized-folder generator for Bases-compatible pre-joined views (Ch 10 §2 design). | Stays sound across Ch 11 deliverables; concept doesn’t change with engine choice | [Confirm? Y/N] Lock in materialized-folder-generator as Tier 1 pattern? |
5.C Partial convergence — need user input
Section titled “5.C Partial convergence — need user input”Items where multiple acceptable answers emerged from the research. User input needed.
| Topic | Convergence pattern | User input needed |
|---|---|---|
| Tier 3 default stack: AGE+Jena Fuseki (Ch 11b, 11c) vs TerminusDB-as-primary (Ch 11a) | 2-of-3 favors AGE+Jena; 1-of-3 favors TerminusDB | Pick: AGE+Jena Fuseki primary, TerminusDB optional vault-mirror? OR TerminusDB primary, AGE fallback? |
| TerminusDB role: vault-mirror only / Tier 3 alternative / Tier 3 default? | 3-of-3 mention TerminusDB; role varies | Pick role for TerminusDB. |
| LinkML as canonical schema substrate (Ch 12b major architectural pivot) | 1-of-2 Ch 12 deliverables; substantial implications | Adopt LinkML as Tier 0 schema layer (auto-generates JSON Schema/OWL/SHACL/Pydantic/TypeScript)? Or stick with bespoke schemas? Or spin up Challenge 15 to evaluate? |
| OSCAL native support priority (Ch 11c flag: FedRAMP RFC-0024 mandates by Sept 2026) | 1-of-3 Ch 11 explicitly flagged; demand-side validation event | Promote OSCAL import/export from feature to architectural concern? Add roadmap item? |
| Grafeo evaluation timing (Ch 11a follow-up + Ch 14 brief) | Identified as potential game-changer; Ch 14 will evaluate | Wait for Ch 14 deliverable, OR also evaluate now via narrower in-session research? |
5.D Identified for follow-on research
Section titled “5.D Identified for follow-on research”Items where the convergence pointed at a gap. New research challenges spun up:
- Challenge 14 (created today): missed engines (Grafeo, Minigraf, CozoDB, SurrealDB-WASM, Comunica, cr-sqlite, sqlite-vec). Brief published.
- Possible Challenge 15 (await user signal): LinkML adoption as canonical schema substrate.
- Possible Challenge 16 (await user signal): IPLD content-addressed crosswalk distribution.
- Possible Challenge 17 (await user signal): reactive/incremental computation for derived crosswalks.
Phase plan refresh
Section titled “Phase plan refresh”The two research waves of 2026-05-02 collapsed Phases B and C into the same calendar day:
- Phase A — Today’s housekeeping (✅ done)
- Adopted the Challenge 09 identifier strategy — locked in §5.A
- Spun up Ch 11, Ch 12, Ch 13 (earlier today) and Ch 14 (this log)
- StewardshipProfile rename ripples — listed in §8 (still not yet applied; awaits user)
- Roadmap deltas — listed in §7 (still not yet applied; awaits user)
- Phase B — Run the second research wave (✅ done, today)
- Ch 11 produced 3 independent fresh-agent runs
- Ch 12 produced 2 deliverables (one focused, one long-horizon)
- Ch 13 produced 1 deliverable
- Phase C — Synthesize (✅ in flight, this log update)
- This log captures convergences (§5.A and §5.B) and divergences (§5.C)
- Key open question: Tier 3 stack default (AGE+Jena vs TerminusDB-as-primary)
- Phase B′ — Run the third research wave
- Hand Challenge 14 to a fresh agent (engines that surfaced during Ch 11)
- Optionally spin up Challenges 15/16/17 if the user signals interest in LinkML / IPLD / reactive computation (§5.C)
- Phase D — Implementation begins (gated on Phase B′ + §5.B sign-off)
- Tier 1 materialized-folder generator (committed in §5.B pending sign-off)
- Tier 1 audit-trail hardening: signed commits + TSA + WORM + cert export + in-toto (committed in §5.B pending sign-off)
- Tier 2 engine integration: layered stack (DuckDB-WASM + Oxigraph-WASM + Nemo-WASM) — committed in §5.B pending sign-off and contingent on Ch 14 not surfacing a single-engine collapse
- Tier 3 engine integration: gated on §5.C user input
- Schemas (StewardshipProfile, junction-note, FrameworkConfig v2,
_crosswalkerv2) — implementation following the meta-schema versioning policy once that policy is concretized
- Phase E — Out of scope for this Foundation cycle
- Marketplace mechanics
- Cross-vault federation (Phase 2)
- LLM/AI-assisted features (informed by Ch 11 §2.6 + Ch 12b § AI-augmented mapping; defer concrete commitment)
- Multi-user collaboration / CRDT layer (informed by Yjs/Loro/Automerge/cr-sqlite analyses across Ch 11 deliverables)
The big change vs yesterday’s plan: Phases B and C collapsed into one calendar day because the user ran multiple fresh-agent sessions in parallel. The new gate before Phase D is (a) §5.B explicit sign-off from the user, (b) Ch 14 deliverable on missed engines, (c) §5.C resolution of partial-convergence items.
Proposed roadmap deltas
Section titled “Proposed roadmap deltas”Concrete edits to land in docs/src/content/docs/reference/roadmap/index.mdx — listed here for review; not yet applied. Edits are bucketed by section of the roadmap.
A. Foundation — “Get the architecture right” — existing items to update
Section titled “A. Foundation — “Get the architecture right” — existing items to update”A1. “Pairwise crosswalks vs synthetic spine architecture” (currently the biggest open Foundation question)
- ✅ Mark as resolved by 05-01 §2.1 deferred-pivot hybrid commitment
- Update item description to summarize the commitment: pairwise primary, optional inheritable pivot (default SCF), SSSOM-on-markdown persistence, derived-mappings-computed-not-stored
- Update internal challenge link from
/zz-challenges/06-synthetic-spineto/zz-challenges/archive/06-synthetic-spine(already done in 05-01 commit; verify still correct)
A2. “EvolutionPattern vs transformation recipes”
- Rename to: “StewardshipProfile (formerly EvolutionPattern) vs transformation recipes”
- Update body text: every “EvolutionPattern” → “StewardshipProfile” with first-mention “(formerly EvolutionPattern)” parenthetical
- Add link to 05-01 §3.2 rename
A3. “Evidence-framework edge model (the other edge type)” (currently marked “Direction committed”)
- Add a sentence: Tier 1 audit-trail hardening (TSA + WORM + cert-export per Challenge 08) is deferred pending Challenge 13 — modern attestation primitives.
- Rest of the existing item (junction notes, 13-field schema, OSCAL by-component, etc.) stands
A4. “Crosswalk edge semantics commitment (STRM + SSSOM)”
- Mark as ✅ committed direction per 05-01 §2.1 and the Challenge 06 deliverable
- No body changes; status marker only
A5. “Progressive tier architecture (pillar)”
- Add: Tier 2 engine choice deferred pending Challenge 11; Datalog vs SQL fork for derivation deferred pending Challenge 12.
- Rename “Tier 2 (sql.js sidecar)” descriptor to “Tier 2 (embedded analytical engine, choice TBD per Ch 11)”
- Existing 3-tier framing stands
A6. “Obsidian Bases direction research”
- No change — still active research; coordinates with Challenge 11 but doesn’t get superseded
B. Foundation — new items to add
Section titled “B. Foundation — new items to add”B1. NEW: “Identifier strategy (UUIDv7 + sha256 CIDs + CURIEs)” ✅ committed
- Per Challenge 09 deliverable: UUIDv7 default, sha256 multibase CIDs for content-addressed (spine snapshots, schema releases), CURIEs for external references (controls, frameworks, ORCIDs)
- “CWUUID” is display convention only, not a new algebra
- Filename rule: human-readable +
--cwunder 6-hex>suffix on collision-prone classes - Six-class minimum viable Foundation set (vault, ontology web, ontology node, junction note, spine snapshot CID, SSSOM author CURIE)
- OSCAL round-trip: preserve incoming
@uuidverbatim; mint UUIDv7 only for new entities - Links to deliverable
B2. NEW: “Tier 2/3 engine deep survey” — research item
- Links to Challenge 11
- Required to complete before Tier 2 implementation can start
B3. NEW: “Datalog vs SQL for SSSOM chain-rule derivation” — research item
- Links to Challenge 12
- Coordinates with Challenge 11 (engine choice picks the paradigm; Ch 12 picks the engine)
B4. NEW: “Modern attestation primitives evaluation (Sigstore, in-toto, SLSA, OpenTimestamps, VCs)” — research item
- Links to Challenge 13
- Required before the Tier 1 audit-trail bar from Challenge 08 can be locked in
B5. NEW: “Crosswalker-internal schema versioning and migration policy”
- Per 05-01 §2.5 dog-food commitment: every Crosswalker-internal schema (StewardshipProfile, junction-note 13-field schema, FrameworkConfig,
_crosswalkermetadata, pivot snapshot manifest, lifecycle change record, SSSOM crosswalk record) is versioned and migration-aware - Versioning convention TBD (semver vs content-addressed vs release-tag-aligned)
- Direction log next pass to commit the convention
B6. NEW: “Tier 2 layered engine stack (DuckDB-WASM + Oxigraph-WASM + Nemo-WASM)” (PENDING §5.B sign-off)
- Per Ch 11 deliverables b+c convergence — DuckDB-WASM for tabular SQL; Oxigraph-WASM for SPARQL/SKOS; Nemo-WASM for SSSOM chain-rule derivation
- Bundle target: under 100 KB plugin shell + ~3 MB on first analytical query + lazy-load Oxigraph (~3 MB) and Nemo (~3–4 MB) on demand
- Alternative collapse-to-single-engine pending Ch 14 evaluation of Grafeo
B7. NEW: “SSSOM chain-rule derivation engine (Nemo, OxO2 architecture)” (PENDING §5.B sign-off)
- Per Ch 12 convergence — Nemo (Rust → WASM) as the canonical Datalog derivation engine
- OxO2 reference architecture: Markdown vault → SSSOM facts → Nemo (Datalog with chain rules) → derived facts (with provenance) → DuckDB Parquet shard / TerminusDB graph commit
- Rules expressed as data (Datalog DSL), not code; compilable to either Nemo or SQL recursive CTE per deployment
B8. NEW: “Tier 1 audit-trail hardening (Ch 08 + in-toto)” (PENDING §5.B sign-off)
- Per Ch 08 + Ch 13 convergence — five-layer Tier 1 stack:
- Signed commits (GPG/SSH default; gitsign as configurable alternative)
- RFC 3161 trusted timestamps on every commit
- S3 Object Lock WORM mirror
- FRE 902(13) qualified-person certification PDF
- in-toto attestations (mandatory) for review/approval evidence — custom predicate type
https://crosswalker.dev/predicates/evidence-review/v1
- Tier 2: Rekor cross-publication when gitsign in use; OpenTimestamps as parallel
.otsproof for high-retention vaults - Tier 3 (optional): Azure Confidential Ledger or immudb for high-volume installations
B9. NEW: “Sigstore/gitsign as configurable alternative; SLSA targeting (L1→L2→L3)” (PENDING §5.B sign-off)
- Per Ch 13 — Sigstore (Fulcio/Rekor/gitsign) as alternative-not-replacement for GPG/SSH
- SLSA Build target: L1 v0.1, L2 v1.0, L3 v2.0+ (gitsign + Fulcio + ephemeral keys)
B10. NEW: “Tier 3 engine stack” (PENDING §5.C user input)
- Per Ch 11 partial convergence — Tier 3 default is split between AGE+Jena Fuseki (deliverables b+c) and TerminusDB-as-primary (deliverable a)
- One row blocked on user signal; not yet a roadmap delta
B11. NEW: “OSCAL native support (FedRAMP RFC-0024 demand-side validation)” (PENDING §5.C user input)
- Per Ch 11c — FedRAMP RFC-0024 mandates machine-readable authorisation packages by Sept 2026; OSCAL native import/export becomes a 10× value-multiplier for Crosswalker’s federal market
- Could promote OSCAL from “feature” to “architectural concern”; not yet a roadmap delta
C. Decision log (bottom of roadmap) — append entries
Section titled “C. Decision log (bottom of roadmap) — append entries”The current “Decision log” section at the bottom of the roadmap lists 04-03 through 04-09 entries. Append:
- Foundation state of play (orientation, 05-01 AM) — web-of-webs framing, six open questions
- Foundation commitments + follow-on research (decisions, 05-01 PM) — five commitments, three new challenges, StewardshipProfile rename, meta-schema commitment
- Direction — research wave + roadmap reshape (this log, 05-02) — the present log
D. Inventory
Section titled “D. Inventory”Total proposed roadmap edits:
- 6 existing items modified (A1–A6)
- 5 new items added (B1–B5)
- 3 decision-log entries appended (C)
These are deltas to the active Foundation section only. The Formats / Crosswalks / Evolution / Community sections of the roadmap are not touched in this pass.
StewardshipProfile rename ripples (proposed edits)
Section titled “StewardshipProfile rename ripples (proposed edits)”A grep across docs/src/content/docs finds 27 files containing “EvolutionPattern” (last counted 2026-05-02). The rename strategy preserves history but updates the canonical present:
E1. Concept pages (canonical present — full update)
Section titled “E1. Concept pages (canonical present — full update)”These pages define the term. Replace “EvolutionPattern” with “StewardshipProfile” on first mention parenthesized as “StewardshipProfile (formerly EvolutionPattern)”, then “StewardshipProfile” thereafter:
concepts/terminology.mdx(3 mentions) — entry headerconcepts/ontology-evolution.mdx(3 mentions)concepts/ontology-lifecycle.mdx(2 mentions)concepts/institutional-landscape.mdx(2 mentions)concepts/operational-landscape.mdx(4 mentions)concepts/what-makes-crosswalker-unique.mdx(8 mentions — most-affected concept page)
E2. Reference pages — full update (registry + roadmap)
Section titled “E2. Reference pages — full update (registry + roadmap)”reference/registry/cis.mdx(1)reference/registry/mitre.mdx(1)reference/registry/nist.mdx(1)reference/registry/oscal.mdx(1)reference/roadmap/index.mdx(3) — covered by §7 A2 above; same rename pass
E3. Active research challenges — full update
Section titled “E3. Active research challenges — full update”agent-context/zz-challenges/03-competitive-landscape.mdx(1)agent-context/zz-challenges/04-first-principles-audit.mdx(1)agent-context/zz-challenges/05-transformation-problem.mdx(1)
E4. Historical log entries — top-of-file rename callout, body preserved
Section titled “E4. Historical log entries — top-of-file rename callout, body preserved”These logs were written before the rename. Add a single :::tip callout at the top of each linking to 05-01 §3.2; leave body alone (preserves history). The most-prominent callout goes on the original EvolutionPattern taxonomy draft.
agent-context/zz-log/2026-04-03-evolution-pattern-taxonomy-draft.mdx(4) — prominent rename calloutagent-context/zz-log/2026-04-03-deep-research-synthesis.mdx(1)agent-context/zz-log/2026-04-03-distribution-architecture-research.mdx(1)agent-context/zz-log/2026-04-03-layered-architecture-vision.mdx(2)agent-context/zz-log/2026-04-03-vision-alignment-decisions.mdx(2)agent-context/zz-log/2026-04-04-volatility-and-registry.mdx(1)agent-context/zz-log/2026-04-08-ontology-evolution-first-principles.mdx(5)agent-context/zz-log/2026-04-09-primitives-depth-and-pluggable-layers.mdx(1)agent-context/zz-log/2026-04-09-user-first-ontology-maintenance.mdx(2)agent-context/zz-log/2026-04-10-foundation-research-synthesis.mdx(12) — most-mentioned log; prominent callout
E5. Today’s logs — already use “StewardshipProfile (formerly EvolutionPattern)” pattern; no edit
Section titled “E5. Today’s logs — already use “StewardshipProfile (formerly EvolutionPattern)” pattern; no edit”agent-context/zz-log/2026-05-01-foundation-state-of-play.mdx(6) — already updatedagent-context/zz-log/2026-05-01-foundation-commitments-and-followon-research.mdx(12) — the rename log itselfagent-context/zz-log/2026-05-02-direction-research-wave-and-roadmap-reshape.mdx(3) — this log; references the rename in §3.2 link only
Rename mechanics
Section titled “Rename mechanics”- Categories E1–E3 (canonical present, 14 files): surgical Edits replacing “EvolutionPattern” → “StewardshipProfile (formerly EvolutionPattern)” on first mention per file, “StewardshipProfile” thereafter
- Category E4 (historical, 10 files): a single inserted
:::tip[Renamed: EvolutionPattern → StewardshipProfile]callout at the top of each, ~3 lines. Body unchanged - Total = ~24 file edits in the rename pass
The rename pass should be its own commit, separate from this log’s commit, so the diff is reviewable.
What’s still deferred
Section titled “What’s still deferred”The deferred list is shorter now that Ch 11/12/13 resolved. Remaining items:
| Deferred topic | Gates on |
|---|---|
| Tier 3 default stack (AGE+Jena vs TerminusDB-as-primary) | §5.C user input |
| TerminusDB role (vault-mirror only / Tier 3 alternative / primary) | §5.C user input |
| OSCAL native support architectural priority | §5.C user input |
| Layered Tier 2 vs single-engine collapse | Challenge 14 deliverable (Grafeo viability) |
| Bi-temporal Datalog (Minigraf) for SSSOM | Challenge 14 |
| LinkML adoption as canonical schema substrate | Possible Challenge 15 (await user signal) |
| IPLD content-addressed crosswalk distribution | Possible Challenge 16 (await user signal) |
| Reactive/incremental computation (Feldera/DBSP) | Possible Challenge 17 (await user signal); see §10 |
| Concrete versioning convention for Crosswalker-internal schemas | Phase B′ synthesis |
| StewardshipProfile keep/replace/stack (formal grounding via Stojanovic & Flouris) | Already on Foundation roadmap |
| Marketplace mechanics (Obsidian plugin distribution, signing, sandboxing) | Post-Foundation (Phase 2) |
| Cross-vault federation protocol | Phase 2 (informed by Linked Data Fragments + Comunica federated SPARQL per Ch 11/12) |
| LLM/AI-assisted features (NL query, schema-matching assistance, mapping suggestion) | Post-Foundation (informed by OAEI 2025/26 hybrid LLM patterns; see §10) |
| Multi-user collaboration / CRDT layer | Post-Foundation (informed by Yjs/Loro/Automerge/cr-sqlite analyses; see §10) |
| Marketplace meta-schema (schema-of-schema question) | Resolved in spirit by §2.5 dog-food; concrete mechanics still deferred |
§10 Long-horizon ideas considered, not committed
Section titled “§10 Long-horizon ideas considered, not committed”This section catalogs architectural ideas that surfaced during the second research wave but are not adopted in this log. They are worth surfacing for future decisions and may seed individual research challenges if the user signals interest.
LinkML as canonical schema substrate (Tier 0)
Section titled “LinkML as canonical schema substrate (Tier 0)”Source: Ch 12b §2.3.
The idea: LinkML is the schema language SSSOM itself is defined in. From a single YAML LinkML schema, codegen produces JSON Schema, ShEx, SHACL, OWL, Python dataclasses, Pydantic models, TypeScript types, and more. If Crosswalker adopted LinkML as a “Tier 0,” every engine choice (DuckDB / Oxigraph / Nemo / TerminusDB / etc.) becomes a serializer/deserializer plugin against the canonical LinkML schema rather than a competing schema authority.
Why interesting: Resolves the schema-substrate question that the §2.5 dog-food commitment opened. Future-proof against engine churn (engines come and go; LinkML schemas persist).
Why deferred: Adopting LinkML as Tier 0 is a meaningful architectural commitment with cascading implications across every Crosswalker schema (StewardshipProfile, junction-note 13-field, FrameworkConfig, _crosswalker metadata, etc.). Worth a dedicated Challenge 15 if the user signals interest.
Possible Challenge 15 brief: “Adopt LinkML as Crosswalker’s canonical schema substrate? Evaluate codegen overhead, contributor learning curve, alignment with SSSOM-py / OAK / Mondo prior art, migration path from current ad-hoc schemas.”
IPLD content-addressed crosswalks
Section titled “IPLD content-addressed crosswalks”Source: Ch 12b §4.2.
The idea: Every SSSOM row, bundle, and mapping_set hashes to an IPLD CID. Crosswalk releases become immutable Merkle DAGs. An audit can verify “we used exactly these mappings on the assessment date” by checking one hash. Distribution via CAR files; signing via W3C VCs.
Why interesting: Solves provenance permanently; couples cleanly with the cid: content-addressed identifier convention from Ch 09; enables federated distribution without a central registry.
Why deferred: Adds operational complexity; depends on at least some users adopting IPLD-aware tooling.
Possible Challenge 16 brief: “IPLD/CAR content-addressed distribution for SSSOM mapping sets — feasibility, tooling overhead, integration with existing Git-based vault.”
Tier 1.5 compilation pipeline
Section titled “Tier 1.5 compilation pipeline”Source: Ch 12b §2.2.
The idea: A Rust CLI (crosswalker compile) ingests the vault and emits multi-target artifacts in one pass: mappings.parquet (DuckDB/Polars/Datafusion), mappings.hdt and mappings.ttl (Comunica/Oxigraph), mappings.json-ld (web), oscal.json and sssom.tsv (regulator-facing exports), mappings.car (IPLD content-addressed bundle), rules.wasm (Ascent-compiled Datalog rules for browser inference).
Why interesting: Decouples the canonical layer (Markdown + SSSOM TSV) from any specific query engine. Engines become consumers of the compiled artifacts rather than competing storage layers. Each tier loads only the artifact it needs.
Why deferred: Substantial new architectural component. Worth its own decision after §5.B is locked in.
Reactive/incremental computation for derived crosswalk views
Section titled “Reactive/incremental computation for derived crosswalk views”Source: Ch 12b §2.4 + Ch 11b §4.7.
The idea: Adopt a Differential-Dataflow / Materialize-style execution model so derived views (coverage matrices, gap reports, MITRE ATT&CK Navigator overlays) update partially on each Markdown save rather than rebuilding from scratch. Realistic substrates: declarative-dataflow, Materialize OSS, Feldera/DBSP.
Why interesting: Live crosswalk editing is naturally a streaming workload; partial updates would dramatically improve UX at scale.
Why deferred: None of these run in the browser yet. Crosswalker’s “files canonical, derived stores rebuildable” principle already provides coarse incrementalism. Revisit if rebuild times become unbearable. Possible Challenge 17.
CRDT-based collaborative editing
Section titled “CRDT-based collaborative editing”Source: Ch 11 §2.4 (all three deliverables), Ch 12b §4.1.
The idea: Yjs (~10–30 KB gzipped, mature ecosystem) or Loro (newer, Rust-native, full editing-history DAG) for live multi-analyst editing of SSSOM rows. Or cr-sqlite (CRDT SQLite, preserves SQL shape over CRDT semantics).
Why interesting: Live-edit team mode is on the deferred list; this is the architectural option for it.
Why deferred: Live-edit is post-Foundation (Phase 2). When the time comes, Yjs is the safe default; cr-sqlite is the architecturally-intriguing alternative because it preserves SQL semantics natively.
AI-augmented mapping (LogMap-LLM, GenOM)
Section titled “AI-augmented mapping (LogMap-LLM, GenOM)”Source: Ch 12b §2.5 + Ch 11 §2.6.
The idea: OAEI 2025/2026 results show hybrid symbolic-skeleton (LogMap/AML) + embedding-retrieval (sqlite-vec or LanceDB-WASM) + LLM-oracle outperforms pure symbolic baselines by 5–16% F1 on biomedical alignments. Same pattern transfers to GRC frameworks. Architecture: vector-search nearest unmapped controls → top-K candidate to LLM → LLM emits SSSOM row with mapping_justification, confidence, predicate_modifier: candidate for review.
Why interesting: LLMs are now empirically validated for ontology alignment. Combined with sqlite-vec or LanceDB at Tier 2, in-Obsidian “suggest next mapping” UX becomes feasible.
Why deferred: Crosswalker’s GRC audience is acutely privacy-sensitive — LLM placement (in-browser via WebLLM, local sidecar via Ollama, BYOK cloud) is a separate architectural question. Defer until the basic three-tier engine stack is shipping.
Verifiable Credentials for mapping_set provenance
Section titled “Verifiable Credentials for mapping_set provenance”Source: Ch 13 §5 + Ch 12b §4.7.
The idea: W3C VCs v2.0 (Recommendation since May 2025) on the mapping_set so consumers can cryptographically verify “this mapping was signed by the official NIST OLIR submitter.” Combines naturally with IPLD CID-anchored bundles.
Why deferred: Track for v2.0+; auditor-familiarity is “very low” today (per Ch 13 table). EU eIDAS 2 + EUDI Wallet rollout is the catalyst that will eventually mainstream VCs in audit toolchains.
OSCAL native support (FedRAMP RFC-0024 demand-side validation)
Section titled “OSCAL native support (FedRAMP RFC-0024 demand-side validation)”Source: Ch 11c §4.4.
The idea: FedRAMP RFC-0024 mandates machine-readable authorisation packages by September 2026. NIST OSCAL (catalog/profile/component-definition/SSP/AP/AR/POA&M models) is the format federal agencies will require. Crosswalker that natively imports/exports OSCAL becomes 10× more valuable to the federal market.
Decision flagged in §5.C — this could become an architectural concern, not just a feature, depending on user input.
Federated crosswalks via Linked Data Fragments
Section titled “Federated crosswalks via Linked Data Fragments”Source: Ch 12b §4.7.
The idea: Every Crosswalker installation publishes a Triple Pattern Fragments (TPF) endpoint; Comunica federates queries across them client-side. No central registry. Combined with W3C VCs for signed mapping_sets, this enables a fully federated crosswalk distribution model without a central authority.
Why deferred: Cross-vault federation is Phase 2.
Related
Section titled “Related”The 05-01 pair (load-bearing for today’s direction):
- Foundation state of play (orientation, 05-01 AM) — web-of-webs framing, six open questions
- Foundation commitments and follow-on research (decisions, 05-01 PM) — five commitments, three new challenges, the StewardshipProfile rename, the meta-schema commitment
First-wave deliverables (Ch 08/09/10 — 2026-05-02 morning):
- Ch 08 deliverable: Is git history a tenable compliance audit trail?
- Ch 09 deliverable: UUID/CWUUID cross-cutting identifier strategy
- Ch 10 deliverable: Graph→tabular bridging engine for the web-of-webs
- Ch 06 deliverable: Pairwise vs synthetic spine (predecessor; resolved 2026-05-01)
Second-wave deliverables (Ch 11/12/13 — 2026-05-02 afternoon/evening):
- Ch 11 deliverable A: Engine survey, TerminusDB-as-Tier-3 emphasis — includes Grafeo follow-up
- Ch 11 deliverable B: Engine survey, layered Tier 2 stack — DuckDB-WASM + Oxigraph-WASM + Nemo-WASM
- Ch 11 deliverable C: Engine survey, layered + OSCAL/FedRAMP angle — multi-agent validation; FedRAMP RFC-0024 strategic insight
- Ch 12 deliverable A: Datalog vs SQL focused fork-in-the-road
- Ch 12 deliverable B: Beyond the known engine landscape (long-horizon) — LinkML, IPLD, Tier 1.5 compilation, AI-augmented mapping
- Ch 13 deliverable: Modern attestation primitives — Sigstore/in-toto/SLSA/OpenTimestamps/VCs/QLDB(dead)
The challenges (briefs):
- Challenge 08: Git audit-trail tenability — ✅ resolved (Ch 13 follow-on)
- Challenge 09: UUID/CWUUID strategy — ✅ resolved
- Challenge 10: Graph→tabular bridging engine — ✅ resolved (Ch 11+12 follow-on)
- Challenge 11: Tier 2/3 engine deep survey — ✅ resolved (3 deliverables)
- Challenge 12: Datalog vs SQL for SSSOM chain rules — ✅ resolved (2 deliverables)
- Challenge 13: Modern attestation primitives — ✅ resolved
- 🆕 Challenge 14: Missed engines evaluation — Phase 2 follow-on; spun up today
Roadmap (target of this log’s deltas):