Direction commitments (TL;DR) — what's locked in, what's still researching
Status at a glance
Section titled “Status at a glance”| # | Item | Status | Where |
|---|---|---|---|
| 1 | Tier 2 layered stack (DuckDB-WASM + Oxigraph + Nemo) — confirmed; extended with Tier 2-Lite alternate + Comunica federation add-on by Ch 14 | ✅ Confirmed | §2.1 |
| 2 | Tier 1 audit-trail — 4-tier model adopted by Ch 15; T2 OpenTimestamps default; git stack repositioned as one of three T3 options | ✅ Confirmed | §3.1 |
| 3 | Datalog (Nemo) for SSSOM derivation — placement explained | ✅ Committed | §2.2 |
| 4 | Sigstore/gitsign as configurable alternative (now scoped to T3 architecture C) | ✅ Committed | §2.3 |
| 5 | SLSA targeting L1→L2→L3 — explanation added | ✅ Committed | §2.4 |
| 6 | Materialized-folder Tier 1 generator | ✅ Committed | §2.5 |
| 7 | Tier 3 default — flipped by Ch 16: Apache Jena Fuseki primary + oxigraph-server same-API alternative; AGE retained as fallback | ✅ Confirmed | §3.2 |
| 8 | TerminusDB as vault-mirror only — small-vendor (DFRNT) risk explicitly flagged by Ch 16 | ✅ Committed | §2.6 |
| 9 | LinkML as canonical schema substrate | 🅿️ Parked (idea bucket) | §4.1 |
| 10 | OSCAL native support — wire format on export/import boundary | ✅ Yes (after core); deferred to Phase 2+; document via registry/oscal page mapping OSCAL into Crosswalker mental model | §3.3 |
| 11 | Grafeo evaluation — resolved by Ch 14 deliverable; track in long-horizon list with explicit migration triggers; do not adopt yet | ✅ Resolved | §3.4 |
Score after third wave + sign-off: 9 confirmed (#1, #2, #3, #4, #5, #6, #7, #8, #10). 1 parked (#9 LinkML). 1 resolved-track (#11 Grafeo). #10 OSCAL deferred to Phase 2+ but committed-in-principle.
See the third-wave architectural shifts log for the deltas behind #1, #2, #7, and #11.
v0.1 build-target reframe: rows #1 (Tier 2 layered stack), #2 (Tier 1 audit-trail T2 OTS default), #4 (Sigstore/gitsign Tier 3 audit option), #7 (Tier 3 default), #8 (TerminusDB vault-mirror), #10 (OSCAL native) are all back-pocket research / opt-in companion plugins or future phases, not v0.1 build target. The v0.1 stack is in the v0.1 stack-pivot log §3. Rows #3 (Datalog placement), #5 (SLSA targeting), #6 (materialized-folder generator), #9 (LinkML parked), #11 (Grafeo) are unaffected by the v0.1 pivot.
§2 Confirmed commitments
Section titled “§2 Confirmed commitments”2.1 Tier 2 layered stack — DuckDB-WASM + Oxigraph + Nemo (confirmed)
Section titled “2.1 Tier 2 layered stack — DuckDB-WASM + Oxigraph + Nemo (confirmed)”Confirmed by Challenge 14 deliverable — see third-wave log §2. None of the seven candidate engines (Grafeo, Minigraf, CozoDB, SurrealDB-WASM, Comunica, cr-sqlite, sqlite-wasm) succeeded at “collapse to one engine”; SurrealDB busted the bundle budget; Grafeo and Minigraf earned watchlist slots with explicit migration triggers.
What Ch 14 added:
- Tier 2-Lite alternate stack (sqlite-wasm + sqlite-vec + simple-graph + recursive-CTE; ~1.5 MB compressed) for Obsidian Mobile / low-end / restricted-CSP environments. SSSOM rule subset and scale ceiling need a dedicated brief — listed as Ch 18 candidate.
- Comunica + N3 + HDT federation add-on (~250–300 KB gzipped) for cross-vault, cross-org, external SPARQL endpoint queries. Genuinely additive — Oxigraph stays primary for local queries.
What was chosen and why (with alternatives that lost)
Section titled “What was chosen and why (with alternatives that lost)”Analytical / SQL surface — DuckDB-WASM
| DuckDB-WASM (chosen) | Polars-WASM | sql.js | DataFusion-WASM | ClickHouse-local | |
|---|---|---|---|---|---|
| Bundle | ~3.2 MB Brotli (lazy-loaded) | tens of MB (Pyodide) | ~1.5 MB | ~10s of MB | n/a (no WASM) |
| License | MIT | MIT | Public domain | Apache-2.0 | Apache-2.0 |
| Project health | DuckDB Foundation, weekly releases | Alpha; “not for production” | Active but slow | Experimental WASM playground | Active; no WASM port |
| Joins/pivots | Native PIVOT/UNPIVOT/window | Native | Limited | Yes | Yes |
| Recursive CTE | Yes (USING KEY since May 2025) | No | Yes (basic) | Yes | n/a |
| Verdict | Pick | Reject (alpha) | Fallback only | Reject (size + experimental) | Reject (no WASM) |
RDF / SPARQL surface — Oxigraph-WASM
| Oxigraph-WASM (chosen) | Comunica + N3 + HDT | Apache Jena Fuseki | GraphDB (Ontotext) | Stardog | |
|---|---|---|---|---|---|
| Bundle | ~3-4 MB | ~200 KB gzipped | n/a (JVM server) | n/a | n/a |
| License | Apache-2.0/MIT | Apache-2.0/MIT | Apache-2.0 | Commercial | Commercial |
| SPARQL 1.1 | Full + RDF 1.2 preview | Full + federation | Full + RDFS/OWL inference | Full + best OWL reasoning | Full + OWL+ICV |
| Browser/Embedded | Yes (in-memory only on Wasm) | Yes (TS-native, lighter) | No | No | No |
| Verdict | Pick (current) | Re-evaluate in Ch 14 — could be lighter | Tier 3 sidecar option | Reject (commercial) | Reject (commercial) |
Datalog / SSSOM derivation engine — Nemo
| Nemo (chosen) | Soufflé | CozoDB | RDFox | Differential Datalog | |
|---|---|---|---|---|---|
| License | Apache-2.0/MIT | UPL-1.0 | MPL-2.0 | Commercial | MIT (archived) |
| WASM | Yes (shipping) | No (C++→native) | Yes (in-memory only) | No (JVM/native) | Archived |
| Production validation | EBI’s OxO2 (1.16M mappings → 49.5K inferences in 17 min on a laptop) | Industry-grade for static analysis | Slowing in 2024–2025 | Highest-quality OWL 2 RL reasoner | VMware archived |
| Provenance support | Native (“tracing”) | Yes | Yes | Best-in-class (why-provenance) | Yes |
| Verdict | Pick | Reject (no WASM) | Re-evaluate in Ch 14 | Reject (commercial) | Reject (archived) |
Why “layered” and not “single engine”?
Section titled “Why “layered” and not “single engine”?”Crosswalker has three different mathematics at Tier 2:
- Tabular pivots (analyst spreadsheet) → SQL → DuckDB
- Ontology semantics (RDF, SKOS, SSSOM standards) → SPARQL → Oxigraph
- Logical derivation (SSSOM chain rules over crosswalk edges) → Datalog → Nemo
Forcing all three onto a single engine produces 30% great-fit code and 70% awkward workarounds.
Caveat: Grafeo (in Challenge 14) claims to do all three (SQL/PGQ + SPARQL + Cypher + GQL + vector). If it works, the layered stack collapses to one engine.
2.2 Datalog (Nemo) for SSSOM chain-rule derivation
Section titled “2.2 Datalog (Nemo) for SSSOM chain-rule derivation”Datalog is a declarative logic-programming language — you write rules like “if A maps to B and B maps to C, then A maps to C with min confidence” once, and the engine derives all consequences. See the verbose Datalog glossary entry for the full explainer of why we use it instead of plain SQL.
Where Nemo lives in the architecture (web-of-webs mapping)
Section titled “Where Nemo lives in the architecture (web-of-webs mapping)”Nemo is not a query engine for live user queries. It is a build-pipeline tier between Tier 1 (canonical files) and Tier 2 (query surface):
In web-of-webs terms
Section titled “In web-of-webs terms”From the orientation log’s web-of-webs framing:
- The source-ontology webs (NIST/CIS/MITRE/etc.) start with only their own internal edges (hierarchy, references)
- Crosswalk edges between source-ontology webs are asserted by the user (or imported from SCF/OLIR)
- Derived crosswalk edges are produced by Nemo composing pairwise mappings through chains — including chains that route through the optional pivot/spine web
So Nemo’s job: densify the crosswalk web by computing transitive closures over asserted edges using SSSOM chain rules. It runs as a build step (not on every query); derived edges are saved to canonical files alongside asserted ones.
This matches EBI’s OxO2 architecture exactly. Crosswalker is essentially “OxO2 for compliance frameworks.”
2.3 Sigstore/gitsign as configurable alternative
Section titled “2.3 Sigstore/gitsign as configurable alternative”For commit signing when git is in use. GPG/SSH stays default; Sigstore is an optional swap-in for teams that want federated OIDC-backed signing (cleanest path to SLSA L3).
Caveat: assumes git is in use at all — see §3.1 on whether that assumption holds in some environments.
2.4 SLSA targeting L1→L2→L3 (explained)
Section titled “2.4 SLSA targeting L1→L2→L3 (explained)”SLSA = Supply-chain Levels for Software Artifacts (slsa.dev/spec/v1.0/). Originally a software-build-integrity framework; we apply it analogically to compliance evidence pipelines.
The progression Crosswalker targets:
| Version | SLSA Target | What’s required | Cost |
|---|---|---|---|
| v0.1 | L1 | Documented build process; emit a provenance file alongside each commit (just an in-toto SLSA-Provenance JSON). | Trivial — already implicit in git+commit-signing; just needs an in-toto Provenance predicate emitted on commit |
| v1.0 | L2 | Provenance generation runs in a “hosted” context (CI workflow), digitally signed with a key the user can’t forge. | Modest — requires a Crosswalker-managed pre-commit hook or CI workflow signing the provenance |
| v2.0+ | L3 | Hardened build with non-extractable signing keys (Fulcio’s ephemeral certs are the cleanest way). Requires gitsign + Fulcio + Rekor (Sigstore). | Significant — requires either a managed Crosswalker SaaS-style verifier service OR gitsign with OIDC; this is the architectural argument for Sigstore at v2.0+ |
Why this matters: L1/L2/L3 is auditor vocabulary. Saying “we target SLSA Build L2 by v1.0” gives a SOC 2 / ISO 27001 auditor a familiar handle. It’s a credibility lever, not an engineering checklist.
2.5 Materialized-folder Tier 1 generator
Section titled “2.5 Materialized-folder Tier 1 generator”Plugin auto-generates Bases-compatible folders containing pre-joined/merged data so users can browse cross-tabs that Bases otherwise cannot compute (no joins, no pivots in Bases). Survives any Tier 2 engine choice — concept is independent of engine selection.
2.6 TerminusDB as optional vault-mirror only
Section titled “2.6 TerminusDB as optional vault-mirror only”If a user wants Git-style branch/diff/merge over the curated crosswalk graph, TerminusDB can be deployed as a parallel governance database that reads from the canonical Markdown vault. It is not the system of record; not the default Tier 3.
§3 Needs more research (with new challenges spun up)
Section titled “§3 Needs more research (with new challenges spun up)”3.1 Tier 1 audit-trail — 4-tier model with OpenTimestamps T2 default
Section titled “3.1 Tier 1 audit-trail — 4-tier model with OpenTimestamps T2 default”Resolved by Challenge 15 deliverable — see third-wave log §4.
What was committed by Ch 15
Section titled “What was committed by Ch 15”A 4-tier audit-trail model (T0 floor / T1 credible / T2 defensible / T3 court-defensible) with .audit/chain.jsonl as the universal substrate that works in both git and non-git modes.
| Tier | Default? | Substrate |
|---|---|---|
| T0 Floor | — | Edit History plugin or Obsidian Sync version history (no cryptographic guarantee) |
| T1 Credible | — | .audit/chain.jsonl with prev_hash links + Ed25519 signatures, vault-anchored |
| T2 Defensible | ✅ New default | T1 + OpenTimestamps .ots on signed chain checkpoints (free, decentralized, offline-buffered, license-free) |
| T3 Court-Defensible | — | T2 + (a) FRE 902(13) PDF + S3 Object Lock; or (b) eIDAS qualified TSA + W3C VC; or (c) Sigstore Rekor v2 + in-toto |
Where the Ch 08+13 git stack went
Section titled “Where the Ch 08+13 git stack went”The signed-commit + RFC 3161 + S3 Object Lock + FRE 902(13) + in-toto stack from Ch 08+13 is kept as a first-class option but no longer the default. It is now Tier 3 architecture A — recommended for users who already have signed-commit workflows, deploy to AWS or another Object-Lock-capable provider, and whose auditors specifically request “WORM + 902(13) PDF”. For the great majority of GRC consultancy work, T2 with OpenTimestamps reaches a comparable evidentiary standard without requiring git.
What’s new
Section titled “What’s new”- PQC migration plan 2026 (Ed25519) → 2027 (dual-sign Ed25519 + ML-DSA-44) → 2030 (deprecate Ed25519-only) → 2032 (fully PQC) — well ahead of NIST IR 8547’s 2035 deadline.
- Single audit-ready badge with progressive disclosure: gray T0 / blue T1 / green T2 / gold T3, plus honest tier-floor messaging (“T0 — version history only” is not “audit-ready”).
- External CLI verifier (
crosswalker-verify) with zero Obsidian dependencies, runnable on the auditor’s machine. - Per-persona tier mapping — see third-wave log §4.4 for the full table (solo consultant US/EU, locked-down enterprise, federal/air-gapped, multi-tenant team, EU AI Act / DORA / NIS2).
3.2 Tier 3 stack — default flipped from AGE to Fuseki/oxigraph-server
Section titled “3.2 Tier 3 stack — default flipped from AGE to Fuseki/oxigraph-server”Resolved by Challenge 16 deliverable — see third-wave log §3.
The flip
Section titled “The flip”| Profile | New Tier 3 default | Why |
|---|---|---|
| Default — small GRC team, ≤500k mappings | Apache Jena Fuseki | Apache TLP governance (multi-employer PMC, no key-person risk); ~2 decades of releases; SPARQL 1.1 + RDFS/OWL inference + SHACL; safest 5–10-year bet |
| Same-API lighter alternative | oxigraph-server | Architectural symmetry with Tier 2 (same engine, just oxigraph serve); single Rust binary or Docker container; smaller footprint than Fuseki’s JVM |
| Power user — multi-team, mixed SQL+graph, multi-million mappings | Layered Fuseki + DuckDB-on-server | Federated via SPARQL SERVICE and DuckDB httpfs / postgres_scanner. Crossover point: above ~250k mappings with mixed workloads |
| Postgres-standardized shop | Apache AGE (kept as supported fallback) or plain Postgres + JSONB + recursive CTEs | AGE is the only option that lives inside an existing Postgres; the boring SQL option remains viable for ≥90% of queries |
Why AGE was demoted (not dropped)
Section titled “Why AGE was demoted (not dropped)”The user’s concern was substantiated: sponsor pivot (Bitnine → SKAI Worldwide moved into AI advertising), the November 2024 PG 17.1 ABI break that hit AGE alongside TimescaleDB, slow per-PG-line release cadence (PG 18 support landed late 2025/early 2026), and Apache board minutes reporting “reduced activity year-over-year” with no new committers. AGE remains supported as a fallback because its killer feature — running graph queries inside an existing Postgres instance with shared transactions and indexes — has no substitute for Postgres-standardized environments.
Migration is a re-projection, not a translation
Section titled “Migration is a re-projection, not a translation”The architectural payoff of files-canonical: because mappings are canonically SSSOM (markdown + YAML in the vault), any database is by definition a projection of the canonical files, not the source of truth. AGE→Fuseki migration is “re-run Crosswalker’s SSSOM-to-RDF projector against the new engine,” not “translate AGE data to Fuseki data.” See third-wave log §3.6 for concrete steps.
Watch but do not adopt
Section titled “Watch but do not adopt”HelixDB (AGPL + YC-stage + custom DSL); ArcadeDB (small contributor base, but Apache-2.0 and built-in MCP); SurrealDB (BSL license is the procurement blocker). Each is interesting; none is mature/governance-stable enough to sit under a small open-source GRC tool with a 5–10-year horizon today.
3.3 OSCAL native support — placement in the web-of-webs
Section titled “3.3 OSCAL native support — placement in the web-of-webs”Where OSCAL fits in the architecture
Section titled “Where OSCAL fits in the architecture”OSCAL is not an internal data model. It is a federally-recognized wire format for export/import:
In web-of-webs terms
Section titled “In web-of-webs terms”OSCAL is the wire format between Crosswalker’s web-of-webs and external federal GRC systems. It is not part of the web-of-webs itself; it is the edge of it — the export/import boundary where the internal SSSOM-flavored data is reshaped into NIST’s preferred JSON/XML/YAML formats.
In the orientation log’s web-of-webs diagram, OSCAL is the port through which:
- Source-ontology webs ↔ NIST OSCAL catalogs (machine-readable control catalogs)
- Crosswalk edges ↔ OSCAL Control Mapping Model records
- Evidence-vault web ↔ OSCAL Assessment Results
Why we’d promote it from “feature” to “architectural concern”
Section titled “Why we’d promote it from “feature” to “architectural concern””| Consideration | Argument |
|---|---|
| FedRAMP RFC-0024 | Mandates machine-readable authorisation packages by Sept 2026. Federal customers will require OSCAL. |
| Credibility multiplier | Even non-federal customers see OSCAL native support as evidence of auditor-grade data modeling. |
| Adjacent ecosystem | Many commercial GRC tools (Hyperproof, Drata, Vanta, AuditBoard, RegScale) already produce OSCAL output. Crosswalker as OSCAL bridge = useful glue. |
| No internal compromise | We don’t change SSSOM-internally; OSCAL is just a serializer/deserializer. |
What “promotion” actually means
Section titled “What “promotion” actually means”- Add
crosswalker import oscal <file>andcrosswalker export oscal --type {catalog,profile,ssp,ar,...}commands - Make OSCAL round-trip a tested feature (not best-effort)
- Add Foundation roadmap item for OSCAL native support
Decision (received 2026-05-02): ✅ yes, but after core working. User read on OSCAL: it sounds a lot like a custom schema we map to (i.e., a wire format on the export/import boundary, not an internal model). Treat it as that, document it in Crosswalker’s mental-model vocabulary, and don’t promote it to architectural concern until the core SSSOM-internal pipeline is solid.
Action items:
- Update
reference/registry/oscal.mdxto map OSCAL into Crosswalker’s mental model (web-of-webs framing; map OSCAL-native terms to Crosswalker synonyms —catalog↔ source-ontology web,Control Mapping Model↔ crosswalk edges,Assessment Result↔ junction notes / evidence-vault web) - Cross-link from any page that mentions OSCAL back to that page
- Roadmap placement: deferred until core export/import boundary is well-defined; flagged for Phase 2+ rather than Foundation
3.4 Grafeo evaluation — resolved by Ch 14
Section titled “3.4 Grafeo evaluation — resolved by Ch 14”Resolved. Folded into the Ch 14 deliverable §2.1. Verdict: track in long-horizon list with explicit migration triggers; do not adopt yet. Genuinely impressive surface area (LPG+RDF+six query languages+HNSW+CDC+WASM+IndexedDB persistence), but ~6 months old, v0.5.x, ~582 stars, single-sponsor (Supernovae), vendor-only benchmarks, no W3C SPARQL conformance proof. A 3-year survival probability of 50–60% is the honest estimate.
Migration trigger A spells out the conditions under which Grafeo would collapse the Tier 2 stack to a single engine.
§4 Parked
Section titled “§4 Parked”4.1 LinkML as canonical schema substrate (idea bucket)
Section titled “4.1 LinkML as canonical schema substrate (idea bucket)”Major architectural pivot. Ch 12b deliverable B made the strongest argument: LinkML auto-generates JSON Schema, OWL, SHACL, Pydantic, TypeScript from a single YAML; SSSOM is itself defined in LinkML; could be Crosswalker’s “Tier 0”.
Real benefit: decouples engines from schema authority. Every engine becomes a serializer/deserializer plugin against the canonical LinkML schema rather than a competing schema authority.
Cost: cascading commitment across every Crosswalker schema (StewardshipProfile, junction-note 13-field, FrameworkConfig, _crosswalker metadata, etc.).
Park for now. Spin up a future Challenge 17 if interest renews after Ch 14/15/16 land.
§5 Related
Section titled “§5 Related”- 2026-05-02 direction third-wave shifts log — captures the deltas behind the third-wave updates to §2.1, §3.1, §3.2, §3.4 above
- Ch 14 deliverable: Missed engines evaluation
- Ch 15 deliverable: Audit-trail alternatives without external git tooling
- Ch 16 deliverable: Tier 3 stack reconsideration
- 2026-05-02 direction log (bloated, full second wave) — research record for 9 fresh-agent deliverables across Ch 08–13. Don’t read for normal navigation.
- 05-01 commitments log — predecessor; pairwise+pivot, junction-notes-by-tier, StewardshipProfile rename, meta-schema commitment
- 05-01 orientation log — the web-of-webs framing referenced in §2.2 (Nemo placement) and §3.3 (OSCAL placement)
- Active challenges — Ch 14 (Grafeo, user-driven), Ch 15 NEW (non-git audit trail), Ch 16 NEW (Tier 3 alternatives)