Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Direction commitments (TL;DR) — what's locked in, what's still researching

Created Updated
#ItemStatusWhere
1Tier 2 layered stack (DuckDB-WASM + Oxigraph + Nemo) — confirmed; extended with Tier 2-Lite alternate + Comunica federation add-on by Ch 14✅ Confirmed§2.1
2Tier 1 audit-trail — 4-tier model adopted by Ch 15; T2 OpenTimestamps default; git stack repositioned as one of three T3 options✅ Confirmed§3.1
3Datalog (Nemo) for SSSOM derivation — placement explained✅ Committed§2.2
4Sigstore/gitsign as configurable alternative (now scoped to T3 architecture C)✅ Committed§2.3
5SLSA targeting L1→L2→L3 — explanation added✅ Committed§2.4
6Materialized-folder Tier 1 generator✅ Committed§2.5
7Tier 3 default — flipped by Ch 16: Apache Jena Fuseki primary + oxigraph-server same-API alternative; AGE retained as fallback✅ Confirmed§3.2
8TerminusDB as vault-mirror only — small-vendor (DFRNT) risk explicitly flagged by Ch 16✅ Committed§2.6
9LinkML as canonical schema substrate🅿️ Parked (idea bucket)§4.1
10OSCAL native support — wire format on export/import boundary✅ Yes (after core); deferred to Phase 2+; document via registry/oscal page mapping OSCAL into Crosswalker mental model§3.3
11Grafeo evaluation — resolved by Ch 14 deliverable; track in long-horizon list with explicit migration triggers; do not adopt yet✅ Resolved§3.4

Score after third wave + sign-off: 9 confirmed (#1, #2, #3, #4, #5, #6, #7, #8, #10). 1 parked (#9 LinkML). 1 resolved-track (#11 Grafeo). #10 OSCAL deferred to Phase 2+ but committed-in-principle.

See the third-wave architectural shifts log for the deltas behind #1, #2, #7, and #11.

v0.1 build-target reframe: rows #1 (Tier 2 layered stack), #2 (Tier 1 audit-trail T2 OTS default), #4 (Sigstore/gitsign Tier 3 audit option), #7 (Tier 3 default), #8 (TerminusDB vault-mirror), #10 (OSCAL native) are all back-pocket research / opt-in companion plugins or future phases, not v0.1 build target. The v0.1 stack is in the v0.1 stack-pivot log §3. Rows #3 (Datalog placement), #5 (SLSA targeting), #6 (materialized-folder generator), #9 (LinkML parked), #11 (Grafeo) are unaffected by the v0.1 pivot.

2.1 Tier 2 layered stack — DuckDB-WASM + Oxigraph + Nemo (confirmed)

Section titled “2.1 Tier 2 layered stack — DuckDB-WASM + Oxigraph + Nemo (confirmed)”

Confirmed by Challenge 14 deliverable — see third-wave log §2. None of the seven candidate engines (Grafeo, Minigraf, CozoDB, SurrealDB-WASM, Comunica, cr-sqlite, sqlite-wasm) succeeded at “collapse to one engine”; SurrealDB busted the bundle budget; Grafeo and Minigraf earned watchlist slots with explicit migration triggers.

What Ch 14 added:

  • Tier 2-Lite alternate stack (sqlite-wasm + sqlite-vec + simple-graph + recursive-CTE; ~1.5 MB compressed) for Obsidian Mobile / low-end / restricted-CSP environments. SSSOM rule subset and scale ceiling need a dedicated brief — listed as Ch 18 candidate.
  • Comunica + N3 + HDT federation add-on (~250–300 KB gzipped) for cross-vault, cross-org, external SPARQL endpoint queries. Genuinely additive — Oxigraph stays primary for local queries.

What was chosen and why (with alternatives that lost)

Section titled “What was chosen and why (with alternatives that lost)”

Analytical / SQL surface — DuckDB-WASM

DuckDB-WASM (chosen)Polars-WASMsql.jsDataFusion-WASMClickHouse-local
Bundle~3.2 MB Brotli (lazy-loaded)tens of MB (Pyodide)~1.5 MB~10s of MBn/a (no WASM)
LicenseMITMITPublic domainApache-2.0Apache-2.0
Project healthDuckDB Foundation, weekly releasesAlpha; “not for production”Active but slowExperimental WASM playgroundActive; no WASM port
Joins/pivotsNative PIVOT/UNPIVOT/windowNativeLimitedYesYes
Recursive CTEYes (USING KEY since May 2025)NoYes (basic)Yesn/a
VerdictPickReject (alpha)Fallback onlyReject (size + experimental)Reject (no WASM)

RDF / SPARQL surface — Oxigraph-WASM

Oxigraph-WASM (chosen)Comunica + N3 + HDTApache Jena FusekiGraphDB (Ontotext)Stardog
Bundle~3-4 MB~200 KB gzippedn/a (JVM server)n/an/a
LicenseApache-2.0/MITApache-2.0/MITApache-2.0CommercialCommercial
SPARQL 1.1Full + RDF 1.2 previewFull + federationFull + RDFS/OWL inferenceFull + best OWL reasoningFull + OWL+ICV
Browser/EmbeddedYes (in-memory only on Wasm)Yes (TS-native, lighter)NoNoNo
VerdictPick (current)Re-evaluate in Ch 14 — could be lighterTier 3 sidecar optionReject (commercial)Reject (commercial)

Datalog / SSSOM derivation engine — Nemo

Nemo (chosen)SouffléCozoDBRDFoxDifferential Datalog
LicenseApache-2.0/MITUPL-1.0MPL-2.0CommercialMIT (archived)
WASMYes (shipping)No (C++→native)Yes (in-memory only)No (JVM/native)Archived
Production validationEBI’s OxO2 (1.16M mappings → 49.5K inferences in 17 min on a laptop)Industry-grade for static analysisSlowing in 2024–2025Highest-quality OWL 2 RL reasonerVMware archived
Provenance supportNative (“tracing”)YesYesBest-in-class (why-provenance)Yes
VerdictPickReject (no WASM)Re-evaluate in Ch 14Reject (commercial)Reject (archived)

Why “layered” and not “single engine”?

Section titled “Why “layered” and not “single engine”?”

Crosswalker has three different mathematics at Tier 2:

  • Tabular pivots (analyst spreadsheet) → SQL → DuckDB
  • Ontology semantics (RDF, SKOS, SSSOM standards) → SPARQL → Oxigraph
  • Logical derivation (SSSOM chain rules over crosswalk edges) → Datalog → Nemo

Forcing all three onto a single engine produces 30% great-fit code and 70% awkward workarounds.

Caveat: Grafeo (in Challenge 14) claims to do all three (SQL/PGQ + SPARQL + Cypher + GQL + vector). If it works, the layered stack collapses to one engine.

2.2 Datalog (Nemo) for SSSOM chain-rule derivation

Section titled “2.2 Datalog (Nemo) for SSSOM chain-rule derivation”

Datalog is a declarative logic-programming language — you write rules like “if A maps to B and B maps to C, then A maps to C with min confidence” once, and the engine derives all consequences. See the verbose Datalog glossary entry for the full explainer of why we use it instead of plain SQL.

Where Nemo lives in the architecture (web-of-webs mapping)

Section titled “Where Nemo lives in the architecture (web-of-webs mapping)”

Nemo is not a query engine for live user queries. It is a build-pipeline tier between Tier 1 (canonical files) and Tier 2 (query surface):

Tier 1 (canonical files)

    │  Markdown + SSSOM TSV
    │  (asserted crosswalk edges only)

┌─────────────────────────────────────────────────────────┐
│  Build-pipeline tier  ← NEMO LIVES HERE                 │
│                                                         │
│  Nemo applies SSSOM chain rules:                        │
│    if A→[skos:exactMatch]→B  AND  B→[skos:closeMatch]→C │
│    then derive A→[skos:closeMatch]→C                    │
│    with provenance: derivation_path = [edge1, edge2]    │
│                                                         │
│  Output: NEW derived edges saved alongside              │
│  asserted edges in the canonical files                  │
└─────────────────────────────────────────────────────────┘

    │  Asserted + derived edges (both in files)

Tier 2 (query surface, in-Obsidian)

    │  DuckDB-WASM queries asserted + derived edges as SQL
    │  Oxigraph queries the same graph as RDF/SPARQL


End user sees a complete crosswalk

From the orientation log’s web-of-webs framing:

  • The source-ontology webs (NIST/CIS/MITRE/etc.) start with only their own internal edges (hierarchy, references)
  • Crosswalk edges between source-ontology webs are asserted by the user (or imported from SCF/OLIR)
  • Derived crosswalk edges are produced by Nemo composing pairwise mappings through chains — including chains that route through the optional pivot/spine web

So Nemo’s job: densify the crosswalk web by computing transitive closures over asserted edges using SSSOM chain rules. It runs as a build step (not on every query); derived edges are saved to canonical files alongside asserted ones.

This matches EBI’s OxO2 architecture exactly. Crosswalker is essentially “OxO2 for compliance frameworks.”

2.3 Sigstore/gitsign as configurable alternative

Section titled “2.3 Sigstore/gitsign as configurable alternative”

For commit signing when git is in use. GPG/SSH stays default; Sigstore is an optional swap-in for teams that want federated OIDC-backed signing (cleanest path to SLSA L3).

Caveat: assumes git is in use at all — see §3.1 on whether that assumption holds in some environments.

2.4 SLSA targeting L1→L2→L3 (explained)

Section titled “2.4 SLSA targeting L1→L2→L3 (explained)”

SLSA = Supply-chain Levels for Software Artifacts (slsa.dev/spec/v1.0/). Originally a software-build-integrity framework; we apply it analogically to compliance evidence pipelines.

The progression Crosswalker targets:

VersionSLSA TargetWhat’s requiredCost
v0.1L1Documented build process; emit a provenance file alongside each commit (just an in-toto SLSA-Provenance JSON).Trivial — already implicit in git+commit-signing; just needs an in-toto Provenance predicate emitted on commit
v1.0L2Provenance generation runs in a “hosted” context (CI workflow), digitally signed with a key the user can’t forge.Modest — requires a Crosswalker-managed pre-commit hook or CI workflow signing the provenance
v2.0+L3Hardened build with non-extractable signing keys (Fulcio’s ephemeral certs are the cleanest way). Requires gitsign + Fulcio + Rekor (Sigstore).Significant — requires either a managed Crosswalker SaaS-style verifier service OR gitsign with OIDC; this is the architectural argument for Sigstore at v2.0+

Why this matters: L1/L2/L3 is auditor vocabulary. Saying “we target SLSA Build L2 by v1.0” gives a SOC 2 / ISO 27001 auditor a familiar handle. It’s a credibility lever, not an engineering checklist.

Plugin auto-generates Bases-compatible folders containing pre-joined/merged data so users can browse cross-tabs that Bases otherwise cannot compute (no joins, no pivots in Bases). Survives any Tier 2 engine choice — concept is independent of engine selection.

2.6 TerminusDB as optional vault-mirror only

Section titled “2.6 TerminusDB as optional vault-mirror only”

If a user wants Git-style branch/diff/merge over the curated crosswalk graph, TerminusDB can be deployed as a parallel governance database that reads from the canonical Markdown vault. It is not the system of record; not the default Tier 3.

§3 Needs more research (with new challenges spun up)

Section titled “§3 Needs more research (with new challenges spun up)”

3.1 Tier 1 audit-trail — 4-tier model with OpenTimestamps T2 default

Section titled “3.1 Tier 1 audit-trail — 4-tier model with OpenTimestamps T2 default”

Resolved by Challenge 15 deliverable — see third-wave log §4.

A 4-tier audit-trail model (T0 floor / T1 credible / T2 defensible / T3 court-defensible) with .audit/chain.jsonl as the universal substrate that works in both git and non-git modes.

TierDefault?Substrate
T0 FloorEdit History plugin or Obsidian Sync version history (no cryptographic guarantee)
T1 Credible.audit/chain.jsonl with prev_hash links + Ed25519 signatures, vault-anchored
T2 Defensible✅ New defaultT1 + OpenTimestamps .ots on signed chain checkpoints (free, decentralized, offline-buffered, license-free)
T3 Court-DefensibleT2 + (a) FRE 902(13) PDF + S3 Object Lock; or (b) eIDAS qualified TSA + W3C VC; or (c) Sigstore Rekor v2 + in-toto

The signed-commit + RFC 3161 + S3 Object Lock + FRE 902(13) + in-toto stack from Ch 08+13 is kept as a first-class option but no longer the default. It is now Tier 3 architecture A — recommended for users who already have signed-commit workflows, deploy to AWS or another Object-Lock-capable provider, and whose auditors specifically request “WORM + 902(13) PDF”. For the great majority of GRC consultancy work, T2 with OpenTimestamps reaches a comparable evidentiary standard without requiring git.

  • PQC migration plan 2026 (Ed25519) → 2027 (dual-sign Ed25519 + ML-DSA-44) → 2030 (deprecate Ed25519-only) → 2032 (fully PQC) — well ahead of NIST IR 8547’s 2035 deadline.
  • Single audit-ready badge with progressive disclosure: gray T0 / blue T1 / green T2 / gold T3, plus honest tier-floor messaging (“T0 — version history only” is not “audit-ready”).
  • External CLI verifier (crosswalker-verify) with zero Obsidian dependencies, runnable on the auditor’s machine.
  • Per-persona tier mapping — see third-wave log §4.4 for the full table (solo consultant US/EU, locked-down enterprise, federal/air-gapped, multi-tenant team, EU AI Act / DORA / NIS2).

3.2 Tier 3 stack — default flipped from AGE to Fuseki/oxigraph-server

Section titled “3.2 Tier 3 stack — default flipped from AGE to Fuseki/oxigraph-server”

Resolved by Challenge 16 deliverable — see third-wave log §3.

ProfileNew Tier 3 defaultWhy
Default — small GRC team, ≤500k mappingsApache Jena FusekiApache TLP governance (multi-employer PMC, no key-person risk); ~2 decades of releases; SPARQL 1.1 + RDFS/OWL inference + SHACL; safest 5–10-year bet
Same-API lighter alternativeoxigraph-serverArchitectural symmetry with Tier 2 (same engine, just oxigraph serve); single Rust binary or Docker container; smaller footprint than Fuseki’s JVM
Power user — multi-team, mixed SQL+graph, multi-million mappingsLayered Fuseki + DuckDB-on-serverFederated via SPARQL SERVICE and DuckDB httpfs / postgres_scanner. Crossover point: above ~250k mappings with mixed workloads
Postgres-standardized shopApache AGE (kept as supported fallback) or plain Postgres + JSONB + recursive CTEsAGE is the only option that lives inside an existing Postgres; the boring SQL option remains viable for ≥90% of queries

The user’s concern was substantiated: sponsor pivot (Bitnine → SKAI Worldwide moved into AI advertising), the November 2024 PG 17.1 ABI break that hit AGE alongside TimescaleDB, slow per-PG-line release cadence (PG 18 support landed late 2025/early 2026), and Apache board minutes reporting “reduced activity year-over-year” with no new committers. AGE remains supported as a fallback because its killer feature — running graph queries inside an existing Postgres instance with shared transactions and indexes — has no substitute for Postgres-standardized environments.

Migration is a re-projection, not a translation

Section titled “Migration is a re-projection, not a translation”

The architectural payoff of files-canonical: because mappings are canonically SSSOM (markdown + YAML in the vault), any database is by definition a projection of the canonical files, not the source of truth. AGE→Fuseki migration is “re-run Crosswalker’s SSSOM-to-RDF projector against the new engine,” not “translate AGE data to Fuseki data.” See third-wave log §3.6 for concrete steps.

HelixDB (AGPL + YC-stage + custom DSL); ArcadeDB (small contributor base, but Apache-2.0 and built-in MCP); SurrealDB (BSL license is the procurement blocker). Each is interesting; none is mature/governance-stable enough to sit under a small open-source GRC tool with a 5–10-year horizon today.

3.3 OSCAL native support — placement in the web-of-webs

Section titled “3.3 OSCAL native support — placement in the web-of-webs”

OSCAL is not an internal data model. It is a federally-recognized wire format for export/import:

Tier 1 (canonical Crosswalker vault)

    │  Markdown + SSSOM TSV (internal source of truth)


═════════════════════════════════════════════════════════
EXPORT/IMPORT BOUNDARY  ← OSCAL LIVES HERE
═════════════════════════════════════════════════════════

    │  Crosswalker exports source-ontology webs as OSCAL `catalog` JSON
    │  Crosswalker exports crosswalk edges as OSCAL `mapping` records
    │     (Control Mapping Model, currently NIST pre-release)
    │  Crosswalker exports evidence-link junction notes as
    │     `assessment-result/observation` records

    │  AND vice versa (OSCAL imports into Crosswalker vault)


External GRC tooling, federal authorisation packages,
agency consumers (FedRAMP, ATO packages, etc.)

OSCAL is the wire format between Crosswalker’s web-of-webs and external federal GRC systems. It is not part of the web-of-webs itself; it is the edge of it — the export/import boundary where the internal SSSOM-flavored data is reshaped into NIST’s preferred JSON/XML/YAML formats.

In the orientation log’s web-of-webs diagram, OSCAL is the port through which:

  • Source-ontology webs ↔ NIST OSCAL catalogs (machine-readable control catalogs)
  • Crosswalk edges ↔ OSCAL Control Mapping Model records
  • Evidence-vault web ↔ OSCAL Assessment Results

Why we’d promote it from “feature” to “architectural concern”

Section titled “Why we’d promote it from “feature” to “architectural concern””
ConsiderationArgument
FedRAMP RFC-0024Mandates machine-readable authorisation packages by Sept 2026. Federal customers will require OSCAL.
Credibility multiplierEven non-federal customers see OSCAL native support as evidence of auditor-grade data modeling.
Adjacent ecosystemMany commercial GRC tools (Hyperproof, Drata, Vanta, AuditBoard, RegScale) already produce OSCAL output. Crosswalker as OSCAL bridge = useful glue.
No internal compromiseWe don’t change SSSOM-internally; OSCAL is just a serializer/deserializer.
  • Add crosswalker import oscal <file> and crosswalker export oscal --type {catalog,profile,ssp,ar,...} commands
  • Make OSCAL round-trip a tested feature (not best-effort)
  • Add Foundation roadmap item for OSCAL native support

Decision (received 2026-05-02): ✅ yes, but after core working. User read on OSCAL: it sounds a lot like a custom schema we map to (i.e., a wire format on the export/import boundary, not an internal model). Treat it as that, document it in Crosswalker’s mental-model vocabulary, and don’t promote it to architectural concern until the core SSSOM-internal pipeline is solid.

Action items:

  • Update reference/registry/oscal.mdx to map OSCAL into Crosswalker’s mental model (web-of-webs framing; map OSCAL-native terms to Crosswalker synonyms — catalog ↔ source-ontology web, Control Mapping Model ↔ crosswalk edges, Assessment Result ↔ junction notes / evidence-vault web)
  • Cross-link from any page that mentions OSCAL back to that page
  • Roadmap placement: deferred until core export/import boundary is well-defined; flagged for Phase 2+ rather than Foundation

3.4 Grafeo evaluation — resolved by Ch 14

Section titled “3.4 Grafeo evaluation — resolved by Ch 14”

Resolved. Folded into the Ch 14 deliverable §2.1. Verdict: track in long-horizon list with explicit migration triggers; do not adopt yet. Genuinely impressive surface area (LPG+RDF+six query languages+HNSW+CDC+WASM+IndexedDB persistence), but ~6 months old, v0.5.x, ~582 stars, single-sponsor (Supernovae), vendor-only benchmarks, no W3C SPARQL conformance proof. A 3-year survival probability of 50–60% is the honest estimate.

Migration trigger A spells out the conditions under which Grafeo would collapse the Tier 2 stack to a single engine.

4.1 LinkML as canonical schema substrate (idea bucket)

Section titled “4.1 LinkML as canonical schema substrate (idea bucket)”

Major architectural pivot. Ch 12b deliverable B made the strongest argument: LinkML auto-generates JSON Schema, OWL, SHACL, Pydantic, TypeScript from a single YAML; SSSOM is itself defined in LinkML; could be Crosswalker’s “Tier 0”.

Real benefit: decouples engines from schema authority. Every engine becomes a serializer/deserializer plugin against the canonical LinkML schema rather than a competing schema authority.

Cost: cascading commitment across every Crosswalker schema (StewardshipProfile, junction-note 13-field, FrameworkConfig, _crosswalker metadata, etc.).

Park for now. Spin up a future Challenge 17 if interest renews after Ch 14/15/16 land.