Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Challenge 33: Multi-modal query engine landscape audit + revisit of prior commitments

Created Updated

Crosswalker’s v0.1 engine substrate is @sqlite.org/sqlite-wasm + (deferred) sqlite-vec per Ch 24 synthesis. Tier 3 server-side path is Apache Jena Fuseki + oxigraph-server per Ch 16 synthesis. These choices were made:

  • Under a GRC framing (small-to-medium vaults; bounded scale)
  • Without surveying multi-modal DBs (DuckDB, Polars, Cozo, ClickHouse, Materialize, etc.)
  • Before the ontology-web positioning was made explicit

The 2026-05-08 alignment review surfaced that v0.1.7+ commitments rest on assumptions that haven’t been adversarially tested under the ontology-web framing. Ch 33 does the test.

AssetWhat it gives us
Ch 11 synthesisPicked layered Tier 2 stack (DuckDB-WASM + Oxigraph + Nemo) — but later pivoted to sqlite-wasm-only
Ch 14 synthesisAdded Tier 2-Lite (sqlite-wasm + sqlite-vec + recursive CTE) for Obsidian Mobile / low-end
Ch 16 synthesisDemoted Apache AGE; promoted Fuseki/oxigraph-server as Tier 3 default
Ch 18 synthesis~100K mapping ceiling for sqlite-wasm + recursive CTE
Ch 24 synthesisRejected libSQL/Turso/Limbo; confirmed sqlite-wasm + sqlite-vec; 5 explicit migration triggers locked
Ch 25 / WASM-A pivotPivoted to plain sqlite-wasm; sqlite-vec deferred; 2026-11-06 revisit anchored
v0.1.5 Tier 2 sidecar shippedWorking sqlite-wasm sidecar with closure cache (recursive CTE)
Plugin handlesplugin.queryConcepts/Crosswalk/Closure already shipped against sqlite-wasm

1. Survey multi-modal databases — substrate alternatives at ontology-web scale

Section titled “1. Survey multi-modal databases — substrate alternatives at ontology-web scale”

For each of the following, document:

  • Scale ceilings: at what vault size does the engine break? Latency profile vs concept count?
  • Query model: SQL, Datalog, SPARQL, Cypher, native API, GraphQL?
  • Multi-paradigm support: relational + graph + KV + time-series + vector? Which combinations?
  • Embedded vs server: WASM-able? Browser-runnable? Memory-bound?
  • Streaming / chunked execution: out-of-core? Spill-to-disk? Incremental view maintenance?
  • License + governance: OSS license? Vendor concentration? Long-term sustainability?

Engines to survey:

EngineNotes
DuckDB + DuckPGQ (https://duckdb.org/docs/extensions/duckpgq)Columnar OLAP + property graph extension; out-of-core; WASM build
Polars (https://pola.rs/)Columnar / Arrow-native; lazy + streaming engine; Python + Rust + JS
Cozo (https://www.cozodb.org/)Datalog + relational + graph; Rust; WASM build
Oxigraph-WASM (https://github.com/oxigraph/oxigraph)SPARQL in Rust/WASM; embedded
Stardog (https://www.stardog.com/)Knowledge graph + reasoning; commercial
RDF4j (https://rdf4j.org/)Java RDF stack; SPARQL; commercial-friendly OSS
Apache Jena + FusekiJava RDF stack; current Crosswalker Tier 3 default
Materialize (https://materialize.com/)Streaming SQL; incremental view maintenance
Datomic (https://www.datomic.com/)Immutable indexes; Datalog query; commercial
ClickHouse (https://clickhouse.com/)Columnar OLAP; massive scale; SQL
LanceDB (https://lancedb.com/)Vector + relational; embedded; Rust
TerminusDB (https://terminusdb.com/)Document + graph; Rust core; vault-mirror potential
Neo4j (https://neo4j.com/)Native graph; Cypher; commercial-OSS
GraphDB (https://graphdb.ontotext.com/)RDF; SHACL; commercial-OSS
HelixDB (https://github.com/HelixDB/helix-db)Graph-DB; pre-1.0
SurrealDB (https://surrealdb.com/)Multi-model; BSL-licensed; flagged in Ch 14 as REJECT

For each: what would Crosswalker GAIN by adopting? What would it LOSE? What’s the migration cost from sqlite-wasm?

2. Re-audit prior Crosswalker commitments under ontology-web framing

Section titled “2. Re-audit prior Crosswalker commitments under ontology-web framing”

For each of Ch 10, Ch 11, Ch 12, Ch 14, Ch 16, Ch 18, Ch 24:

  • Does the verdict still hold under ontology-web framing?
  • Were assumptions GRC-specific that don’t generalize?
  • Are migration triggers still right? Earlier? Later?
  • Should the verdict be REVISED, REAFFIRMED, or DEFERRED to a follow-on rerun (Ch 35/36/37)?

For each substrate alternative + Crosswalker’s current sqlite-wasm:

  • Small-vault scale (~10K concepts, ~5K mappings; current GRC bound): performance? acceptable?
  • Medium-vault scale (~100K concepts, ~50K mappings): performance? memory? closure-query cost?
  • Large-vault scale (~1M concepts, ~500K mappings; OLIR-scale): performance? feasibility?
  • Ontology-web scale (~10M concepts; UMLS/MeSH/OBO Foundry-scale): feasibility? required architecture?

Output: scale × engine matrix showing where each substrate breaks and what alternatives unlock the next tier.

The current Tier 2 stack decouples vector (sqlite-vec) from substrate (sqlite-wasm). Under ontology-web framing, vector queries become more important (semantic similarity across ontologies). Survey:

  • LanceDB (native vector + relational)
  • DuckDB + vector extensions
  • ClickHouse + vector
  • Embedded vector stores (Chroma, Qdrant, Faiss-WASM)

Does decoupling still make sense? Or does an integrated multi-modal substrate (LanceDB?) win on cross-modal queries?

Ch 24’s 5 migration triggers might shift earlier under ontology-web framing. Re-audit each:

  1. Vector extension packaging — does ontology-web framing change urgency?
  2. WASM bundle size — does the larger primitive set (Ch 29) blow the budget?
  3. Closure query latency — at ontology-web scale, what’s the real ceiling?
  4. Mobile / low-end performance — does the larger query surface make mobile harder?
  5. Federation requirement — does ontology-web cross-ontology query make federation more pressing?

Output: revised trigger list + new triggers if needed.

6. Substrate-neutral architecture verification

Section titled “6. Substrate-neutral architecture verification”

Ch 24 Settled #4 committed to “vector layer is decoupled from substrate” as a load-bearing modularity commitment. Verify this still holds:

The deliverable must NOT recommend:

  1. Migrating off sqlite-wasm without a strong concrete trigger — Ch 24 confirmed sqlite-wasm; reversal requires hard evidence
  2. Adopting an engine that violates Mobile / Capacitor constraints — Crosswalker must run on Obsidian Mobile (no SharedArrayBuffer / OPFS limitations)
  3. Reintroducing libSQL / Turso / Limbo — Ch 24 rejected explicitly
  4. Adopting commercial-OSS engines with vendor concentration — Datomic, Stardog, GraphDB are flagged; community substrates preferred
  5. Forking the engine internals — Crosswalker is a plugin, not a database product
  6. Multi-engine deployments in v0.1 — at most one Tier 2 substrate; one Tier 3 substrate
  7. Speculative migration without a working prototype — recommendations to migrate must include a feasible POC path

The deliverable must produce:

  1. Engine survey matrix — 15+ engines × 8 dimensions (scale / query model / multi-paradigm / embed / streaming / license / vendor concentration / Mobile feasibility)
  2. Re-audit verdicts — for each of Ch 10/11/12/14/16/18/24: REAFFIRMED / REVISED / DEFERRED-TO-RERUN
  3. Scale × engine matrix — performance estimates at small / medium / large / ontology-web scale
  4. Vector-layer decoupling verification — does the architecture still support substrate swap?
  5. Migration trigger updates — revised list of falsifiable conditions to migrate off sqlite-wasm
  6. Recommended v0.1.7+ commitments — substrate, vector, query language, scale path
  7. Concrete next-action items — what should v0.1.7 milestone scope include based on findings

Project context:

Engines (priority survey targets):

Adjacent Crosswalker challenges:

Write the deliverable to docs/.../zz-research/YYYY-MM-DD-challenge-33-deliverable-a-<slug>.md. After deliverable lands: update synthesis log §9 status Ch 33 row from ⏳ to ✅; update Ch 24 migration triggers per findings; flip prior-challenge verdicts as needed (REAFFIRMED / REVISED); update v0.1.7 milestone scope; archive this brief.