🚧 Early alpha — building the foundation. See the roadmap →

Challenge 33: Multi-modal query engine landscape audit + revisit of prior commitments

Created May 8, 2026 Updated Jun 1, 2026

Why this exists

Crosswalker’s v0.1 engine substrate is @sqlite.org/sqlite-wasm + (deferred) sqlite-vec per Ch 24 synthesis. Tier 3 server-side path is Apache Jena Fuseki + oxigraph-server per Ch 16 synthesis. These choices were made:

Under a GRC framing (small-to-medium vaults; bounded scale)
Without surveying multi-modal DBs (DuckDB, Polars, Cozo, ClickHouse, Materialize, etc.)
Before the ontology-web positioning was made explicit

The 2026-05-08 alignment review surfaced that v0.1.7+ commitments rest on assumptions that haven’t been adversarially tested under the ontology-web framing. Ch 33 does the test.

What we already have

Asset	What it gives us
Ch 11 synthesis	Picked layered Tier 2 stack (DuckDB-WASM + Oxigraph + Nemo) — but later pivoted to sqlite-wasm-only
Ch 14 synthesis	Added Tier 2-Lite (sqlite-wasm + sqlite-vec + recursive CTE) for Obsidian Mobile / low-end
Ch 16 synthesis	Demoted Apache AGE; promoted Fuseki/oxigraph-server as Tier 3 default
Ch 18 synthesis	~100K mapping ceiling for sqlite-wasm + recursive CTE
Ch 24 synthesis	Rejected libSQL/Turso/Limbo; confirmed sqlite-wasm + sqlite-vec; 5 explicit migration triggers locked
Ch 25 / WASM-A pivot	Pivoted to plain sqlite-wasm; sqlite-vec deferred; 2026-11-06 revisit anchored
v0.1.5 Tier 2 sidecar shipped	Working sqlite-wasm sidecar with closure cache (recursive CTE)
Plugin handles	`plugin.queryConcepts/Crosswalk/Closure` already shipped against sqlite-wasm

What to investigate

For each of the following, document:

Scale ceilings: at what vault size does the engine break? Latency profile vs concept count?
Query model: SQL, Datalog, SPARQL, Cypher, native API, GraphQL?
Multi-paradigm support: relational + graph + KV + time-series + vector? Which combinations?
Embedded vs server: WASM-able? Browser-runnable? Memory-bound?
Streaming / chunked execution: out-of-core? Spill-to-disk? Incremental view maintenance?
License + governance: OSS license? Vendor concentration? Long-term sustainability?

Engines to survey:

Engine	Notes
DuckDB + DuckPGQ (https://duckdb.org/docs/extensions/duckpgq)	Columnar OLAP + property graph extension; out-of-core; WASM build
Polars (https://pola.rs/)	Columnar / Arrow-native; lazy + streaming engine; Python + Rust + JS
Cozo (https://www.cozodb.org/)	Datalog + relational + graph; Rust; WASM build
Oxigraph-WASM (https://github.com/oxigraph/oxigraph)	SPARQL in Rust/WASM; embedded
Stardog (https://www.stardog.com/)	Knowledge graph + reasoning; commercial
RDF4j (https://rdf4j.org/)	Java RDF stack; SPARQL; commercial-friendly OSS
Apache Jena + Fuseki	Java RDF stack; current Crosswalker Tier 3 default
Materialize (https://materialize.com/)	Streaming SQL; incremental view maintenance
Datomic (https://www.datomic.com/)	Immutable indexes; Datalog query; commercial
ClickHouse (https://clickhouse.com/)	Columnar OLAP; massive scale; SQL
LanceDB (https://lancedb.com/)	Vector + relational; embedded; Rust
TerminusDB (https://terminusdb.com/)	Document + graph; Rust core; vault-mirror potential
Neo4j (https://neo4j.com/)	Native graph; Cypher; commercial-OSS
GraphDB (https://graphdb.ontotext.com/)	RDF; SHACL; commercial-OSS
HelixDB (https://github.com/HelixDB/helix-db)	Graph-DB; pre-1.0
SurrealDB (https://surrealdb.com/)	Multi-model; BSL-licensed; flagged in Ch 14 as REJECT

For each: what would Crosswalker GAIN by adopting? What would it LOSE? What’s the migration cost from sqlite-wasm?

2. Re-audit prior Crosswalker commitments under ontology-web framing

For each of Ch 10, Ch 11, Ch 12, Ch 14, Ch 16, Ch 18, Ch 24:

Does the verdict still hold under ontology-web framing?
Were assumptions GRC-specific that don’t generalize?
Are migration triggers still right? Earlier? Later?
Should the verdict be REVISED, REAFFIRMED, or DEFERRED to a follow-on rerun (Ch 35/36/37)?

3. Scale models

For each substrate alternative + Crosswalker’s current sqlite-wasm:

Small-vault scale (~10K concepts, ~5K mappings; current GRC bound): performance? acceptable?
Medium-vault scale (~100K concepts, ~50K mappings): performance? memory? closure-query cost?
Large-vault scale (~1M concepts, ~500K mappings; OLIR-scale): performance? feasibility?
Ontology-web scale (~10M concepts; UMLS/MeSH/OBO Foundry-scale): feasibility? required architecture?

Output: scale × engine matrix showing where each substrate breaks and what alternatives unlock the next tier.

The current Tier 2 stack decouples vector (sqlite-vec) from substrate (sqlite-wasm). Under ontology-web framing, vector queries become more important (semantic similarity across ontologies). Survey:

LanceDB (native vector + relational)
DuckDB + vector extensions
ClickHouse + vector
Embedded vector stores (Chroma, Qdrant, Faiss-WASM)

Does decoupling still make sense? Or does an integrated multi-modal substrate (LanceDB?) win on cross-modal queries?

5. Migration trigger updates

Ch 24’s 5 migration triggers might shift earlier under ontology-web framing. Re-audit each:

Vector extension packaging — does ontology-web framing change urgency?
WASM bundle size — does the larger primitive set (Ch 29) blow the budget?
Closure query latency — at ontology-web scale, what’s the real ceiling?
Mobile / low-end performance — does the larger query surface make mobile harder?
Federation requirement — does ontology-web cross-ontology query make federation more pressing?

Output: revised trigger list + new triggers if needed.

6. Substrate-neutral architecture verification

Ch 24 Settled #4 committed to “vector layer is decoupled from substrate” as a load-bearing modularity commitment. Verify this still holds:

Are the 3 query primitives shipped (getConceptsByOntology / crosswalkBetween / closureFromConcept) substrate-specific or substrate-neutral?
Could they be reimplemented over Cozo / DuckDB / Polars / Oxigraph without breaking the recipe schema?
What’s the real abstraction line — is plugin.queryClosure() engine-neutral, or does it leak SQL semantics?

Anti-patterns to reject upfront

The deliverable must NOT recommend:

Migrating off sqlite-wasm without a strong concrete trigger — Ch 24 confirmed sqlite-wasm; reversal requires hard evidence
Adopting an engine that violates Mobile / Capacitor constraints — Crosswalker must run on Obsidian Mobile (no SharedArrayBuffer / OPFS limitations)
Reintroducing libSQL / Turso / Limbo — Ch 24 rejected explicitly
Adopting commercial-OSS engines with vendor concentration — Datomic, Stardog, GraphDB are flagged; community substrates preferred
Forking the engine internals — Crosswalker is a plugin, not a database product
Multi-engine deployments in v0.1 — at most one Tier 2 substrate; one Tier 3 substrate
Speculative migration without a working prototype — recommendations to migrate must include a feasible POC path

Success criteria for the deliverable

The deliverable must produce:

Engine survey matrix — 15+ engines × 8 dimensions (scale / query model / multi-paradigm / embed / streaming / license / vendor concentration / Mobile feasibility)
Re-audit verdicts — for each of Ch 10/11/12/14/16/18/24: REAFFIRMED / REVISED / DEFERRED-TO-RERUN
Scale × engine matrix — performance estimates at small / medium / large / ontology-web scale
Vector-layer decoupling verification — does the architecture still support substrate swap?
Migration trigger updates — revised list of falsifiable conditions to migrate off sqlite-wasm
Recommended v0.1.7+ commitments — substrate, vector, query language, scale path
Concrete next-action items — what should v0.1.7 milestone scope include based on findings

Anchored references

Project context:

Ch 24 synthesis — sqlite-wasm + 5 migration triggers
Ch 11 / 14 / 16 / 18 syntheses + archived briefs — prior engine work
WASM-A pivot synthesis — current substrate state
v0.1.5 Tier 2 sidecar shipped — what’s actually running today

Engines (priority survey targets):

Adjacent Crosswalker challenges:

Hand-off

Write the deliverable to docs/.../zz-research/YYYY-MM-DD-challenge-33-deliverable-a-<slug>.md. After deliverable lands: update synthesis log §9 status Ch 33 row from ⏳ to ✅; update Ch 24 migration triggers per findings; flip prior-challenge verdicts as needed (REAFFIRMED / REVISED); update v0.1.7 milestone scope; archive this brief.