Challenge 33: Multi-modal query engine landscape audit + revisit of prior commitments
Why this exists
Section titled “Why this exists”Crosswalker’s v0.1 engine substrate is @sqlite.org/sqlite-wasm + (deferred) sqlite-vec per Ch 24 synthesis. Tier 3 server-side path is Apache Jena Fuseki + oxigraph-server per Ch 16 synthesis. These choices were made:
- Under a GRC framing (small-to-medium vaults; bounded scale)
- Without surveying multi-modal DBs (DuckDB, Polars, Cozo, ClickHouse, Materialize, etc.)
- Before the ontology-web positioning was made explicit
The 2026-05-08 alignment review surfaced that v0.1.7+ commitments rest on assumptions that haven’t been adversarially tested under the ontology-web framing. Ch 33 does the test.
What we already have
Section titled “What we already have”| Asset | What it gives us |
|---|---|
| Ch 11 synthesis | Picked layered Tier 2 stack (DuckDB-WASM + Oxigraph + Nemo) — but later pivoted to sqlite-wasm-only |
| Ch 14 synthesis | Added Tier 2-Lite (sqlite-wasm + sqlite-vec + recursive CTE) for Obsidian Mobile / low-end |
| Ch 16 synthesis | Demoted Apache AGE; promoted Fuseki/oxigraph-server as Tier 3 default |
| Ch 18 synthesis | ~100K mapping ceiling for sqlite-wasm + recursive CTE |
| Ch 24 synthesis | Rejected libSQL/Turso/Limbo; confirmed sqlite-wasm + sqlite-vec; 5 explicit migration triggers locked |
| Ch 25 / WASM-A pivot | Pivoted to plain sqlite-wasm; sqlite-vec deferred; 2026-11-06 revisit anchored |
| v0.1.5 Tier 2 sidecar shipped | Working sqlite-wasm sidecar with closure cache (recursive CTE) |
| Plugin handles | plugin.queryConcepts/Crosswalk/Closure already shipped against sqlite-wasm |
What to investigate
Section titled “What to investigate”1. Survey multi-modal databases — substrate alternatives at ontology-web scale
Section titled “1. Survey multi-modal databases — substrate alternatives at ontology-web scale”For each of the following, document:
- Scale ceilings: at what vault size does the engine break? Latency profile vs concept count?
- Query model: SQL, Datalog, SPARQL, Cypher, native API, GraphQL?
- Multi-paradigm support: relational + graph + KV + time-series + vector? Which combinations?
- Embedded vs server: WASM-able? Browser-runnable? Memory-bound?
- Streaming / chunked execution: out-of-core? Spill-to-disk? Incremental view maintenance?
- License + governance: OSS license? Vendor concentration? Long-term sustainability?
Engines to survey:
| Engine | Notes |
|---|---|
| DuckDB + DuckPGQ (https://duckdb.org/docs/extensions/duckpgq) | Columnar OLAP + property graph extension; out-of-core; WASM build |
| Polars (https://pola.rs/) | Columnar / Arrow-native; lazy + streaming engine; Python + Rust + JS |
| Cozo (https://www.cozodb.org/) | Datalog + relational + graph; Rust; WASM build |
| Oxigraph-WASM (https://github.com/oxigraph/oxigraph) | SPARQL in Rust/WASM; embedded |
| Stardog (https://www.stardog.com/) | Knowledge graph + reasoning; commercial |
| RDF4j (https://rdf4j.org/) | Java RDF stack; SPARQL; commercial-friendly OSS |
| Apache Jena + Fuseki | Java RDF stack; current Crosswalker Tier 3 default |
| Materialize (https://materialize.com/) | Streaming SQL; incremental view maintenance |
| Datomic (https://www.datomic.com/) | Immutable indexes; Datalog query; commercial |
| ClickHouse (https://clickhouse.com/) | Columnar OLAP; massive scale; SQL |
| LanceDB (https://lancedb.com/) | Vector + relational; embedded; Rust |
| TerminusDB (https://terminusdb.com/) | Document + graph; Rust core; vault-mirror potential |
| Neo4j (https://neo4j.com/) | Native graph; Cypher; commercial-OSS |
| GraphDB (https://graphdb.ontotext.com/) | RDF; SHACL; commercial-OSS |
| HelixDB (https://github.com/HelixDB/helix-db) | Graph-DB; pre-1.0 |
| SurrealDB (https://surrealdb.com/) | Multi-model; BSL-licensed; flagged in Ch 14 as REJECT |
For each: what would Crosswalker GAIN by adopting? What would it LOSE? What’s the migration cost from sqlite-wasm?
2. Re-audit prior Crosswalker commitments under ontology-web framing
Section titled “2. Re-audit prior Crosswalker commitments under ontology-web framing”For each of Ch 10, Ch 11, Ch 12, Ch 14, Ch 16, Ch 18, Ch 24:
- Does the verdict still hold under ontology-web framing?
- Were assumptions GRC-specific that don’t generalize?
- Are migration triggers still right? Earlier? Later?
- Should the verdict be REVISED, REAFFIRMED, or DEFERRED to a follow-on rerun (Ch 35/36/37)?
3. Scale models
Section titled “3. Scale models”For each substrate alternative + Crosswalker’s current sqlite-wasm:
- Small-vault scale (~10K concepts, ~5K mappings; current GRC bound): performance? acceptable?
- Medium-vault scale (~100K concepts, ~50K mappings): performance? memory? closure-query cost?
- Large-vault scale (~1M concepts, ~500K mappings; OLIR-scale): performance? feasibility?
- Ontology-web scale (~10M concepts; UMLS/MeSH/OBO Foundry-scale): feasibility? required architecture?
Output: scale × engine matrix showing where each substrate breaks and what alternatives unlock the next tier.
4. Vector layer + multi-modal composition
Section titled “4. Vector layer + multi-modal composition”The current Tier 2 stack decouples vector (sqlite-vec) from substrate (sqlite-wasm). Under ontology-web framing, vector queries become more important (semantic similarity across ontologies). Survey:
- LanceDB (native vector + relational)
- DuckDB + vector extensions
- ClickHouse + vector
- Embedded vector stores (Chroma, Qdrant, Faiss-WASM)
Does decoupling still make sense? Or does an integrated multi-modal substrate (LanceDB?) win on cross-modal queries?
5. Migration trigger updates
Section titled “5. Migration trigger updates”Ch 24’s 5 migration triggers might shift earlier under ontology-web framing. Re-audit each:
- Vector extension packaging — does ontology-web framing change urgency?
- WASM bundle size — does the larger primitive set (Ch 29) blow the budget?
- Closure query latency — at ontology-web scale, what’s the real ceiling?
- Mobile / low-end performance — does the larger query surface make mobile harder?
- Federation requirement — does ontology-web cross-ontology query make federation more pressing?
Output: revised trigger list + new triggers if needed.
6. Substrate-neutral architecture verification
Section titled “6. Substrate-neutral architecture verification”Ch 24 Settled #4 committed to “vector layer is decoupled from substrate” as a load-bearing modularity commitment. Verify this still holds:
- Are the 3 query primitives shipped (
getConceptsByOntology/crosswalkBetween/closureFromConcept) substrate-specific or substrate-neutral? - Could they be reimplemented over Cozo / DuckDB / Polars / Oxigraph without breaking the recipe schema?
- What’s the real abstraction line — is
plugin.queryClosure()engine-neutral, or does it leak SQL semantics?
Anti-patterns to reject upfront
Section titled “Anti-patterns to reject upfront”The deliverable must NOT recommend:
- Migrating off sqlite-wasm without a strong concrete trigger — Ch 24 confirmed sqlite-wasm; reversal requires hard evidence
- Adopting an engine that violates Mobile / Capacitor constraints — Crosswalker must run on Obsidian Mobile (no SharedArrayBuffer / OPFS limitations)
- Reintroducing libSQL / Turso / Limbo — Ch 24 rejected explicitly
- Adopting commercial-OSS engines with vendor concentration — Datomic, Stardog, GraphDB are flagged; community substrates preferred
- Forking the engine internals — Crosswalker is a plugin, not a database product
- Multi-engine deployments in v0.1 — at most one Tier 2 substrate; one Tier 3 substrate
- Speculative migration without a working prototype — recommendations to migrate must include a feasible POC path
Success criteria for the deliverable
Section titled “Success criteria for the deliverable”The deliverable must produce:
- Engine survey matrix — 15+ engines × 8 dimensions (scale / query model / multi-paradigm / embed / streaming / license / vendor concentration / Mobile feasibility)
- Re-audit verdicts — for each of Ch 10/11/12/14/16/18/24: REAFFIRMED / REVISED / DEFERRED-TO-RERUN
- Scale × engine matrix — performance estimates at small / medium / large / ontology-web scale
- Vector-layer decoupling verification — does the architecture still support substrate swap?
- Migration trigger updates — revised list of falsifiable conditions to migrate off sqlite-wasm
- Recommended v0.1.7+ commitments — substrate, vector, query language, scale path
- Concrete next-action items — what should v0.1.7 milestone scope include based on findings
Anchored references
Section titled “Anchored references”Project context:
- Ch 24 synthesis — sqlite-wasm + 5 migration triggers
- Ch 11 / 14 / 16 / 18 syntheses + archived briefs — prior engine work
- WASM-A pivot synthesis — current substrate state
- v0.1.5 Tier 2 sidecar shipped — what’s actually running today
Engines (priority survey targets):
Adjacent Crosswalker challenges:
- Ch 35 — Graph→tabular bridging rerun
- Ch 36 — Query language rerun
- Ch 37 — Tier 2-Lite scale rerun
- Ch 34 — Streaming / chunked execution (sister challenge)
Hand-off
Section titled “Hand-off”Write the deliverable to docs/.../zz-research/YYYY-MM-DD-challenge-33-deliverable-a-<slug>.md. After deliverable lands: update synthesis log §9 status Ch 33 row from ⏳ to ✅; update Ch 24 migration triggers per findings; flip prior-challenge verdicts as needed (REAFFIRMED / REVISED); update v0.1.7 milestone scope; archive this brief.