Challenge 11: Tier 2/3 graph + analytical engine deep survey
Why this exists
Section titled “Why this exists”Challenge 10’s deliverable made the Tier 2/3 engine call (DuckDB-WASM + Apache AGE) on the basis of a 9-engine shortlist. Whole classes of relevant systems were not engaged with at all — Datalog engines, production triple stores, versioned graph databases, vector+graph hybrids, streaming/incremental view-maintenance systems, virtual/federated approaches.
For a multi-year compliance tool whose primary value is durability of a curated dataset, locking in the engine choice on an incomplete shortlist is premature. This challenge re-evaluates against the full design space.
What to investigate
Section titled “What to investigate”1. Engines NOT covered in Challenge 10 — full evaluation matrix
Section titled “1. Engines NOT covered in Challenge 10 — full evaluation matrix”For each system below, score the same Challenge 10 axes (graph fit, tabular fit, browser/Obsidian compatibility, bundle size, license, project health) plus three new axes (RDF semantics fit, native versioning support, vector+graph hybrid):
Datalog engines (high priority — direct fit for SSSOM chain rules):
- Soufflé
- Nemo — already used by OxO2 for SSSOM derivation
- Differential Datalog
- Datomic / Datomic Pro
- RDFox (commercial; instructive for design)
Production triple stores (RDF-native, relevant for SSSOM/SKOS/STRM):
- Apache Jena Fuseki
- GraphDB (Ontotext)
- Virtuoso
- RDF4J
- Stardog
- AnzoGraph
- Blazegraph
Versioned graph databases (high priority — TerminusDB especially):
- TerminusDB — Git-style branching/diff/merge over RDF; uncannily aligned with Crosswalker’s files-canonical ethos. Demands first-class evaluation
- Dolt (relational, but versioned-DB pattern worth understanding)
Other property graphs:
- Memgraph
- NebulaGraph
- ArangoDB
- Dgraph
- FalkorDB (formerly RedisGraph; vector+graph hybrid)
- OrientDB
Embedded analytical engines (Tier 1.5 / alternative to DuckDB-WASM):
- Polars-WASM (Rust→WASM, native joins/pivots, no SQL required)
- DataFusion (Apache Arrow ecosystem)
- LanceDB
- ClickHouse-local (clickhouse-local in browser?)
- Velox
Vector + graph hybrids (relevant for AI-assisted features):
- Weaviate
- Qdrant
- Milvus
- FalkorDB+vec
- KuzuDB had vector — confirm fork status (see §3 below)
Streaming / incremental MV systems:
- Materialize
- Differential Dataflow
- Snowflake Dynamic Tables
- ksqlDB
Virtual / federated:
- Ontop (SPARQL-over-relational)
- Trino
- Dremio
Query unification:
- GraphQL gateway (compiles to SQL/Cypher/SPARQL per backend)
- Substrait
2. Architectural questions Challenge 10 didn’t ask
Section titled “2. Architectural questions Challenge 10 didn’t ask”- TerminusDB as Tier 2 or Tier 3 primary — its versioned-graph-with-diff-and-merge model maps directly onto Crosswalker’s “files canonical, derived stores rebuildable” ethos. Should be evaluated as a top contender, not glossed
- Polars-WASM as Tier 1.5 — bundle-size-sensitive users could get real joins/pivots without DuckDB’s 6 MB. Doc treats it only as a renderer-side helper
- GraphQL as a tier-agnostic query surface — a unified query layer that abstracts the engine choice across tiers
- CRDT layer for the deferred live-edit team mode — Yjs / Automerge / Loro
- WASM bundle optimization strategies — tree-shaking, code-splitting, on-demand loading. Concrete plan, not vague gestures
- LLM/NL-query architecture — where does the LLM live? Sidecar API? In-browser? Local? How does it bind to whichever engine is at Tier 2?
- Datalog vs recursive CTE for the core SSSOM chain-rule derivation — overlaps with Challenge 12; coordinate but don’t duplicate
3. Verify Challenge 10’s empirical claims
Section titled “3. Verify Challenge 10’s empirical claims”- “KuzuDB archived 10 October 2025” — load-bearing for the entire engine choice. Verify upstream state, the “bighorn” community fork status (Kineviz fork), and whether any other production-grade fork has emerged
- “DuckDB-WASM ~3.2 MB compressed” — confirm against current build; bundle has grown over releases
- “DuckPGQ extension not yet WASM-friendly” — check current state of the SQL/PGQ extension
- Apache AGE PostgreSQL version compatibility window — current AGE supports Postgres 11–18; confirm
Success criteria for the deliverable
Section titled “Success criteria for the deliverable”- Engine evaluation matrix covering ≥15 engines from §1 above on the unified scoring axes
- TerminusDB explicit deep dive — does it deserve to be the Tier 2 or Tier 3 primary?
- Recommendation against Challenge 10’s call: keep DuckDB+AGE, replace, layer (e.g., Datalog-on-top-of-AGE), or hybrid (e.g., DuckDB-WASM + TerminusDB option)
- Verification of empirical claims in Challenge 10 with citations
- Bundle size strategy for whichever Tier 2 engine is recommended — concrete tree-shaking / code-splitting / on-demand-loading plan
- Migration / re-decision triggers — under what conditions should this decision be revisited (e.g., KuzuDB fork stabilizes for 12+ months)
Out of scope
Section titled “Out of scope”- Actual benchmarks against representative GRC data — separate work item, see Challenge 02
- Specific UI design for the query-builder layer
- Implementation details of any chosen engine
- The Datalog vs SQL fork for the core derivation engine — see Challenge 12 for that narrower question
- The audit-trail attestation primitives — see Challenge 13
Relationship to prior challenges
Section titled “Relationship to prior challenges”- Supersedes the engine-selection portion of Challenge 10. Challenge 10’s broader 3-tier architecture (materialized folders → embedded engine → server) stands; this challenge re-decides which embedded engine and which server stack
- Coordinates with Challenge 12 — Challenge 12 is narrower (Datalog vs SQL for chain rules specifically); Challenge 11 is broader (the whole engine survey)
- Independent of Challenge 13 — different layer of the architecture
Related
Section titled “Related”- Challenge 10: Graph→tabular bridging engine — predecessor; engine shortlist this challenge expands
- Challenge 12: Datalog vs SQL for SSSOM chain rules — sibling; narrow fork-in-the-road
- 05-02 Direction log §3.1 Challenge 10 gap inventory — the source of this challenge’s scope
- Roadmap: Foundation
- Ch 10 deliverable: Graph→tabular bridging engine — full predecessor research this challenge expands