Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Challenge 36: Query language under ontology-web framing (RERUN of Ch 12)

Created Updated

Original Ch 12 (resolved 2026-05-02 via 2 convergent deliverables):

  • Asked: Datalog vs SQL for SSSOM chain-rule derivation
  • Verdict: Hybrid — rules expressed as Datalog DSL, executable via either Datalog engine (Nemo) OR compiled to SQL recursive CTEs (DuckDB-WASM). Validated against OxO2 architecture
  • Scope: SSSOM chain-rules only (e.g., “if A maps to B with confidence X and B maps to C with confidence Y, derive A maps to C with confidence min(X,Y)”)

What’s different now (2026-05-08 rerun):

  1. Broader framing — original was SSSOM-specific; rerun asks the general question: what language do recipe authors / users write?
  2. More candidates — rerun considers Bases DSL, SPARQL property paths, Cypher, GraphQL, unified Crosswalker DSL — none compared in original
  3. 3-layer architecture — rerun has Layer A primitives + Layer B view shapes + Layer C recipes; query language fits across this taxonomy
  4. Substrate shifted — Crosswalker is on plain sqlite-wasm now (per WASM-A pivot); the Datalog/Nemo path was paired with DuckDB-WASM in original
  5. User questions — “Are we limited to SQL type queries with crazy merging?” is a direct user concern that original Ch 12 didn’t engage

A query engine has multiple language surfaces:

  • What the user types in a .base file or codeblock (Bases DSL, SQL, SPARQL, Datalog, custom?)
  • What the recipe author writes in YAML (query: block — pending Ch 31)
  • What the engine compiles to internally (SQL recursive CTE, Datalog rule, Cypher pattern, native API call)
  • What inter-tier protocols use (Arrow Flight? GraphQL? custom?)

Each surface has different ergonomics, expressivity, and substrate compatibility. Original Ch 12 answered for ONE surface (the rule definition language for SSSOM chains) and ONE substrate (DuckDB-WASM). The rerun asks for ALL surfaces under the current architecture.

AssetWhat it gives us
Ch 12 archived brief + 2 deliverablesOriginal verdict for SSSOM chain-rules
concepts/query-primitives7 candidate primitives + cross-domain precedents
Tier 2 query helpers (src/tier2/queries.ts)3 typed helpers; substrate-bound to sqlite-wasm + recursive CTE
Bases DSLThe .base file YAML grammar — already a query language, not chosen but inherited
kepano/obsidian-skillsSteph Ango’s Agent Skills pattern; LLM-friendly Bases authoring

1. Survey query languages for ontology webs

Section titled “1. Survey query languages for ontology webs”

For each, document: what’s it good at? what’s it weak at? what’s the substrate-bind?

LanguageStandard?StrengthsWeaknesses
SQL (recursive CTE)ANSI/ISOUniversal; relational; WITH RECURSIVE for closureNot graph-native; pivots awkward; no built-in path syntax
SPARQLW3CNative triple-store query; property paths; built for RDFRequires triple store; not natural for tabular pivot
Datalog(multiple dialects)Recursion natural; aggregation in rules; logic-programming clarityMultiple dialects (Cozo, Nemo, Datomic, Soufflé); no single standard
Cypher / openCypher / GQLCypher: Neo4j; openCypher: community; GQL: ISO/IEC 39075 (2024)Native graph traversal; readableSubstrate-bound (Neo4j primary; GQL emerging)
GraphQLSpecTyped selection sets; field-level argsDoesn’t express closure / anti-join / aggregation natively
Bases DSL (Obsidian)Vendor-specificNative to Crosswalker’s primary mechanismLimited primitives (no joins, no recursion, no anti-join)
Custom Crosswalker DSLTailored to ontology-web semanticsAnti-pattern per Ch 27; rejected as “fourth language”

For each query language, what surfaces is it appropriate for in Crosswalker?

SurfaceBases DSLSQLSPARQLDatalogCypherGraphQLYAML (recipe query: block)
User types in .base file✅ nativepartial (recipe references)
User writes in codeblock body⚠️ Bases doesn’t support inline✅ via crosswalker-query (v0.1.7)partial
Recipe author writes in YAMLembedded?embedded?embedded?✅ native
Engine compiles to internallypossible if Tier 3possible if Tier 3unlikelyunlikelyn/a
Inter-tier protocoln/apartialpossiblen/an/apossiblen/a

Argue the optimal surface×language pairing.

3. The “unified Crosswalker DSL” question

Section titled “3. The “unified Crosswalker DSL” question”

Original Ch 12 verdict was “Datalog DSL compiled to SQL CTEs.” Was this DSL ever defined? Is it still the right approach? Three positions:

  • Yes, define a Crosswalker DSL — domain-specific; users write declarative rules; compiles to mechanism
  • No, use existing standards — pick ONE language (SQL? SPARQL? Datalog?) and stick with it
  • Compositional — different surfaces use different languages; recipe-layer YAML hides the underlying language choice

Argue. The “anti-pattern: don’t invent a fourth language” from Ch 27/28 is a strong argument for option 2 or 3.

Each language has substrate constraints:

  • SQL (sqlite-wasm) — works everywhere; recursive CTE works
  • SPARQL (Oxigraph-WASM) — possible; ~3MB WASM bundle
  • Datalog (Nemo-WASM) — possible; but Nemo is research-grade
  • Cypher — no good WASM implementation
  • GraphQL — pure spec; works anywhere

Mobile / Publish parity (per Ch 28): SQLSpec via sqlite-wasm is the safe path. SPARQL via Oxigraph adds bundle weight. Argue the v0.1.6/v0.1.7/v0.1.8 sequencing.

Increasingly, queries will be authored by AI agents:

  • SQL — well-known to all LLMs
  • SPARQL — known to LLMs but less common
  • Datalog — uncommon to LLMs; requires careful prompting
  • Cypher — known to LLMs (Neo4j popular)
  • Bases DSL — newer; depends on training cutoff
  • YAML recipes — highly LLM-friendly (declarative; structured)

Which language(s) does an LLM agent most reliably author? What’s the failure mode of each?

The original Datalog-DSL-compiled-to-SQL pattern (for SSSOM chain rules) is genuinely useful — but is it the right pattern for the BROADER set of ontology-web queries (closure / anti-join / pivot / cross-ontology join)?

  • Affirm: Datalog is the right choice for rule-based derivations (SSSOM)
  • Revise: Datalog’s complexity overhead doesn’t pay off outside SSSOM rules; SQL+CTEs handle most cases
  • Defer: pick a primary path; accept SSSOM is a special case that may use Datalog DSL internally

The deliverable must NOT recommend:

  1. A new custom Crosswalker query language — explicit anti-pattern per Ch 27/28
  2. Reintroducing Dataview / DataviewJS — explicit project memory commitment to Bases-only
  3. Locking the engine to one substrate — Ch 24 settled-item-#5 commits to “vector layer decoupled from substrate”
  4. Forking SPARQL or Cypher implementations — out of scope
  5. Speculative LLM-only authoring — query languages must work for human authors first
  6. Dropping SQL — too foundational; sqlite-wasm is the v0.1 substrate
  7. Replacing recipe YAML with code-fences — violates “recipes declare data, not code”

The deliverable must produce:

  1. Language survey — 7+ languages × strengths/weaknesses/substrate-bind
  2. Surface × language matrix — which language for which surface
  3. Unified-DSL verdict — argued YES/NO with rationale
  4. Mobile/WASM/Publish constraint analysis — what works where
  5. LLM-friendliness ranking — which languages agents author most reliably
  6. Original Ch 12 reconciliation — REAFFIRM / REVISE / DEFER for the Datalog-DSL→SQL CTE pattern
  7. Recommended layering — Layer A primitive vocabulary + Layer B shape declarations + Layer C recipe-query-block + execution language; language choices for each
  8. Concrete v0.1.6/v0.1.7/v0.1.8 implementation guidance

Predecessor:

Project context:

Standards:

Sister challenges:

Write the deliverable to docs/.../zz-research/YYYY-MM-DD-challenge-36-deliverable-a-<slug>.md. After deliverable lands: flip synthesis log §9 status Ch 36 row from ⏳ to ✅; update Ch 12 archived brief with :::note callout pointing to this rerun; if verdict revises original Ch 12 verdict, document explicitly in synthesis log; archive this brief.