🚧 Early alpha — building the foundation. See the roadmap →

Challenge 36: Query language under ontology-web framing (RERUN of Ch 12)

Created May 8, 2026 Updated Jun 1, 2026

Predecessor + what’s different now

Original Ch 12 (resolved 2026-05-02 via 2 convergent deliverables):

Asked: Datalog vs SQL for SSSOM chain-rule derivation
Verdict: Hybrid — rules expressed as Datalog DSL, executable via either Datalog engine (Nemo) OR compiled to SQL recursive CTEs (DuckDB-WASM). Validated against OxO2 architecture
Scope: SSSOM chain-rules only (e.g., “if A maps to B with confidence X and B maps to C with confidence Y, derive A maps to C with confidence min(X,Y)”)

What’s different now (2026-05-08 rerun):

Broader framing — original was SSSOM-specific; rerun asks the general question: what language do recipe authors / users write?
More candidates — rerun considers Bases DSL, SPARQL property paths, Cypher, GraphQL, unified Crosswalker DSL — none compared in original
3-layer architecture — rerun has Layer A primitives + Layer B view shapes + Layer C recipes; query language fits across this taxonomy
Substrate shifted — Crosswalker is on plain sqlite-wasm now (per WASM-A pivot); the Datalog/Nemo path was paired with DuckDB-WASM in original
User questions — “Are we limited to SQL type queries with crazy merging?” is a direct user concern that original Ch 12 didn’t engage

Why this exists (under the new framing)

A query engine has multiple language surfaces:

What the user types in a .base file or codeblock (Bases DSL, SQL, SPARQL, Datalog, custom?)
What the recipe author writes in YAML (query: block — pending Ch 31)
What the engine compiles to internally (SQL recursive CTE, Datalog rule, Cypher pattern, native API call)
What inter-tier protocols use (Arrow Flight? GraphQL? custom?)

Each surface has different ergonomics, expressivity, and substrate compatibility. Original Ch 12 answered for ONE surface (the rule definition language for SSSOM chains) and ONE substrate (DuckDB-WASM). The rerun asks for ALL surfaces under the current architecture.

What we already have

Asset	What it gives us
Ch 12 archived brief + 2 deliverables	Original verdict for SSSOM chain-rules
`concepts/query-primitives`	7 candidate primitives + cross-domain precedents
Tier 2 query helpers (`src/tier2/queries.ts`)	3 typed helpers; substrate-bound to sqlite-wasm + recursive CTE
Bases DSL	The `.base` file YAML grammar — already a query language, not chosen but inherited
`kepano/obsidian-skills`	Steph Ango’s Agent Skills pattern; LLM-friendly Bases authoring

What to investigate

1. Survey query languages for ontology webs

For each, document: what’s it good at? what’s it weak at? what’s the substrate-bind?

Language	Standard?	Strengths	Weaknesses
SQL (recursive CTE)	ANSI/ISO	Universal; relational; `WITH RECURSIVE` for closure	Not graph-native; pivots awkward; no built-in path syntax
SPARQL	W3C	Native triple-store query; property paths; built for RDF	Requires triple store; not natural for tabular pivot
Datalog	(multiple dialects)	Recursion natural; aggregation in rules; logic-programming clarity	Multiple dialects (Cozo, Nemo, Datomic, Soufflé); no single standard
Cypher / openCypher / GQL	Cypher: Neo4j; openCypher: community; GQL: ISO/IEC 39075 (2024)	Native graph traversal; readable	Substrate-bound (Neo4j primary; GQL emerging)
GraphQL	Spec	Typed selection sets; field-level args	Doesn’t express closure / anti-join / aggregation natively
Bases DSL (Obsidian)	Vendor-specific	Native to Crosswalker’s primary mechanism	Limited primitives (no joins, no recursion, no anti-join)
Custom Crosswalker DSL	❌	Tailored to ontology-web semantics	Anti-pattern per Ch 27; rejected as “fourth language”

2. Surface × language matrix

For each query language, what surfaces is it appropriate for in Crosswalker?

Surface	Bases DSL	SQL	SPARQL	Datalog	Cypher	GraphQL	YAML (recipe `query:` block)
User types in `.base` file	✅ native	❌	❌	❌	❌	❌	partial (recipe references)
User writes in codeblock body	⚠️ Bases doesn’t support inline	✅ via `crosswalker-query` (v0.1.7)	❌	❌	❌	❌	partial
Recipe author writes in YAML	❌	embedded?	embedded?	embedded?	❌	❌	✅ native
Engine compiles to internally	✅	✅	possible if Tier 3	possible if Tier 3	unlikely	unlikely	n/a
Inter-tier protocol	n/a	partial	possible	n/a	n/a	possible	n/a

Argue the optimal surface×language pairing.

3. The “unified Crosswalker DSL” question

Original Ch 12 verdict was “Datalog DSL compiled to SQL CTEs.” Was this DSL ever defined? Is it still the right approach? Three positions:

Yes, define a Crosswalker DSL — domain-specific; users write declarative rules; compiles to mechanism
No, use existing standards — pick ONE language (SQL? SPARQL? Datalog?) and stick with it
Compositional — different surfaces use different languages; recipe-layer YAML hides the underlying language choice

Argue. The “anti-pattern: don’t invent a fourth language” from Ch 27/28 is a strong argument for option 2 or 3.

4. Mobile / WASM language constraints

Each language has substrate constraints:

SQL (sqlite-wasm) — works everywhere; recursive CTE works
SPARQL (Oxigraph-WASM) — possible; ~3MB WASM bundle
Datalog (Nemo-WASM) — possible; but Nemo is research-grade
Cypher — no good WASM implementation
GraphQL — pure spec; works anywhere

Mobile / Publish parity (per Ch 28): SQLSpec via sqlite-wasm is the safe path. SPARQL via Oxigraph adds bundle weight. Argue the v0.1.6/v0.1.7/v0.1.8 sequencing.

5. LLM-friendliness

Increasingly, queries will be authored by AI agents:

SQL — well-known to all LLMs
SPARQL — known to LLMs but less common
Datalog — uncommon to LLMs; requires careful prompting
Cypher — known to LLMs (Neo4j popular)
Bases DSL — newer; depends on training cutoff
YAML recipes — highly LLM-friendly (declarative; structured)

Which language(s) does an LLM agent most reliably author? What’s the failure mode of each?

6. Reconcile with Ch 12 original verdict

The original Datalog-DSL-compiled-to-SQL pattern (for SSSOM chain rules) is genuinely useful — but is it the right pattern for the BROADER set of ontology-web queries (closure / anti-join / pivot / cross-ontology join)?

Affirm: Datalog is the right choice for rule-based derivations (SSSOM)
Revise: Datalog’s complexity overhead doesn’t pay off outside SSSOM rules; SQL+CTEs handle most cases
Defer: pick a primary path; accept SSSOM is a special case that may use Datalog DSL internally

Anti-patterns to reject upfront

The deliverable must NOT recommend:

A new custom Crosswalker query language — explicit anti-pattern per Ch 27/28
Reintroducing Dataview / DataviewJS — explicit project memory commitment to Bases-only
Locking the engine to one substrate — Ch 24 settled-item-#5 commits to “vector layer decoupled from substrate”
Forking SPARQL or Cypher implementations — out of scope
Speculative LLM-only authoring — query languages must work for human authors first
Dropping SQL — too foundational; sqlite-wasm is the v0.1 substrate
Replacing recipe YAML with code-fences — violates “recipes declare data, not code”

Success criteria for the deliverable

The deliverable must produce:

Language survey — 7+ languages × strengths/weaknesses/substrate-bind
Surface × language matrix — which language for which surface
Unified-DSL verdict — argued YES/NO with rationale
Mobile/WASM/Publish constraint analysis — what works where
LLM-friendliness ranking — which languages agents author most reliably
Original Ch 12 reconciliation — REAFFIRM / REVISE / DEFER for the Datalog-DSL→SQL CTE pattern
Recommended layering — Layer A primitive vocabulary + Layer B shape declarations + Layer C recipe-query-block + execution language; language choices for each
Concrete v0.1.6/v0.1.7/v0.1.8 implementation guidance

Anchored references

Predecessor:

Project context:

Standards:

Sister challenges:

Ch 29 — Query verbs validation — primitive vocabulary
Ch 31 — Recipe query: block schema — YAML surface
Ch 33 — Multi-modal landscape — substrate alternatives

Hand-off

Write the deliverable to docs/.../zz-research/YYYY-MM-DD-challenge-36-deliverable-a-<slug>.md. After deliverable lands: flip synthesis log §9 status Ch 36 row from ⏳ to ✅; update Ch 12 archived brief with :::note callout pointing to this rerun; if verdict revises original Ch 12 verdict, document explicitly in synthesis log; archive this brief.