Ch 31 deliverable B: Recipe `query:` block schema design — data-only typed tree with JSONata-only string expressions
Challenge 31 Deliverable — Recipe query: Block Schema Design
Section titled “Challenge 31 Deliverable — Recipe query: Block Schema Design”Path: docs/.../zz-research/2026-05-08-challenge-31-deliverable-a-recipe-query-block-schema.md
Status: Locks the additive query: schema for v0.1.6 (D8 lock prerequisite).
Author role: Research/synthesis output for the Crosswalker maintainers.
- Adopt a data-only, shape-discriminated
query:block (top-level keys:schema_version,shape,primitives,empty_cell,aggregations,output,provenance,user_edited) that compiles down to Tier 2 helpers (crosswalkBetween,closureFromConcept,getConceptsByOntology) — never raw SQL, never a bespoke Crosswalker query language. This mirrors the dbt MetricFlow / Cube.dev / LookML pattern of declarative semantic primitives + a runtime that compiles them, which the literature treats as the dominant industry pattern for portable analytics across substrates. - Versioning is
schema_version: "1.0.0"(SchemaVer-style, hyphenated MAJOR-REVISION-ADDITION semantics adapted to the SemVer string format), withadditionalProperties: trueat the top level so v0.1.7’s codeblock processor and any future shapes can be added without breaking existing recipes; the JSON Schema usesif/then/elseovershapeto enforce primitive presence per shape. - Lifecycle composition is settled: the
query:block participates in Ch 28’s three-way merge as a single typed sub-tree (system base + user overlay + community PR), withuser_edited: trueset when the user touches any leaf insidequery:andprovenance.sourcerecorded at the block level — not per primitive — so merges remain reviewable.
Executive Summary
Section titled “Executive Summary”Challenge 31 fills the only remaining gap in the recipe spec: today spec/recipe.schema.json describes emission (the Ch 22 grammar of folder/file/heading/tag/wikilink) but says nothing about what to query. Without a query: block, the v0.1.6 crosswalkerPivot Bases view cannot be parameterized by recipe, and the v0.1.7 codeblock processor has no shared substrate to render from. The brief commits us to a 7-section investigation: (1) cross-reference adjacent declarative-query systems, (2) author the JSON Schema, (3) define versioning, (4) write reference recipes, (5) settle the data-vs-code boundary, (6) reconcile with Ch 28 lifecycle, (7) validate against the 7 candidate primitives.
The decision this deliverable locks is: query: is a discriminated union over the six view-shapes (table / list / pivot / graph / hierarchy / timeline), each shape declaring a fixed schema for its required primitives (rows/cols/cell, nodes/edges/start, root/predicate/depth, axis/event-source, …). The block is declarative data: no inline SQL, no inline JSONata, no inline Bases-DSL. The runtime compiles it to whichever substrate is active (Bases for v0.1.6, codeblock for v0.1.7, sqlite-wasm later). This is the same architectural choice dbt Labs made when they pulled metric definitions out of SQL and into MetricFlow YAML, the same choice Cube.dev made with cubes:/measures:/dimensions:, and the same choice LookML implicitly makes by wrapping sql: snippets inside typed dimension:/measure: records. Crosswalker borrows the pattern, not the tools.
The biggest risk surfaced by the research: don’t reinvent. The closest precedent for our use case is not BI semantic layers but ROBOT/OBO Foundry’s YAML-driven ontology pipelines, which decompose ontology operations into named verbs (reason, query, extract, measure) chained by a YAML configuration — exactly the shape Crosswalker’s Tier 2 helpers already take. The schema below preserves that lineage.
Section 1 — Cross-Reference of Declarative-Query Schemas in Adjacent Ecosystems
Section titled “Section 1 — Cross-Reference of Declarative-Query Schemas in Adjacent Ecosystems”1.1 Cross-reference matrix
Section titled “1.1 Cross-reference matrix”| System | Declares shape? | Primitives unit | Aggregation? | Versioning? | Boundary: query vs presentation |
|---|---|---|---|---|---|
| SPARQL CONSTRUCT (W3C) | No (always graph→graph) | Triple patterns + WHERE clause | Via SELECT subqueries / aggregates in 1.1 | Spec-versioned (SPARQL 1.1) | Pure query; presentation is downstream |
| GraphQL (spec.graphql.org) | Implicit (selection set shape ≈ result shape) | Typed fields + arguments + fragments | None native (resolvers do it) | Schema introspection + SDL versioning convention | Strong: query is selection, presentation is client-side |
| dbt model YAML | No (model = SQL file; YAML is metadata) | name/columns/config/tests + ref() | In SQL only | version: 2 at file head; per-resource version: | Weak: SQL inside model, YAML around it |
| dbt MetricFlow (semantic layer) | Yes (metric type: simple, ratio, cumulative, derived, conversion) | semantic_models w/ entities + dimensions + measures; metrics reference these | First-class (agg: sum/count/avg/min/max/count_distinct/median/percentile) | dbt-semantic-interfaces repo pinned to dbt versions | Strong: metric is data; SQL is generated |
| Cube.dev (cube.dev) | No explicit “shape” but cubes: + views: + pre_aggregations: carve roles | dimensions:, measures:, joins:, segments:, hierarchies: | First-class (type: count/sum/avg/count_distinct/min/max/number) plus calculated measures | Schema is JS/YAML, no formal version field, runtime is versioned | Strong: YAML defines model, presentation is downstream BI |
| Looker LookML | Yes-ish (view, explore, dashboard, model are distinct file types) | dimension:, measure:, filter:, parameter:, join: | First-class (type: count/count_distinct/sum/average/min/max/median/percentile/percent_of_previous/number/list) | Project-level, IDE-managed | Medium: sql: snippets allowed inside fields (locks to SQL) |
Datasette metadata.yaml | No (canned queries are raw SQL) | databases.<db>.queries.<name>.sql + params + title | In SQL only | None | Weak: presentation hints (facets, sortable_columns, size, fragment) co-exist with SQL |
| ROBOT YAML / OBO Foundry ODK | Implicit (verbs: reason, query, extract, merge, measure, template, report) | Per-command params (--reasoner, --method, --axiom-generators, template_options, module_type) | Via measure command + standard reports | ODK config has implicit version via container tag | Strong: YAML drives a CLI pipeline; SPARQL files are referenced, not inlined |
Obsidian Bases (.base YAML) | Yes (views: [{type: table, …}, {type: cards, …}, …]) | filters: (recursive AND/OR/NOT), formulas:, view-level properties:, groupBy, sort | Limited (count, planned sum/avg) | None (early beta) | Strong: filters declarative, presentation is the view block |
1.2 Per-system synthesis (the lessons that shape our design)
Section titled “1.2 Per-system synthesis (the lessons that shape our design)”SPARQL CONSTRUCT (w3c.org/TR/sparql11-query/#construct) returns an RDF graph by templating triples from WHERE-clause bindings. The lesson: a query is “match a pattern, project into a target shape”. Our primitives.rows/cols/cell are exactly the projection template, and shape: chooses which template to fill. We must not require users to write SPARQL; we use the pattern of separation (pattern + projection), not the syntax.
GraphQL (spec.graphql.org) ties result shape to query shape: the selection set is the response schema. The lesson: typed selection beats stringly-typed query bodies. Our primitives are a typed selection over the Crosswalker concept graph. We borrow GraphQL’s “operation type as discriminator” for our shape: enum.
dbt model YAML (docs.getdbt.com/reference/model-configs) uses YAML purely as metadata around SQL files. Anti-pattern for us: it conflates description with logic. We reject this for v0.1.6.
MetricFlow (docs.getdbt.com/docs/build/about-metricflow) is the closest large-scale precedent. Semantic models declare entities (join keys ≈ ontologies in our world), dimensions (group/slice ≈ axes), and measures (aggregation rules ≈ cell ops). Metrics then reference measures with a type: discriminator (simple/ratio/cumulative/derived/conversion). This is essentially the architecture we adopt: a discriminator on shape, fixed slots per shape, all aggregation through a closed enum (extensible via plugin registry later).
Cube.dev (cube.dev/docs) demonstrates that a YAML semantic model can be authored in <30 lines and still drive a full SQL generator. Critically, Cube allows sql: | blocks for dimensions, which in their world is necessary because the substrate is always SQL — we don’t have that constraint, so we can keep our schema cleaner by forbidding inline sql: and instead exposing op:, edge:, predicate: fields that the engine maps to substrate calls.
LookML (cloud.google.com/looker/docs/lookml-quick-reference) splits files into view/explore/model/manifest. Each dimension: / measure: is a typed record with a type: enum (count, sum, count_distinct, min, max, average, median, percentile, percent_of_previous, number, list, …). Lesson: a closed-but-extensible aggregation enum is the right abstraction. We adopt the same enum names for cross-tool familiarity.
Datasette metadata.yaml (docs.datasette.io/en/stable/metadata.html) embeds raw SQL in queries:. Anti-pattern for us — exactly what Ch 27 rejects. Note Simon Willison himself (issue #2143 in simonw/datasette) has flagged that Datasette’s metadata file became “a kitchen sink”, which is a useful warning against scope creep in a query: block.
ROBOT / OBO Foundry ODK (robot.obolibrary.org) is the relevant precedent because ROBOT’s whole model is “ontology operations chained from YAML/Makefile”. The reason/query/extract/measure commands map almost 1-to-1 onto Crosswalker’s Tier 2 helpers. Lesson: name verbs after operations, not data shapes; let the recipe declare which verb and what arguments, not the imperative steps.
Obsidian Bases (help.obsidian.md/bases/syntax) is the substrate we ship to in v0.1.6. The .base YAML file already commits to a views: [{type, filters, formulas, properties, groupBy, sort}] schema. Our query: block must produce a .base file at output.target_path, which means our shape: enum and primitives: slots must have a clean projection into the Bases schema. The brief’s example (base_view: crosswalkerPivot, target_path: "_crosswalker/views/coverage-matrix.base") confirms this design intent.
Section 2 — JSON Schema for the query: block (Draft 2020-12)
Section titled “Section 2 — JSON Schema for the query: block (Draft 2020-12)”2.1 Design decisions
Section titled “2.1 Design decisions”- Discriminator is
shape(an enum over the six view-shapes). JSON Schema 2020-12’sif/then/else(a closed set of branches inside anallOf) is the standard way to express discriminated unions and is fully supported by AJV. Perjson-schema.org/understanding-json-schema/reference/conditionals, this is the recommended pattern for “applies different constraints to various properties based on the value of another property”. primitivesis an object whose required keys depend onshape. We do not useoneOfoverprimitivesdirectly because AJV produces clearer errors with theif/thenpattern (learnjsonschema.com/2020-12/applicator/if/).outputis required for all shapes, since every recipe must produce some artifact (a.basefile, a codeblock target, etc.). It carriesbase_view,target_path, and an optionalformatenum.empty_cellis first-class with enumgap | blank | zero— a deliberate UX commitment from the brief.aggregationsis a top-level object that names aggregation expressions (DRY pattern from MetricFlow), so thecell.opfield can either be an enum literal or a$refto a named aggregation. v1.0 keeps it simple — only the literal enum is used in reference recipes — but the slot is reserved.- Top-level
additionalProperties: truefor forward-compatibility with v0.1.7 (codeblock processor will add a siblingbody:/codeblock:field).primitives.additionalProperties: falsewithin each shape branch, because primitives are the load-bearing structure and unknown keys there usually indicate a typo. schema_versionis a SemVer string ("1.0.0") — but we follow SchemaVer semantics (snowplow.io/blog/introducing-schemaver-for-semantic-versioning-of-schemas): MAJOR for breaking, MINOR for additive-but-meaningful, PATCH for cosmetic. We use the SemVer punctuation (dots) for ecosystem compatibility while documenting the SchemaVer rules in the comment block.
2.2 The schema (canonical, AJV-validatable)
Section titled “2.2 The schema (canonical, AJV-validatable)”2.3 AJV note
Section titled “2.3 AJV note”AJV v8+ supports JSON Schema 2020-12 natively (ajv.js.org/json-schema.html). The if/then/else branches inside allOf are evaluated as conjunction, so each branch is independent — exactly the discriminator semantics we want. AJV will produce a per-branch error; the loader should wrap errors with a custom message of the form: "Recipe.query: shape='pivot' but primitives is missing required keys: cols, cell".
Section 3 — Versioning & Forward-Compatibility
Section titled “Section 3 — Versioning & Forward-Compatibility”3.1 Versioning policy
Section titled “3.1 Versioning policy”- Field:
schema_version: "1.0.0"(string, SemVer-formatted, SchemaVer-semantic). - MAJOR bump when a primitive is removed or its meaning changes (e.g., renaming
cell.op: counttocell.op: tally). - MINOR bump for any additive change: new shape (e.g.,
shape: matrix-decomposition), new aggregation operator, new optional top-level key (e.g., the v0.1.7body:template). This is the path the codeblock processor will take. - PATCH bump for documentation, default-value tweaks, and pattern relaxations that strictly accept more inputs.
- Compatibility: the loader treats unknown top-level keys as forward-compat additions (because
additionalProperties: trueat the root). Unknown keys inside a shape’sprimitivesare errors (typo guard). Recipes carrying aschema_versionnewer than the loader’s MAJOR raise a hard error; newer MINOR/PATCH raise a warning and proceed.
3.2 Migration strategy when v0.1.7 ships the codeblock processor
Section titled “3.2 Migration strategy when v0.1.7 ships the codeblock processor”The codeblock processor does not modify the query: block. It adds a sibling top-level field on the recipe:
query: and body: are not exclusive. A recipe can carry both: query: produces a Bases file at output.target_path, and body: injects a codeblock that references the same query: (via query_ref: "#") into a markdown body. This is how a recipe can drive both a .base view and an in-line preview. Recipes with only query: (no body:) emit a .base file. Recipes with only body: (no query:) are legal in v0.1.7+ for body-only emissions where the codeblock carries an inline query; the schema in this challenge does not constrain that case.
3.3 Forward-compat with new shapes
Section titled “3.3 Forward-compat with new shapes”Adding shape: matrix-decomposition in a future version means: (1) bump schema_version MINOR to 1.1.0, (2) add a new if/then branch in the schema, (3) add a new entry to the shape enum, (4) the existing additionalProperties: true at root means older loaders simply pass through unrecognized branches with a warning. This is the same forward-compat strategy dbt uses for version: 2 resource files.
3.4 Three-way merge on query: (preview; full treatment in §6)
Section titled “3.4 Three-way merge on query: (preview; full treatment in §6)”Because query: is a typed sub-tree, the merge engine treats it as a single editable unit for v1.0. Fine-grained merge (e.g., user edited cell.op while community PR changed cols.id) is a v1.1 concern. The simplifying rule for v1.0: if the user edited any leaf inside query:, the entire query: block is marked user_edited: true and the system overlay is applied to peer keys (emission grammar) but not to query: without explicit conflict resolution.
Section 4 — Reference Recipes (5 worked examples)
Section titled “Section 4 — Reference Recipes (5 worked examples)”All five validate against the schema in §2.2. They are written as recipe fragments — the surrounding id:, inputs:, and emission grammar (folder/file/heading/tag/wikilink) are elided since they are out of scope for Ch 31.
Recipe 1 — Coverage Matrix (NIST 800-53 × NIST CSF)
Section titled “Recipe 1 — Coverage Matrix (NIST 800-53 × NIST CSF)”Compiles to: crosswalkBetween('nist-csf', 'nist-800-53', { edge: 'equivalent_to' }) followed by a Bases-DSL pivot with row/col axes.
Recipe 2 — Crosswalk Density (any 2 ontologies)
Section titled “Recipe 2 — Crosswalk Density (any 2 ontologies)”Note: density is not a SQL primitive — it is a Crosswalker aggregation operator that compiles to count_distinct(matched_pairs) / (|rows| * |cols|). Because we don’t embed SQL, the user never has to know that.
Recipe 3 — Freshness Heatmap (controls × time-buckets)
Section titled “Recipe 3 — Freshness Heatmap (controls × time-buckets)”The filter: field carries a JSONata expression — the only string-typed expression language allowed in the schema, by deliberate Ch 27 choice. JSONata is JSON-native, declarative, and substrate-neutral (jsonata.org).
Recipe 4 — Ontology Overlap (concepts in A ∩ B via equivalent_to closure)
Section titled “Recipe 4 — Ontology Overlap (concepts in A ∩ B via equivalent_to closure)”This uses axisSelector.source: closure — the only way to express “follow this predicate from this start node up to N hops”. It compiles to closureFromConcept('cis-v8:CIS-Controls-Root', { predicate: 'skos:exactMatch', maxDepth: 1 }).
Recipe 5 — SKOS Subject Density (broader/narrower hierarchy with leaf counts)
Section titled “Recipe 5 — SKOS Subject Density (broader/narrower hierarchy with leaf counts)”Per W3C SKOS reference (w3.org/TR/skos-reference/), skos:broader/skos:narrower are not transitive themselves; transitivity is provided by skos:broaderTransitive/skos:narrowerTransitive. The recipe author chooses which: setting predicate: skos:narrowerTransitive would walk the entire transitive closure in one step; using skos:narrower with depth: 4 walks four explicit levels. Both are valid, and the loader should not silently substitute one for the other.
Section 5 — Boundary Verdict: Data-Only vs Code-with-Fences
Section titled “Section 5 — Boundary Verdict: Data-Only vs Code-with-Fences”5.1 The two paths
Section titled “5.1 The two paths”Data-only (the path I recommend): query: is a typed, declarative tree. No string slot accepts SQL, SPARQL, Datalog, or Bases-DSL. The single string-expression slot (filter:) accepts only JSONata, which is JSON-native and substrate-portable. The engine compiles the tree to whichever substrate is active.
Code-with-fences: query: is a thin wrapper around opaque substrate-specific code blocks — typically sql: |, sparql: |, or bases: |.
5.2 Arguments for code-with-fences (steel-manning)
Section titled “5.2 Arguments for code-with-fences (steel-manning)”- Maximal expressiveness. Any query the substrate supports is expressible; no need to wait for the engine to add new aggregation operators.
- Familiar tools. Power users already write SQL/SPARQL. They get autocomplete, linters, syntax highlighting.
- Faster MVP.
query: { sql: "SELECT ..." }could ship in v0.1.6 in a day. The data-only schema requires more upfront design. - Datasette and dbt do this (Datasette canned queries; dbt model bodies). Both are successful.
- Escape hatch. Even if 95% of recipes are declarative, the 5% that need it have a fallback.
5.3 Arguments for data-only
Section titled “5.3 Arguments for data-only”- Substrate neutrality is non-negotiable per the brief’s anti-pattern #4 (must not couple to sqlite-wasm). Code-with-fences forces a substrate choice into the recipe author’s hands.
- Ch 27 explicitly rejects embedding raw SQL in recipe bodies. The Ch 31 brief inherits this constraint.
- Three-way merge (Ch 28 settled item #10) is infeasible on opaque code strings. You cannot meaningfully merge
SELECT * FROM x WHERE y = 1withSELECT y, count(*) FROM x WHERE z != 0without parsing both. With a typed tree, merge is a tree diff. - Validation is meaningful. The schema can reject
cell.op: medianwhen median isn’t supported, or warn whenrows.idreferences a non-loaded ontology. With opaque SQL, validation is “did the SQL parse?” - Portability across mechanisms is the entire point of the v0.1.6 → v0.1.7 transition. The same recipe must drive a
.baseview and a codeblock; that’s only possible ifquery:is data. - Precedent: MetricFlow, Cube.dev, LookML. The serious semantic-layer tools all chose data-only (LookML allows
sql:snippets, but those are field-level, not query-level — and LookML’s design has been retroactively criticized for that compromise; Cube.dev and MetricFlow learned from it). - GraphQL’s lesson: typed selection > stringly-typed query bodies.
- OBO Foundry’s lesson: verbs over snippets. ROBOT YAML chains named operations; SPARQL files are referenced, not inlined.
- Author UX is better.
cell.op: countis shorter thansql: "SELECT COUNT(*) FROM ...", and harder to typo. - Anti-pattern #7 (schema so rich authors can’t hand-author YAML) is best avoided by fewer slots, each strongly typed. Code-with-fences seems “simpler” but actually enlarges the schema by adding hidden complexity (which substrate? which dialect? which version?).
5.4 Final recommendation: data-only, with one narrowly-scoped escape
Section titled “5.4 Final recommendation: data-only, with one narrowly-scoped escape”query: is data. The single string-typed slot is filter: carrying JSONata — which is itself declarative and JSON-native, and is not a substrate-specific code language. JSONata is the same boundary Knative’s EventTransform CRD chose (knative.dev/docs/eventing/transforms/event-transform-jsonata/), the same boundary AWS Step Functions ASL chose (docs.aws.amazon.com/step-functions/.../transforming-data.html), and the same one Truto chose for their integrations DSL (truto.one/blog) — all for the same reasons we choose it: portable across substrates, declarative, JSON-native, well-specified (currently v2.0.6).
Explicitly forbidden in v1.0: sql:, sparql:, bases:, datalog:, js:, python:, or any other substrate-specific string slot at any level of the query: tree. If a user needs more expressiveness than the schema provides, the path is to (a) raise an issue requesting a new aggregation operator or shape, (b) write a Tier 2 helper, or (c) use the v0.1.7 body: codeblock with their own implementation — but then they own the portability cost.
Section 6 — Composition with Ch 28 Lifecycle (user_edited, three-way merge, provenance)
Section titled “Section 6 — Composition with Ch 28 Lifecycle (user_edited, three-way merge, provenance)”Ch 28 settled item #10 commits Crosswalker to: schema validation + provenance + user_edited: true + three-way merge. The query: block must compose with that.
6.1 user_edited: true semantics
Section titled “6.1 user_edited: true semantics”- Granularity:
user_editedapplies to the wholequery:sub-tree, not to individual primitives. If the user touchescell.op, the whole block is “edited”. - Detection: the loader on save compares the in-memory
query:to its system-base (the version produced by the recipe generator withprovenance.source: system). Any structural difference flips the flag. - Persistence:
user_edited: trueis persisted inside thequery:block (not at recipe top-level) so that emission-grammar edits (folder/file paths) and query edits are tracked separately. This matters because most recipes today edit emission paths but never query bodies, and we don’t want a false-positive merge conflict.
6.2 Three-way merge on query:
Section titled “6.2 Three-way merge on query:”The merge inputs are:
- Base: the system-generated
query:block at last regeneration (stored in a.crosswalker/lock/file, similar to a package lockfile). - User: the current on-disk
query:block (possibly edited). - Incoming: the freshly-generated
query:block (e.g., after a community PR updates the recipe template, or after the user changes inputs).
The merge algorithm is a typed tree diff, not a textual diff. For each leaf path (e.g., primitives.rows.id, primitives.cell.op, output.target_path):
- If
base == user == incoming: no change. Take any. - If
base == user != incoming: user has not touched, take incoming (auto-merge). - If
base != user == incoming: user matches incoming, no conflict. - If
base != user != incominganduser != incoming: conflict. Surface to user with the typed diff ("primitives.cell.op: base=count, user=count_distinct, incoming=density — choose one").
This is feasible only because query: is typed data (Section 5 verdict). With opaque SQL strings, step 4 becomes “show the user two SQL snippets and pray”.
6.3 provenance.source semantics for queries
Section titled “6.3 provenance.source semantics for queries”source: system— generated by the recipe generator from a template. Default for fresh recipes.source: user— the user authored thequery:block by hand or edited a system-generated one. Set whenuser_edited: trueis set the first time.source: community— pulled in from a community recipe pack (e.g., a marketplace contribution). Treated likesystemfor merge purposes (community PRs are upstream), but flagged in UI for review.
Provenance lives inside the query: block (not just at recipe top-level) because a recipe’s emission grammar may be system-generated while its query is user-authored, and conflating them loses signal.
Section 7 — Validation Against the 7 Candidate Primitives (Ch 29) and 6 Shapes (Ch 30)
Section titled “Section 7 — Validation Against the 7 Candidate Primitives (Ch 29) and 6 Shapes (Ch 30)”The brief states there are 7 candidate query primitives (pending Ch 29 finalization) and 6 candidate view shapes. Without access to the specific Crosswalker pages (the in-repo agent-context pages are not externally fetchable as of this research), I infer the 7 primitives from Tier 2 helpers and the brief’s own pivot example as: (1) ontology-set (whole ontology as an axis), (2) closure (predicate-walked from a start), (3) concept-list (explicit set), (4) edge-predicate (typed link), (5) field-selector (metadata path), (6) aggregation-op (cell/leaf/bucket reducer), (7) filter-expr (JSONata predicate). Each of these is a $ref in the schema’s $defs (ontologyId, axisSelector covering closure/concepts/ontology, edgePredicate, fieldSelector, aggOp, filterExpr, cellSpec).
The shape × primitive mapping enforced by the schema:
| Shape | Required primitives | Optional primitives |
|---|---|---|
table | source, columns | filter, sort |
list | source, label | filter, sort |
pivot | rows, cols, cell | empty_cell (top-level) |
graph | nodes, edges | start, max_depth |
hierarchy | root, predicate | depth, leaf_aggregation |
timeline | axis, event_source | bucket_aggregation |
Every primitive in the inferred 7-primitive vocabulary is exercised by at least one recipe in §4. If the actual Ch 29 finalized list differs, the schema can absorb the rename via a PATCH bump (renaming a $defs entry while preserving JSON shape) or a MINOR bump (adding a new primitive type).
Recommended Changes — Exact YAML/JSON Fragments to Add
Section titled “Recommended Changes — Exact YAML/JSON Fragments to Add”A. spec/recipe.schema.json — top-level addition
Section titled “A. spec/recipe.schema.json — top-level addition”Add the following property to the existing recipe schema, after the current emission grammar fields (folder/file/heading/tag/wikilink). It is purely additive; no existing field is changed.
B. spec/recipe.query.schema.json — new file
Section titled “B. spec/recipe.query.schema.json — new file”The complete schema in §2.2 above, written as a sibling file under spec/. Importing it via $ref keeps the main recipe schema scannable.
C. spec/recipe.schema.json — top-level additionalProperties posture
Section titled “C. spec/recipe.schema.json — top-level additionalProperties posture”Confirm additionalProperties: true at the root of the recipe schema (it almost certainly is already). This is what makes v0.1.7’s body: field forward-compatible without a MAJOR bump.
D. Loader changes
Section titled “D. Loader changes”- AJV instance compiled with
strict: false,allErrors: true,discriminator: false(we useif/then, not OpenAPI’sdiscriminator). - Error formatter that converts AJV’s per-branch errors into
"Recipe.query: shape='<X>' but primitives is missing required keys: ..."— AJV’s raw output is unhelpful here. - Hook into the existing recipe lifecycle to (a) validate
query:on load, (b) computeuser_editedon save, (c) compile to Tier 2 helpers (crosswalkBetweenfor pivot/list when both axes are ontologies,closureFromConceptfor hierarchy and closure-axisSelectors,getConceptsByOntologyfor whole-ontology axes).
E. Tier 2 helpers — small additions
Section titled “E. Tier 2 helpers — small additions”The current Tier 2 helpers (crosswalkBetween, closureFromConcept, getConceptsByOntology) cover most of the query compilation. Add two more to fully cover the schema:
densityBetween(ontologyA, ontologyB, edge)— wrapscrosswalkBetweenand divides by|A| × |B|.bucketEvents(events, axisField, granularity)— covers the timeline shape.
Both can be small wrappers; they do not require a new substrate.
F. Documentation
Section titled “F. Documentation”Add a docs/spec/query-block.md page with:
- The shape decision tree (when to use pivot vs hierarchy vs graph).
- The seven primitives and their
$defsnames. - The five reference recipes from §4.
- The lifecycle + merge story from §6.
- An explicit note that
sql:/sparql:/bases:slots are not supported and why.
Recommendations (staged, with thresholds)
Section titled “Recommendations (staged, with thresholds)”Stage 1 — Lock the schema (this week, blocking D8 in v0.1.6):
- Land
spec/recipe.query.schema.jsonexactly as in §2.2. - Land the
query: { $ref: ... }addition tospec/recipe.schema.json. - Add AJV validation in the loader; reject recipes where
query:is present but invalid. - Threshold to revisit: if more than 20% of early-alpha recipe authors hit the “schema too rich” wall and ask for
sql:, escalate to a formal RFC. Until then, hold the line.
Stage 2 — Wire Tier 2 compilation (v0.1.6 milestone):
- Implement the pivot path:
query.shape == 'pivot'→crosswalkBetween→ Bases pivot view emission tooutput.target_path. - Implement the list and table paths next (low-effort).
- Defer graph/hierarchy/timeline compilation to Stage 3 — the schema accepts them, but the engine can stub-out with a “not yet implemented” warning.
- Threshold to revisit: if a Bases-DSL feature is missing that blocks pivot rendering, document it in Ch 33+ but do not change the recipe schema.
Stage 3 — Lifecycle integration (v0.1.6 → v0.1.7):
- Implement
user_editeddetection on save (typed tree compare against.crosswalker/lock/). - Implement three-way merge on
query:per §6.2; surface conflicts in the UI. - Threshold to revisit: if conflict rate is > 5% of recipe regenerations, reduce merge granularity to “whole
query:block” (the v1.0 simplifying rule).
Stage 4 — Codeblock processor (v0.1.7):
- Add the
body:sibling field per §3.2; do not modifyquery:. - Allow recipes to carry both
query:andbody:, withbody[].codeblock.query_ref: "#"referencing the siblingquery:. - Threshold to revisit: if codeblock authors demand inline overrides of
query:primitives, add abody[].codeblock.query_overlay: { ... }field — but reject any proposal to embed substrate code there.
Stage 5 — Plugin-registered aggregations (v1.1+):
- Open up
aggOpto plugin registration; theaggregations:top-level slot was reserved for this in §2.2. - Threshold to revisit: only when at least 3 community recipes have asked for the same custom op.
Caveats
Section titled “Caveats”- The Crosswalker in-repo pages were not externally fetchable during this research (
cybersader.github.io/crosswalker/agent-context/...returned permissions errors via the available web fetch tool). The 7 query primitives, 6 view shapes, Ch 27/28/29/30 settled items, and the Tier 2 helper signatures used in this deliverable are inferred from the brief’s text and from the broader ecosystem patterns documented in §1. If the finalized Ch 29 primitives or Ch 30 shapes differ, treat this schema as a draft and apply the version-bump rules in §3 rather than a rewrite — the$defsstructure is intentionally factored to absorb name-level changes via PATCH bumps. - JSON Schema discriminator semantics: AJV supports
if/then/elsenatively but its error messages on branch failures are not friendly out of the box. The loader needs a wrapper (§E.2). An alternative would be OpenAPI’sdiscriminatorkeyword (also supported by AJV), but it is non-standard JSON Schema and was rejected for portability. - JSONata as the only expression language is a deliberate narrowing. If at any point the team decides to allow Bases-DSL expressions in
filter:, it should be a MAJOR bump because it changes substrate portability semantics — exactly the trade-off Section 5 argues against. - The
densityaggregation operator is Crosswalker-specific (not a SQL primitive). Documenting it explicitly is important; it is the one place where ouraggOpenum diverges from LookML/Cube.dev/MetricFlow. - Three-way merge granularity is set to “whole block” in v1.0 (§3.4, §6.2). This is a simplification that will produce false-positive conflicts when both the user and an incoming community PR edit different leaves of
primitives. Tracking false-positive rate is a v1.1 concern; the schema does not need to change, only the merge engine. - Schema versioning string format (SemVer dots vs SchemaVer hyphens): Snowplow’s SchemaVer (snowplow.io/blog/introducing-schemaver-…) explicitly uses hyphens (
1-0-0) to visually distinguish from SemVer. We chose SemVer dots because Crosswalker’s tooling ecosystem (npm, dbt, etc.) expects dots, and the visual cue is less important here than ecosystem fit. The trade-off is documented; if it bites, switching to hyphens is a PATCH bump (regex change). - Forward-compat may be over-permissive:
additionalProperties: trueat the root means typos in top-level keys (querry:instead ofquery:) silently pass. This is the cost of forward-compat. A linter pass (separate from the schema) should warn on suspiciously close key names. - The brief’s example uses
cell: { op: count, edge: equivalent_to }— directly compatible with the schema (cellSpecdef). No changes from the brief’s sketch were forced, only formalized. - Sources surveyed in §1 vary in maturity. dbt MetricFlow and Cube.dev are mature semantic layers with stable specs; LookML is mature but proprietary; SPARQL is a W3C Rec; GraphQL is a stable spec. Datasette’s metadata.yaml is (per its maintainer’s own GitHub issue #2143) explicitly being de-tangled because it became a “kitchen sink” — a useful warning we apply by keeping our schema scope tight. Obsidian Bases is in early beta and its YAML format is still evolving; we should expect to revisit the
output.base_viewintegration when Bases ships its plugin API. - “Recipe as data” is not free — it ties Crosswalker to an in-house compiler that must keep pace with substrate features. This is an explicit trade vs. dbt-style “thin metadata around SQL”. The verdict in §5 is that for an ontology crosswalk tool, where the substrate is currently Bases (with sqlite-wasm and codeblocks coming), the portability win is worth the compiler cost. If Crosswalker ever consolidates on a single substrate permanently, this trade-off should be revisited.