Challenge 31: Recipe `query:` block schema design — how recipes declare what to query in YAML
Why this exists
Section titled “Why this exists”Crosswalker’s recipe schema today (spec/recipe.schema.json) describes emission — what to import and how to write to disk (folder / file / heading / tag / wikilink mechanisms; the 5-mechanism Ch 22 grammar). It does NOT describe queries.
For v0.1.6 we ship a custom Bases view (crosswalkerPivot) parameterized by recipe. That parameterization needs a YAML structure: which axes to pivot, which edge predicate to count, what cell aggregation to use. Without a schema, recipe authors invent ad-hoc options that break later.
What recipe authors might write (sketch — Ch 31 will lock the actual structure):
Ch 31 designs the JSON Schema that validates this YAML and produces 3-5 reference recipes that exercise it.
What we already have
Section titled “What we already have”| Asset | What it gives us |
|---|---|
spec/recipe.schema.json | Existing emission grammar; the query: block is additive — must not break existing recipes |
concepts/query-primitives | Layer A vocabulary that recipes compose from |
concepts/view-shapes | Layer B vocabulary that recipes select |
Tier 2 query helpers (src/tier2/queries.ts) | crosswalkBetween, closureFromConcept, getConceptsByOntology — the executable surface recipes invoke |
| Synthesis log D8 | ”v0.1.6 additive bump” lean confirmed pending this design |
What to investigate
Section titled “What to investigate”1. Cross-reference declarative-query schemas in adjacent ecosystems
Section titled “1. Cross-reference declarative-query schemas in adjacent ecosystems”For each, document: how does the system let users declare a query in YAML/config? What fields? What primitives? What boundary between query + presentation?
- SPARQL CONSTRUCT clause — how does SPARQL let you declare “what to fetch + what to return”?
- GraphQL query schemas — typed selection sets with field-level args
- dbt model YAML (https://docs.getdbt.com/docs/build/models) —
models/*.sql+*.ymlschema; how do dbt models describe queries declaratively? - Looker LookML (https://cloud.google.com/looker/docs/lookml-quick-reference) — semantic layer with
view/explore/measure/dimension - Cube.dev semantic layer (https://cube.dev/docs/) — JS/YAML semantic layer with measures/dimensions
- MetricFlow (dbt’s semantic layer) — semantic models with measures + dimensions
- Datasette metadata.yaml — declarative views + canned queries
- ROBOT YAML configs (OBO Foundry tooling) — how do biomedical-ontology pipelines declare queries
2. Design the JSON Schema for query: block
Section titled “2. Design the JSON Schema for query: block”Concrete design pass. Specify:
- Top-level structure:
shape(enum) +primitives(object, structure depends on shape) +output(where does result land) primitivesper shape: pivot has rows/cols/cell; graph has nodes/edges/start; hierarchy has root/predicate/depth; timeline has axis/event-source- Type system: what can a value reference?
ontologyID?conceptCURIE? Field selector? Edge predicate? - Aggregation operators: count, sum, avg, min, max, count_distinct — extensible? plugin-registered?
- Empty-cell semantics: gap vs blank vs zero — first-class field
- Validation rules: which combinations of shape + primitives are valid?
Output: a complete JSON Schema (draft 2020-12) for the additive query: block.
3. Versioning and forward-compatibility
Section titled “3. Versioning and forward-compatibility”The query: block additively bumps the recipe schema. Specify:
- What’s the schema version field? (
schema_version: 1.0.0SemVer) - What’s the migration path when v0.1.7 codeblock processor adds new fields?
- Can a recipe specify both
query:block ANDbody:template (where the body has codeblock invocations)? Or are they exclusive? - How does the loader handle unknown fields (forward-compat)?
4. Reference recipes (3-5 worked examples)
Section titled “4. Reference recipes (3-5 worked examples)”Write actual .yaml recipes that exercise the schema:
- Coverage Matrix (compliance launch recipe) — NIST 800-53 × NIST CSF, cells = count of equivalent_to edges
- Crosswalk Density (cross-domain) — any 2 ontologies; cells = density of mappings (covered count / cross-product)
- Freshness Heatmap (compliance) — controls × time-buckets; cells = count of evidence reviewed in bucket
- Ontology Overlap (cross-domain) — concepts in ontology A ∩ ontology B (via equivalent_to closure)
- SKOS Subject Density (taxonomy) — broader-narrower hierarchy with leaf counts
Each recipe should validate against the JSON Schema + produce a working .base file when run through the recipe loader.
5. Boundary: recipe as data vs recipe as code
Section titled “5. Boundary: recipe as data vs recipe as code”A core question: does query: declare data (axes, edges, cell ops) that the engine interprets, OR does it declare a snippet of code (SQL fragment, Datalog rule)? The two approaches:
- Data-only: recipe lists primitives + their parameters; engine compiles to SQL/SPARQL. Closest to dbt/LookML.
- Code-with-fences: recipe has a
query.sqlfile or inline string. Closest to dbt models with.sqlcontent.
Argue. The data-only path keeps recipes portable across mechanism (Bases vs SQL vs codeblock); the code-with-fences path is more flexible but locks recipes to one mechanism.
6. Compatibility with Ch 28 settled items
Section titled “6. Compatibility with Ch 28 settled items”The synthesis log Settled-item #10 commits to “schema validation + provenance + user_edited:true flag + three-way merge” for recipe lifecycle. The query: block must compose with this lifecycle:
- Does
query:get the sameuser_edited:trueflag treatment? - How does three-way merge work on a
query:block (rather than mapping rows)? - What’s a
provenance.source = system | user | communitymean for a query?
7. Validate against the 7 candidate primitives (pending Ch 29)
Section titled “7. Validate against the 7 candidate primitives (pending Ch 29)”Currently 7 primitives. The query: block must express compositions of all 7. Walk through each shape × required primitives mapping; show the YAML for each.
If Ch 29 revises the primitive set, Ch 31’s schema must absorb the revision before locking.
Anti-patterns to reject upfront
Section titled “Anti-patterns to reject upfront”The deliverable must NOT recommend:
- Embedding raw SQL in the recipe — violates the “no raw SQL in concept-note bodies” anti-pattern (Ch 27). Recipes declare what; the engine compiles to how.
- A custom Crosswalker query language — Ch 27 anti-pattern. JSONata + SQL + Bases-DSL are the existing layers; don’t invent a fourth.
- Schema that only works for the pivot shape — must support all 6 candidate shapes (table / list / pivot / graph / hierarchy / timeline) even if v0.1.6 only ships pivot.
- Tightly coupling
query:to sqlite-wasm — the schema must be substrate-neutral; the engine compiles per-mechanism. - Reinventing dbt/LookML/Cube.dev — borrow patterns, don’t fork.
- Breaking changes to existing emission grammar —
query:is additive; recipes withoutquery:continue to work. - A schema so rich that recipe authors can’t author by hand — explicit non-goal; recipes are YAML by humans.
Success criteria for the deliverable
Section titled “Success criteria for the deliverable”The deliverable must produce:
- Cross-reference matrix — 6+ adjacent declarative-query systems (dbt, LookML, Cube, GraphQL, SPARQL CONSTRUCT, Datasette) × design dimensions (declares shape? primitives? aggregation? versioning?)
- Working JSON Schema for the
query:block (draft 2020-12, validates with AJV) - 3-5 reference recipes (
.yamlfiles) that validate against the schema and exercise all candidate shapes - Versioning + forward-compat plan — schema_version handling, additive-bump path to v0.1.7
- Boundary verdict — data-only vs code-with-fences argued; final choice
- Lifecycle integration —
user_edited:true, three-way merge, provenance fields applied toquery:block - Recommended changes — exact YAML fragments to add to
spec/recipe.schema.json
Anchored references
Section titled “Anchored references”Project context:
spec/recipe.schema.json— current emission grammarconcepts/query-primitives— Layer A vocabulary recipes composeconcepts/view-shapes— Layer B shapes recipes select- v0.1.6 milestone — recipe-schema additive bump task
Standards:
Adjacent declarative-query systems:
- dbt models YAML
- Looker LookML
- Cube.dev
- Datasette metadata.yaml
- MetricFlow semantic models
- ROBOT (OBO Foundry tooling)
Hand-off
Section titled “Hand-off”Write the deliverable to docs/.../zz-research/YYYY-MM-DD-challenge-31-deliverable-a-<slug>.md. Include the full JSON Schema as a fenced code block + the 3-5 reference recipes inline. After deliverable lands: integrate into spec/recipe.schema.json (additive bump); update v0.1.6 milestone tasks; flip synthesis log §9 status Ch 31 row from ⏳ to ✅; archive this brief.