Challenge 29: Ontology-web query verbs — adversarial validation
Why this exists
Section titled “Why this exists”Crosswalker’s query engine has a three-layer architecture: primitives (Layer A), view shapes (Layer B), recipes (Layer C). Layer A’s vocabulary is load-bearing — every recipe in the marketplace, every codeblock query users write, every internal SQL helper composes from it. If we lock the wrong primitive set, every downstream artifact is built on a shaky vocabulary.
The proposed set (the candidate “LEGO bricks for asking questions about ontology webs”):
| # | Primitive | Plain language | Cross-domain precedent |
|---|---|---|---|
| 1 | filter | Restrict by predicate | SPARQL FILTER, SQL WHERE, Datalog body |
| 2 | project | Choose attributes | SPARQL SELECT, SQL projection |
| 3 | traversal | Hop along an edge (1 step) | SPARQL property path single step, SKOS broader/narrower |
| 4 | closure | Transitive reachability | SPARQL :p+/:p*, Datalog recursion, OWL transitive |
| 5 | anti-join | ”X without Y” | SPARQL MINUS, SQL EXCEPT |
| 6 | pivot | 2D crosstab | OLAP cube, pandas pivot_table |
| 7 | aggregate | Count / sum / max / min | SPARQL COUNT, SQL aggregates |
This set was surfaced by the 2026-05-08 alignment review, not by deep adversarial cross-reference. Ch 29 does the cross-reference.
What we already have
Section titled “What we already have”| Asset | What it gives us |
|---|---|
concepts/query-primitives.mdx | The candidate 7-primitive page; cross-domain precedent table; primitive × mechanism matrix |
Tier 2 query helpers (src/tier2/queries.ts) | getConceptsByOntology, crosswalkBetween, closureFromConcept — 3 of 7 primitives have shipped executable surfaces |
| Synthesis log Settled #14 | Marks the candidate set as pending this challenge |
| Recipe schema EMISSION primitives (Ch 22 synthesis) | Domain-neutral 5-mechanism set (folder/file/heading/tag/wikilink); precedent for closed primitive grammar |
What to investigate
Section titled “What to investigate”Each section below is a focused question the deliverable must answer concretely.
1. Cross-reference the 7 primitives against established standards
Section titled “1. Cross-reference the 7 primitives against established standards”Build a primitive × standard matrix. For each of the 7, document:
- SPARQL (W3C standard for RDF query): which SPARQL constructs map onto each primitive? Are there SPARQL constructs we’re missing?
- Datalog (logic-programming paradigm; Cozo, Nemo, Datomic): how does the primitive express in rule-body form? Stratified negation, recursion, aggregation as rule heads.
- OLAP cube operators (slice, dice, drill-down, roll-up, pivot): which are subsumed by our primitives, which are missing?
- SKOS (W3C, taxonomy): broader/narrower/related/exactMatch — are these traversal, closure, or something else?
- OWL transitive properties + reasoning: what gets us beyond traversal/closure into actual reasoning, and is reasoning in scope?
- SSSOM mapping vocabulary: does its predicate set imply any primitives we’re missing (
narrower_matchaggregation, justification-weighted matches)? - Cypher / Gremlin / GQL: graph-database query languages; what primitives are central there?
- GraphQL schemas: how does GraphQL’s “select fields, follow connections” map?
Output: a table where each row is a primitive and each column is a standard. Mark “covered”, “approximate”, “missing entirely”. Find the gaps.
2. Find missing primitives
Section titled “2. Find missing primitives”Specific candidate primitives to evaluate:
- diff — “what changed between v1 and v2 of this ontology?” (relevant for ontology evolution; Ch 03 territory)
- rank — relevance scoring, weighted similarity (relevant for embedding-based queries; sqlite-vec territory)
- constraint-satisfy — OWL DL reasoning; “what concepts satisfy this class definition?” (probably out of scope, but argue)
- temporal primitives — snapshot, change-feed, point-in-time (relevant for v0.1.8 audit trail)
- set operations — union, intersection, symmetric difference (compositional or primitive?)
- projection-with-rename — does naming aliases at projection time deserve to be a separate primitive?
- window functions — running totals, partition aggregates (SQL
OVER, OLAP roll-up) - constraint propagation — given partial information about a concept, what’s derivable?
For each: argue YES or NO with a concrete decision rule. If YES, reframe as #8/#9/#10. If NO, document the rejection so future agents don’t re-litigate.
3. Find redundant or wrong-abstraction primitives in the candidate set
Section titled “3. Find redundant or wrong-abstraction primitives in the candidate set”Specific challenges to the candidate 7:
- project: is project really a primitive, or is it a Layer B concern (view shape config)?
- aggregate: similar question — are aggregates Layer A operations, or Layer B view-config?
- traversal vs closure: should they be one parameterized primitive (
traverse(predicate, depth=1|*))? - pivot: is pivot really an operation, or a composition of (group-by + group-by + aggregate)? If compositional, does it belong as a Layer A primitive?
- anti-join: is anti-join a primitive or a compositional pattern?
Output: an argued list of which primitives stay (or get merged/split), with rationale. The goal is the smallest complete set, not the largest possible set.
4. Boundary check: Layer A (primitives) vs Layer B (view shapes)
Section titled “4. Boundary check: Layer A (primitives) vs Layer B (view shapes)”The synthesis distinguishes Layer A (mechanism-neutral query operations) from Layer B (mechanism-neutral visual presentations). Is the boundary right?
- Is pivot Layer A (a query operation that produces 2D-shaped result) or Layer B (a visual presentation of any tabular result)?
- Is sort Layer A or Layer B? (Both arguably; depends on whether sort affects semantics or just display)
- Is limit / pagination Layer A or Layer B?
- Where does search relevance ranking belong?
Argue the boundary. Provide examples where a wrongly-placed primitive at the wrong layer makes the architecture awkward.
5. Re-audit the candidate primitives against actual Crosswalker queries
Section titled “5. Re-audit the candidate primitives against actual Crosswalker queries”Take the 20-question query-routing matrix from Ch 28a §4 and decompose each query into Layer A primitives. Find:
- Queries that need primitives the candidate set lacks
- Queries that decompose into combinations not naturally expressible
- Primitives that are NEVER used in any of the 20 queries (vestigial?)
6. Worked composition examples
Section titled “6. Worked composition examples”Demonstrate that the primitive set is actually compositional. For each of these representative ontology-web queries, write the explicit primitive composition:
- “Coverage gaps: NIST 800-53 controls without evidence” (compliance, GRC)
- “Crosswalk chain: NIST CSF → 800-53 → ISO 27001 (transitive)” (compliance, GRC)
- “MITRE ATT&CK techniques mitigated by NIST AC-family controls” (cross-framework)
- “SKOS broader/narrower closure from a top-level subject heading” (taxonomy)
- “OBO Foundry: gene-ontology terms with no MONDO disease mapping” (biomedical)
- “OLIR: which submitted crosswalks have inconsistent confidence values?” (quality assurance)
- “Ontology version diff: terms removed between OBO 2024-Q1 and 2024-Q2” (ontology evolution; needs
diffprimitive?)
Output: each query as a primitive composition. Where a composition is awkward, that’s evidence for an additional primitive.
Anti-patterns to reject upfront
Section titled “Anti-patterns to reject upfront”The deliverable must NOT recommend:
- A “minimum complete set” that’s actually incomplete — if the cross-reference shows we’re missing a primitive that 80% of users will need, add it
- Adding primitives speculatively — every addition must be justified by a real query that can’t compose otherwise
- Implementing the full SPARQL feature set — Crosswalker is a Markdown-vault tool, not a triple store. Borrow semantics; don’t reimplement reasoning
- OWL DL reasoning — explicitly out of scope
- Reinventing existing standards — if SSSOM/SKOS/STRM already say it, point at them; don’t invent parallel vocabulary
- Folding Layer A primitives into Layer B view shapes to “simplify” — that’s how query languages become unmaintainable
- Making the primitive set engine-specific — primitives must be mechanism-neutral; if a “primitive” only works in SQL, it’s not a primitive
Success criteria for the deliverable
Section titled “Success criteria for the deliverable”The deliverable must produce:
- Primitive × standard matrix — 7 candidate primitives × 7+ standards (SPARQL/Datalog/OLAP/SKOS/OWL/SSSOM/Cypher/GraphQL); each cell marked covered/approximate/missing
- Missing-primitive evaluation — each of (diff, rank, constraint-satisfy, temporal, set ops, project-with-rename, window, constraint-propagation) argued YES/NO with rationale
- Redundant-primitive evaluation — argued challenge to project/aggregate/pivot/anti-join as Layer A primitives; final verdict on whether they stay
- Final primitive set — confirmed list (could be 5, 7, 9, or other number); each item with its plain-language framing + cross-domain anchor
- Layer A/B boundary — argued ruling on pivot/sort/limit/rank placement
- Composition examples — 7+ representative ontology-web queries decomposed into primitive compositions
- Recommended changes to synthesis log — exact wording for Settled-item #14 + any changes to query-primitives concept page
Anchored references
Section titled “Anchored references”Project context:
concepts/query-primitives— candidate set + cross-domain precedent tableconcepts/view-shapes— Layer B, for boundary checkconcepts/ontology-web-querying— positioning- In-progress synthesis log — Settled #14 references this challenge
Standards:
- SPARQL 1.1 spec — property paths, MINUS, FILTER NOT EXISTS, aggregates
- SKOS reference
- SSSOM specification
- OWL 2 transitive properties
- Datalog (Wikipedia)
- OLAP cube operators — slice / dice / drill / roll-up / pivot
Engines (for primitive-vocabulary comparison):
- Cozo — Datalog + relational
- Nemo — Datalog WASM (Crosswalker reference)
- Oxigraph — SPARQL in Rust/WASM
- Stardog — Knowledge graph + reasoning
- DuckDB DuckPGQ — graph extension
- Apache DataFusion — query engine
Adjacent Crosswalker challenges:
- Ch 12 archived (Datalog vs SQL — narrow scope)
- Ch 36 — Query language rerun (sister challenge; broader query-language question)
- Ch 33 — Multi-modal landscape audit (sister; how engines expose primitives)
Hand-off
Section titled “Hand-off”Write the deliverable to docs/src/content/docs/agent-context/zz-research/YYYY-MM-DD-challenge-29-deliverable-a-<slug>.md (plain .md, frontmatter only for sidebar). Convention per zz-research/index.md — verbatim preservation; multi-deliverable runs split per agent.
After deliverable lands: update concepts/query-primitives.mdx with confirmed/revised set; update synthesis log Settled-item #14; flip §9 status table from ⏳ to ✅; archive this brief.