🚧 Early alpha — building the foundation. See the roadmap →

Challenge 29: Ontology-web query verbs — adversarial validation

Created May 8, 2026 Updated Jun 1, 2026

Why this exists

Crosswalker’s query engine has a three-layer architecture: primitives (Layer A), view shapes (Layer B), recipes (Layer C). Layer A’s vocabulary is load-bearing — every recipe in the marketplace, every codeblock query users write, every internal SQL helper composes from it. If we lock the wrong primitive set, every downstream artifact is built on a shaky vocabulary.

The proposed set (the candidate “LEGO bricks for asking questions about ontology webs”):

#	Primitive	Plain language	Cross-domain precedent
1	filter	Restrict by predicate	SPARQL `FILTER`, SQL `WHERE`, Datalog body
2	project	Choose attributes	SPARQL `SELECT`, SQL projection
3	traversal	Hop along an edge (1 step)	SPARQL property path single step, SKOS broader/narrower
4	closure	Transitive reachability	SPARQL `:p+`/`:p*`, Datalog recursion, OWL transitive
5	anti-join	”X without Y”	SPARQL `MINUS`, SQL `EXCEPT`
6	pivot	2D crosstab	OLAP cube, pandas `pivot_table`
7	aggregate	Count / sum / max / min	SPARQL `COUNT`, SQL aggregates

This set was surfaced by the 2026-05-08 alignment review, not by deep adversarial cross-reference. Ch 29 does the cross-reference.

What we already have

Asset	What it gives us
`concepts/query-primitives.mdx`	The candidate 7-primitive page; cross-domain precedent table; primitive × mechanism matrix
Tier 2 query helpers (`src/tier2/queries.ts`)	`getConceptsByOntology`, `crosswalkBetween`, `closureFromConcept` — 3 of 7 primitives have shipped executable surfaces
Synthesis log Settled #14	Marks the candidate set as pending this challenge
Recipe schema EMISSION primitives (Ch 22 synthesis)	Domain-neutral 5-mechanism set (folder/file/heading/tag/wikilink); precedent for closed primitive grammar

What to investigate

Each section below is a focused question the deliverable must answer concretely.

1. Cross-reference the 7 primitives against established standards

Build a primitive × standard matrix. For each of the 7, document:

SPARQL (W3C standard for RDF query): which SPARQL constructs map onto each primitive? Are there SPARQL constructs we’re missing?
Datalog (logic-programming paradigm; Cozo, Nemo, Datomic): how does the primitive express in rule-body form? Stratified negation, recursion, aggregation as rule heads.
OLAP cube operators (slice, dice, drill-down, roll-up, pivot): which are subsumed by our primitives, which are missing?
SKOS (W3C, taxonomy): broader/narrower/related/exactMatch — are these traversal, closure, or something else?
OWL transitive properties + reasoning: what gets us beyond traversal/closure into actual reasoning, and is reasoning in scope?
SSSOM mapping vocabulary: does its predicate set imply any primitives we’re missing (narrower_match aggregation, justification-weighted matches)?
Cypher / Gremlin / GQL: graph-database query languages; what primitives are central there?
GraphQL schemas: how does GraphQL’s “select fields, follow connections” map?

Output: a table where each row is a primitive and each column is a standard. Mark “covered”, “approximate”, “missing entirely”. Find the gaps.

2. Find missing primitives

Specific candidate primitives to evaluate:

diff — “what changed between v1 and v2 of this ontology?” (relevant for ontology evolution; Ch 03 territory)
rank — relevance scoring, weighted similarity (relevant for embedding-based queries; sqlite-vec territory)
constraint-satisfy — OWL DL reasoning; “what concepts satisfy this class definition?” (probably out of scope, but argue)
temporal primitives — snapshot, change-feed, point-in-time (relevant for v0.1.8 audit trail)
set operations — union, intersection, symmetric difference (compositional or primitive?)
projection-with-rename — does naming aliases at projection time deserve to be a separate primitive?
window functions — running totals, partition aggregates (SQL OVER, OLAP roll-up)
constraint propagation — given partial information about a concept, what’s derivable?

For each: argue YES or NO with a concrete decision rule. If YES, reframe as #8/#9/#10. If NO, document the rejection so future agents don’t re-litigate.

3. Find redundant or wrong-abstraction primitives in the candidate set

Specific challenges to the candidate 7:

project: is project really a primitive, or is it a Layer B concern (view shape config)?
aggregate: similar question — are aggregates Layer A operations, or Layer B view-config?
traversal vs closure: should they be one parameterized primitive (traverse(predicate, depth=1|*))?
pivot: is pivot really an operation, or a composition of (group-by + group-by + aggregate)? If compositional, does it belong as a Layer A primitive?
anti-join: is anti-join a primitive or a compositional pattern?

Output: an argued list of which primitives stay (or get merged/split), with rationale. The goal is the smallest complete set, not the largest possible set.

4. Boundary check: Layer A (primitives) vs Layer B (view shapes)

The synthesis distinguishes Layer A (mechanism-neutral query operations) from Layer B (mechanism-neutral visual presentations). Is the boundary right?

Is pivot Layer A (a query operation that produces 2D-shaped result) or Layer B (a visual presentation of any tabular result)?
Is sort Layer A or Layer B? (Both arguably; depends on whether sort affects semantics or just display)
Is limit / pagination Layer A or Layer B?
Where does search relevance ranking belong?

Argue the boundary. Provide examples where a wrongly-placed primitive at the wrong layer makes the architecture awkward.

5. Re-audit the candidate primitives against actual Crosswalker queries

Take the 20-question query-routing matrix from Ch 28a §4 and decompose each query into Layer A primitives. Find:

Queries that need primitives the candidate set lacks
Queries that decompose into combinations not naturally expressible
Primitives that are NEVER used in any of the 20 queries (vestigial?)

6. Worked composition examples

Demonstrate that the primitive set is actually compositional. For each of these representative ontology-web queries, write the explicit primitive composition:

“Coverage gaps: NIST 800-53 controls without evidence” (compliance, GRC)
“Crosswalk chain: NIST CSF → 800-53 → ISO 27001 (transitive)” (compliance, GRC)
“MITRE ATT&CK techniques mitigated by NIST AC-family controls” (cross-framework)
“SKOS broader/narrower closure from a top-level subject heading” (taxonomy)
“OBO Foundry: gene-ontology terms with no MONDO disease mapping” (biomedical)
“OLIR: which submitted crosswalks have inconsistent confidence values?” (quality assurance)
“Ontology version diff: terms removed between OBO 2024-Q1 and 2024-Q2” (ontology evolution; needs diff primitive?)

Output: each query as a primitive composition. Where a composition is awkward, that’s evidence for an additional primitive.

Anti-patterns to reject upfront

The deliverable must NOT recommend:

A “minimum complete set” that’s actually incomplete — if the cross-reference shows we’re missing a primitive that 80% of users will need, add it
Adding primitives speculatively — every addition must be justified by a real query that can’t compose otherwise
Implementing the full SPARQL feature set — Crosswalker is a Markdown-vault tool, not a triple store. Borrow semantics; don’t reimplement reasoning
OWL DL reasoning — explicitly out of scope
Reinventing existing standards — if SSSOM/SKOS/STRM already say it, point at them; don’t invent parallel vocabulary
Folding Layer A primitives into Layer B view shapes to “simplify” — that’s how query languages become unmaintainable
Making the primitive set engine-specific — primitives must be mechanism-neutral; if a “primitive” only works in SQL, it’s not a primitive

Success criteria for the deliverable

The deliverable must produce:

Primitive × standard matrix — 7 candidate primitives × 7+ standards (SPARQL/Datalog/OLAP/SKOS/OWL/SSSOM/Cypher/GraphQL); each cell marked covered/approximate/missing
Missing-primitive evaluation — each of (diff, rank, constraint-satisfy, temporal, set ops, project-with-rename, window, constraint-propagation) argued YES/NO with rationale
Redundant-primitive evaluation — argued challenge to project/aggregate/pivot/anti-join as Layer A primitives; final verdict on whether they stay
Final primitive set — confirmed list (could be 5, 7, 9, or other number); each item with its plain-language framing + cross-domain anchor
Layer A/B boundary — argued ruling on pivot/sort/limit/rank placement
Composition examples — 7+ representative ontology-web queries decomposed into primitive compositions
Recommended changes to synthesis log — exact wording for Settled-item #14 + any changes to query-primitives concept page

Anchored references

Project context:

concepts/query-primitives — candidate set + cross-domain precedent table
concepts/view-shapes — Layer B, for boundary check
concepts/ontology-web-querying — positioning
In-progress synthesis log — Settled #14 references this challenge

Standards:

SPARQL 1.1 spec — property paths, MINUS, FILTER NOT EXISTS, aggregates
SKOS reference
SSSOM specification
OWL 2 transitive properties
Datalog (Wikipedia)
OLAP cube operators — slice / dice / drill / roll-up / pivot

Engines (for primitive-vocabulary comparison):

Cozo — Datalog + relational
Nemo — Datalog WASM (Crosswalker reference)
Oxigraph — SPARQL in Rust/WASM
Stardog — Knowledge graph + reasoning
DuckDB DuckPGQ — graph extension
Apache DataFusion — query engine

Adjacent Crosswalker challenges:

Ch 12 archived (Datalog vs SQL — narrow scope)
Ch 36 — Query language rerun (sister challenge; broader query-language question)
Ch 33 — Multi-modal landscape audit (sister; how engines expose primitives)

Hand-off

Write the deliverable to docs/src/content/docs/agent-context/zz-research/YYYY-MM-DD-challenge-29-deliverable-a-<slug>.md (plain .md, frontmatter only for sidebar). Convention per zz-research/index.md — verbatim preservation; multi-deliverable runs split per agent.

After deliverable lands: update concepts/query-primitives.mdx with confirmed/revised set; update synthesis log Settled-item #14; flip §9 status table from ⏳ to ✅; archive this brief.