Ch 29: Adversarial validation of the 7-primitive Layer-A query set
Adversarial Validation of the Crosswalker 7-Primitive Layer-A Query Set
Section titled “Adversarial Validation of the Crosswalker 7-Primitive Layer-A Query Set”TL;DR
- The 7-primitive set is roughly correct in intent but wrong in shape: keep
filter,traversal(parameterized to subsumeclosure),aggregate, andanti-join; demotepivotto Layer B, drop standaloneclosure(it’straversalwith a*quantifier), and add three primitives that real Crosswalker queries cannot decompose without:union/difference(set-ops),bind(computed columns / rename / projection-with-rename), anddiff(ontology-version delta).projectsurvives but only as a thin output-shaping primitive distinct frombind. - The decision-ready final set is 8 primitives: filter, traverse, bind, aggregate, anti-join, set-op, diff, project — anchored to relational algebra (Codd’s six minus cross-product), SPARQL 1.1 (BGP/FILTER/MINUS/property-paths/aggregates/BIND), Datalog (recursion + stratified negation), and SSSOM (mapping predicates as edge labels for
traverse). - Settled-item #14 should be revised: replace “7 primitives including pivot, closure” with the 8-primitive set above; add a Layer A/B ruling table (pivot/sort/limit/rank are all Layer B); and add an explicit non-goal statement excluding OWL-DL reasoning, full SPARQL federation, and constraint-satisfaction.
1. Primitive × Standard Cross-Reference Matrix
Section titled “1. Primitive × Standard Cross-Reference Matrix”Cells are marked C (covered, native first-class operator), A (approximate / expressible but not primitive), M (missing entirely / only via composition or extension). Rationale follows.
| Primitive ↓ / Standard → | SPARQL 1.1 | Datalog (Cozo/Nemo/Soufflé) | OLAP cube ops | SKOS | OWL 2 (out-of-scope ref) | SSSOM | Cypher / Gremlin / GQL | GraphQL | Codd Rel-Alg |
|---|---|---|---|---|---|---|---|---|---|
| filter | C (FILTER, BGP constraints) | C (rule body literals, comparison built-ins) | A (slice = filter on 1 dim; dice = filter on N) | A (no FILTER, but filtered access by predicate) | A (class-restriction is a filter under reasoning) | C (predicate_id, confidence thresholds drive filtering) | C (WHERE in MATCH/RETURN, has() in Gremlin) | A (resolver args ≠ true relational filter) | C (σ selection) |
| project | C (SELECT var-list) | A (rule heads project; no rename) | A (cube → table is implicit projection) | M (SKOS has no result-shape) | M (out of scope) | A (mapping table columns) | C (RETURN var, AS in Cypher; select() in Gremlin) | C (field selection IS the language) | C (π projection) |
| traversal (1-hop) | C (BGP triple pattern) | C (literal in rule body) | M (cubes have no edges; drill-across is closest) | C (broader/narrower/related/Match family ARE traversal edges) | C (object property assertion) | C (predicate_id IS the traversal edge type) | C (MATCH (a)-[:R]->(b); out(‘R’) in Gremlin) | C (nested resolver = traversal step) | A (θ-join over edge relation) |
| closure (transitive *) | C (property paths r*, r+) | C (recursive rules — central feature) | M (no transitive op on dimensions) | A (skos:broaderTransitive declared but not asserted; closure is application duty) | A (TransitiveObjectProperty axiom — but full reasoning out of scope) | A (declared per-predicate via mapping_set_id chains; not a SSSOM op) | C (variable-length [*1..n]; QPP) | M (not in spec; needs custom resolver) | M (not first-order; α-extension needed) |
| anti-join | C (MINUS, FILTER NOT EXISTS) | C (stratified negation — not p(x)) | M (no native; expressible via set-ops) | M (no native) | A (ComplementOf — but reasoning) | M (predicate_modifier “not” is metadata, not a query op) | A (WHERE NOT EXISTS; Cypher MATCH … WHERE NOT (a)-[]->(b)) | M (no native) | A (derivable from − and ⨝; not primitive) |
| pivot | M (no native; subquery + GROUP BY composition) | M (no native; recursive aggregation patterns) | C (canonical cube op — rotate axes) | M (no native) | M (out of scope) | M (presentation concern) | M (no native; GQL has no PIVOT) | M (presentation concern) | M (not in Codd; SQL extension only) |
| aggregate | C (COUNT/SUM/AVG/MIN/MAX/GROUP BY/HAVING) | C (rule heads with aggregate functions; many engines) | C (roll-up = aggregate over hierarchy) | M (no aggregation in vocabulary) | M (out of scope) | A (mapping density / confidence stats are aggregates over rows) | C (count/sum/group() in Gremlin; aggregating functions in Cypher/GQL) | A (computed via resolver, not declarative) | A (not in original 8; standard extension γ) |
Rationale highlights.
- SPARQL is the most complete reference standard: every primitive except
pivothas a first-class operator (BGP/path/FILTER/MINUS/COUNT/GROUP BY). SPARQL multiset semantics map exactly to a multiset relational algebra of {π, σ, ⨝, ∪, \} per Angles & Gutierrez (2016) — i.e., 5 primitives plus property paths addclosure. That’s 6, plus aggregate = 7. SPARQL itself is the strongest external evidence that the right number is in the 6-8 range. - Datalog covers everything except pivot, but folds
closureandtraversalinto “rule body + recursion.” Stratified negation givesanti-join; rule heads with aggregate functions (Soufflé, Cozo, Nemo) giveaggregate. Datalog is the strongest argument thatclosureis not a separate primitive — recursion just happens to be allowed intraverse. - OLAP is the only standard with native
pivot. Notably, OLAP also hasslice/dice(= filter),roll-up(= aggregate),drill-down(= traverse downward in a hierarchy = closure onnarrower). OLAP’s pivot is rotating axes for display — a Layer B concern when transplanted to an ontology web. - SKOS contributes the vocabulary of edges (
broader/narrower/broadMatch/exactMatch/etc.) consumed bytraverseandclosure. The W3C SKOS reference is explicit thatskos:broaderis deliberately not transitive; onlyskos:broaderTransitiveis, and “by convention is not used to make assertions” — meaning closure is a query-time operation, not a stored fact. This validatestraverse(predicate, depth=*)as a Layer A primitive. - OWL 2 is explicitly out of scope per the task’s anti-patterns. Property chain axioms and class subsumption go beyond
closureinto reasoning. Crosswalker should not implement them; transitive-property entailment must be approximated by user-invokedtraverse(*). - SSSOM is a data model, not a query language. It contributes
predicate_id,confidence,mapping_justification, andmapping_toolas filterable columns and as edge labels — but it has no notion of pivot, aggregation, or anti-join. SSSOM defines the shape of the graph that the primitives operate on; STRM (NIST IR 8477) is the analogous shape for GRC crosswalks and uses set-theoretic relationships (subset-of, intersects-with, equal, superset-of, no-relationship) — these are predicate values fortraverse, not new primitives. - Cypher / Gremlin / GQL: all three center on traversal patterns + filter + return-projection. None has native pivot; GQL (ISO/IEC 39075:2024) deliberately matches MATCH/FILTER/LET/ORDER BY/LIMIT/RETURN as core. Gremlin’s ~30 steps reduce to map / flatMap / filter / sideEffect / branch — which is essentially
traverse + filter + bind + aggregate. - GraphQL is a resolver protocol, not a query algebra. Field selection ≈
project; nested resolvers ≈traverse; field arguments ≈ weakfilter. It cannot do anti-join, closure, pivot, or aggregate declaratively. It is best treated as a Layer C output protocol, not a primitive source. - Codd’s relational algebra: the canonical primitive set is {σ, π, ρ, ×, ∪, −} (5 + rename); ⨝, ∩, ÷ are derived. Notably anti-join is derivable from − and ⨝, but real engines treat it as primitive for performance; SPARQL elevates it to MINUS / FILTER NOT EXISTS for the same reason. This is the precedent for Crosswalker keeping
anti-joinas Layer A.
2. Missing-Primitive Evaluation
Section titled “2. Missing-Primitive Evaluation”Decision rule used throughout: a primitive enters Layer A iff (a) at least one real Crosswalker query needs it, (b) it cannot be expressed as a short, natural composition of the other primitives, and (c) at least two of {SPARQL, Datalog, OLAP, Codd, GQL, SSSOM} treat it as first-class.
| Candidate | Verdict | Rationale |
|---|---|---|
| diff (ontology v1 vs v2) | YES — add | Real query #7 (“terms removed between OBO 2024-Q1 and 2024-Q2”) cannot be expressed as anti-join alone because the operands live in different ontology versions, not different relations within one graph. Naïvely composing as traverse(v1) anti-join traverse(v2) requires snapshot semantics that the rest of the algebra does not provide. The OWL-Manchester ecco, CODEX, and DynDiff tools all treat ontology-diff as a first-class operation, generating typed change records (added/removed/strengthened/weakened class, axiom, mapping). For Crosswalker’s audit-trail use case (v0.1.8) this is a primitive. Anchor: Unix diff, git diff, OWL-ecco. |
| rank (relevance / similarity scoring) | NO — reject for Layer A | Vector / embedding ranking is a value-producing operation (each row gets a score), which is the same shape as bind (computed column). Once you have scores you can filter score > τ and sort/limit (Layer B). Cozo’s HNSW vector search is reachable from inside Datalog as just-another-relation. Treating ranking as bind keeps the algebra mechanism-neutral; treating it as a primitive bakes embeddings into Layer A and conflicts with the “engine-neutral” requirement. |
| constraint-satisfy (OWL-DL “concepts satisfying class def”) | NO — reject (out of scope) | Explicit anti-pattern in the task brief. Crosswalker is a markdown vault with crosswalk tables, not a reasoner. If a user wants this, they invoke an external reasoner that produces additional skos:exactMatch triples; those triples then become input data for traverse. |
| temporal primitives (snapshot / point-in-time / change-feed) | PARTIAL — fold into diff | ”Snapshot at time t” is a parameter to every primitive (read-vault-as-of), not a separate operator. “Change feed” is the stream of diff(t, t+1) outputs. Audit-trail does not require new primitives; it requires (a) diff as a primitive and (b) versioned input addressing as a query parameter. Reject as standalone. |
| set operations (∪, ∩, ⊖) | YES — add as one primitive set-op | Codd has them; SPARQL has UNION; SSSOM mapping merging requires them; the framework-overlap query (“controls in both NIST and CIS”) and quality-assurance comparisons need union/intersection of result-sets. They are not compositional — you cannot get ∪ from {filter, traverse, aggregate, anti-join} because anti-join is one-sided. Combine ∪/∩/⊖ into a single parameterized set-op(left, right, mode) to keep the count down; symmetric difference is (A ∪ B) − (A ∩ B), so it can stay derived. |
| projection-with-rename | NO — fold into bind | A projection with rename is bind(new_name = old_name); project(new_name). SPARQL spells this (?old AS ?new) inside SELECT, which is exactly the algebra Extend(P, ?new, ?old). No reason for a separate primitive. |
| window functions | NO — reject for Layer A; expose at Layer B | ”Running totals”, “rank within partition” are presentation/analytics concerns over a result set. SQL added them as syntactic sugar; SPARQL still does not have them. Adding them invites scope creep toward analytics. The single legitimate use case in Crosswalker (mapping density per framework) is already an aggregate(group=framework). |
| constraint propagation | NO — reject (out of scope) | This is reasoning, not querying. Same fate as constraint-satisfy. |
| OPTIONAL / left-join | NO — fold into traverse(optional=true) | SPARQL’s OPTIONAL is left-outer-join. In an ontology-web context, “give me each control plus its evidence if any” decomposes to traverse(:hasEvidence, optional=true). Making OPTIONAL a separate primitive doubles the surface area of traverse. Parameterize instead. |
| CONSTRUCT / graph-output | NO — Layer B (or Layer C: serialization) | Producing an RDF graph or a markdown rollup from a result set is an output-shape concern. Same role as OLAP pivot for tabular display. The query algebra produces relations; what the UI/exporter does with them is downstream. Crosswalker explicitly is “not a triple store,” so CONSTRUCT semantics are non-goals. |
| BIND / computed-column | YES — add | Required for: confidence-threshold computations, predicate normalization (mapping oboInOwl:hasDbXref → skos:closeMatch), evidence-age calculations (“controls with evidence older than 1 year”). SPARQL elevates BIND to a primitive (Extend in the algebra); Datalog gets it via head-expressions; SQL has computed columns and AS. Without bind, evidence-freshness queries are inexpressible. |
| subquery / nesting | NO — meta-property of the algebra, not a primitive | The algebra is closed under composition by definition. Subquery support means “any primitive can take a primitive’s output as input” — that is a property of the system, not a Layer A operator. Document it as such, do not add an operator. |
| federation / SERVICE | NO — out of scope | Crosswalker is a single-vault tool. Cross-vault or cross-endpoint federation is a deployment concern. Reject. |
Net add list: diff, set-op, bind. Net remove list: closure (subsumed by parameterized traverse), and pivot (demoted to Layer B).
3. Redundant or Wrong-Abstraction Evaluation
Section titled “3. Redundant or Wrong-Abstraction Evaluation”project — Layer A or Layer B?
Section titled “project — Layer A or Layer B?”Verdict: keep at Layer A, but minimally. The argument for demoting it: Codd’s π and SPARQL’s SELECT are both relation-shaping, and a “view shape” (Layer B) plausibly subsumes them. But three things keep project in Layer A: (1) projection changes cardinality under set semantics (deduplication after column drop), which is observable in subsequent operators, not just in display; (2) every external standard has it; (3) it is the natural counterpart of bind (one adds columns, the other removes/selects). What is not Layer A is “render this column as a chip with this color” — that’s Layer B. Final framing for Layer A: project(cols) := output-relation has exactly these columns.
aggregate — Layer A or Layer B?
Section titled “aggregate — Layer A or Layer B?”Verdict: Layer A. Aggregation produces rows that did not exist as input rows; it is value-producing, not display-shaping. Pushing aggregation into Layer B would force every UI-shape to re-derive group counts, breaking compositionality (you cannot then filter count > 5 after the aggregate). SPARQL’s choice (GROUP BY/HAVING in the language) and Codd-extended algebra’s γ-operator both confirm. Keep.
traversal vs closure — one parameterized primitive?
Section titled “traversal vs closure — one parameterized primitive?”Verdict: one primitive traverse(predicate, depth=k|*). Strong evidence: (a) Cypher and GQL collapse them into a single MATCH with a *1..n quantifier; (b) SPARQL property paths use a single grammar with r, r+, r*, r? as parameters; (c) Datalog treats them identically (a recursive rule is just a non-recursive rule in the limit); (d) the only standards that do separate them are pedagogical (textbook closure-vs-edge). Two primitives where one suffices is exactly the Layer-A bloat the task warned against. Collapse.
pivot — primitive or composition?
Section titled “pivot — primitive or composition?”Verdict: not Layer A. Demote to Layer B (view-shape). Three independent arguments converge:
- Compositional: Pivot decomposes into
aggregate(group=row_dim, group=col_dim, fn=agg) → reshape rows-to-columns. The reshape step is purely display. - No analog in graph standards: Cypher, Gremlin, GQL, SPARQL, SSSOM all lack pivot. Only OLAP and SQL extensions have it, and both treat it as report-shaping.
- Wrong domain: Crosswalker queries a graph of crosswalk mappings, not a multidimensional cube. The few queries that “need pivot” (e.g., NIST-vs-CIS coverage matrix) actually want a Layer B table view over the result of an
aggregate(group=control, group=framework). The pivot is the renderer, not the query.
anti-join — primitive or compositional?
Section titled “anti-join — primitive or compositional?”Verdict: keep as Layer A primitive even though theoretically derivable. Codd’s algebra derives anti-join from − plus ⨝. But:
- SPARQL elevates it (MINUS, FILTER NOT EXISTS) because optimizers handle it specially.
- Datalog cannot encode anti-join at all without stratified negation as a first-class concept.
- Real Crosswalker queries (#1, #5, #7) are dominated by anti-join — coverage gaps, missing mappings, removed terms. Forcing users to write
set-op(left, ⊖, traverse(...))is awkward and the optimizer cannot push the negation as efficiently. - The hidden cost of “deriving” anti-join: NULL-handling semantics differ (see Oracle’s null-aware anti-join patent). Making it explicit pins down semantics.
Keep.
Goal: smallest complete set
Section titled “Goal: smallest complete set”The minimum complete set we are converging on is 8 primitives:
filter, traverse, bind, project, aggregate, anti-join, set-op, diff.
(7-primitive options that drop set-op or diff fail real queries; 9+-primitive options add closure or pivot redundantly.)
4. Layer A / Layer B Boundary Ruling
Section titled “4. Layer A / Layer B Boundary Ruling”Layer A operators change the value or cardinality of the result. Layer B operators change the presentation of an already-determined result and are reversible / referentially transparent w.r.t. value.
| Operator | Ruling | Why | Awkwardness if misplaced |
|---|---|---|---|
| pivot | Layer B | Reorients axes for display; same underlying tuples. Aggregation already happened. | If Layer A: forces every downstream consumer to re-flatten before further filter/traverse. SPARQL/Cypher/GQL deliberately omit pivot — following them is correct. |
| sort / order-by | Layer B | Set semantics make ordering invisible; only matters when paired with limit (and even then, “top-k” is a Layer B view, not a query value). | If Layer A: forces optimizer to preserve order across joins, producing pessimistic plans. |
| limit / pagination | Layer B | Same value-set, different prefix shown. Even SPARQL classifies LIMIT/OFFSET as “solution modifiers,” not query operators. | If Layer A: composing two queries where one has LIMIT becomes semantically chaotic. |
| search relevance ranking | Split: scoring is bind (Layer A); top-k display is Layer B | Scoring assigns a value to each row (legitimate bind); presenting “top 10 most similar” is Layer B (sort+limit). | If you treat ranking as one Layer A primitive: you re-introduce pivot-style category confusion (value-producing AND view-shaping in one op). Splitting keeps the algebra clean. |
Examples of awkwardness when boundary is wrong:
- Pivot in Layer A: a user writes
pivot → traverse. What does it mean to follow a:hasEvidenceedge from a 2D pivot table? Nonsense. Layer B containment prevents this. - Sort in Layer A: composing
sort(by=date) ⨝ filter(framework=NIST)cannot be reordered, because the sort operator sees a different cardinality on each side. Optimizer breakage. - Rank in Layer A as one op: you cannot then
filter rank < 0.7, because you’d need to project the rank value out first — meaning rank-as-primitive secretly is abind+sortalready.
5. Re-Audit Against Real Crosswalker Queries
Section titled “5. Re-Audit Against Real Crosswalker Queries”Each row below decomposes the canonical query into the proposed 8-primitive set. ❗ flags awkwardness; ✅ flags clean fit.
| Query | Decomposition | Verdict |
|---|---|---|
| 1. Coverage gaps: NIST 800-53 controls without evidence | traverse(rdf:type=Control, framework=NIST-800-53) anti-join traverse(:hasEvidence, optional=false) | ✅ Native fit; anti-join is essential. |
| 2. Crosswalk chain NIST CSF → 800-53 → ISO 27001 (transitive) | traverse(:mapsTo | skos:exactMatch, depth=*, start=CSF, frameworks=\{CSF,800-53,ISO27001\}) | ✅ Closure-as-parameterized-traverse fits cleanly. STRM/SSSOM predicate restricts edge type. |
| 3. MITRE ATT&CK techniques mitigated by NIST AC-family controls | filter(family=AC) → traverse(:mitigates, target_framework=ATT&CK) then project(technique_id, control_id) | ✅ Filter-then-traverse is the canonical pattern. |
| 4. SKOS broader/narrower closure from a top-level heading | traverse(skos:narrower, depth=*, start=TopHeading) | ✅ Clean. Note: must NOT use SKOS reasoning entailment; just walk the asserted edges. |
| 5. OBO Foundry GO terms with no MONDO mapping | traverse(rdf:type=GOTerm) anti-join traverse(skos:exactMatch | skos:closeMatch | oboInOwl:hasDbXref, target=MONDO) | ✅ Anti-join with predicate-set parameter. SSSOM lets us specify the predicate set explicitly. |
| 6. OLIR crosswalks with inconsistent confidence values | traverse(:hasMapping) → bind(disagreement = max(conf) - min(conf) per (subject,object)) → filter(disagreement > τ) | ❗ Uses bind essentially. Without bind this is not expressible. Strongest evidence for adding bind. |
| 7. Ontology version diff: terms removed between OBO 2024-Q1 and 2024-Q2 | diff(v1=2024-Q1, v2=2024-Q2, kind=removed_terms) | ❗❗ Without diff primitive, this becomes set-op(traverse(v1) ⊖ traverse(v2)) which requires versioned snapshot semantics outside the algebra. Strongest evidence for adding diff. |
| 8. Framework overlap: controls in both NIST and CIS | set-op(traverse(framework=NIST), ∩, traverse(framework=CIS)) keyed on skos:exactMatch co-membership | ❗ Without set-op, you’d write a contrived double-anti-join. Strongest evidence for adding set-op. |
| 9. Mapping density (avg mappings per control per framework) | traverse(:hasMapping) → aggregate(group=control, group=framework, fn=count) → aggregate(group=framework, fn=avg) | ✅ Two-stage aggregate. Note: a UI may display this as a pivot; the query itself does not pivot. |
| 10. Evidence freshness (controls w/ evidence > 1 year old) | traverse(:hasEvidence) → bind(age = now − evidence.date) → filter(age > 365d) | ❗ Requires bind. |
Vestigial primitives: in this audit, pivot is never used in any query — confirming Layer-B demotion. closure as separate primitive is also never used — traverse(*) covers all cases.
Primitives that real queries demand and that the original 7-set lacks: bind (queries 6, 10), set-op (queries 8 and any union of result sets), diff (query 7).
6. Worked Composition Examples
Section titled “6. Worked Composition Examples”Notation: T(p, depth=k) = traverse predicate p to depth k (* = transitive); F(cond) = filter; B(name=expr) = bind; P(cols) = project; G(group, fn) = aggregate; \\ = anti-join; ∪/∩ via set-op; D(v1, v2, kind) = diff.
Q1. Coverage gaps (NIST 800-53 controls without evidence)
Q2. Crosswalk chain CSF → 800-53 → ISO 27001 (transitive)
Q3. MITRE ATT&CK techniques mitigated by NIST AC-family controls
Q4. SKOS broader/narrower closure from a top-level subject heading
Note: depth as a derived column requires B(depth_from_root = path_length); this is the second small piece of evidence that bind is unavoidable.
Q5. OBO/GO terms with no MONDO disease mapping
Q6. OLIR crosswalks with inconsistent confidence values
Without bind, the disagreement column cannot exist. Without aggregate, neither can n and the min/max. This query alone justifies adding bind as Layer A.
Q7. Ontology version diff: terms removed between OBO 2024-Q1 and 2024-Q2
The composition-without-diff alternative is:
which silently fails because:
(a) it requires the algebra to address two versioned worlds simultaneously — semantically novel;
(b) it cannot distinguish “removed” from “renamed” or “merged into another concept” — distinctions DynDiff/CODEX/ecco treat as primitive change types;
(c) it cannot produce the change characterization (effectual vs. ineffectual, strengthening vs. weakening axiom changes) that audit-trail consumers need.
This query alone justifies adding diff as Layer A.
(Bonus) Q8. Framework overlap (NIST and CIS)
Without set-op, you would have to express ∩ via two anti-joins, which is contrived. This query justifies adding set-op.
7. Final Deliverables
Section titled “7. Final Deliverables”7.1 Final Primitive Set (8)
Section titled “7.1 Final Primitive Set (8)”| # | Primitive | Plain-language framing | Cross-domain anchor |
|---|---|---|---|
| 1 | filter | ”Keep only the rows that satisfy this condition.” | σ (Codd); FILTER (SPARQL); WHERE (SQL/Cypher/GQL); rule-body literal (Datalog); slice/dice (OLAP). |
| 2 | traverse (parameterized: predicate-set, depth=k|*, optional) | “Walk these edges, possibly transitively, possibly leaving rows that have no neighbor.” | BGP + property paths (SPARQL); MATCH + var-length (Cypher/GQL); out()/in() steps (Gremlin); recursive rule (Datalog); broader/narrower/match (SKOS); SSSOM predicate_id; STRM relationship. |
| 3 | bind (computed column / rename) | “Add a new column whose value is computed from existing columns.” | BIND/Extend (SPARQL); rule head expression (Datalog); SELECT … AS / computed columns (SQL); LET (GQL). |
| 4 | project | ”Keep only these columns; drop the rest.” | π (Codd); SELECT var-list (SPARQL); RETURN (Cypher/GQL); field selection (GraphQL). |
| 5 | aggregate (group-by + reduction) | “Group rows by these keys; reduce each group with this function.” | γ (extended algebra); GROUP BY / aggregates (SPARQL/SQL); aggregation rule heads (Datalog); roll-up (OLAP). |
| 6 | anti-join | ”Keep rows from the left that have no match on the right.” | MINUS / FILTER NOT EXISTS (SPARQL); stratified negation (Datalog); WHERE NOT EXISTS (SQL/Cypher); − ⨯ ⨝ pattern (Codd, derived). |
| 7 | set-op (∪, ∩, ⊖) | “Combine two compatible result-sets by union, intersection, or symmetric difference.” | UNION / + difference (SPARQL/Codd); UNION/INTERSECT/EXCEPT (SQL); set ops (Datalog, GQL). |
| 8 | diff (versioned ontology delta) | “Compare two snapshots of the ontology and emit typed changes (added/removed/strengthened/weakened/renamed).” | OWL ecco; CODEX; DynDiffOnto; PROMPTDIFF; git diff. Note: this is uniquely Crosswalker-essential and has no exact analog in pure query languages — that’s the point. |
Removed from candidate list: closure (now traverse(depth=*)), pivot (Layer B).
7.2 Layer A / Layer B Ruling
Section titled “7.2 Layer A / Layer B Ruling”| Concern | Layer | One-sentence justification |
|---|---|---|
| filter, traverse, bind, project, aggregate, anti-join, set-op, diff | A | Value- or cardinality-changing operations on the ontology web. |
| pivot, sort, limit/pagination, top-k presentation | B | Reshaping or windowing of an already-determined result for display. |
| relevance scoring | A as bind; top-k as B | Score is a value (Layer A); selecting top-N for display is Layer B. |
| graph output (CONSTRUCT-style) / table view / pivot table / kanban | B (or Layer C if serialization-specific) | Renderer responsibility. |
| OWL-DL reasoning, constraint satisfaction, federation | out of scope | Explicit non-goals; consume reasoner output as input data instead. |
7.3 Recommended Changes to Synthesis-Log Settled-Item #14
Section titled “7.3 Recommended Changes to Synthesis-Log Settled-Item #14”Before (per task brief, item #14 is the prior 7-primitive Settled item):
Layer A query primitives = {filter, project, traversal, closure, anti-join, pivot, aggregate}.
After (recommended replacement text):
#14 (revised). Layer A query primitives = {filter, traverse, bind, project, aggregate, anti-join, set-op, diff} (8 mechanism-neutral operators).
Notes attached to the item:
traverse(predicate, depth=k|*, optional=true|false)is one parameterized primitive that subsumes both single-hop edge-following and transitive-closure walks (depth=*). The previous separateclosureprimitive is removed; SPARQL property paths, Cypher variable-length patterns, and Datalog recursion all confirm the unification.pivotis Layer B (visual presentation), not Layer A. Likewisesort,limit, top-k display, and CONSTRUCT-style graph output.bind(computed columns / rename) is added because evidence-freshness, confidence-disagreement, and depth-annotation queries are inexpressible without it.set-op(∪ / ∩ / ⊖) is added because framework-overlap and result-set merging cannot be reduced to anti-join.diffis added as a Crosswalker-specific primitive for ontology version comparison (audit-trail v0.1.8 use case). It is parameterized by(v1, v2, kind ∈ \{added, removed, strengthened, weakened, renamed\}).rank,window,OPTIONAL,subquery,federation,constraint-satisfy,temporal-snapshotare rejected as separate primitives. They are either compositional (rank=bind+ sort/limit;OPTIONAL= parameter ontraverse), out-of-scope (federation, OWL-DL), or properties of the algebra rather than operators (subquery / nesting / closure-under-composition).- The set is not engine-specific. Implementations may use Datalog (Cozo, Nemo), SPARQL (Oxigraph), DataFusion + DuckPGQ, or pure JS over markdown frontmatter — the primitives translate to each.
- Anchored standards: SPARQL 1.1 (W3C), SKOS (W3C), SSSOM, NIST OLIR / IR 8477 STRM, Codd relational algebra, OLAP cube ops, GQL ISO/IEC 39075:2024. Out of scope: OWL 2 DL reasoning, full SPARQL federation/SERVICE, full SPARQL CONSTRUCT semantics.
Recommendations (Staged)
Section titled “Recommendations (Staged)”Stage 1 — Adopt the 8-primitive set in the synthesis log this revision.
- Update Settled-item #14 with the text above.
- Add a one-line non-goals statement: “OWL-DL reasoning, constraint propagation, full federation, and CONSTRUCT graph-output are out of scope.”
- Trigger to revisit: if any new representative query is found that none of the 8 primitives can express in ≤4 composed steps.
Stage 2 — Build the test harness around the 10 representative queries in §5.
- Each query should be expressible as a primitive composition tree of depth ≤ 4.
- Threshold to add a new primitive: ≥2 distinct real queries require the same workaround pattern of length ≥ 4 — and that pattern has cross-domain precedent.
Stage 3 — Implement Layer A on a Datalog backend (Cozo or Nemo) first, not SPARQL.
- Datalog gives recursion, stratified negation, and aggregation in a single coherent semantics.
- Cozo additionally gives transactional embedded operation (matches Obsidian’s vault model) and HNSW vector indices reachable from the same query (covers the future
bind(score=cosine(…))case without adding primitives). - Oxigraph remains an option if SSSOM/RDF round-tripping becomes the dominant use case; revisit if >50% of vault content arrives as RDF rather than markdown frontmatter.
Stage 4 — Implement Layer B as a thin renderer over Layer A.
- pivot, sort, limit, top-k, and the graph/table/kanban views consume Layer A results.
- Trigger to reconsider boundary: if a Layer B feature requires re-issuing the Layer A query with different parameters more than once per render, that feature is secretly Layer A and needs to be examined.
Stage 5 — Defer diff until v0.1.8 (audit trail) work begins.
- Until then, scaffolding-only (interface defined, implementation = unimplemented). The cost of leaving it on the primitive list now is zero; the cost of discovering it should have been Layer A after shipping is high.
Caveats
Section titled “Caveats”- “Smallest complete” is empirical, not provable. Codd showed his 5 primitives are minimal for first-order relational queries; Crosswalker’s 8 are minimal-against-the-10-representative-queries-we-listed. New queries may move the boundary. The 5-vs-8 gap reflects that Crosswalker’s data model is a graph (needs
traverse), is multi-versioned (needsdiff), and serves analytics use cases (needsaggregate+bind). diffis the most contestable inclusion. A reasonable alternative architecture treats versioned snapshots as a parameter dimension applied to every primitive (read-as-of-version), makingdiffderivable asset-op(read(v1) ⊖ read(v2)). The argument for keepingdiffLayer A is pragmatic (audit-trail consumers need typed change records, not raw set differences) rather than theoretically forced. If the audit-trail consumer is happy with raw deltas, demotediffand the set shrinks to 7.- SKOS, SSSOM, STRM contribute predicate vocabularies, not primitives. Make sure Layer A is parameterized over a configurable predicate set rather than hardcoded SKOS terms — otherwise users with bespoke OLIR or STRM relationship vocabularies are second-class citizens.
- OWL 2 transitive properties remain a tempting trap.
skos:broaderTransitiveis technically reasoning, not closure-walking; the W3C SKOS spec deliberately blurs this. Crosswalker should commit to the asserted-graph-only interpretation:traverse(skos:broader, depth=*)walks asserted edges and does not invoke an OWL reasoner. Document this loudly. bindopens a small door to expression-language scope creep. Restrict it to: arithmetic, string ops, date arithmetic, and a small whitelist of similarity functions. Resist adding general user-defined functions until a real query forces it.- The proposed set is engine-neutral but not implementation-neutral. A markdown-vault tool will not have native graph indices, so
traverse(depth=*)over a 50k-node ontology web will need careful materialization or pruning strategies. This is an engineering concern, not a primitive-design concern, but it should not be hidden. - External standards used in the matrix evolve. SPARQL 1.2 work (RDF-star, etc.), GQL revisions, and SSSOM minor versions could change the “covered/approximate/missing” cells over time. Re-audit when any of those publish a new edition.