Datalog
Datalog is a declarative query/rule language: you write rules about what’s true, the engine figures out everything that follows. It’s been around since the 1970s (originated as a sublanguage of Prolog) and is the theoretical underpinning of how relational databases evaluate joins.
In Crosswalker it’s used for one specific job — deriving new SSSOM mappings from existing ones. If NIST CSF Identify maps to ISO 27002 Risk Assessment and ISO 27002 Risk Assessment maps to MITRE ATT&CK Reconnaissance, Crosswalker can derive NIST CSF Identify → MITRE ATT&CK Reconnaissance automatically with a confidence score that’s the minimum of the two source confidences.
You write that rule once:
The Datalog engine handles the rest — including chains of any length without you having to write a loop.
Why Crosswalker uses it
Section titled “Why Crosswalker uses it”The alternative is plain SQL with WITH RECURSIVE (a “recursive CTE”). Both can express transitive closure. Datalog wins on three things that matter to Crosswalker:
- Stratified negation — Datalog can say “find mappings A→C that are NOT derivable through any intermediate B” naturally. SQL recursive CTEs can fake this with
NOT EXISTS/LEFT JOINbut the queries get ugly fast. - Aggregation inside recursion — Datalog rules can compute
min(confidence_a, confidence_b)as part of the rule. Recursive CTEs handle this but with awkward window functions. - Magic-set rewriting — modern Datalog engines (Nemo, CozoDB) automatically optimize chain queries by working backward from what you asked for. Recursive CTEs don’t do this; they materialize the full transitive frontier.
For Crosswalker at scale (10⁶+ mappings with branching factor over 5), the difference is real. At small scale (≤10⁵ mappings) recursive CTE is fine — which is why Tier 2-Lite drops Datalog and uses recursive CTE instead.
Engine choice: Nemo
Section titled “Engine choice: Nemo”Per the Ch 12 Datalog vs SQL deliverable, Crosswalker uses Nemo — a Rust Datalog engine from TU Dresden with native WASM support, stratified negation, existential rules, and W3C-tested OBDA semantics. It’s the Datalog engine in the Tier 2 layered stack.
Sister Datalog engines on the watchlist:
- CozoScript (CozoDB) — superset of Datalog with built-in graph algorithms (PageRank, Dijkstra, Yen-K) and HNSW vector queries integrated as first-class joins. Rejected for adoption due to maintenance signal weakening — see Ch 14 §2.3.
- WOQL (TerminusDB) — Datalog with
path(),dot(),slice()operators. Relevant only if TerminusDB is adopted. - Datalevin (Clojure/JVM) — best-in-class query optimizer; not WASM, so out of scope for Tier 2.
- Minigraf (bi-temporal Datalog) — pre-1.0, single-maintainer; on the Ch 14 trigger B watchlist for SSSOM bi-temporal history queries.
How it relates to other Crosswalker query layers
Section titled “How it relates to other Crosswalker query layers”In words: Datalog fills a gap that SQL and SPARQL each do imperfectly. SQL is great at joins but awkward at recursion. SPARQL is great at graph patterns but limited at rule-level aggregation. Datalog handles both.
When Datalog is not the right tool
Section titled “When Datalog is not the right tool”- Tier 2-Lite: bundle size matters more than expressivity. Use recursive CTE.
- Local user-facing queries: SPARQL is easier to write by hand and Oxigraph is fast enough. Use SPARQL.
- Analytical rollups (coverage matrices, predicate distributions): use DuckDB SQL.
- Federation across multiple endpoints: Comunica + SPARQL
SERVICEis the right abstraction.
Datalog is reserved for derivation rules — the place where its expressivity premium pays off.
Further reading
Section titled “Further reading”- Ch 12 deliverable A: Datalog vs SQL for SSSOM chain rules — the full evaluation that picked Nemo
- Ch 18: Tier 2-Lite SSSOM rule subset and scale ceiling — defines which rules survive without Datalog
- Third-wave architectural shifts log §2 — where Nemo was confirmed in the Tier 2 stack
- Nemo (TU Dresden) — the Rust Datalog engine Crosswalker uses
- Datalog Wikipedia — academic background