Research challenges
What this is
Section titled “What this is”These are adversarial/exploratory research briefs — focused assignments that can be handed to a fresh agent with no prior context bias. Each challenge:
- Targets a specific architectural assumption, decision, or design
- Asks the agent to critically assess, stress-test, or find alternatives
- Is grounded in first principles and the project’s philosophical pillars
- Thinks long-term and large-scale — not just “does this work today” but “does this hold at 100K notes, 50 frameworks, 10 years from now”
How to use
Section titled “How to use”- Pick a challenge from the list below
- Hand it to a fresh agent (one with no prior conversation context about this project)
- Point the agent at the KB (
docs/src/content/docs/) for context - Let it research, critique, and report
- Log findings in
zz-log/if they surface decisions or insights worth preserving
Why fresh agents
Section titled “Why fresh agents”Every agent that works on a project accumulates context bias — it starts agreeing with past decisions because it helped make them. A fresh agent given only the KB and a challenge brief can find blind spots that embedded agents can’t see.
Active challenges
Section titled “Active challenges”- Challenge 01: Are the ontology diff primitives truly atomic?
- Challenge 02: Architecture stress test at scale
- Challenge 03: What’s the actual competitive landscape? — partly addressed by Ch 19 §1; archive-eligible if user signals
- Challenge 04: First principles audit — partly addressed by 04-10 Foundation synthesis + Ch 20 dialog; archive-eligible if user signals
- Challenge 21: Should Crosswalker build its own ETL engine, adopt an existing one, or compose them? — strictly upstream of Ch 20; build-vs-buy meta-question with critical/long-term thinking; opportunity-cost + governance + sustainability lens; protocol-surface (Path D) opportunity surfaced
- Challenge 27: Bases query layer architecture — v0.1.6 pre-requisite. Bases can’t do joins / recursion / anti-joins / multi-file aggregation. Junction notes reify edges as Bases-queryable files, but cross-junction aggregation still hits Bases’ wall. Tier 2 SQL helpers (just shipped v0.1.5 P3) cover the gaps. Design: query routing, junction-note query patterns, Bases+SQL composition, recipe-driven query emission, junction-subject-string resolution. Status: 3 deliverables landed 2026-05-07 — see 27a, 27b, 27c; follow-on stress test filed as Ch 28
- Challenge 28: Bases query layer follow-on stress test — v0.1.6 synthesis pre-requisite. Adversarial follow-on after Ch 27’s 3 deliverables converged on Hybrid + Pattern B+D but left specific lifecycle/UX questions open. Stress-tests: materialization-file lifecycle (vault pollution, sync churn, freshness UX, audit bloat); registerBasesView vs codeblock processor integration mechanism decision; mobile/Publish parity; recipe lifecycle and ownership; audit-trail implications of materialized fact tables; the contrarian “no materialization” third path. Status: 3 deliverables landed 2026-05-07 — see 28a, 28b, 28c; pre-decision synthesis log open at
zz-log/2026-05-07-bases-query-layer-architecture-synthesis— finalization PAUSED pending Ch 29-37
Filed 2026-05-08 — ontology-web alignment + landscape audit (synthesis-log finalization blockers)
Section titled “Filed 2026-05-08 — ontology-web alignment + landscape audit (synthesis-log finalization blockers)”The 2026-05-08 alignment review surfaced that the synthesis log had drifted GRC-coded and that several v0.1.7+ commitments rest on assumptions never validated under the ontology-web framing. Nine challenges were filed to close the gap. Briefs are filed; fresh-agent runs are deferred to user’s schedule.
NEW challenges (questions never asked before):
- Challenge 29: Ontology-web query verbs — adversarial validation — “Are these the right LEGO bricks for asking questions about ontology webs?” Stress-tests the 7-primitive set (filter / project / traversal / closure / anti-join / pivot / aggregate) against SPARQL/Datalog/OLAP/SKOS/SSSOM/OLIR prior art. Direct input to settled-item #14 in synthesis log +
concepts/query-primitives.mdxlock. - Challenge 30: View shape taxonomy — “Beyond pivot, what shapes does the engine need? (graph, hierarchy, timeline, sankey, treemap…)” Direct input to
concepts/view-shapes.mdxlock + v0.2+ roadmap. - Challenge 31: Recipe
query:block schema design — “How does a recipe declare its query in YAML?” Compares to SPARQL CONSTRUCT / GraphQL / dbt / LookML / Cube.dev. Deliverable: working JSON Schema + 3-5 reference recipes. Direct input to D8. - Challenge 32: Intuitive query UX — “How does a user go from intent to a working .base file?” Wizard / recipe-picker / inline-editor analysis. v0.2+ UX work.
- Challenge 33: Multi-modal query engine landscape audit — Survey DuckDB+DuckPGQ, Polars, Cozo, Oxigraph-WASM, Stardog, RDF4j, Materialize, Datomic, ClickHouse, LanceDB. Re-audit prior engine commitments under ontology-web framing. v0.1.7+ substrate decisions; updates Ch 24 migration triggers.
- Challenge 34: Streaming / chunked query execution — “How does merging scale chunk-by-chunk to not blow memory?” Genuine gap — prior streaming refactor (v0.1.4.5) handled IMPORT only; query side is not designed.
RERUN challenges (revisit prior commitments under ontology-web framing — original briefs preserved):
- Challenge 35: Graph→tabular bridging (RERUN of Ch 10) — Original answered for SSSOM/GRC at small-medium scale; rerun asks for arbitrary ontology webs (BioPortal, OBO Foundry, OLS, UMLS scale). The user’s framing “we’re pulling tabular views from graph/web-connected networks” was the central question; original only nipped at it.
- Challenge 36: Query language under ontology-web framing (RERUN of Ch 12) — Original asked Datalog vs SQL narrowly for SSSOM chain-rules; rerun asks the broader question — what language does the recipe author / user actually write? Bases DSL / SQL / SPARQL / Datalog / Cypher / GraphQL.
- Challenge 37: Tier 2-Lite scale model (RERUN of Ch 18) — Original ceiling was ~100K mappings sized for GRC vault; rerun asks the same scale question for ontology-web vaults (BioPortal: 700+ ontologies; UMLS: 3.5M concepts; OBO Foundry: hundreds × thousands). Substrate may not survive.
Rerun convention: original briefs (Ch 10/12/18) stay archived as historical record. Each rerun is a NEW brief with a fresh Ch number that explicitly anchors to the original as predecessor and frames “what’s different now”. Original deliverables preserved verbatim; reruns produce NEW deliverables that may revise prior verdicts.
Filed 2026-06-01 — unified model: spine, backbone, audit role (pre-ingestion gate)
Section titled “Filed 2026-06-01 — unified model: spine, backbone, audit role (pre-ingestion gate)”The full security & GRC framework corpus is assembled; the next phase is ingesting it into Crosswalker. Before that, the unified-model assumptions (inherited, not tested) get stress-tested — most pointedly “is CRI the spine?”. These three run roughly in order (39 → 40 → 41); 40 and 41 partly depend on 39’s outcome.
- Challenge 39: Is there a unified-model spine — and is it CRI? — Foundational, gates ingestion. Tests the single-spine + CRI-as-spine assumption against SCF/STRM, NIST CSF, 800-53, ISO, synthetic, and no-designated-spine alternatives. May demote CRI to a sector profile.
- Challenge 40: Control, risk, or obligation as the backbone entity? — What the model’s nodes are. Stress-tests the implicit control-centric lean against risk- and obligation-centric models; shapes the Tier 1 schema.
- Challenge 41: Is internal audit a spine element or a lens? (and what to name the model) — Decides audit’s structural status and lets the model name (“Unified Risk and Audit Model” vs “Assurance” vs “GRC” …) follow from structure.
Archived (resolved)
Section titled “Archived (resolved)”Challenges that have been resolved by a research deliverable and a corresponding log entry. The brief itself is preserved (re-runnable) — see each archived file for the resolution callout pointing to the resolving log.
- Challenge 05: The transformation problem — resolved 2026-05-03 by Ch 20 import-primitive synthesis (data has fundamental forms; ETL is tree-to-tree mapping; primitives are tree transducers)
- Challenge 06: Pairwise vs synthetic spine — resolved 2026-05-01
- Challenge 07: Link metadata / edge model — resolved 2026-04-10
- Challenge 08: Git history audit-trail tenability — resolved 2026-05-02; verdict: augment-not-replace (TSA + WORM + cert-export at Tier 1); v0.1 default = T1 git+signed commits
- Challenge 09: UUID / CWUUID cross-cutting strategy — resolved 2026-05-02; UUIDv7 + sha256 CIDs + CURIEs + ORCIDs adopted; CWUUID display-only
- Challenge 10: Graph→tabular bridging engine — resolved 2026-05-02 via cascade (Ch 11 + Ch 14 + Ch 16 → v0.1 stack pivot)
- Challenge 11: Tier 2/3 engine deep survey — resolved 2026-05-02 by 3 independent fresh-agent deliverables + Ch 14 follow-on; layered Tier 2 + Fuseki/oxigraph-server Tier 3
- Challenge 12: Datalog vs SQL for SSSOM chain-rule derivation — resolved 2026-05-02 by 2 convergent deliverables; Datalog (Nemo) primary + OxO2 architecture
- Challenge 13: Modern attestation primitives (Sigstore, in-toto, SLSA, OpenTimestamps, VCs) — resolved 2026-05-02; confirms Ch 08 + adds in-toto; absorbed into v0.1 audit decision
- Challenge 14: Missed engines evaluation — resolved 2026-05-02 (third research wave); keep Ch 11 layered Tier 2 stack + add Tier 2-Lite + Comunica federation
- Challenge 15: Audit-trail alternatives without external git tooling — resolved 2026-05-02 (third research wave); 4-tier model with OpenTimestamps T2 default
- Challenge 16: Tier 3 stack reconsideration — resolved 2026-05-02 (third research wave); demote AGE, promote Fuseki + oxigraph-server
- Challenge 18: Tier 2-Lite SSSOM rule subset and scale ceiling — resolved 2026-05-02; rule expressivity matrix + ~100K mapping ceiling; Tier 2-Lite promoted to default-bundled v0.1 sidecar
- Challenge 19: Over-engineering stress test — resolved 2026-05-02; “radically simplify with narrow tiered escape hatch” verdict drove the v0.1 stack pivot
- Challenge 20: Import primitive formal foundation — resolved 2026-05-03; three fresh-agent deliverables + dialog; convergent transformation-algebra layer (RML retargeted, 5–6 primitives, MTT-justified) plus complementary boundary-semantics layer (ref/resolve/bind/seal); synthesis is a wargaming setup, not a decision log
- Challenge 23: Bundle engine implementation language — resolved 2026-05-04 by adversarial fresh-agent deliverable; Path A (Pure TS in-plugin) for v0.1; Path C (Hybrid: optional external producer) reserved for v0.5+; mobile-Obsidian portability + small-OSS contributor pool are the two irreversible constraints; 9 concrete v0.1 commitments adopted (8 of 9; Bun-stays disagreement explicitly recorded)
- Challenge 24: Turso / libSQL evaluation — resolved 2026-05-04 by adversarial fresh-agent deliverable; REJECT all three Qs (libSQL-WASM Tier 2 migration, Turso Cloud Tier 3 listing, Limbo near-term adoption); stay on
@sqlite.org/sqlite-wasm+sqlite-vec; vendor-trajectory signal — Turso publicly de-prioritized libSQL; vector-layer-decoupled-from-substrate elevated as load-bearing modularity commitment - Challenge 22: Target-structure expressivity — resolved 2026-05-04 by adversarial fresh-agent deliverable; closed grammar of 5 mechanisms × ordered layout × also_emit × graph_edges;
render(Recipe, ConceptIdentity) → Addressas single coupling point modeled on RML/R2RML; content addressing computed BEFORE render (target structure is a view); managed/user_preserve frontmatter split; v0.1 ships full schema with folder+file+heading wired; tag+wikilink layout-levels deferred to v0.2 - Challenge 38: Query state location + folder-note pattern — resolved 2026-05-18 by two convergent fresh-agent deliverables (Ch 38a same-name + Ch 38b query-pack) and user pick of Layout B+ (
_crosswalker/queries/<slug>/index.md+ explicit![[<slug>/view.base]]embed); both rejected the literal folder-noteindex.mdmagic-embed (would require LostPaul Folder Notes plugin, violating Commitment #3); ~20-case edge-case policy table locked; Phase 4.6 migration sub-phase opened ahead of Phase 5 — see synthesis log