Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Challenge 27: Bases query layer architecture (where complex query logic lives)

Created Updated

Bases is locked as the v0.1 query layer (per the Ch 23 synthesis + the project memory commitment — Dataview removed). And per Ch 07 (evidence-link edge model), junction notes (edge-as-note reification) are the only architecture compatible with Bases’ flat-YAML constraint.

But these two commitments leave a real architectural gap unaddressed:

Bases is tabular only. It does NOT support: joins across files, recursive graph traversal, aggregation across linked notes, edge-metadata-on-links, inline field queries. metadata-ecosystem concept page

The user-facing queries Crosswalker exists to enable include exactly the things Bases can’t do natively:

User questionBases-doable?Notes
”Show all NIST AC controls”✅ YesFlat filter on file.inFolder() or framework frontmatter
”Show all junction notes”✅ YesFlat filter on kind == 'junction-note'
”Show junction notes where coverage = ‘partial’”✅ YesFlat filter on frontmatter
”Show junction notes for AC-2”✅ YesFlat filter on subject field (wikilink-target string match)
“Show controls WITHOUT any junction note”❌ NoAnti-join across files; Bases has no left-anti-join semantics
”Coverage matrix: each control × evidence count × framework”❌ NoMulti-table aggregate join
”All controls reachable from NIST CSF Identify via any chain of crosswalks”❌ NoRecursive graph traversal
”Crosswalk between two ontologies, filtered by STRM predicate”❌ NoJoining mappings + concepts tables
”Show evidence freshness per control with computed staleness”⚠ PartialBases formulas can compute per-row; can’t aggregate per-control across multiple junctions

The first 4 (Bases-doable) are flat queries over Tier 1 frontmatter. The last 5 (Bases-impossible) require either Tier 2 SQL or DataviewJS. We’ve removed Dataview. We’ve shipped Tier 2 SQL helpers (v0.1.5 Phase 3) that handle exactly these cases. The architectural question is: how do these two surfaces compose at user-experience level?

This challenge designs that architecture.

What we already have (the unchosen-but-built parts)

Section titled “What we already have (the unchosen-but-built parts)”
AssetWhat it gives us
Tier 1 canonical Markdown vault (system architecture Layer 2)Everything Bases queries against. Concepts, junction notes, crosswalk-edge notes — all flat-YAML files conforming to spec/tier1.schema.json
Tier 2 sqlite-wasm sidecar (v0.1.5)SQLite database derived from Tier 1; deletable; recoverable. Tables: concepts, mappings, junction_notes, closure_cache, ontologies. Already populated by the projector
Tier 2 query API (v0.1.5 Phase 3)3 typed helpers: getConceptsByOntology, crosswalkBetween(subj, obj, predicate?), closureFromConcept(start, predicate?, maxDepth?). Plus closure cache lazy materialization via recursive CTE
junction_notes_with_freshness viewAlready built in Tier 2 schema; computes expired / stale / fresh / not-set per junction at query time
Junction note 13-field schema (Ch 07 synthesis)subject + predicate + object + coverage + confidence + status + reviewer + review_date + 5 more fields, all flat YAML
Bases skill (obsidian-bases)Authoring patterns for .base files (filters, formulas, summaries, view types)
Plugin instance handles for SQL helpers (plugin.queryConcepts, queryCrosswalk, queryClosure)Programmatic access to Tier 2 from any Crosswalker code path

What we don’t have:

  • A user-facing surface for the SQL queries (currently only callable via executeObsidian in tests)
  • A decision on how Bases views and SQL queries compose in the UX
  • A pattern for emitting useful queries into concept-note bodies on import
  • A pattern for “complex query result → Bases-renderable view”
  • Junction-note coverage queries that span concept × junction × evidence (currently neither Bases nor our SQL helpers handle this cleanly)

Each section below is a focused question the deliverable must answer concretely with worked examples.

For every common compliance/crosswalk query a Crosswalker user would ask, determine where it routes:

  • Tier 1 / Bases native (flat YAML; .base file; no plugin code needed)
  • Tier 2 / SQL helper (calling plugin.queryConcepts/Crosswalk/Closure or new helpers)
  • Hybrid (SQL helper produces a result; Bases renders it via some bridge)
  • Out of scope for v0.1.6 (deferred / DataviewJS-style in a future milestone / out-of-product)

Produce a table: ~20 user questions × routing decision × rationale.

2. Junction-note query patterns specifically

Section titled “2. Junction-note query patterns specifically”

Junction notes are the most architecturally novel piece (Ch 07). Six query patterns matter for compliance:

a. “All evidence for control X” — junction notes where subject points at X b. “All controls covered by evidence Y” — junction notes where object points at Y c. “Coverage gaps” — controls with no junction note pointing at them (left-anti-join) d. “Stale evidence” — junctions with freshness == 'stale' or expired e. “Coverage matrix” — table of (control, count of junctions, latest review_date, set of frameworks crosswalked) f. “Multi-control evidence” — junction notes where the same evidence covers multiple controls

For each: which tier handles it, how the query is expressed, what the user sees.

The hard ones are (c) and (e). Bases CAN’T do anti-joins. SQL helpers CAN do (c) easily but the result needs to surface in the user’s vault — how?

Three plausible patterns for how Bases + Tier 2 SQL compose at user-surface level:

PatternWhat it looks likeTradeoffs
A: Bases-only (sacrifice complex queries)Concept-note bodies emit pure Bases queries; complex queries are out-of-scope for v0.1.6; user runs SQL queries via palette commands that show modal resultsSimple; but incomplete — coverage gaps, closure queries unavailable in note views
B: SQL → Materialized Tier 1 file → Bases renders itCrosswalker periodically (or on-demand) runs Tier 2 SQL queries; result rows written to a generated Markdown file with frontmatter; Bases queries that file nativelyBases UX preserved; result file needs lifecycle (regenerate? when?); freshness becomes complicated
C: Custom Crosswalker views (not Bases-native)Plugin emits its own query views that render SQL results in custom panes; bypasses Bases for complex queriesMost flexible; least Obsidian-native UX; “Bases query layer” commitment becomes “Bases for simple, custom for complex”
D: Dual surfaceRecipe emits Bases for simple queries inline; recipe also emits a “[Run complex query]” link that opens a modal calling Tier 2 SQLTwo surfaces user has to know; honest about the constraint

Which is the right v0.1.6 architecture? What’s the minimum viable user experience?

When a recipe imports a framework, what queries should the bundled engine emit into note bodies automatically?

For a concept note (e.g., AC-2):

  • “Evidence covering this control” — Bases query? SQL bridge? Both?
  • “Controls in this family” — Bases (flat tag/frontmatter filter)
  • “Crosswalks from this control” — Bases (junction notes filter on subject)
  • “Crosswalks to this control” — Bases (junction notes filter on object)
  • “Transitive reach via crosswalks” — SQL only

For a junction note:

  • “All other junctions for this control” — Bases (subject filter)
  • “All other junctions for this evidence” — Bases (object filter)
  • “Freshness status” — Bases formula

For a crosswalk-edge note:

  • “Other edges with same subject” — Bases
  • “Other edges with same predicate” — Bases
  • “Transitive closure from this edge” — SQL only

What’s the recipe API for declaring “emit this query into the note body”? Is it a recipe field (e.g., also_emit.body.queries) or a per-kind default in the recipe schema?

  • Tier 2 closure cache invalidates on any mappings change. What about Bases queries — do they auto-update? (Yes, Bases re-runs on file change.)
  • If we use Pattern B (materialize SQL → Tier 1 file → Bases renders), what’s the materialization cadence? On vault load? On-demand? On Tier 1 change?
  • For multi-thousand-edge OLIR crosswalks, what’s the wall-clock cost of common queries?

6. The junction-note-subject-string problem

Section titled “6. The junction-note-subject-string problem”

junction_notes.subject is a wikilink-target string (e.g., [[Frameworks/NIST 800-53 r5/AC/AC-2]]). To answer “evidence for control X” we need to match this string against the concept’s vault path or curie. Three approaches:

  • String LIKE: WHERE junction_notes.subject LIKE '%' || concepts.curie || '%' — brittle (false positives if curie is a substring of another)
  • Resolve via Obsidian’s metadataCache: app.metadataCache.resolveLink() returns the target file; lookup that file’s CURIE; match — accurate but requires plugin code, not pure SQL
  • Projector pre-resolution: at projection time, resolve subject wikilink to a concept curie + store in a new column — cleanest if the resolved-curie is stable

Which approach? What’s the schema impact? Does it need a v0.1.5 schema bump (tier2-sqlite-v2)?

A user with multiple vaults (corporate vs personal; NIST vault vs ISO vault) might want queries that span them. Out of scope for v0.1.6, but: does the design preclude it later? Or does the architecture leave the door open?

The deliverable must NOT recommend:

  1. Reintroducing Dataview — the project memory commits to Bases-only as the v0.1 query layer; reversing that is a synthesis-log-level decision, not a query-architecture choice
  2. Embedding raw SQL in concept-note bodies — SQL is not portable; Bases queries are because they’re upstream Obsidian
  3. Custom plugin-only query language — would compete with both Bases and SQL; adds maintenance debt; isn’t Obsidian-native
  4. Closure-on-every-query — closure cache exists for a reason; the architecture must use it
  5. DataviewJS as a fallback — same project memory; Dataview removed
  6. A separate “Crosswalker query language” — JSONata is already the in-recipe expression sub-language (transform-engine-depth log); not adding a third language

The deliverable must produce:

  1. Query-routing decision matrix — ~20 user questions × tier × rationale
  2. Junction-note query pattern catalog — 6 patterns × concrete .base file or SQL helper call × user-surface rendering
  3. Cross-tier composition recommendation — which of patterns A/B/C/D wins, with explicit tradeoffs vs the alternatives
  4. Recipe-emission API proposal — what fields recipe authors use to declare body queries; how the engine emits them
  5. Junction-note-subject-string resolution decision — string LIKE vs metadataCache vs projector pre-resolution; schema implications
  6. Performance worked examples — wall-clock estimates for the 3-5 hardest query types at OLIR-scale (~1000 mappings, ~1000 junctions, ~10K concepts)
  7. v0.1.6 milestone scope refinement — what’s in / out / deferred to v0.2 with that recommendation in hand
  8. Open questions for v0.2+ — concrete deferral list (e.g., “cross-vault queries deferred per §7”; “DataviewJS migration when Datacore stabilizes”)

Why this needs to be a research challenge (not just a planning session)

Section titled “Why this needs to be a research challenge (not just a planning session)”

Three reasons:

  1. The decision is genuinely architecturally consequential — gets v0.1.6 + v0.1.7 + future query work right or wrong for years
  2. It synthesizes 5+ prior decisions — Ch 07 (junction notes); Ch 23 (Bases not Dataview); Ch 18 (closure cache); v0.1.5 Phase 3 (SQL helpers); junction note schema
  3. The right answer is non-obvious — Patterns A/B/C/D have real tradeoffs; “obvious” choices (e.g., “just use Bases everywhere”) fail at the first complex query

The deliverable should be adversarial — argue against each chosen approach before recommending it. Test each pattern against a hostile real-world query: 5000 controls × 8 frameworks × 30K junction notes; user wants “stale-evidence coverage matrix grouped by framework, sorted by oldest review_date.” Which pattern actually serves them?

Write the deliverable to docs/src/content/docs/agent-context/zz-research/YYYY-MM-DD-challenge-27-bases-query-layer-architecture.md (plain .md, frontmatter only for sidebar — per the synthesis-log skill convention).

After deliverable lands: write a synthesis log resolving the verdicts adopted/rejected and update the v0.1.6 milestone page scope. Use the synthesis-log skill.