System architecture — components, tiers, and data flow
This page is the canonical entry point for understanding how Crosswalker’s pieces fit together. It consolidates information that was previously scattered across the ETL concept, the embedded substrates concept, the v0.1 schema spec, and per-milestone pages.
If you’re new here, read this first.
Crosswalker has three storage tiers and six logical layers that move data between them.
Three tiers, simply:
- Tier 0 = your source data (whatever shape it arrives in)
- Tier 1 = the canonical Markdown vault (what Obsidian sees, what git tracks, what survives plugin uninstall) — the source of truth
- Tier 2 = a deletable SQLite cache derived from Tier 1 — convenience for fast queries
Six logical layers, simply:
| # | Layer | What it does | Where it operates |
|---|---|---|---|
| 1 | Import | Read source → write canonical Markdown | T0 → T1 |
| 2 | Storage | Hold canonical state on disk | T1 (also T2 once projected) |
| 3 | Projection | Read T1 → write T2 (the projector) | T1 → T2 |
| 4 | Query | Answer user questions over T1 + T2 | T1, T2 → user |
| 5 | Export | Read T1 (or T2) → write external formats | T1, T2 → external |
| 6 | Audit | Track who-changed-what + integrity | git, signed manifests |
The rest of this page expands each layer with diagrams, components, and code references.
The simplified picture
Section titled “The simplified picture”Three rules that make the whole architecture tractable:
- Tier 1 is canonical. Every other tier (T0, T2, exports) is either upstream of T1 or derived from T1. Nothing important lives only in T2.
- Tier 2 is deletable. If
.crosswalker.sqliteis missing, corrupted, or stale, the projector rebuilds it from T1 on next vault load. This is what makes Tier 2 risk-free to bundle (Ch 24 §2). - The contract is the schema, not the engine. Anyone who can write valid Tier 1 frontmatter is a first-class producer (schema-as-primitive commitment).
These three rules together mean: you can lose, replace, or reimplement everything except the Tier 1 schema and the canonical Markdown vault. That property is what makes the architecture portable, modular, and resilient.
The three storage tiers
Section titled “The three storage tiers”| Tier | What it is | Lifetime | Authoritative? | Spec |
|---|---|---|---|---|
| Tier 0 | Source data in whatever shape it ships (CSV, JSON, JSONL, XLSX, OSCAL, etc.) | Whenever the user has it | No — it’s the input | n/a (any structured format) |
| Tier 1 | Canonical Markdown vault — .md files with YAML frontmatter | Permanent (git-tracked) | Yes — the source of truth | spec/tier1.schema.json |
| Tier 2 | .crosswalker.sqlite sidecar — a SQL projection of Tier 1 | Recreatable on demand | No — derived from T1 | v0.1 schema spec §7 |
What “canonical” means: every field in Tier 2 has to be derivable from Tier 1 frontmatter. Every queryable Tier 1 frontmatter field has to be a flat scalar or wikilink (no nested objects, no inline expressions) so it’s Bases-queryable directly without going through T2.
What’s NOT a tier:
ParsedData(the bundled engine’s in-memory iteration shape) is not a tier — it’s an implementation detail of the import layer (see ETL concept “What ParsedData is”)- The recipe (
spec/recipe.schema.json) is not a tier — it’s a transformation declaration consumed by the import layer - The closure cache (rows in
closure_cachetable) is not a separate tier — it’s a materialized view inside Tier 2
Layer 1 — Import (T0 → T1)
Section titled “Layer 1 — Import (T0 → T1)”One-liner: take a source ontology + a recipe, produce Tier 1 Markdown files.
Components in this layer
Section titled “Components in this layer”| Component | Code | What it does |
|---|---|---|
| Source parsers | src/import/parsers/csv-parser.ts (+ JSONL + JSON-with-iterator-path planned for v0.2) | T0 file → ParsedData (eager array OR AsyncIterable<Row>) |
| Streaming foundation | src/import/parsers/csv-parser.ts:parseCSVFileStream | PapaParse step callback → AsyncIterable with backpressure (v0.1.4.5) |
| render() | src/render/index.ts | Pure function (Recipe, ConceptIdentity) → Address (v0.1.2) |
| Template engine | src/render/template.ts | R2RML-style {var|filter} interpolation; closed 7-filter set; JSONata sub-language reserved for v0.3+ |
| Mechanisms | src/render/mechanisms/{folder,file,heading,tag,wikilink}.ts | The 5 closed hierarchy primitives — folder/file/heading wired in v0.1; tag/wikilink reserved for v0.2 |
| Kind dispatch | src/render/index.ts (Tier1Kind) | kind: concept | junction-note | crosswalk-edge discriminator (v0.1.4) |
| Generation engine | src/generation/generation-engine.ts:generateNotes / generateFromRecipe | Per-row loop calling render() + provenance + validation + merge → vault.create/modify |
| Frontmatter merge | src/generation/frontmatter-merge.ts | managed (recipe-owned, overwritten) vs user_preserve (recipe-untouched) per Ch 22 §8.4 |
| Provenance writer | src/generation/provenance.ts | _crosswalker block per Tier 1 schema $defs/provenance_block |
| Validator (pre-write gate) | src/validation/validator.ts:validateTier1Frontmatter | AJV against spec/tier1.schema.json. Strict mode rejects bad rows pre-write — STRM predicate enforcement happens here |
| Legacy-recipe shim | src/generation/legacy-recipe-shim.ts | Translates v0.1.0 column-role configs → Ch 22 layout Recipe (Phase-0 compat per Ch 22 §10.7) |
Two-mode architecture
Section titled “Two-mode architecture”The import layer supports two architectural modes (decided 2026-05-05):
- Mode 1 — bundled projector: external producer (or wizard) hands the bundled engine structured rows + recipe → engine emits Tier 1
- Mode 2 — direct emission: external producer (AI agent, MCP server, marketplace bundle) writes Tier 1 Markdown directly, bypassing the bundled engine
Both are first-class. ChunkyCSV / JSONaut / dbt are natural Mode 1 feeders. AI agents are natural Mode 2 emitters. See ETL concept “two-mode architecture” for the full picture.
Milestones that ship this layer
Section titled “Milestones that ship this layer”v0.1.1 (validation), v0.1.2 (render), v0.1.3 (engine), v0.1.4 (kind dispatch + STRM), v0.1.4.5 (streaming foundation).
Layer 2 — Storage (Tier 1 = canonical)
Section titled “Layer 2 — Storage (Tier 1 = canonical)”One-liner: hold the truth as Markdown files on disk.
Properties of Tier 1 storage
Section titled “Properties of Tier 1 storage”- File-based: every concept, evidence link, and crosswalk is its own
.mdfile - YAML frontmatter is the data: per Ch 22 §1.3, frontmatter is flat-scalar / wikilink only, not nested objects
- git-tracked: every change is a commit (v0.1.8 audit trail)
- Obsidian-native: files appear in the file explorer, search, graph view, Bases queries, etc., natively
- Survives plugin uninstall: even with no plugins enabled, the vault is still readable Markdown
- Three Tier 1 shapes discriminated by
kindfield — concept / junction-note / crosswalk-edge (Tier 1 schema)
Why files instead of a database
Section titled “Why files instead of a database”See concepts/file-based-graph-database for the full philosophical case. Short version: files are diffable, mergeable, exportable, archivable, attestable, and survive every tool/plugin in the stack getting replaced.
Layer 3 — Projection (T1 → T2)
Section titled “Layer 3 — Projection (T1 → T2)”One-liner: read Tier 1 Markdown frontmatter, write rows into the SQLite sidecar.
Components in this layer
Section titled “Components in this layer”| Component | Code | What it does |
|---|---|---|
| Sidecar lifecycle | src/tier2/sidecar.ts:openSidecar/clearSidecar | Init sqlite-wasm; OPFS sahpool VFS; open .crosswalker.sqlite |
| Schema migrations | src/tier2/migrations.ts | drop-and-recreate on version mismatch (correct because T2 is purely a T1 projection) |
| Schema DDL | src/tier2/schema.sql | The full Tier 2 SQL schema per spec §7 |
| Projector | src/tier2/projector.ts (Phase 2 of v0.1.5) | Walk vault .md files via streaming foundation; dispatch by kind; idempotent upsert |
| Closure cache | src/tier2/queries.ts (Phase 3 of v0.1.5) | Recursive-CTE materialization on first transitive query; mtime-based invalidation |
Recovery property (load-bearing)
Section titled “Recovery property (load-bearing)”If .crosswalker.sqlite is missing, corrupted, or stale, the projector reprojects from canonical Tier 1 on next vault load. This is what makes Tier 2 risk-free to bundle. Per Ch 24 §2 — the modularity rule that lets us swap substrates (libSQL/Limbo/server) without data migration.
Substrate
Section titled “Substrate”@sqlite.org/sqlite-wasm (canonical, foundation-governed). sqlite-vec deferred — see the WASM-A path notes in v0.1.5 and the calendar-anchored 2026-11-06 revisit checkpoint in Ch 24 §5 Q4.
Milestones that ship this layer
Section titled “Milestones that ship this layer”v0.1.5 (the entire projection layer).
Layer 4 — Query (T1 + T2 → user)
Section titled “Layer 4 — Query (T1 + T2 → user)”One-liner: let users ask questions over their vault — flat lookups via Bases over T1, transitive/joined queries via SQL over T2.
Components in this layer
Section titled “Components in this layer”| Component | Code | What it does |
|---|---|---|
| Bases (upstream Obsidian) | n/a (built-in to Obsidian) | Flat YAML-frontmatter queries directly over Tier 1 — no plugin needed |
| SQL query helpers | src/tier2/queries.ts (v0.1.5 Phase 3) | Typed getControlsByOntology, findCoverageGaps, crosswalkBetween, closureFromConcept |
| Bases query templates emitted in concept-note bodies | src/generation/templates/ (v0.1.6) | Recipe-author authors a body template; engine emits a working Bases query in each concept note |
When to use which
Section titled “When to use which”| Question shape | Tier | Why |
|---|---|---|
| Flat key/value over a single note kind | Tier 1 via Bases | Native; no plugin; survives Tier 2 corruption |
| Tag-prefix or single-frontmatter-field filter | Tier 1 via Bases | Same |
| Multi-ontology joins | Tier 2 via SQL | Bases doesn’t do joins; SQL does |
| Transitive closure (chain of crosswalks) | Tier 2 via SQL recursive CTE | Closure cache materialized once; subsequent queries hit the cache |
| Coverage matrix (controls × evidence × frameworks) | Tier 2 via SQL | Multi-table join; awkward in Bases |
| Vector / semantic similarity | Tier 2 via sqlite-vec | Deferred to a future milestone (see Ch 24 §5 Q4) |
The choice is the recipe author’s (or query author’s). Tier 1 is the default for simple queries because it survives Tier 2 corruption. Tier 2 is for when SQL’s expressivity is needed.
Three-layer query engine architecture
Section titled “Three-layer query engine architecture”The query layer separates concerns across three orthogonal layers (settled 2026-05-08; see synthesis log):
| Layer | Vocabulary | Concept page |
|---|---|---|
| A — Query primitives | filter / project / traversal / closure / anti-join / pivot / aggregate | query-primitives |
| B — View shapes | table / list / pivot / graph / hierarchy / timeline | view-shapes |
| C — Recipes / reports | ”Coverage Matrix”, “Crosswalk Density”, “Ontology Overlap”, etc. | recipe registry |
7 primitives × 4 mechanisms
Section titled “7 primitives × 4 mechanisms”| Primitive | Bases-native | Tier 2 SQL helpers | Codeblock processor (v0.1.7) | Materialized snapshot (v0.1.8) |
|---|---|---|---|---|
| filter | ✅ default | ✅ WHERE | ✅ | ✅ pre-resolved |
| project | ✅ default | ✅ projection | ✅ | ✅ |
| traversal | partial (single-hop via file.hasLink) | ✅ via mappings table + crosswalkBetween | ✅ | ✅ |
| closure | ❌ no recursion | ✅ recursive CTE via closureFromConcept (already shipped v0.1.5 P3) | ✅ | ✅ |
| anti-join | ❌ | ✅ via EXCEPT / LEFT JOIN ... NULL | ✅ | ✅ |
| pivot | ❌ summaries are 1-D | composes filter + traversal + aggregate | ✅ | ✅ pre-rendered |
| aggregate | ✅ summaries (1-D) | ✅ multi-D aggregates | ✅ | ✅ |
Crosswalker recipes (Layer C) name a primitive composition + a view shape. The engine picks the right mechanism for each primitive based on what’s expressible at that layer. v0.1.6 ships the crosswalkerPivot custom Bases view (registerBasesView) for the pivot shape; v0.1.7 ships the crosswalker-query codeblock processor; v0.1.8 ships materialized snapshot output.
Milestones that ship this layer
Section titled “Milestones that ship this layer”v0.1.6 (Bases templates + crosswalkerPivot custom view + recipe loader); v0.1.7 (codeblock processor); v0.1.8 (materialized snapshot output).
Layer 5 — Export (T1 → external; or T2 → external)
Section titled “Layer 5 — Export (T1 → external; or T2 → external)”One-liner: serialize Tier 1 (or Tier 2) data into external interop formats.
Components in this layer
Section titled “Components in this layer”| Component | Code | What it does |
|---|---|---|
| STRM TSV exporter | src/export/strm-tsv.ts (v0.1.7) | Reads T2 mappings; emits NIST IR 8278A r1 OLIR template format |
| OSCAL JSON exporter | src/export/oscal-profile.ts (v0.1.7) | Reads T1 junction notes (or T2 junction_notes); emits OSCAL Control Mapping profile |
| SSSOM TSV exporter | src/export/sssom-tsv.ts (v0.1.7) | Reads T2 mappings; emits SSSOM-shaped TSV; opt-in toggle |
Round-trip determinism
Section titled “Round-trip determinism”The export layer commits to byte-identical round-trip: import a STRM TSV → export the resulting Tier 1 vault → STRM TSV bytes match. This is enforced by tests in v0.1.7. Importing your own export and re-exporting must be a fixed point.
Milestones that ship this layer
Section titled “Milestones that ship this layer”v0.1.7 (the entire export layer).
Layer 6 — Audit (cross-cutting)
Section titled “Layer 6 — Audit (cross-cutting)”One-liner: track who-changed-what, when, with what integrity guarantees.
Components in this layer
Section titled “Components in this layer”| Component | Code | What it does |
|---|---|---|
| git commit hook | src/audit/git-commit.ts (v0.1.8) | One commit per generation pass; deterministic message |
| Ed25519 sign helper | src/audit/sign.ts (v0.1.8) | @noble/curves; signs release manifests |
| Release manifest writer | src/audit/manifest.ts (v0.1.8) | _crosswalker_manifest.json at vault root |
| FRE 902(13) PDF template | src/audit/fre-902.ts (v0.1.8) | pdf-lib-rendered certification template |
Milestones that ship this layer
Section titled “Milestones that ship this layer”v0.1.8 (Tier 1 audit floor: git + Ed25519 + FRE 902(13)).
Component-to-tier matrix
Section titled “Component-to-tier matrix”A single table of every load-bearing logical component, where it lives, and what it reads/writes:
| Component | Layer | Reads | Writes | Code |
|---|---|---|---|---|
| Source parser | Import | T0 file | ParsedData (in-memory iter) | src/import/parsers/*.ts |
render() | Import | Recipe + ConceptIdentity | Address (in-memory) | src/render/index.ts |
| Generation engine | Import | ParsedData + Recipe | Tier 1 .md files | src/generation/generation-engine.ts |
| Validator | Import | Frontmatter object | (returns valid/invalid) | src/validation/validator.ts |
| Frontmatter merge | Import | Existing frontmatter + new managed | Merged frontmatter | src/generation/frontmatter-merge.ts |
| Provenance writer | Import | source ref + recipe id | _crosswalker block | src/generation/provenance.ts |
| Tier 1 vault | Storage | (passive) | (held on disk) | n/a — files |
| Sidecar lifecycle | Projection | (init only) | Opens .crosswalker.sqlite; applies migrations | src/tier2/sidecar.ts |
| Projector | Projection | Tier 1 .md frontmatter | ontologies, concepts, mappings, junction_notes rows | src/tier2/projector.ts (v0.1.5 Phase 2) |
| Closure cache populator | Projection | mappings table | closure_cache table | src/tier2/queries.ts:closureFromConcept |
| Bases queries | Query | Tier 1 frontmatter | (returns user data) | n/a (upstream Obsidian) |
| SQL query helpers | Query | Tier 2 tables | (returns user data) | src/tier2/queries.ts |
| Body-query templates | Query | Recipe template | Tier 1 note body (Bases query string) | src/generation/templates/ (v0.1.6) |
| STRM TSV exporter | Export | T2 mappings (or T1 crosswalk-edge notes) | .tsv file | src/export/strm-tsv.ts (v0.1.7) |
| OSCAL JSON exporter | Export | T1 junction notes (or T2) | .json file | src/export/oscal-profile.ts (v0.1.7) |
| SSSOM TSV exporter | Export | T2 mappings | .tsv file | src/export/sssom-tsv.ts (v0.1.7) |
| git commit hook | Audit | (write event) | git commit | src/audit/git-commit.ts (v0.1.8) |
| Ed25519 signer | Audit | Release bundle | Signature on _crosswalker_manifest.json | src/audit/sign.ts (v0.1.8) |
| FRE 902(13) PDF | Audit | Vault hash + signed manifest | .pdf file | src/audit/fre-902.ts (v0.1.8) |
Read/write data flow
Section titled “Read/write data flow”Which components read/write which storage tier:
| Tier | Written by | Read by |
|---|---|---|
| T0 (source) | the user (or external producer) | Source parser |
| T1 (canonical Markdown) | Generation engine; external Mode 2 producers | Bases queries; projector; exporters; git audit |
| T2 (SQLite sidecar) | Sidecar lifecycle (open + migrations); projector; closure cache populator | SQL query helpers; exporters (when joining over T2 tables) |
| External (TSV / JSON / PDF) | Exporters; FRE 902(13) PDF generator | (downstream tools — Excel, OSCAL validators, courts) |
Notice that everything ultimately traces back to Tier 1: T0 produces T1, T2 derives from T1, exports come from T1 (or T2 which derives from T1), audit hashes T1. No data lives only in T2.
Where to look in the codebase
Section titled “Where to look in the codebase”Where to look in the docs
Section titled “Where to look in the docs”| If you want… | Read this |
|---|---|
| Quickest way to grasp the system | This page (TL;DR section) |
| The truth about what Tier 1 must look like | v0.1 schema spec |
| The recipe shape (transformation declaration) | spec/recipe.schema.json + Ch 22 synthesis |
| Why the architecture is shaped this way (the philosophy) | What makes Crosswalker unique + Vision |
| Why Tier 1 is files (not a database) | File-based graph database concept |
| Why Tier 2 is sqlite-wasm (and not Turso/libSQL/Limbo) | Ch 24 substrate synthesis |
| The closed 5-mechanism recipe grammar | Hierarchy primitives concept + Ch 22 synthesis |
| The two-mode architecture (bundled vs direct) | ETL two-mode section + decision log |
| Definitions of every term used here | Terminology |
| The full implementation roadmap | Milestones hub |
| The 6 architectural commitments locked 2026-05-04 | v0.1 import-engine design log |
| Why no JSONaut/ChunkyCSV port + GUI scope line | Transform-engine-depth log |
How v0.1 implementation maps to this architecture
Section titled “How v0.1 implementation maps to this architecture”Phase 2 of v0.1.5 (the projector) is the next unit of work — it’s the bridge between “Tier 1 exists” and “fast queries are possible.”
Related
Section titled “Related”Concept pages (this page links to and from):
- ETL and import — the import layer details + two-mode architecture
- Hierarchy primitives — the 5 closed mechanisms render() dispatches over
- Embedded vs server substrates — why Tier 2 is embedded
- File-based graph database — why Tier 1 is files
- Terminology — definitions for every term
- What makes Crosswalker unique — architectural differentiators
- Ontology evolution — how T1 changes are tracked
- Metadata ecosystem — Bases query layer context
- Consistency models — T1 ↔ T2 consistency semantics
Agent context:
- v0.1 schema spec — Tier 1 + recipe + Tier 2 SQL DDL
- Vision — why this architecture exists
- Tradeoffs
Foundation decisions (synthesis logs — six load-bearing commitments settled 2026-05-04):
- v0.1 import-engine design log — the canonical commitments page
- Ch 22 — target-structure expressivity — recipe grammar + render()
- Ch 23 — bundle/engine/language — TS in-plugin engine + JSONata
- Ch 24 — Tier 2 substrate — sqlite-wasm + sqlite-vec + 5 migration triggers + 2026-11-06 vec revisit
Recent design decisions (2026-05-05):
- Two-mode architecture — bundled projector + direct emission
- Transform-engine depth + input formats — JSONata sub-language; v0.2 JSONL
- ETL pipeline clarification — what ParsedData is/isn’t
Implementation milestones:
- Milestone hub
- v0.1.1 (validation) · v0.1.2 (render) · v0.1.3 (engine) · v0.1.4 (kind dispatch) · v0.1.4.5 (streaming)
- v0.1.5 (projection) · v0.1.6 (query) · v0.1.7 (export) · v0.1.8 (audit) · v0.1-RC (ship)
Spec files:
spec/tier1.schema.json— what Tier 1 frontmatter must look likespec/recipe.schema.json— what a recipe must look like
Registry references (frameworks the architecture is generic over):