v0.1 schema spec — the four interconnected schemas
§1 Why this page exists
Section titled “§1 Why this page exists”The Foundation phase intro states: “The three interconnected schemas (_crosswalker metadata, ImportRecipe, StewardshipProfile) must be designed together — getting any wrong is expensive to fix later.” The v0.1 stack-pivot makes this concrete by committing to a buildable Tier 1 + Tier 2 sidecar architecture.
This page is the unified spec the v0.1 codebase will be built against. It pulls the four interconnected schemas into one place:
_crosswalkermetadata — the per-note frontmatter Crosswalker writes onto generated control / mapping / evidence files. Tier 1 canonical. (Extends constraint-enforcement § Metadata tiers.)- ImportRecipe — the import recipe (JSON/YAML on disk; user-authored or community-shared) that drives generation. Supersedes the older “FrameworkConfig” naming from config-schema-design; see §4 naming history for context.
- Junction note 13-field schema — the per-edge markdown file that holds evidence-link metadata. Tier 1 canonical. (Ch 07 resolution.)
- Tier 2 sidecar SQL schema — the projection of Tier 1 frontmatter into a queryable SQLite store. Tier 2, deletable, recoverable from Tier 1.
The earlier-written config-schema-design and constraint-enforcement pages stay as background design rationale; this page is the consolidated v0.1 build target.
Where this spec is implemented (forward-links to milestones)
Section titled “Where this spec is implemented (forward-links to milestones)”Each schema section in this spec is implemented by specific v0.1 implementation milestones:
| Spec section | Implementing milestone(s) | Status |
|---|---|---|
§3 _crosswalker metadata block | v0.1.3 — Generation engine integration (provenance writer); v0.1.1 — Type system + validation (AJV) | ✅ Done |
| §4 ImportRecipe + render() | v0.1.2 — render() v1 (pure function); v0.1.4 — Junction notes + crosswalk edges (kind dispatch); v0.1.4.5 — Streaming refactor (AsyncIterable rows) | ✅ Done |
| §5 Junction note 13-field schema | v0.1.4 (kind: junction-note dispatch); Ch 07 resolution | ✅ Done |
| §7 Tier 2 sidecar SQL schema | v0.1.5 — Tier 2 sqlite-wasm sidecar (Phase 1 substrate; Phase 2 projector; Phase 3 query API + closure cache) | 🚧 Phase 1+2+3 done |
| §7 Recursive-CTE closure cache | v0.1.5 Phase 3 — see Ch 18 deliverable for the algorithmic patterns and engineering scale model | 🚧 In progress |
| §8 Cross-schema invariants | All v0.1 milestones | Enforced via test harness |
Higher-level system view: see the system architecture page for the 6-layer view (import / storage / projection / query / export / audit) showing how each schema fits the broader pipeline.
Naming history: FrameworkConfig → ImportRecipe
Section titled “Naming history: FrameworkConfig → ImportRecipe”The type now called ImportRecipe was originally proposed as FrameworkConfig in the config-schema-design page (2026-04). On 2026-05-03 the name was changed for two reasons:
- General-ontology positioning. “Framework” baked the GRC use case into a type that’s actually general — any structured ontology (compliance frameworks, biomedical taxonomies, library classifications, custom domain hierarchies) can be imported via this recipe. “Ontology” is the broader term that already pervades the project’s vocabulary (ontology lifecycle, ontology evolution, ontology diff primitives).
- Recipe vs config. “Config” sounds like settings; this artifact is actually a reusable transformation recipe applied to a source. A recipe is shareable, version-controlled, and replayable — semantics that “config” doesn’t carry.
Companion field renames in _crosswalker metadata and junction-note schemas:
framework_id→ontology_idframework_version→ontology_versionframework(junction-note field) →ontologyconfig_id(in_crosswalker) →recipe_id
Folder convention (Frameworks/...) is user-controlled via recipe.output.base_path — examples in this doc still use Frameworks/NIST-800-53-r5/ because NIST 800-53 is a framework and that’s the natural choice; non-GRC users would set base_path to Ontologies/, Standards/, Domain/X/, etc.
Historical decision logs (zz-log/) and research deliverables (zz-research/) are preserved verbatim with the original FrameworkConfig / framework_id naming — they are dated decision records, not living spec.
§2 The schemas at a glance
Section titled “§2 The schemas at a glance”Cross-schema invariants (load-bearing):
- Every field in the Tier 2 SQL schema must be derivable from Tier 1 frontmatter.
- Every Tier 1 frontmatter field that needs to be queryable in Tier 2 must be a flat scalar or a wikilink (no nested objects, no inline expressions). Bases-queryable constraint.
_crosswalkermetadata is additive — it never overwrites user-authored frontmatter, only adds keys under_crosswalker:.- Junction notes are generated, but users can edit them; v0.1 generation is non-destructive (read git history to confirm).
- ImportRecipe is user-authored (or imported from a community-shared recipe); the plugin does not auto-generate recipes.
§3 _crosswalker metadata schema (per-note frontmatter)
Section titled “§3 _crosswalker metadata schema (per-note frontmatter)”Every note Crosswalker generates carries a _crosswalker frontmatter block. It is additive only — no other top-level frontmatter keys are touched.
TypeScript shape
Section titled “TypeScript shape”YAML on-disk shape (canonical)
Section titled “YAML on-disk shape (canonical)”Mandatory vs optional fields
Section titled “Mandatory vs optional fields”| Field | Required for | Notes |
|---|---|---|
schema_version | All _crosswalker-bearing notes | Enables migration on schema bumps |
source_file | All | Provenance |
source_hash | All | Re-import / staleness detection |
import_date | All | Audit trail |
recipe_id | All | Links to ImportRecipe |
ontology_id + ontology_version | Control notes & ontology-bound mapping notes | Optional for evidence-only notes |
control_id | Control notes only | The predicate_id namespace for STRM crosswalk fields |
status | Optional | Defaults to active if absent |
| Other lifecycle fields | Optional | Written when state transitions occur |
Schema-version migration policy
Section titled “Schema-version migration policy”Per the meta-schema lifecycle commitment (“Crosswalker eats own dog food”): every internal schema is versioned. When the schema bumps from crosswalker-v1 to crosswalker-v2, a migration script:
- Reads all
_crosswalker:blocks - Detects the old
schema_version - Applies a per-version migration function (additive transform: add new fields with defaults; rename fields with aliasing entry in
previous_ids; never destructive) - Writes the new
_crosswalker:block back, preserving non-Crosswalker frontmatter unchanged
The migration is idempotent (re-running on already-migrated data is a no-op) and resumable (per-file state-tracked).
§4 ImportRecipe schema (import config)
Section titled “§4 ImportRecipe schema (import config)”A ImportRecipe is JSON/YAML on disk that drives the import wizard. One per ontology; can be saved, shared, version-controlled, and matched against incoming source files via fingerprinting.
TypeScript shape
Section titled “TypeScript shape”Example: NIST 800-53 r5 ImportRecipe
Section titled “Example: NIST 800-53 r5 ImportRecipe”Migration from v1 (current code) to v2
Section titled “Migration from v1 (current code) to v2”Current src/types/config.ts defines CrosswalkerConfig for column mapping in a single import session. v2 wraps it:
- v1
CrosswalkerConfigbecomes v2’scolumns+output+transforms(existing fields lifted) - v2 adds:
id,version,source_file_patterns,fingerprint_columns,crosswalks,stewardship_profile_id,schema_version - The fingerprint-based config matching (current code uses content-derived fingerprint) is preserved; v2 adds explicit
source_file_patternsfor filename-based matching as a faster pre-check
Migration script:
- Read existing v1 saved configs
- Wrap each in a v2 envelope, generating an
idfrom thenamefield - Set
schema_version: import-recipe-v1 - Empty
crosswalks: []andsource_file_patterns: []for now (user fills in) - Write back
§4.5 Recipe query: block (added v0.1.6 — SchemaVer 1.1.0; additive)
Section titled “§4.5 Recipe query: block (added v0.1.6 — SchemaVer 1.1.0; additive)”Per Ch 31 schema design + Ch 36 compositional language stack, v0.1.6 added an optional query: block to the recipe schema. Recipes can now declare WHAT to query (axes, edges, aggregation) without writing SQL — the engine compiles the declared block to SQL recursive CTEs against the Tier 2 sqlite-wasm cache.
Backward compatibility: existing recipes WITHOUT query: continue to validate. The bump is purely additive (SchemaVer ADDITION); SchemaVer URI did NOT bump (https://crosswalker.dev/spec/recipe.schema.json stays).
Eight Layer A query verbs (per Ch 29 adversarial validation): filter / traverse / bind / project / aggregate / anti-join / set-op / diff. Closure folded into parameterized traverse(depth=*, transitive=true). Pivot demoted to Layer B (presentation, not value-producing).
Six view shapes (per Ch 30 view shape taxonomy): table / list / pivot / graph / hierarchy / timeline. v0.1 first-class catalog: Pivot (custom Bases view crosswalkerPivot) + Table/List/Cards (Bases-native consumed) + Hierarchy (graduates v0.1.7-v0.1.8). Graph + Timeline schema-declared; renderers ship v0.2+.
Top-level query block shape:
Pivot primitives example (the v0.1.6 launch-market Coverage Matrix shape):
Two discriminator styles ship (per Ch 31a + Ch 31b):
| Style | Discriminator | Default? | Trade-off |
|---|---|---|---|
| A | oneOf + const | ✅ default | ”Must match exactly one schema” errors |
| B | if/then/else | advanced | Focused per-shape errors; better IDE autocomplete |
Settings → “Recipe schema → Recipe query block schema style” picks the active validator. Both styles produce identical validity verdicts; differ in error-message UX. Implementation detail: validator compiles both styles at init; buildStyleBSchema() deep-clones the schema and patches query_block.allOf[0] to reference ShapeDispatchB (strips $id so AJV compiles as anonymous variant).
Allowed string-expression language: JSONata only (per Ch 36 compositional language stack). No inline SQL, SPARQL, or Crosswalker-invented DSL inside the query: block.
Reference recipes shipped to recipes/v0-1/:
| File | Shape | Notes |
|---|---|---|
coverage-matrix.json | pivot | Launch-market NIST CSF × ISO 27001 |
crosswalk-density.json | table | Aggregates per framework pair |
orphan-controls.json | list | Demonstrates anti-join verb |
hierarchy-view.json | hierarchy | Renderer ships v0.1.7-v0.1.8 |
list-view.json | list | Minimal Bases-native list |
Implementing milestone: v0.1.6 — Bases query layer ships the schema (Phase 1 ✅) + the crosswalkerPivot view that consumes pivot-shaped queries (Phase 3) + recipe-picker UX that emits embedded \“base` blocks (Phase 4) + opt-in materialization (Phase 5).
§5 Junction note 13-field schema (per evidence-link edge)
Section titled “§5 Junction note 13-field schema (per evidence-link edge)”Per Ch 07 resolution: evidence links are edge-as-note reified — one markdown file per evidence→control relationship.
TypeScript shape
Section titled “TypeScript shape”YAML on-disk shape
Section titled “YAML on-disk shape”Filename convention
Section titled “Filename convention”Junctions/{ontology}/{control_id}--{evidence_slug}.md — e.g., Junctions/nist-800-53-r5/AC-2--MFA-Policy.md.
The composite filename (control + evidence) makes diff-on-rename easy and prevents collision when one piece of evidence implements multiple controls.
Mandatory vs optional fields (rule)
Section titled “Mandatory vs optional fields (rule)”- Mandatory (5):
link_type,evidence,control,ontology,status. Plugin generation refuses to create a junction note missing any of these. Bases queries assume their presence. - Optional (8):
confidence,evidence_type,method,reviewer,review_date,responsible,collected,expires. Pluginrenders missing asnull/empty. - Computed:
freshnessderived fromreview_date+expiresat query time (Tier 1: in-memory; Tier 2: SQL view).
OSCAL by-component mapping
Section titled “OSCAL by-component mapping”The 13-field schema is structurally isomorphic to OSCAL’s by-component assembly per Ch 07 deliverable. The OSCAL export pipeline is straightforward field-rename. See reference/registry/oscal for the mapping table once the OSCAL mental-model doc lands.
§6 STRM + SSSOM hybrid wire format (crosswalk edges)
Section titled “§6 STRM + SSSOM hybrid wire format (crosswalk edges)”Per the v0.1 stack-pivot §6: user-facing wire format is STRM-shaped; SSSOM remains the internal validation envelope.
Crosswalk edges between ontology nodes are stored as STRM predicate frontmatter on control notes (not as separate junction notes — junction notes are evidence links, distinct from crosswalks).
STRM predicates (user-facing, frontmatter keys)
Section titled “STRM predicates (user-facing, frontmatter keys)”The 5 STRM relationships from NIST IR 8477:
| Predicate | Frontmatter key | Semantics |
|---|---|---|
| Equal To | is_equivalent_to | A ≡ B (semantic equivalence) |
| Subset Of | is_narrower_than | A ⊂ B (A is a special case of B) |
| Superset Of | is_broader_than | A ⊃ B (A subsumes B) |
| Intersects With | is_approximate_to | A ∩ B ≠ ∅ but A ≠ B |
| No Relationship | no_relationship | A ⊥ B (used for explicit “not related” assertions) |
Plus SSSOM predicate_modifier:
is_equivalent_to_NOT: ["[[X]]"]representspredicate_modifier: NOT— explicit negation per SSSOM spec
YAML on-disk shape (control note example)
Section titled “YAML on-disk shape (control note example)”Two formats supported:
- Simple wikilink array:
is_equivalent_to: ["[[X]]", "[[Y]]"]— when no per-edge metadata is needed (the 90% case). - Object array with metadata: when an edge needs
confidence,mapping_justification,mapping_date, etc.
The object form is fully SSSOM-compatible; the simple wikilink form is the SSSOM minimal envelope (just the predicate triple).
TSV exports
Section titled “TSV exports”The plugin emits two TSV variants from the same source data:
- STRM-TSV (OLIR-template-shaped): columns match NIST IR 8278A r1 OLIR template (Source Document, Source Element, Relationship, Target Document, Target Element, Strength, Comments). Excel-friendly. Default user-facing export. Submittable to NIST OLIR directly.
- SSSOM-TSV: columns match SSSOM spec (subject_id, predicate_id, object_id, mapping_justification, confidence, author_id, mapping_date, comment, etc.). Round-trip-compatible with
sssom-py. Optional academic emission.
Both come from the same in-memory representation; the SSSOM envelope is always validated internally even when the STRM-TSV export is what the user sees. This is the hybrid resolution — STRM-foreground, SSSOM-internal.
OSCAL JSON profile export
Section titled “OSCAL JSON profile export”The third export format. Structured per OSCAL’s profile and mapping assemblies. Round-trip with NIST’s OSCAL toolchain. Documentation in reference/registry/oscal/ (TODO).
§7 Tier 2 sidecar SQL schema (sqlite-wasm projection)
Section titled “§7 Tier 2 sidecar SQL schema (sqlite-wasm projection)”The Tier 2 sidecar is a deletable, recoverable projection of Tier 1 frontmatter into SQLite tables. It exists to enable performant queries (transitive closure, multi-ontology joins, coverage matrices, perspective views) that are awkward over markdown frontmatter.
Recovery property (load-bearing): if .crosswalker.sqlite is missing, corrupted, or stale, the projector rebuilds it from canonical Tier 1 on next vault load. This is what makes Tier 2 risk-free to bundle in v0.1.
Projection rules (Tier 1 → Tier 2)
Section titled “Projection rules (Tier 1 → Tier 2)”The projector (a TypeScript module in packages/core/) executes on vault load:
- Detect missing/stale state: if
.crosswalker.sqlitedoesn’t exist, orschema_meta.projected_atis older than the vault’s most recent Tier 1 mtime, reproject. - Walk ontologies: for each
ImportRecipein the vault, insert/upsert intoontologies. - Walk control notes: for each
.mdfile with_crosswalker.control_idset, insert/upsert intocontrols. - Extract crosswalk edges: for each control note, parse STRM predicate frontmatter (
is_equivalent_to, etc.) and insert onemappingsrow per wikilink target. Object form (with metadata) is parsed into the SSSOM envelope columns. - Walk junction notes: for each
.mdfile withlink_type: evidence_link, insert/upsert intojunction_notes. - Invalidate closure cache if any
mappingsrow changed.
Projection is idempotent. Re-running on an unchanged vault is a no-op (mtime check + content hash).
Schema-version migration
Section titled “Schema-version migration”When the Tier 2 SQL schema bumps from tier2-sqlite-v1 to tier2-sqlite-v2:
- Simplest path: drop the .sqlite, reproject from canonical Tier 1.
- Faster path: per-version
ALTER TABLEmigration. - v0.1 ships only
tier2-sqlite-v1; the migration mechanism is documented but not exercised yet.
§8 Cross-schema invariants & constraints
Section titled “§8 Cross-schema invariants & constraints”These are the rules the projector and the import wizard both have to respect:
- Tier 1 is canonical: every Tier 2 column has a Tier 1 source. No data lives only in Tier 2.
- Frontmatter is flat-scalar or wikilink: enforces Bases-queryability of the Tier 1 path. Object-form metadata on STRM predicates is permitted because it round-trips through SSSOM; the projector flattens it into Tier 2 SSSOM-envelope columns.
_crosswalkeris additive: never overwrites user-authored top-level frontmatter.- Mandatory junction note fields:
link_type,evidence,control,ontology,status. Plugin generation refuses to create incomplete notes. - STRM predicates are the only crosswalk vocabulary: user-facing crosswalk frontmatter keys MUST be one of the 5 STRM predicates (or their
_NOTmodifier form). SKOS predicates rejected as base vocab; SSSOM is internal validation only. - Schema versions on every artifact: every persisted schema (ImportRecipe,
_crosswalker, junction note, Tier 2 SQL) carries aschema_versionfor migration. “Crosswalker eats own dog food.” - Composite IDs in Tier 2:
subject_id/object_idinmappingsare ontology-qualified (nist-800-53-r5/AC-2) to enable cross-ontology queries without ID collision. - Source-hash provenance: every projected row carries
source_hash(sha256 of the originating Tier 1 byte content) so re-import detects changes byte-accurately.
§9 Build sequence
Section titled “§9 Build sequence”The four schemas land in this order in v0.1 implementation:
_crosswalkermetadata v2 — extend existingsrc/generation/generation-engine.tsto write the v2 fields. Lowest-risk: this is just a frontmatter shape change.- Junction note 13-field schema — implement junction-note generation as a new code path in the generation engine. Triggered by ImportRecipe with crosswalks that target evidence documents.
- ImportRecipe v2 — refactor
src/types/config.tsto the v2 shape. Migration script for existing v1 saved configs. - Tier 2 sidecar SQL projector — new module in
packages/core/. Auto-runs on vault load. Lazy closure cache + sqlite-vec embedding integration.
Each step has a corresponding test:
- Round-trip test: write Tier 1 frontmatter, project to Tier 2 SQL, query, confirm semantic equivalence to source data.
- Idempotency test: project twice; second projection is a no-op.
- Recovery test: delete the .sqlite, reload, confirm reprojection produces byte-identical state.
§10 Open sub-decisions (deferred from this spec)
Section titled “§10 Open sub-decisions (deferred from this spec)”Items flagged in Ch 07’s “remaining open sub-decisions” and elsewhere that this spec does not lock in:
- UUID enterprise resilience scope — Ch 09 settled UUIDv7 + sha256 + CURIEs; the enterprise scope (cross-tenant ID stability) needs a separate design pass.
- Multi-editor conflict resolution — when two users edit the same junction note simultaneously, what does the merge look like? Out of v0.1 scope; v1.0+ collaboration story.
- Inline-Dataview migration script for v0 vaults — for users who have notes with the legacy
key:: valuesyntax. Out of v0.1 scope. - Tested vault scale threshold documentation — Ch 18 gives a model; need empirical confirmation in real vaults of varying size.
- OSCAL ↔ Crosswalker mental-model documentation —
reference/registry/oscal/page extension. Listed in roadmap Foundation tasks. StewardshipProfilev2 schema — separate from this spec but referenced viaImportRecipe.stewardship_profile_id. Per the 05-01 commitments log, the schema design pass is deferred until after v0.1 schemas land.
§11 Related
Section titled “§11 Related”- Decision logs:
- v0.1 stack-pivot log — the stack commitment this spec implements
- Evidence-link edge model synthesis (Ch 07) — junction note schema source
- 04-10 Foundation research synthesis — STRM + SSSOM crosswalk edge semantics
- 05-01 Foundation commitments + meta-schema lifecycle — “schemas are versioned, migration-aware”
- Research deliverables:
- Ch 09 deliverable: identifier strategy
- Ch 18 deliverable: Tier 2-Lite scale ceiling — informs Tier 2 SQL design
- Earlier design pages (background, partly superseded by this unified spec):
- Config schema design — original ImportRecipe draft
- Constraint enforcement & metadata tiers — original
_crosswalkermetadata tiers
- External references:
- STRM (NIST IR 8477) — predicate vocabulary
- SSSOM spec — internal validation envelope
- OSCAL — junction-note isomorphism (
by-component) - Datalog glossary — context for query layers above this schema