Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

v0.1.6 Phase 2 shipped — SSSOM TSV import + materialized closure precompute

Created Updated

v0.1.6 Phase 2 — SSSOM TSV import + materialized closure precompute. Mid-milestone phase; v0.1.6 still has Phases 3–5 pending. Phase 2 is the data-ingestion foundation everything downstream queries against.

SurfaceDelivered
src/import/sssom-parser.tsTSV parser per SSSOM 0.15+ spec. Handles # -prefixed YAML-shaped headers (curie_map, mapping_set_id, license, subject_source, object_source, etc.), required cols (subject_id/predicate_id/object_id), optional cols (subject_label, object_label, mapping_justification, confidence, mapping_provider, mapping_set_id), CURIE-prefix-based ontology-pair detection.
src/import/sssom-importer.tsOrchestrator: parse → SKOS→STRM predicate normalization → synthetic crosswalk-edge recipe → generateFromRecipe → Tier 2 projection → eager closure precompute. Idempotent re-imports via overwriteMode: 'replace'.
src/import/sssom-import-modal.tsModal UX: file picker (vault .tsv/.sssom.tsv) OR paste-TSV; preview row count + detected ontology pair + warnings; confirm → execute.
src/tier2/queries.tsNew precomputeClosureForOntologyPair(db, source, target, predicate?) — eagerly populates closure_cache for the imported pair.
src/main.tsNew Crosswalker: Import SSSOM mapping file command + plugin.precomputeClosure(source, target, predicate?) handle.
tools/fixtures/synthetic/nist-csf-to-iso27001.sssom.tsvTest fixture; 11 mappings covering all 5 SKOS predicates + curie_map header + mapping_set_id.
tests/sssom-parser.test.ts19 unit tests (happy path, error paths, ontology-pair detection, CURIE prefix).
tests/sssom-importer.test.ts6 integration tests with mock vault (full round-trip; error paths; option overrides).
TEST_PHASE2_SSSOM_IMPORT.mdManual test guide; 7 scenarios.

Test coverage: 164/164 tests pass (added 25 new for SSSOM). Build clean. Fixture-drift CI gate clean.

Phase 2 plugs into the canonical Crosswalker pipeline at the import side of the data flow, populating Tier 1 + Tier 2 surfaces that Phase 3 (crosswalkerPivot Bases view) will read from:

       INPUT                STORAGE                 PROJECTION              QUERY
   ┌─────────────┐    ┌────────────────────┐    ┌──────────────────┐    ┌────────────────────┐
   │ .sssom.tsv  │ →  │ Tier 1: junction-  │ →  │ Tier 2 sqlite    │ →  │ Bases / future     │
   │ (user file  │    │ edge .md per row   │    │ cache:           │    │ crosswalkerPivot   │
   │  OR paste)  │    │ in _crosswalker/   │    │ • mappings       │    │ view (Phase 3)     │
   │             │    │ mappings/<src>-    │    │ • closure_cache  │    │                    │
   │ THIS PHASE  │    │ to-<tgt>/          │    │ (precomputed)    │    │                    │
   │ Phase 2     │    │ THIS PHASE         │    │ THIS PHASE       │    │                    │
   └─────────────┘    └────────────────────┘    └──────────────────┘    └────────────────────┘

                                                        Phase 3 unblocked: real data to query

Reuse: Phase 2 composes existing v0.1.5 P3 + v0.1.4 infrastructure rather than introducing new schema:

  • mappings table — already SSSOM-shaped from v0.1.5 P3; importer just feeds it
  • closure_cache — already lazy-built by closureFromConcept; eager precompute reuses that path per-subject
  • generateFromRecipe — already handles kind: 'crosswalk-edge' from existing crosswalk recipes
  • frontmatter-merge — managed/user_preserve semantics for idempotent re-imports

The SSSOM importer is mostly orchestration glue + a small predicate-normalization table, not new substrate logic. Per Ch 35 substrate-neutrality audit checklist.

Implementation decisions made during Phase 2

Section titled “Implementation decisions made during Phase 2”

These weren’t pre-locked in the synthesis log; they emerged at coding time. Documented here so future agents understand why the code looks the way it does.

Decision 1 — SKOS → STRM predicate normalization table

Section titled “Decision 1 — SKOS → STRM predicate normalization table”

Context: Tier 1 schema requires STRM predicates (is_equivalent_to, is_approximate_to, is_broader_than, is_narrower_than, intersects_with, no_relationship) on crosswalk-edge frontmatter. SSSOM TSVs use SKOS predicates (skos:exactMatch, skos:closeMatch, skos:broadMatch, skos:narrowMatch, skos:relatedMatch). They don’t line up 1:1 by string but DO by semantic.

Decision: hardcode the mapping table in src/import/sssom-importer.ts:

SSSOM/SKOS predicateSTRM predicate_idRationale
skos:exactMatchis_equivalent_to”Perfect synonym” semantics aligned
skos:closeMatchis_approximate_to”Near-synonym, exchangeable in many contexts”
skos:broadMatchis_broader_thanSubject broader than object
skos:narrowMatchis_narrower_thanSubject narrower than object
skos:relatedMatchintersects_withOverlapping concepts
(unknown)intersects_with (with warning)Permissive fallback; warns the user

Preservation: original SSSOM predicate goes into sssom_predicate frontmatter so no information is lost. Round-trip-safe.

Why hardcode (vs config-file mapping): only 5 SKOS predicates; the mapping is canonical per SKOS Mapping Properties spec; making it user-configurable would invite inconsistency across vaults. Move to config in v0.2+ if a real user needs it.

Decision 2 — match_confidence numeric coercion deferred to v0.1.7+

Section titled “Decision 2 — match_confidence numeric coercion deferred to v0.1.7+”

Context: Tier 1 schema requires match_confidence as { "type": "number", "minimum": 0, "maximum": 1 }. SSSOM confidence column comes in as a string, parsed to number by parseFloat in the parser. But the render template engine (src/render/template.ts) emits ALL frontmatter values as strings — so match_confidence: "0.85" would fail Tier 1 validation.

Decision: drop match_confidence from the synthetic recipe’s managed frontmatter for Phase 2. SSSOM confidence value is preserved as sssom_confidence (a non-validated extra field, stays a string). Tier 1 schema’s match_confidence field stays optional and unused for SSSOM-imported edges in Phase 2.

Follow-up trigger: when render template engine gets numeric coercion (templates emit numbers when the source value is a number), flip sssom_confidencematch_confidence and the field becomes Tier 1-validated. Tracked as v0.1.7+ work in TEST_PHASE2_SSSOM_IMPORT.md “known limitations”.

Decision 3 — Synthetic recipe pattern (vs shipped SSSOM recipe template)

Section titled “Decision 3 — Synthetic recipe pattern (vs shipped SSSOM recipe template)”

Context: SSSOM imports need a recipe to drive generateFromRecipe. Two options:

  • A. Ship a pre-built recipe at recipes/sssom/sssom-default.json that users copy + customize per ontology pair
  • B. Build the recipe in-memory from the parsed SSSOM (the importer constructs a Recipe object from the detected source/target ontology pair)

Decision: option B (in-memory synthetic recipe in buildSyntheticRecipe(source, target) inside sssom-importer.ts). Reasons:

  1. SSSOM imports are mostly mechanical — the recipe is fully derivable from the SSSOM header (subject_source / object_source) + content (CURIE prefixes); no per-import customization needed
  2. Keeps SSSOM-handling code in the importer module rather than scattering it across recipe templates
  3. Future tweaks to the SSSOM-import recipe shape don’t require users to migrate their copy of the recipe — just rebuild

Trade-off: users who want a CUSTOM crosswalk-edge recipe (e.g., different folder structure, different managed fields) can still author one and import via the existing runImportFromRecipe path; the SSSOM modal is the convention-over-configuration default, not the only option.

Decision 4 — Eager closure precompute reuses lazy closureFromConcept

Section titled “Decision 4 — Eager closure precompute reuses lazy closureFromConcept”

Context: Per Ch 35: “every production ontology-web system materializes precomputed pairwise crosswalks — nobody computes graph→tabular projections at query time.” So Phase 2 needs eager precompute on import. But v0.1.5 P3 already shipped a lazy closure path via closureFromConcept that builds and caches on first call.

Decision: precomputeClosureForOntologyPair in src/tier2/queries.ts doesn’t introduce a new closure-build path. It (a) selects all distinct subject_ids in the imported ontology pair, (b) loops over them calling closureFromConcept (which lazy-builds + caches), (c) reports the count of cache rows now populated.

Why this works: closureFromConcept is idempotent (cache-checks before recomputing). Calling it eagerly at import time just front-loads what would otherwise happen on first user query. Cleanest possible eager precompute — no schema change, no new code path, no logic duplication.

Trade-off: per-subject loop is O(N) calls where N = distinct subjects. For a 1M-mapping import, that’s 1M cache lookups + recursive-CTE walks. Acceptable for v0.1.6 (target scale: ≤100K mappings per Ch 37 Yellow tier). v0.1.7’s per-ontology partitioning + bounded LRU closure cache (per Ch 37 deliverables) will replace this with a more efficient batched precompute when scale demands it.

CascadeWhat
Phase 3 (crosswalkerPivot Bases view)Real SSSOM-imported data exists for the view to render against. The reference .base file shipped in Phase 3 can target _crosswalker/mappings/csf-to-iso27001/ directly; the launch-market Coverage Matrix recipe can be tested end-to-end.
Phase 4 (recipe-picker UX)The SSSOM import command + modal pattern is the precedent for the recipe-picker modal — same Modal class, same setting-builder pattern.
Phase 5 (materialization command)_crosswalker/mappings/ folder convention is set; _crosswalker/audit/ follows the same prefix.
v0.1.7 (per-ontology partitioning + scale work)The eager closure precompute is the simple version; v0.1.7 swaps for a batched per-ontology precompute path.
v0.1.8 (audit trail)Junction notes from SSSOM imports are committed to git as the audit-truth source per CQRS commitment #7. The materialized concept_closure is rebuildable cache; audit attestation hashes the junction notes, not the cache.
  • Numeric template coercion: render template engine should emit numbers when source value is numeric. Then flip sssom_confidence (string) → match_confidence (number, Tier 1-validated). v0.1.7+ work.
  • E2E env diagnosis — RESOLVED. Backfilled 2026-05-10: WebdriverIO + wdio-obsidian-service runs against real Obsidian 1.12.7. tests/e2e/sssom-import.spec.ts ships 7 real E2E tests verifying command registration, plugin.precomputeClosure handle, TSV → 5 junction notes round-trip, STRM normalization in frontmatter, Tier 2 mappings table population, closure_cache eager-precompute. bun run e2e confirms 17/17 spec files pass (full suite, ~18 min sequential run).
  • Incremental refresh on SSSOM file change: Phase 2 idempotent re-import handles this manually (user re-runs import); a file watcher that auto-detects .sssom.tsv changes is v0.1.7 work per the milestone page deferral.
  • match_confidence numeric: see follow-up #1.
  • CURIE-prefix-vs-header source/target precedence: importer uses header subject_source/object_source first, falls back to first-row CURIE prefix. Documented in src/import/sssom-parser.ts detectOntologyPair. Edge case: if the two disagree, header wins. Not exercised by current tests; add fixture in v0.1.7 if any user reports an ambiguous case.

Decision logs that drove Phase 2:

Research deliverable:

  • Ch 35 — Graph→tabular bridging rerun (Ch 10) — every production ontology-web system materializes precomputed pairwise crosswalks; nobody computes N×N at query time; Crosswalker should not either. Ch 35 §6 “five non-GRC ontology-web archetypes all reduce to existing helpers” → Phase 2 confirms (the importer reuses crosswalkBetween + closureFromConcept without new primitives).

Concept pages referenced in code:

Prior phases:

Adjacent milestones:

Spec files:

Manual test guide: