v0.1.6 Phase 2 shipped — SSSOM TSV import + materialized closure precompute
What shipped
Section titled “What shipped”v0.1.6 Phase 2 — SSSOM TSV import + materialized closure precompute. Mid-milestone phase; v0.1.6 still has Phases 3–5 pending. Phase 2 is the data-ingestion foundation everything downstream queries against.
| Surface | Delivered |
|---|---|
src/import/sssom-parser.ts | TSV parser per SSSOM 0.15+ spec. Handles # -prefixed YAML-shaped headers (curie_map, mapping_set_id, license, subject_source, object_source, etc.), required cols (subject_id/predicate_id/object_id), optional cols (subject_label, object_label, mapping_justification, confidence, mapping_provider, mapping_set_id), CURIE-prefix-based ontology-pair detection. |
src/import/sssom-importer.ts | Orchestrator: parse → SKOS→STRM predicate normalization → synthetic crosswalk-edge recipe → generateFromRecipe → Tier 2 projection → eager closure precompute. Idempotent re-imports via overwriteMode: 'replace'. |
src/import/sssom-import-modal.ts | Modal UX: file picker (vault .tsv/.sssom.tsv) OR paste-TSV; preview row count + detected ontology pair + warnings; confirm → execute. |
src/tier2/queries.ts | New precomputeClosureForOntologyPair(db, source, target, predicate?) — eagerly populates closure_cache for the imported pair. |
src/main.ts | New Crosswalker: Import SSSOM mapping file command + plugin.precomputeClosure(source, target, predicate?) handle. |
tools/fixtures/synthetic/nist-csf-to-iso27001.sssom.tsv | Test fixture; 11 mappings covering all 5 SKOS predicates + curie_map header + mapping_set_id. |
tests/sssom-parser.test.ts | 19 unit tests (happy path, error paths, ontology-pair detection, CURIE prefix). |
tests/sssom-importer.test.ts | 6 integration tests with mock vault (full round-trip; error paths; option overrides). |
TEST_PHASE2_SSSOM_IMPORT.md | Manual test guide; 7 scenarios. |
Test coverage: 164/164 tests pass (added 25 new for SSSOM). Build clean. Fixture-drift CI gate clean.
System-design integration
Section titled “System-design integration”Phase 2 plugs into the canonical Crosswalker pipeline at the import side of the data flow, populating Tier 1 + Tier 2 surfaces that Phase 3 (crosswalkerPivot Bases view) will read from:
Reuse: Phase 2 composes existing v0.1.5 P3 + v0.1.4 infrastructure rather than introducing new schema:
mappingstable — already SSSOM-shaped from v0.1.5 P3; importer just feeds itclosure_cache— already lazy-built byclosureFromConcept; eager precompute reuses that path per-subjectgenerateFromRecipe— already handleskind: 'crosswalk-edge'from existing crosswalk recipesfrontmatter-merge— managed/user_preserve semantics for idempotent re-imports
The SSSOM importer is mostly orchestration glue + a small predicate-normalization table, not new substrate logic. Per Ch 35 substrate-neutrality audit checklist.
Implementation decisions made during Phase 2
Section titled “Implementation decisions made during Phase 2”These weren’t pre-locked in the synthesis log; they emerged at coding time. Documented here so future agents understand why the code looks the way it does.
Decision 1 — SKOS → STRM predicate normalization table
Section titled “Decision 1 — SKOS → STRM predicate normalization table”Context: Tier 1 schema requires STRM predicates (is_equivalent_to, is_approximate_to, is_broader_than, is_narrower_than, intersects_with, no_relationship) on crosswalk-edge frontmatter. SSSOM TSVs use SKOS predicates (skos:exactMatch, skos:closeMatch, skos:broadMatch, skos:narrowMatch, skos:relatedMatch). They don’t line up 1:1 by string but DO by semantic.
Decision: hardcode the mapping table in src/import/sssom-importer.ts:
| SSSOM/SKOS predicate | STRM predicate_id | Rationale |
|---|---|---|
skos:exactMatch | is_equivalent_to | ”Perfect synonym” semantics aligned |
skos:closeMatch | is_approximate_to | ”Near-synonym, exchangeable in many contexts” |
skos:broadMatch | is_broader_than | Subject broader than object |
skos:narrowMatch | is_narrower_than | Subject narrower than object |
skos:relatedMatch | intersects_with | Overlapping concepts |
| (unknown) | intersects_with (with warning) | Permissive fallback; warns the user |
Preservation: original SSSOM predicate goes into sssom_predicate frontmatter so no information is lost. Round-trip-safe.
Why hardcode (vs config-file mapping): only 5 SKOS predicates; the mapping is canonical per SKOS Mapping Properties spec; making it user-configurable would invite inconsistency across vaults. Move to config in v0.2+ if a real user needs it.
Decision 2 — match_confidence numeric coercion deferred to v0.1.7+
Section titled “Decision 2 — match_confidence numeric coercion deferred to v0.1.7+”Context: Tier 1 schema requires match_confidence as { "type": "number", "minimum": 0, "maximum": 1 }. SSSOM confidence column comes in as a string, parsed to number by parseFloat in the parser. But the render template engine (src/render/template.ts) emits ALL frontmatter values as strings — so match_confidence: "0.85" would fail Tier 1 validation.
Decision: drop match_confidence from the synthetic recipe’s managed frontmatter for Phase 2. SSSOM confidence value is preserved as sssom_confidence (a non-validated extra field, stays a string). Tier 1 schema’s match_confidence field stays optional and unused for SSSOM-imported edges in Phase 2.
Follow-up trigger: when render template engine gets numeric coercion (templates emit numbers when the source value is a number), flip sssom_confidence → match_confidence and the field becomes Tier 1-validated. Tracked as v0.1.7+ work in TEST_PHASE2_SSSOM_IMPORT.md “known limitations”.
Decision 3 — Synthetic recipe pattern (vs shipped SSSOM recipe template)
Section titled “Decision 3 — Synthetic recipe pattern (vs shipped SSSOM recipe template)”Context: SSSOM imports need a recipe to drive generateFromRecipe. Two options:
- A. Ship a pre-built recipe at
recipes/sssom/sssom-default.jsonthat users copy + customize per ontology pair - B. Build the recipe in-memory from the parsed SSSOM (the importer constructs a
Recipeobject from the detected source/target ontology pair)
Decision: option B (in-memory synthetic recipe in buildSyntheticRecipe(source, target) inside sssom-importer.ts). Reasons:
- SSSOM imports are mostly mechanical — the recipe is fully derivable from the SSSOM header (
subject_source/object_source) + content (CURIE prefixes); no per-import customization needed - Keeps SSSOM-handling code in the importer module rather than scattering it across recipe templates
- Future tweaks to the SSSOM-import recipe shape don’t require users to migrate their copy of the recipe — just rebuild
Trade-off: users who want a CUSTOM crosswalk-edge recipe (e.g., different folder structure, different managed fields) can still author one and import via the existing runImportFromRecipe path; the SSSOM modal is the convention-over-configuration default, not the only option.
Decision 4 — Eager closure precompute reuses lazy closureFromConcept
Section titled “Decision 4 — Eager closure precompute reuses lazy closureFromConcept”Context: Per Ch 35: “every production ontology-web system materializes precomputed pairwise crosswalks — nobody computes graph→tabular projections at query time.” So Phase 2 needs eager precompute on import. But v0.1.5 P3 already shipped a lazy closure path via closureFromConcept that builds and caches on first call.
Decision: precomputeClosureForOntologyPair in src/tier2/queries.ts doesn’t introduce a new closure-build path. It (a) selects all distinct subject_ids in the imported ontology pair, (b) loops over them calling closureFromConcept (which lazy-builds + caches), (c) reports the count of cache rows now populated.
Why this works: closureFromConcept is idempotent (cache-checks before recomputing). Calling it eagerly at import time just front-loads what would otherwise happen on first user query. Cleanest possible eager precompute — no schema change, no new code path, no logic duplication.
Trade-off: per-subject loop is O(N) calls where N = distinct subjects. For a 1M-mapping import, that’s 1M cache lookups + recursive-CTE walks. Acceptable for v0.1.6 (target scale: ≤100K mappings per Ch 37 Yellow tier). v0.1.7’s per-ontology partitioning + bounded LRU closure cache (per Ch 37 deliverables) will replace this with a more efficient batched precompute when scale demands it.
What Phase 2 unblocks
Section titled “What Phase 2 unblocks”| Cascade | What |
|---|---|
Phase 3 (crosswalkerPivot Bases view) | Real SSSOM-imported data exists for the view to render against. The reference .base file shipped in Phase 3 can target _crosswalker/mappings/csf-to-iso27001/ directly; the launch-market Coverage Matrix recipe can be tested end-to-end. |
| Phase 4 (recipe-picker UX) | The SSSOM import command + modal pattern is the precedent for the recipe-picker modal — same Modal class, same setting-builder pattern. |
| Phase 5 (materialization command) | _crosswalker/mappings/ folder convention is set; _crosswalker/audit/ follows the same prefix. |
| v0.1.7 (per-ontology partitioning + scale work) | The eager closure precompute is the simple version; v0.1.7 swaps for a batched per-ontology precompute path. |
| v0.1.8 (audit trail) | Junction notes from SSSOM imports are committed to git as the audit-truth source per CQRS commitment #7. The materialized concept_closure is rebuildable cache; audit attestation hashes the junction notes, not the cache. |
Open follow-ups (tracked, not blocking)
Section titled “Open follow-ups (tracked, not blocking)”- Numeric template coercion: render template engine should emit numbers when source value is numeric. Then flip
sssom_confidence(string) →match_confidence(number, Tier 1-validated). v0.1.7+ work. E2E env diagnosis— RESOLVED. Backfilled 2026-05-10: WebdriverIO + wdio-obsidian-service runs against real Obsidian 1.12.7.tests/e2e/sssom-import.spec.tsships 7 real E2E tests verifying command registration,plugin.precomputeClosurehandle, TSV → 5 junction notes round-trip, STRM normalization in frontmatter, Tier 2 mappings table population, closure_cache eager-precompute.bun run e2econfirms 17/17 spec files pass (full suite, ~18 min sequential run).- Incremental refresh on SSSOM file change: Phase 2 idempotent re-import handles this manually (user re-runs import); a file watcher that auto-detects
.sssom.tsvchanges is v0.1.7 work per the milestone page deferral. match_confidencenumeric: see follow-up #1.- CURIE-prefix-vs-header source/target precedence: importer uses header
subject_source/object_sourcefirst, falls back to first-row CURIE prefix. Documented insrc/import/sssom-parser.tsdetectOntologyPair. Edge case: if the two disagree, header wins. Not exercised by current tests; add fixture in v0.1.7 if any user reports an ambiguous case.
Related
Section titled “Related”Decision logs that drove Phase 2:
- Synthesis log §1 + §3.5 — D1 “Ch 35 nuance” lock — locked 2026-05-09 per user direction “might as well do it earlier”; SSSOM TSV import + materialized closure-table promoted from v0.1.8 to v0.1.6
- Settled #2 + #16 — Tier 3 SPARQL stack reframe (Oxigraph + Fuseki); Phase 2 stays in Tier 1+2 per CQRS
Research deliverable:
- Ch 35 — Graph→tabular bridging rerun (Ch 10) — every production ontology-web system materializes precomputed pairwise crosswalks; nobody computes N×N at query time; Crosswalker should not either. Ch 35 §6 “five non-GRC ontology-web archetypes all reduce to existing helpers” → Phase 2 confirms (the importer reuses
crosswalkBetween+closureFromConceptwithout new primitives).
Concept pages referenced in code:
- System architecture Layer 3 (Projection T1 → T2) — where Phase 2 lives in the pipeline
- Hierarchy primitives — junction-note shape + crosswalk-edge model
- Terminology — STRM — STRM predicate vocabulary the SKOS→STRM normalization targets
Prior phases:
- v0.1.6 Phase 1 (recipe
query:block schema) — schema bump that Phases 2-5 build on - v0.1.6 Phase 1.5 (test infrastructure) — deterministic fixtures + drift CI gate; Phase 2 fixture follows the deterministic-timestamp convention
Adjacent milestones:
- v0.1.5 — Tier 2 sidecar — Phase 2 reuses the projector + query helpers + closure cache shipped in v0.1.5 P3
- v0.1.6 milestone hub — Phase 3 next
- v0.1.7 — Exporters — SSSOM-shaped data Phase 2 produces is what v0.1.7 exporters serialize back out
Spec files:
spec/recipe.schema.json—query:block (Phase 1) + crosswalk-edge layout (Phase 2 reuses)spec/tier1.schema.json— STRM predicate enum that the SKOS→STRM normalization targets
Manual test guide:
TEST_PHASE2_SSSOM_IMPORT.md— 7 scenarios for end-user verification