Foundation research synthesis — testing our decisions against 4 parallel research sessions
Why this log exists
Section titled “Why this log exists”The project is in its Foundation phase, and decisions made here calcify fast. Before we start pouring concrete we sent four fresh research agents at the current architecture with no prior context bias, each with a different angle:
- First-principles roadmap critique — stress-test the technical roadmap against known problem-domain literature
- Formal primitives audit (the long one, ~94KB) — ground our primitives in information science, category theory, BFO, FCA, SSSOM, OAEI literature
- Resilient knowledge work primitives — what makes a knowledge structure last 20+ years
- Accessibility pass — same rigor, but explicitly asked to keep it anchored to things a non-specialist can hold onto
All four sit in .workspace/ (local, gitignored). This log is the distillation — what converged, what pushed back, what new concepts showed up, and what we now need to decide.
What we tested against
Section titled “What we tested against”The Foundation decisions under review:
- The 9 ontology diff primitives — see atomic operations research (6 GED atoms refined to 9, with 4 composites)
- The EvolutionPattern taxonomy — see evolution pattern draft
- File-first / progressive tiers — see why Obsidian, why files and layered architecture vision
- User-first entity-aligned maintenance UX — see user-first ontology maintenance
- Pluggable detection & decisioning layers — see primitives depth and pluggable layers
Convergences — where the research agrees with Foundation
Section titled “Convergences — where the research agrees with Foundation”The 9 primitives are sound
Section titled “The 9 primitives are sound”All four research sessions, independently, converged on the same answer: the 6 mathematical atoms of Graph Edit Distance theory (Bunke & Allermann 1983, Sanfeliu & Fu 1983) are provably complete for transforming any labeled directed graph into any other — and our 9-primitive set is a valid refinement of those 6, splitting the two “substitution” operations into semantically distinct sub-cases (id change vs. property change vs. type change).
Plain-English anchor: If you had a Lego model and wanted to turn it into a different Lego model, the only things you can do are add a brick, remove a brick, swap a brick, or change how two bricks are connected. That’s it. The math guarantees those six moves are enough. We kept all six, then split “swap a brick” into three kinds because renaming a control (id change), editing its description (property change), and reclassifying it from detective to preventive (type change) mean very different things to a compliance user, even though the math doesn’t care.
The 4 composite operations (node.moved, subgraph.merged, subgraph.split, hierarchy.restructured) also got formal backing: Klein (2004) calls these “complex changes” and proved the set is infinite (you can always define new compositions). Our decision to recognize only 4 and keep the list open is formally correct.
Also confirmed: The property graph model (nodes + edges + properties on both) is the right formal substrate. Unlike RDF’s triple model, property graphs allow metadata directly on edges — essential for crosswalks where a mapping carries confidence scores, rationale, and provenance. The new ISO GQL standard (ISO/IEC 39075:2024) enshrines property graphs as the international standard. This is Lindy-compatible.
File-first is valid at current scale
Section titled “File-first is valid at current scale”The resilient-knowledge-work research found that sixty years of information science converge on the same architecture when the criteria are human readability, version controllability, extensibility, and multi-decade resilience:
-
Ranganathan’s faceted classification (Colon Classification, 1933) — the “break it into independent tags” idea. The point of this bullet: our YAML frontmatter structure inherited a 90-year-old library-science property that makes it safe to extend forever without reorganizing. Here’s why that matters:
- The old way (rigid trees): pick one giant pre-made category tree and force every item into exactly one slot. “19th-century French landscape painting” becomes a single drawer you have to find. If you later decide you care about the artist’s mood, you have to rebuild the tree — every drawer needs a new subdivision, and all your old cross-references break.
- Ranganathan’s way (independent labels): break the same item into small independent labels and combine them at query time:
era: 19th century+country: France+medium: painting+subject: landscape. Now you can ask for “all French things,” “all landscapes,” or any combination on the fly. - The key property: adding a new label later never touches the old labels. If next week you want to track
mood: melancholy, you just start writing it on new items. Nothing reshuffles. Old queries keep working unchanged because they never looked atmoodin the first place. Ranganathan called this hospitality — the system is “hospitable” to new information. - Why this matters for Crosswalker: YAML frontmatter keys are facets. Every
type:,framework:,control_id:,reviewer:field on a note is an independent facet. When we want to track a new property on controls (say,quantum-resistant: truein 2029), we just start adding it. No schema migration, no reshuffling folders, no breaking existing queries. - And to address the obvious question: this is not the same as “deprecate but never delete” (which is Protobuf’s schema-evolution discipline for binary wire formats). Hospitality is a step more fundamental — because new facets can’t collide with old ones, there’s nothing to deprecate in the first place. You just grow the vocabulary. The old fields aren’t “legacy kept alive for compatibility”; they’re simply facets you still happen to use.
This is the deepest reason the markdown + YAML file-first architecture is likely to age well — we got a 90-year-old library-science property for free just by picking plain text files with key-value frontmatter.
-
Luhmann’s Zettelkasten — 90,000+ file-per-node cards maintained for 30 years, produced 70+ books. File-first scales to serious intellectual work without a database.
-
Markdown + YAML + WikiLinks vs OWL+RDF+SPARQL — the “simple plain-text stack vs. formal Semantic Web stack” tradeoff. Richard Gabriel’s “Worse is Better” (1991) predicted exactly this outcome:
- The “worse” stack wins on adoption. Markdown notes with YAML frontmatter and
[[wiki links]]is technically less expressive than OWL+RDF+SPARQL (no Description Logic reasoner, no SPARQL query engine, no formal axioms). But it’s radically simpler to implement, adopt, and read. Obsidian has 1.5M+ monthly active users; formal ontology tooling sits at roughly 27% production deployment even among organizations that bought into the Semantic Web. This is exactly why we picked Obsidian + files in the first place — adoption and human-readability beat formal expressiveness in practice. - We don’t lose the formal stack — we build a bridge when we need it. The research flagged two mature bridge technologies: YAML-LD (W3C Community Group Final Report, December 2023) defines conventions for serializing Linked Data as YAML on top of JSON-LD syntax, and LinkML provides a single YAML schema language that compiles to JSON Schema, OWL, SHACL, and SQL DDL. So if we ever need to export a vault to RDF for interoperability with institutional ontology tooling, or hand a SHACL schema to a compliance auditor, the path exists — without abandoning files as source-of-truth. See the new formal concepts reference table below for pointers.
- The deeper design principle. This is the same “worse is better” bet Crosswalker makes everywhere: human-editable files first, machine-queryable layers built on top, formal-export bridges reserved for the rare moments they’re actually needed. See what makes Crosswalker unique for the philosophical pillar this connects to, and the property graph callout above for how the same logic extends the claim all the way down to storage (files canonical + Tier 2 sidecar for edge-property queries).
- The “worse” stack wins on adoption. Markdown notes with YAML frontmatter and
Pluggable layers match the research
Section titled “Pluggable layers match the research”Our “pluggable detection + pluggable decisioning” separation (see primitives depth log) lines up with Javed, Abgaz & Pahl’s (2009) four-layer change operator model: atomic operations → composite operations → domain-specific patterns → domain-specific complex patterns. Our “math atoms → detectors → decisioners → handlers” stack is the same structure under different names. Good sign.
Challenges — where the research pushes back
Section titled “Challenges — where the research pushes back”Files-canonical ceiling — documented, already answered by the 3-tier pillar
Section titled “Files-canonical ceiling — documented, already answered by the 3-tier pillar”The first-principles critique confirmed a ceiling we’d already identified: files-as-source-of-truth is strategically correct, but Tier 1 (pure files + in-vault queries) has finite headroom. This is not a surprise finding and not a crisis — it’s the exact reason the Progressive tier architecture is a Foundation pillar on the roadmap. We’re documenting the ceiling here so the research on record agrees with our existing plan.
What the ceilings look like:
- Obsidian’s graph view caps usefully around 25K nodes
- The V8 engine inside Electron caps at ~4 GB heap
- A full NIST-800-53 × CIS × ISO × MITRE crosswalk is ~210,000 potential mapping pairs to evaluate
- Any in-vault query layer doing linear scans over YAML frontmatter degrades around 3–5K notes (the research literature cites this as a Dataview number — see the expandable note below on why that’s a misleading framing for us)
Note: we are not building on Dataview — click to expand (recurring research-agent confusion)
The underlying research reports all cite Dataview’s 3–5K note ceiling as the “Obsidian query layer” limit. Crosswalker is not building on Dataview. Dataview is deprecated, and we explicitly chose not to take on that dependency.
The roadmap’s Obsidian Bases direction research Foundation item captures the actual plan: build the viewing/querying layer on top of Obsidian Bases (the native successor to Dataview-style queries), with Datacore as a backup to investigate if Bases turns out to be insufficient.
What this means for the ceiling claim: the 3–5K note figure is a reasonable first estimate for any in-vault query engine doing linear scans over YAML frontmatter — Bases is likely to face similar orders-of-magnitude limits until we do our own benchmarks against it. The Tier 2 sql.js sidecar argument stands either way (it rescues query performance at scale regardless of which viewing layer we pick). The specific number should be re-measured against Bases as part of the Foundation research item, not inherited from Dataview benchmarks.
Research items to reconcile: Obsidian internals research (04-04) and the roadmap’s Bases direction item.
Plain-English anchor: Think of a filing cabinet. At 500 folders it’s fine. At 5,000 it’s still workable if you have an index card drawer. At 50,000 you’re overturning the cabinet every time you want to find something. At that point you need a librarian sitting beside the cabinet who keeps a notebook of where everything is — that’s Tier 2. The files are still the source of truth, but the librarian makes them queryable. Our current roadmap has Tier 2 as “when needed” — the research says we need to commit to when it activates, because the ceiling is closer than it looks.
This is not a new challenge — it’s the exact ceiling the Progressive tier architecture pillar is committed to handling. Files stay canonical at all three tiers; what changes is the machine-queryable index layer sitting beside them: Tier 1 (files only + validation) → Tier 2 (files + sql.js WASM sidecar for property-graph queries) → Tier 3 (files + server: PocketBase / Postgres). The research’s contribution is narrower than “we need a new plan” — it’s that Tier 2’s activation threshold should be explicit and designed rather than emergent. See the Decisions this forces section below for the specific Tier 2 cutoff question, and the property graph callout above for why the sidecar is also what makes first-class edge properties practically queryable.
The synthetic spine is the missing architectural insight
Section titled “The synthetic spine is the missing architectural insight”This is the single most transformational finding in all four research sessions. It showed up independently in three of them:
Every mature compliance meta-framework — SCF (~1,300 controls across 175+ frameworks), the DESM (Data Exchange Standards Mapper), Hyperproof’s topic-based mapping, UCF’s ~10,000 Common Controls — converges on the same architecture: instead of maintaining O(n²) pairwise mappings between every framework pair, each framework maps once to a canonical intermediate representation (the “spine”), and cross-framework mappings are derived transitively through the spine.
Plain-English anchor: Imagine translating between every pair of 50 languages. You’d need 1,225 dictionaries. Or — you pick one pivot language (say, a simplified Esperanto), translate each of the 50 into it once, and now to translate from Korean to Swahili you just go Korean → pivot → Swahili. You need only 50 dictionaries instead of 1,225. The research says: pick a pivot for compliance controls. Candidates include SCF’s ~1,300 canonical controls, an OSCAL-based catalog, or a Crosswalker-authored canonical. Category theory backs this formally — alignments are “spans”, merging through a pivot is the “pushout” construction, and it’s provably the best-possible structure-preserving transformation (Spivak’s functorial data model).
This is a concept we don’t currently have a formal position on. The roadmap talks about crosswalks as pairwise edges. The research is saying: that’s the O(n²) trap, and there’s a proven architectural alternative that every mature system has converged on independently. This is the biggest open architectural question surfaced by the research — now tracked as the roadmap item “Pairwise crosswalks vs synthetic spine architecture” and the fresh-agent research challenge 06, which also goes deep on the long-term resilience and audit-grade trustworthiness questions that have to be answered before any spine can be committed to.
Is a synthetic spine the same as a crosswalk? — click to expand
Short answer: no. They’re related but sit at different levels of the stack.
-
A crosswalk is a mapping artifact between two specific ontologies. “NIST 800-53 AC-2 maps to ISO 27001 A.9.2.1 with justification X” is a crosswalk entry. It’s an edge — a statement about two concrete things that already exist. Crosswalks are what Crosswalker is named after and what it produces.
-
A synthetic spine is not a crosswalk — it’s an architectural choice for how you generate and maintain crosswalks at scale. Instead of authoring N×(N-1)/2 direct crosswalks between every pair of frameworks, you author a canonical intermediate ontology (the “spine”), map each framework to the spine once (N mappings instead of N²), and let cross-framework crosswalks be derived transitively through the spine.
The practical difference:
| Pairwise crosswalks | Synthetic spine | |
|---|---|---|
| What you author | A↔B, A↔C, B↔C, … (N² edges) | A→spine, B→spine, C→spine, … (N mappings) |
| When framework A updates | Re-review every crosswalk involving A | Re-review only A’s mapping to the spine |
| Consistency guarantee | Pairs can silently disagree (A→B ∘ B→C ≠ A→C) | Transitivity enforced by construction |
| Example in the wild | Hand-curated NIST↔ISO spreadsheets | SCF (~1,300 controls as pivot), DESM, UCF |
So which does Crosswalker make? Crosswalks — that’s the product. The spine question is how we architect the production of those crosswalks at scale. A vault could contain both: framework-to-spine mappings (efficient to maintain) and transitively-derived framework-to-framework crosswalks (what the user actually queries against).
For more context on where crosswalks, frameworks, interchange formats, and canonical ontologies sit in the broader ecosystem, see the institutional landscape page (who creates, maps, mandates, and consumes each of these) and the operational landscape page (which explicitly distinguishes “frameworks themselves vs crosswalks vs evidence vs interchange formats” as different resource classes with different update cadences).
Edge semantics should be set-theoretic, not SKOS-ish
Section titled “Edge semantics should be set-theoretic, not SKOS-ish”Where this sits in the bigger picture. This is the decision about what vocabulary every Crosswalker crosswalk edge must speak. It’s the ground-level instantiation of the broader interlingua / pivot approach pattern — also called a pivot ontology or meta-framework — where every framework maps once to a canonical intermediate instead of O(n²) pairwise. The mapping organizations that already operate at that scale — SCF with its ~1,300-control STRM bundle, NIST OLIR, CTID — have each committed to a specific edge-type vocabulary. Crosswalker’s question is which one to adopt.
The schema matching concept page already documents most of this terrain — specifically its NIST OLIR formal relationship types section (the 5 set-theory relationships, with domain examples) and its SSSOM section (the metadata envelope). This log’s contribution is not to reinvent that content, it’s to tie those pieces to an explicit Foundation commitment: STRM as the required predicate vocabulary, SSSOM as the required metadata envelope, SKOS rejected as the base (kept only as an export format via the YAML-LD / LinkML bridge).
Right now we don’t have a locked-in vocabulary for crosswalk edge types. The research is unambiguous: SKOS’s 5 mapping relations (exactMatch, closeMatch, broadMatch, narrowMatch, relatedMatch) are insufficient for compliance crosswalking. They lack confidence scores, provenance, many-to-many support, and negation.
The literature converges on 5 set-theory relationships from NIST IR 8477 (February 2024), the SCF’s STRM methodology, and OSCAL’s Control Mapping Model — three independent efforts that picked the same vocabulary:
Plain-English anchor: Forget SKOS’s vague “closeMatch” / “relatedMatch” — think in terms of what the two things actually cover. Equivalent = same requirement. Subset = framework A’s control is a narrower version of framework B’s. Superset = A covers everything B covers and more. Intersects = they overlap but neither contains the other. No-relationship = genuinely unrelated. That’s the whole vocabulary, and it’s the same one that NIST, SCF, and OSCAL’s Control Mapping Model all independently arrived at.
Technically, STRM is the edge-type vocabulary (the 5 allowed values for a crosswalk edge’s predicate_id slot) and SSSOM is the row-schema envelope (the required + optional metadata fields every edge carries alongside that predicate). They sit at different layers and work together, not as alternatives — the edge semantics stack entry in terminology and the schema matching concept page both define this distinction formally. (Expand the collapsible callout below for a worked example with both layers filled in.)
Each ontology-to-ontology edge then carries SSSOM-style metadata (Matentzoglu et al., Database, 2022): mapping_justification (mandatory), confidence score, author_id, mapping_date, mapping_tool, and predicate_modifier for negation. This is the edge model our crosswalks (but not our evidence links) should commit to.
How STRM and SSSOM fit together in practice — they are not alternatives, they are layers (click to expand)
This is the part that gets confusing on first read: STRM is not a replacement for SSSOM, and SSSOM is not a replacement for STRM. They’re two different layers of the same edge, and you use them together.
The envelope/content analogy. Think of a shipping manifest form. The form itself has mandatory fields — sender, recipient, declared contents, weight, date shipped — with rules about which are required. That’s SSSOM: the row schema, the “what fields must every crosswalk carry.” Then one specific field on that form (“declared contents category”) has a controlled vocabulary — electronics, liquid, fragile, perishable. That’s STRM: the allowed values for the predicate_id field only. Without the form, you have nothing to write on. Without the vocabulary, the most important field is free-text slop that can’t be audited.
What a real Crosswalker edge looks like with both layers filled in:
- Every field except
predicate_idis an SSSOM field. SSSOM is predicate-agnostic — it doesn’t care what vocabulary you use for thepredicate_id; it just requires there be one, and that the other envelope fields travel with it for audit. - The
predicate_idvalue itself (strm:subset-of) is from STRM’s 5-relationship vocabulary. If you picked SKOS instead, it would beskos:narrowMatch— same slot, different vocabulary. - Together they give you auditable, precisely-typed crosswalks: STRM gives the compliance auditor confidence that “subset-of” means exactly one thing mathematically; SSSOM gives them the justification, confidence, author, and date needed to trust the assertion.
Where this decision lives:
- Roadmap pillar: Crosswalk edge semantics commitment (STRM + SSSOM) in Foundation phase — this is the commitment we have to lock in before any crosswalk authoring features ship
- Registry entries (canonical facts): SKOS · SSSOM · STRM
- Bigger-picture concept: schema matching — the interlingua / pivot approach shows where these edge types sit in the larger crosswalking workflow
- Terminology: interlingua / pivot entry
- Related research angle: the synthetic spine question in Challenge 06 asks how STRM+SSSOM edges compose when crosswalks are derived transitively through a pivot rather than authored directly — does
subset-of ∘ subset-of = subset-of? That’s the composition math we’ll need to answer
EvolutionPattern needs formal grounding — or replacement
Section titled “EvolutionPattern needs formal grounding — or replacement”The EvolutionPattern taxonomy classifies how a framework evolves (stewardship profile, cadence, backwards compatibility, etc.) so we can set sensible default handling strategies. The research didn’t invalidate it, but it surfaced a sharper question we’d already been circling (see roadmap research item):
EvolutionPattern taxonomy
A per-framework profile: “NIST evolves like X, so expect Y when a new version drops.”
- Works before any new version exists
- Sets defaults for unknown changes
- Portable across frameworks in the same class
- Predictive, so can be wrong
- Needs formal grounding (Stojanovic 2004 evolution ontology, Flouris 11 change tasks)
- Coarse-grained vs. per-version reality
Transformation recipe
A per-version-transition record: “NIST r5→r6 renamed these 12 IDs, merged these 3, split this one.”
- Grounded in actual changes, not prediction
- Auditable — every change has provenance
- Reuses our 9 primitives directly
- Only exists after a new version is released
- Doesn’t help set defaults for first-time imports
- Per-transition labor cost
The research’s verdict: keep EvolutionPattern, but treat it as a default-setter that transformation recipes can override when actual version deltas become available. Both layers coexist — EvolutionPattern sets expectations before a new version; the recipe records what actually happened. And EvolutionPattern itself needs formal grounding in Stojanovic’s (2004) evolution ontology (capture → represent → semantics → implement → propagate → validate) and Flouris et al.’s (2008) 11-task classification. Currently it’s a draft taxonomy with no literature anchor.
New formal concepts the research introduced
Section titled “New formal concepts the research introduced”These are concepts that weren’t in our Foundation vocabulary but the research leans on heavily. Each deserves a short definition now so future logs and concept pages can cite them.
| Concept | Source | One-line definition | Relevance to Crosswalker |
|---|---|---|---|
| Graph Edit Distance (GED) | Bunke & Allermann 1983, Sanfeliu & Fu 1983 | The minimum-cost sequence of atomic graph operations to transform one labeled graph into another | Proves our 9 primitives are complete |
| Algebraic graph transformation / DPO | Ehrig, Prange & Taentzer 2006 | Formalizes graph rewrite rules via Double Pushout category-theoretic constructions | Each of our 9 primitives maps to a DPO production rule |
| Category theory (spans, functors, pushouts) | Spivak, Kent — ologs | Math of structure-preserving mappings between categories; alignments are spans, merges are pushouts | Formal basis for the synthetic spine and transitive crosswalks — see Challenge 06 |
| Formal Concept Analysis (FCA) | Ganter & Wille | Given a cross-table of objects × attributes, computes the unique concept lattice of all co-occurrence patterns | Could auto-discover implicit control equivalences from data — a candidate mechanism for spine distillation in Challenge 06 |
| SSSOM | Matentzoglu et al. 2022 (Database) | Simple Standard for Sharing Ontological Mappings — SKOS predicates + mandatory justification, confidence, author, date, tool | The metadata model our crosswalk edges should adopt — tracked in the edge semantics roadmap item |
| NIST IR 8477 / STRM | NIST Feb 2024 | Set Theory Relationship Mapping: equivalent, subset, superset, intersects, no-relationship | The 5 edge-type primitives for compliance crosswalks — tracked in the edge semantics roadmap item |
| BFO (Basic Formal Ontology) | ISO/IEC 21838-2:2021 | 36-class ISO-standardized upper ontology; baseline for DOD/IC since Jan 2024 | Its continuants/occurrents distinction clarifies “framework as entity” vs. “framework revision as event” — see decision #5 below |
| Stojanovic’s evolution ontology | Stojanovic 2004 (KAON) | Models ontology changes as first-class entities with a six-phase lifecycle | Should ground our EvolutionPattern taxonomy — tracked in the EvolutionPattern vs transformation recipes roadmap item |
| Flouris 11 change tasks | Flouris et al. 2008 | Taxonomy distinguishing evolution, versioning, integration, and alignment as formally separate problems | Clarifies that our “evolution” work is actually four different problems — see user-first maintenance log |
| Ranganathan PMEST / faceted classification | Colon Classification, 1933 | 5 fundamental facets — Personality, Matter, Energy, Space, Time — for decomposing any subject | The deepest root for why YAML frontmatter works as a facet system — see file-first is valid at current scale |
| Synthetic spine / hub-and-spoke mapping | SCF, DESM, UCF convergence | Map each framework to a canonical intermediate, derive cross-framework mappings transitively | The biggest architectural insight — see the synthetic spine section above, the pairwise-vs-spine roadmap item, and Challenge 06. Related concept: schema matching — interlingua / pivot approach |
| YAML-LD / LinkML | W3C CG Final Report 2023; LinkML project | Bridges from YAML frontmatter to full RDF/OWL/SHACL | Our escape hatch to formal Semantic Web tooling without abandoning files — see the Markdown + YAML vs OWL+RDF+SPARQL bullet |
| Content-addressable versioning | Git, IPFS, Nix, Dolt | Identify versions by SHA of content, not by central numbering | Candidate versioning model for framework snapshots — see Versioning model in Next research items below |
Decisions this forces
Section titled “Decisions this forces”The research doesn’t make these decisions — it makes them unavoidable. Each of these now needs an explicit position, and each should become a dated log entry or zz-challenge when picked up:
-
Tier 2 activation threshold. Where does the sql.js sidecar kick in? Candidate cutoffs: 3K notes (the “community query tooling” threshold), 5K notes (soft user-noticeable), 10K notes (hard). And is it user-toggled, note-count-triggered, or adaptive based on operation latency? Right now this is “when needed” — that’s not a decision, that’s a deferral. Tracked under the Progressive tier architecture pillar and the Obsidian Bases direction research item (which needs to re-measure the ceiling against Bases specifically — see the Dataview callout above).
-
EvolutionPattern: keep, replace, or stack? Keep EvolutionPattern as framework-level default-setter AND add per-version transformation recipes, OR replace it entirely with recipes and let recipes generalize. Research leans “stack both”, but we need a commitment. Tracked under the EvolutionPattern vs transformation recipes roadmap item.
-
Crosswalk edge vocabulary. Commit to the 5 NIST IR 8477 set-theory relationships — i.e. STRM — as the edge type vocabulary? And commit to SSSOM’s metadata model for edge properties (mandatory justification + optional confidence, author, date, tool)? Tracked under the new Crosswalk edge semantics commitment roadmap item. See the STRM and SSSOM fit together in practice callout above for how the two layers compose on a single edge.
-
Synthetic spine — adopt or reject. Do we architect around a canonical pivot (SCF’s ~1,300 controls? An OSCAL catalog? A Crosswalker-authored canonical?) so crosswalks become transitive through the spine, or do we keep direct pairwise mappings and accept the O(n²) maintenance burden? This is the biggest open question. Tracked under the Pairwise vs synthetic spine roadmap item and explored in depth in Challenge 06, which also covers the long-term resilience and audit-grade trustworthiness angles. Bigger context: schema matching — interlingua / pivot approach.
-
BFO-style formal grounding — how much? Adopt BFO’s continuant/occurrent distinction as a
node.typeconvention (near-zero cost, high conceptual clarity) without adopting the full 36-class ontology? Or stay fully platform-independent and resist any upper-ontology commitment as premature lock-in? No roadmap item exists yet — this decision is currently only captured here. -
Formal grounding for EvolutionPattern. Rebuild the EvolutionPattern taxonomy on top of Stojanovic’s six-phase evolution ontology (capture → represent → semantics → implement → propagate → validate) and Flouris et al.’s 11 change tasks — or keep it as a pragmatic draft and defer the grounding? Tracked under the same EvolutionPattern vs transformation recipes roadmap item as decision #2, since the “keep / replace / stack” question and the “formalize against Stojanovic / Flouris” question travel together.
Next research items
Section titled “Next research items”These don’t need decisions yet — they need investigation:
- Is FCA tractable on real framework data? Build a formal context (rows = controls across NIST + CIS + ISO, columns = properties extracted from description text) and see whether the auto-computed concept lattice reveals real equivalences that match SCF’s hand-curated mappings. If yes, we have an automated mapping discovery path. This is one of the explicit investigation branches in Challenge 06 (spine distillation option).
- Sizing the synthetic spine. If we adopt hub-and-spoke, is the spine inherited (reuse SCF or OSCAL), distilled (compute it from FCA over the imported frameworks), or authored (handcraft a small canonical set)? Each has very different operational implications. Explicitly addressed in section 4 of Challenge 06, with resilience and trustworthiness profiles for each option.
- Versioning model. Content-addressable (Git-style CIDs per framework snapshot)? Semantic version strings (NIST r5 / r6)? Datomic-style immutable accumulation? Research pointed at all three as viable — we haven’t picked. Related: framework versioning concept page. No dedicated roadmap item yet.
- CRDT layer for future distributed editing. Not Foundation-phase, but the research flagged that if we ever want multi-user editing, we should layer Yjs (via the Relay plugin architecture for Obsidian) on top rather than invent our own reconciler. See the consistency models concept page for the formal backdrop. Note for later.
- LLM matching pipeline (Tier 3 of crosswalk matching). Magneto / MILA / LLMs4OM achieved 0.83–0.95 F1 on ontology alignment by using LLMs only for uncertain mappings. When we get to automated crosswalk suggestion, this is the architecture to adopt — three-tier: lexical → embedding → LLM-for-uncertain-only. No roadmap item yet; will fit under a future “AI-assisted transforms” workstream. Related concept: schema matching.
Each of these is a candidate for a new research challenge brief so fresh agents can attack them without inheriting our assumptions. Challenge 06 is the first one spun out from this synthesis; future items should follow the same pattern.
What this log does NOT settle
Section titled “What this log does NOT settle”To be explicit: this log is the synthesis, not the decisions. Nothing in the KB, roadmap, or terminology pages gets modified based on this log alone. The next steps are:
- Review this synthesis with fresh eyes
- Pick the decisions above one at a time, each becoming its own dated log entry
- Update roadmap / concepts / terminology pages as each decision lands
- Archive the
.workspace/research docs once their insights are represented in the KB
The Foundation phase holds up. It gets refined, not rebuilt.
Related
Section titled “Related”- Atomic operations research (04-09) — the 6 GED atoms, our 9 primitives, completeness proof
- Ontology evolution first principles (04-08) — the original 13 structural change primitives exercise
- User-first ontology maintenance (04-09) — entity-aligned UX, Path C
- Primitives depth and pluggable layers (04-09) — detection / decisioning / handling separation
- Evolution pattern taxonomy draft (04-03) — the taxonomy this research wants grounded in Stojanovic
- Why Obsidian, why files (04-03) — the file-first commitment
- Layered architecture vision (04-03) — progressive Tier 1/2/3
- Terminology — definitions for ontology diff primitives, EvolutionPattern, transformation recipes, decisioning, handling strategy, pluggable layer
- Roadmap — where the ontology diff primitives and EvolutionPattern research items live
- Research challenges — the place where “next research items” become fresh-agent assignments