🚧 Early alpha — building the foundation. See the roadmap →

Foundation research synthesis — testing our decisions against 4 parallel research sessions

Created Apr 10, 2026 Updated Jun 1, 2026

Why this log exists

The project is in its Foundation phase, and decisions made here calcify fast. Before we start pouring concrete we sent four fresh research agents at the current architecture with no prior context bias, each with a different angle:

First-principles roadmap critique — stress-test the technical roadmap against known problem-domain literature
Formal primitives audit (the long one, ~94KB) — ground our primitives in information science, category theory, BFO, FCA, SSSOM, OAEI literature
Resilient knowledge work primitives — what makes a knowledge structure last 20+ years
Accessibility pass — same rigor, but explicitly asked to keep it anchored to things a non-specialist can hold onto

All four sit in .workspace/ (local, gitignored). This log is the distillation — what converged, what pushed back, what new concepts showed up, and what we now need to decide.

What we tested against

The Foundation decisions under review:

The 9 ontology diff primitives — see atomic operations research (6 GED atoms refined to 9, with 4 composites)
The EvolutionPattern taxonomy — see evolution pattern draft
File-first / progressive tiers — see why Obsidian, why files and layered architecture vision
User-first entity-aligned maintenance UX — see user-first ontology maintenance
Pluggable detection & decisioning layers — see primitives depth and pluggable layers

Convergences — where the research agrees with Foundation

The 9 primitives are sound

All four research sessions, independently, converged on the same answer: the 6 mathematical atoms of Graph Edit Distance theory (Bunke & Allermann 1983, Sanfeliu & Fu 1983) are provably complete for transforming any labeled directed graph into any other — and our 9-primitive set is a valid refinement of those 6, splitting the two “substitution” operations into semantically distinct sub-cases (id change vs. property change vs. type change).

Math atoms → Crosswalker’s refinement

6 GED atoms (provably complete)

1node insertion

2node deletion

3node substitution

4edge insertion

5edge deletion

6edge substitution

→

9 Crosswalker primitives

1node.added

2node.removed

3anode.id_changed

3bnode.properties_changed

3cnode.type_changed

4edge.added

5edge.removed

6aedge.type_changed

6bedge.properties_changed

Plain-English anchor: If you had a Lego model and wanted to turn it into a different Lego model, the only things you can do are add a brick, remove a brick, swap a brick, or change how two bricks are connected. That’s it. The math guarantees those six moves are enough. We kept all six, then split “swap a brick” into three kinds because renaming a control (id change), editing its description (property change), and reclassifying it from detective to preventive (type change) mean very different things to a compliance user, even though the math doesn’t care.

The 4 composite operations (node.moved, subgraph.merged, subgraph.split, hierarchy.restructured) also got formal backing: Klein (2004) calls these “complex changes” and proved the set is infinite (you can always define new compositions). Our decision to recognize only 4 and keep the list open is formally correct.

Also confirmed: The property graph model (nodes + edges + properties on both) is the right formal substrate. Unlike RDF’s triple model, property graphs allow metadata directly on edges — essential for crosswalks where a mapping carries confidence scores, rationale, and provenance. The new ISO GQL standard (ISO/IEC 39075:2024) enshrines property graphs as the international standard. This is Lindy-compatible.

“Property graphs are the right substrate” sounds like a conceptual choice, but it’s actually an architectural commitment that reaches all the way down to storage. To practically implement it you have to start at the hardware level and work up: first-class edge properties (confidence, provenance, rationale attached to a mapping) need a backend that can natively store and query edges-with-properties — not just nodes.

Markdown files alone can only fake this: we’d be encoding edge metadata in link syntax (the framework_here:: [[AC-2]] {"sufficient": true} pattern), which is fragile, slow to query, and breaks as soon as the vault gets large. See the scale cliff below — the bun run serve of this architecture is the Tier 2 sql.js WASM sidecar: it lives beside the vault, stays gitignored, is rebuildable from the files (so files remain source-of-truth), and gives us the property-graph operations Obsidian alone can’t. The files stay canonical; the sidecar is the machine-queryable index layer built on top.

See the file-based graph database concept page for the full formal model and the layered architecture vision (04-03) for how the tiers stack.

File-first is valid at current scale

The resilient-knowledge-work research found that sixty years of information science converge on the same architecture when the criteria are human readability, version controllability, extensibility, and multi-decade resilience:

Ranganathan’s faceted classification (Colon Classification, 1933) — the “break it into independent tags” idea. The point of this bullet: our YAML frontmatter structure inherited a 90-year-old library-science property that makes it safe to extend forever without reorganizing. Here’s why that matters:
- The old way (rigid trees): pick one giant pre-made category tree and force every item into exactly one slot. “19th-century French landscape painting” becomes a single drawer you have to find. If you later decide you care about the artist’s mood, you have to rebuild the tree — every drawer needs a new subdivision, and all your old cross-references break.
- Ranganathan’s way (independent labels): break the same item into small independent labels and combine them at query time: era: 19th century + country: France + medium: painting + subject: landscape. Now you can ask for “all French things,” “all landscapes,” or any combination on the fly.
- The key property: adding a new label later never touches the old labels. If next week you want to track mood: melancholy, you just start writing it on new items. Nothing reshuffles. Old queries keep working unchanged because they never looked at mood in the first place. Ranganathan called this hospitality — the system is “hospitable” to new information.
- Why this matters for Crosswalker: YAML frontmatter keys are facets. Every type:, framework:, control_id:, reviewer: field on a note is an independent facet. When we want to track a new property on controls (say, quantum-resistant: true in 2029), we just start adding it. No schema migration, no reshuffling folders, no breaking existing queries.
- And to address the obvious question: this is not the same as “deprecate but never delete” (which is Protobuf’s schema-evolution discipline for binary wire formats). Hospitality is a step more fundamental — because new facets can’t collide with old ones, there’s nothing to deprecate in the first place. You just grow the vocabulary. The old fields aren’t “legacy kept alive for compatibility”; they’re simply facets you still happen to use.
This is the deepest reason the markdown + YAML file-first architecture is likely to age well — we got a 90-year-old library-science property for free just by picking plain text files with key-value frontmatter.
Luhmann’s Zettelkasten — 90,000+ file-per-node cards maintained for 30 years, produced 70+ books. File-first scales to serious intellectual work without a database.
Markdown + YAML + WikiLinks vs OWL+RDF+SPARQL — the “simple plain-text stack vs. formal Semantic Web stack” tradeoff. Richard Gabriel’s “Worse is Better” (1991) predicted exactly this outcome:
- The “worse” stack wins on adoption. Markdown notes with YAML frontmatter and [[wiki links]] is technically less expressive than OWL+RDF+SPARQL (no Description Logic reasoner, no SPARQL query engine, no formal axioms). But it’s radically simpler to implement, adopt, and read. Obsidian has 1.5M+ monthly active users; formal ontology tooling sits at roughly 27% production deployment even among organizations that bought into the Semantic Web. This is exactly why we picked Obsidian + files in the first place — adoption and human-readability beat formal expressiveness in practice.
- We don’t lose the formal stack — we build a bridge when we need it. The research flagged two mature bridge technologies: YAML-LD (W3C Community Group Final Report, December 2023) defines conventions for serializing Linked Data as YAML on top of JSON-LD syntax, and LinkML provides a single YAML schema language that compiles to JSON Schema, OWL, SHACL, and SQL DDL. So if we ever need to export a vault to RDF for interoperability with institutional ontology tooling, or hand a SHACL schema to a compliance auditor, the path exists — without abandoning files as source-of-truth. See the new formal concepts reference table below for pointers.
- The deeper design principle. This is the same “worse is better” bet Crosswalker makes everywhere: human-editable files first, machine-queryable layers built on top, formal-export bridges reserved for the rare moments they’re actually needed. See what makes Crosswalker unique for the philosophical pillar this connects to, and the property graph callout above for how the same logic extends the claim all the way down to storage (files canonical + Tier 2 sidecar for edge-property queries).

Pluggable layers match the research

Our “pluggable detection + pluggable decisioning” separation (see primitives depth log) lines up with Javed, Abgaz & Pahl’s (2009) four-layer change operator model: atomic operations → composite operations → domain-specific patterns → domain-specific complex patterns. Our “math atoms → detectors → decisioners → handlers” stack is the same structure under different names. Good sign.

Challenges — where the research pushes back

Files-canonical ceiling — documented, already answered by the 3-tier pillar

The first-principles critique confirmed a ceiling we’d already identified: files-as-source-of-truth is strategically correct, but Tier 1 (pure files + in-vault queries) has finite headroom. This is not a surprise finding and not a crisis — it’s the exact reason the Progressive tier architecture is a Foundation pillar on the roadmap. We’re documenting the ceiling here so the research on record agrees with our existing plan.

What the ceilings look like:

Obsidian’s graph view caps usefully around 25K nodes
The V8 engine inside Electron caps at ~4 GB heap
A full NIST-800-53 × CIS × ISO × MITRE crosswalk is ~210,000 potential mapping pairs to evaluate
Any in-vault query layer doing linear scans over YAML frontmatter degrades around 3–5K notes (the research literature cites this as a Dataview number — see the expandable note below on why that’s a misleading framing for us)

Note: we are not building on Dataview — click to expand (recurring research-agent confusion)

The underlying research reports all cite Dataview’s 3–5K note ceiling as the “Obsidian query layer” limit. Crosswalker is not building on Dataview. Dataview is deprecated, and we explicitly chose not to take on that dependency.

The roadmap’s Obsidian Bases direction research Foundation item captures the actual plan: build the viewing/querying layer on top of Obsidian Bases (the native successor to Dataview-style queries), with Datacore as a backup to investigate if Bases turns out to be insufficient.

What this means for the ceiling claim: the 3–5K note figure is a reasonable first estimate for any in-vault query engine doing linear scans over YAML frontmatter — Bases is likely to face similar orders-of-magnitude limits until we do our own benchmarks against it. The Tier 2 sql.js sidecar argument stands either way (it rescues query performance at scale regardless of which viewing layer we pick). The specific number should be re-measured against Bases as part of the Foundation research item, not inherited from Dataview benchmarks.

Research items to reconcile: Obsidian internals research (04-04) and the roadmap’s Bases direction item.

Where files-canonical starts to hurt

Plain-English anchor: Think of a filing cabinet. At 500 folders it’s fine. At 5,000 it’s still workable if you have an index card drawer. At 50,000 you’re overturning the cabinet every time you want to find something. At that point you need a librarian sitting beside the cabinet who keeps a notebook of where everything is — that’s Tier 2. The files are still the source of truth, but the librarian makes them queryable. Our current roadmap has Tier 2 as “when needed” — the research says we need to commit to when it activates, because the ceiling is closer than it looks.

This is not a new challenge — it’s the exact ceiling the Progressive tier architecture pillar is committed to handling. Files stay canonical at all three tiers; what changes is the machine-queryable index layer sitting beside them: Tier 1 (files only + validation) → Tier 2 (files + sql.js WASM sidecar for property-graph queries) → Tier 3 (files + server: PocketBase / Postgres). The research’s contribution is narrower than “we need a new plan” — it’s that Tier 2’s activation threshold should be explicit and designed rather than emergent. See the Decisions this forces section below for the specific Tier 2 cutoff question, and the property graph callout above for why the sidecar is also what makes first-class edge properties practically queryable.

The synthetic spine is the missing architectural insight

This is the single most transformational finding in all four research sessions. It showed up independently in three of them:

Every mature compliance meta-framework — SCF (~1,300 controls across 175+ frameworks), the DESM (Data Exchange Standards Mapper), Hyperproof’s topic-based mapping, UCF’s ~10,000 Common Controls — converges on the same architecture: instead of maintaining O(n²) pairwise mappings between every framework pair, each framework maps once to a canonical intermediate representation (the “spine”), and cross-framework mappings are derived transitively through the spine.

Pairwise mapping vs. synthetic spine

Pairwise (what we have now)

21 edges · O(n²) · n=7

Synthetic spine (hub-and-spoke)

7 edges · O(n) · n=7

Plain-English anchor: Imagine translating between every pair of 50 languages. You’d need 1,225 dictionaries. Or — you pick one pivot language (say, a simplified Esperanto), translate each of the 50 into it once, and now to translate from Korean to Swahili you just go Korean → pivot → Swahili. You need only 50 dictionaries instead of 1,225. The research says: pick a pivot for compliance controls. Candidates include SCF’s ~1,300 canonical controls, an OSCAL-based catalog, or a Crosswalker-authored canonical. Category theory backs this formally — alignments are “spans”, merging through a pivot is the “pushout” construction, and it’s provably the best-possible structure-preserving transformation (Spivak’s functorial data model).

This is a concept we don’t currently have a formal position on. The roadmap talks about crosswalks as pairwise edges. The research is saying: that’s the O(n²) trap, and there’s a proven architectural alternative that every mature system has converged on independently. This is the biggest open architectural question surfaced by the research — now tracked as the roadmap item “Pairwise crosswalks vs synthetic spine architecture” and the fresh-agent research challenge 06, which also goes deep on the long-term resilience and audit-grade trustworthiness questions that have to be answered before any spine can be committed to.

Is a synthetic spine the same as a crosswalk? — click to expand

Short answer: no. They’re related but sit at different levels of the stack.

A crosswalk is a mapping artifact between two specific ontologies. “NIST 800-53 AC-2 maps to ISO 27001 A.9.2.1 with justification X” is a crosswalk entry. It’s an edge — a statement about two concrete things that already exist. Crosswalks are what Crosswalker is named after and what it produces.
A synthetic spine is not a crosswalk — it’s an architectural choice for how you generate and maintain crosswalks at scale. Instead of authoring N×(N-1)/2 direct crosswalks between every pair of frameworks, you author a canonical intermediate ontology (the “spine”), map each framework to the spine once (N mappings instead of N²), and let cross-framework crosswalks be derived transitively through the spine.

The practical difference:

	Pairwise crosswalks	Synthetic spine
What you author	A↔B, A↔C, B↔C, … (N² edges)	A→spine, B→spine, C→spine, … (N mappings)
When framework A updates	Re-review every crosswalk involving A	Re-review only A’s mapping to the spine
Consistency guarantee	Pairs can silently disagree (A→B ∘ B→C ≠ A→C)	Transitivity enforced by construction
Example in the wild	Hand-curated NIST↔ISO spreadsheets	SCF (~1,300 controls as pivot), DESM, UCF

So which does Crosswalker make? Crosswalks — that’s the product. The spine question is how we architect the production of those crosswalks at scale. A vault could contain both: framework-to-spine mappings (efficient to maintain) and transitively-derived framework-to-framework crosswalks (what the user actually queries against).

For more context on where crosswalks, frameworks, interchange formats, and canonical ontologies sit in the broader ecosystem, see the institutional landscape page (who creates, maps, mandates, and consumes each of these) and the operational landscape page (which explicitly distinguishes “frameworks themselves vs crosswalks vs evidence vs interchange formats” as different resource classes with different update cadences).

Edge semantics should be set-theoretic, not SKOS-ish

Where this sits in the bigger picture. This is the decision about what vocabulary every Crosswalker crosswalk edge must speak. It’s the ground-level instantiation of the broader interlingua / pivot approach pattern — also called a pivot ontology or meta-framework — where every framework maps once to a canonical intermediate instead of O(n²) pairwise. The mapping organizations that already operate at that scale — SCF with its ~1,300-control STRM bundle, NIST OLIR, CTID — have each committed to a specific edge-type vocabulary. Crosswalker’s question is which one to adopt.

The schema matching concept page already documents most of this terrain — specifically its NIST OLIR formal relationship types section (the 5 set-theory relationships, with domain examples) and its SSSOM section (the metadata envelope). This log’s contribution is not to reinvent that content, it’s to tie those pieces to an explicit Foundation commitment: STRM as the required predicate vocabulary, SSSOM as the required metadata envelope, SKOS rejected as the base (kept only as an export format via the YAML-LD / LinkML bridge).

Right now we don’t have a locked-in vocabulary for crosswalk edge types. The research is unambiguous: SKOS’s 5 mapping relations (exactMatch, closeMatch, broadMatch, narrowMatch, relatedMatch) are insufficient for compliance crosswalking. They lack confidence scores, provenance, many-to-many support, and negation.

The literature converges on 5 set-theory relationships from NIST IR 8477 (February 2024), the SCF’s STRM methodology, and OSCAL’s Control Mapping Model — three independent efforts that picked the same vocabulary:

NIST IR 8477 set-theory relationships (5 primitives)

Plain-English anchor: Forget SKOS’s vague “closeMatch” / “relatedMatch” — think in terms of what the two things actually cover. Equivalent = same requirement. Subset = framework A’s control is a narrower version of framework B’s. Superset = A covers everything B covers and more. Intersects = they overlap but neither contains the other. No-relationship = genuinely unrelated. That’s the whole vocabulary, and it’s the same one that NIST, SCF, and OSCAL’s Control Mapping Model all independently arrived at.

Technically, STRM is the edge-type vocabulary (the 5 allowed values for a crosswalk edge’s predicate_id slot) and SSSOM is the row-schema envelope (the required + optional metadata fields every edge carries alongside that predicate). They sit at different layers and work together, not as alternatives — the edge semantics stack entry in terminology and the schema matching concept page both define this distinction formally. (Expand the collapsible callout below for a worked example with both layers filled in.)

The STRM + SSSOM stack above applies specifically to crosswalk edges — edges where both ends are concepts in structured ontologies (framework↔framework, framework↔synthetic spine, transitively-derived). It does not cover evidence links — edges from user-authored documents to framework controls with properties like status: covered, sufficient: true, reviewer: Alice. Evidence-link semantics (“this document demonstrates this control at such-and-such implementation status”) are fundamentally different from crosswalk semantics (“these two concepts correspond at such-and-such confidence”), and SSSOM’s vocabulary does not fit the evidence case.

The evidence-link edge model is a separate, unresolved Foundation question that this synthesis didn’t settle. Candidate approaches include adopting OSCAL’s Implementation Layer concepts (by-component, implementation-status, satisfied), reusing Crosswalker’s original framework_here.applies_to:: [[AC-2]] {JSON} link metadata syntax, or inventing a Crosswalker-specific schema. Tracked as its own research item in the Foundation roadmap.

Each ontology-to-ontology edge then carries SSSOM-style metadata (Matentzoglu et al., Database, 2022): mapping_justification (mandatory), confidence score, author_id, mapping_date, mapping_tool, and predicate_modifier for negation. This is the edge model our crosswalks (but not our evidence links) should commit to.

How STRM and SSSOM fit together in practice — they are not alternatives, they are layers (click to expand)

This is the part that gets confusing on first read: STRM is not a replacement for SSSOM, and SSSOM is not a replacement for STRM. They’re two different layers of the same edge, and you use them together.

The envelope/content analogy. Think of a shipping manifest form. The form itself has mandatory fields — sender, recipient, declared contents, weight, date shipped — with rules about which are required. That’s SSSOM: the row schema, the “what fields must every crosswalk carry.” Then one specific field on that form (“declared contents category”) has a controlled vocabulary — electronics, liquid, fragile, perishable. That’s STRM: the allowed values for the predicate_id field only. Without the form, you have nothing to write on. Without the vocabulary, the most important field is free-text slop that can’t be audited.

What a real Crosswalker edge looks like with both layers filled in:

# A single crosswalk edge — SSSOM row schema, STRM predicate vocabulary
subject_id:            NIST-800-53/AC-2              # ← what this edge starts from
predicate_id:          strm:subset-of                 # ← STRM's vocabulary (1 of 5 values)
object_id:             ISO-27001/A.9.2.1              # ← what it points to
mapping_justification: semapv:ManualMappingCuration   # ← SSSOM: mandatory
confidence:            0.85                           # ← SSSOM: optional
author_id:             alice@example.org              # ← SSSOM: optional
mapping_date:          2026-04-10                     # ← SSSOM: optional
mapping_tool:          crosswalker-wizard             # ← SSSOM: optional

Every field except predicate_id is an SSSOM field. SSSOM is predicate-agnostic — it doesn’t care what vocabulary you use for the predicate_id; it just requires there be one, and that the other envelope fields travel with it for audit.
The predicate_id value itself (strm:subset-of) is from STRM’s 5-relationship vocabulary. If you picked SKOS instead, it would be skos:narrowMatch — same slot, different vocabulary.
Together they give you auditable, precisely-typed crosswalks: STRM gives the compliance auditor confidence that “subset-of” means exactly one thing mathematically; SSSOM gives them the justification, confidence, author, and date needed to trust the assertion.

Where this decision lives:

Roadmap pillar: Crosswalk edge semantics commitment (STRM + SSSOM) in Foundation phase — this is the commitment we have to lock in before any crosswalk authoring features ship
Registry entries (canonical facts): SKOS · SSSOM · STRM
Bigger-picture concept: schema matching — the interlingua / pivot approach shows where these edge types sit in the larger crosswalking workflow
Terminology: interlingua / pivot entry
Related research angle: the synthetic spine question in Challenge 06 asks how STRM+SSSOM edges compose when crosswalks are derived transitively through a pivot rather than authored directly — does subset-of ∘ subset-of = subset-of? That’s the composition math we’ll need to answer

EvolutionPattern needs formal grounding — or replacement

The EvolutionPattern taxonomy classifies how a framework evolves (stewardship profile, cadence, backwards compatibility, etc.) so we can set sensible default handling strategies. The research didn’t invalidate it, but it surfaced a sharper question we’d already been circling (see roadmap research item):

EvolutionPattern vs. transformation recipes

prediction

EvolutionPattern taxonomy

A per-framework profile: “NIST evolves like X, so expect Y when a new version drops.”

Strengths

Works before any new version exists
Sets defaults for unknown changes
Portable across frameworks in the same class

Weaknesses

Predictive, so can be wrong
Needs formal grounding (Stojanovic 2004 evolution ontology, Flouris 11 change tasks)
Coarse-grained vs. per-version reality

record

Transformation recipe

A per-version-transition record: “NIST r5→r6 renamed these 12 IDs, merged these 3, split this one.”

Strengths

Grounded in actual changes, not prediction
Auditable — every change has provenance
Reuses our 9 primitives directly

Weaknesses

Only exists after a new version is released
Doesn’t help set defaults for first-time imports
Per-transition labor cost

The research’s verdict: keep EvolutionPattern, but treat it as a default-setter that transformation recipes can override when actual version deltas become available. Both layers coexist — EvolutionPattern sets expectations before a new version; the recipe records what actually happened. And EvolutionPattern itself needs formal grounding in Stojanovic’s (2004) evolution ontology (capture → represent → semantics → implement → propagate → validate) and Flouris et al.’s (2008) 11-task classification. Currently it’s a draft taxonomy with no literature anchor.

New formal concepts the research introduced

These are concepts that weren’t in our Foundation vocabulary but the research leans on heavily. Each deserves a short definition now so future logs and concept pages can cite them.

Concept	Source	One-line definition	Relevance to Crosswalker
Graph Edit Distance (GED)	Bunke & Allermann 1983, Sanfeliu & Fu 1983	The minimum-cost sequence of atomic graph operations to transform one labeled graph into another	Proves our 9 primitives are complete
Algebraic graph transformation / DPO	Ehrig, Prange & Taentzer 2006	Formalizes graph rewrite rules via Double Pushout category-theoretic constructions	Each of our 9 primitives maps to a DPO production rule
Category theory (spans, functors, pushouts)	Spivak, Kent — ologs	Math of structure-preserving mappings between categories; alignments are spans, merges are pushouts	Formal basis for the synthetic spine and transitive crosswalks — see Challenge 06
Formal Concept Analysis (FCA)	Ganter & Wille	Given a cross-table of objects × attributes, computes the unique concept lattice of all co-occurrence patterns	Could auto-discover implicit control equivalences from data — a candidate mechanism for spine distillation in Challenge 06
SSSOM	Matentzoglu et al. 2022 (Database)	Simple Standard for Sharing Ontological Mappings — SKOS predicates + mandatory justification, confidence, author, date, tool	The metadata model our crosswalk edges should adopt — tracked in the edge semantics roadmap item
NIST IR 8477 / STRM	NIST Feb 2024	Set Theory Relationship Mapping: equivalent, subset, superset, intersects, no-relationship	The 5 edge-type primitives for compliance crosswalks — tracked in the edge semantics roadmap item
BFO (Basic Formal Ontology)	ISO/IEC 21838-2:2021	36-class ISO-standardized upper ontology; baseline for DOD/IC since Jan 2024	Its continuants/occurrents distinction clarifies “framework as entity” vs. “framework revision as event” — see decision #5 below
Stojanovic’s evolution ontology	Stojanovic 2004 (KAON)	Models ontology changes as first-class entities with a six-phase lifecycle	Should ground our EvolutionPattern taxonomy — tracked in the EvolutionPattern vs transformation recipes roadmap item
Flouris 11 change tasks	Flouris et al. 2008	Taxonomy distinguishing evolution, versioning, integration, and alignment as formally separate problems	Clarifies that our “evolution” work is actually four different problems — see user-first maintenance log
Ranganathan PMEST / faceted classification	Colon Classification, 1933	5 fundamental facets — Personality, Matter, Energy, Space, Time — for decomposing any subject	The deepest root for why YAML frontmatter works as a facet system — see file-first is valid at current scale
Synthetic spine / hub-and-spoke mapping	SCF, DESM, UCF convergence	Map each framework to a canonical intermediate, derive cross-framework mappings transitively	The biggest architectural insight — see the synthetic spine section above, the pairwise-vs-spine roadmap item, and Challenge 06. Related concept: schema matching — interlingua / pivot approach
YAML-LD / LinkML	W3C CG Final Report 2023; LinkML project	Bridges from YAML frontmatter to full RDF/OWL/SHACL	Our escape hatch to formal Semantic Web tooling without abandoning files — see the Markdown + YAML vs OWL+RDF+SPARQL bullet
Content-addressable versioning	Git, IPFS, Nix, Dolt	Identify versions by SHA of content, not by central numbering	Candidate versioning model for framework snapshots — see Versioning model in Next research items below

Decisions this forces

The research doesn’t make these decisions — it makes them unavoidable. Each of these now needs an explicit position, and each should become a dated log entry or zz-challenge when picked up:

Tier 2 activation threshold. Where does the sql.js sidecar kick in? Candidate cutoffs: 3K notes (the “community query tooling” threshold), 5K notes (soft user-noticeable), 10K notes (hard). And is it user-toggled, note-count-triggered, or adaptive based on operation latency? Right now this is “when needed” — that’s not a decision, that’s a deferral. Tracked under the Progressive tier architecture pillar and the Obsidian Bases direction research item (which needs to re-measure the ceiling against Bases specifically — see the Dataview callout above).
EvolutionPattern: keep, replace, or stack? Keep EvolutionPattern as framework-level default-setter AND add per-version transformation recipes, OR replace it entirely with recipes and let recipes generalize. Research leans “stack both”, but we need a commitment. Tracked under the EvolutionPattern vs transformation recipes roadmap item.
Crosswalk edge vocabulary. Commit to the 5 NIST IR 8477 set-theory relationships — i.e. STRM — as the edge type vocabulary? And commit to SSSOM’s metadata model for edge properties (mandatory justification + optional confidence, author, date, tool)? Tracked under the new Crosswalk edge semantics commitment roadmap item. See the STRM and SSSOM fit together in practice callout above for how the two layers compose on a single edge.
Synthetic spine — adopt or reject. Do we architect around a canonical pivot (SCF’s ~1,300 controls? An OSCAL catalog? A Crosswalker-authored canonical?) so crosswalks become transitive through the spine, or do we keep direct pairwise mappings and accept the O(n²) maintenance burden? This is the biggest open question. Tracked under the Pairwise vs synthetic spine roadmap item and explored in depth in Challenge 06, which also covers the long-term resilience and audit-grade trustworthiness angles. Bigger context: schema matching — interlingua / pivot approach.
BFO-style formal grounding — how much? Adopt BFO’s continuant/occurrent distinction as a node.type convention (near-zero cost, high conceptual clarity) without adopting the full 36-class ontology? Or stay fully platform-independent and resist any upper-ontology commitment as premature lock-in? No roadmap item exists yet — this decision is currently only captured here.
Formal grounding for EvolutionPattern. Rebuild the EvolutionPattern taxonomy on top of Stojanovic’s six-phase evolution ontology (capture → represent → semantics → implement → propagate → validate) and Flouris et al.’s 11 change tasks — or keep it as a pragmatic draft and defer the grounding? Tracked under the same EvolutionPattern vs transformation recipes roadmap item as decision #2, since the “keep / replace / stack” question and the “formalize against Stojanovic / Flouris” question travel together.

Next research items

These don’t need decisions yet — they need investigation:

Is FCA tractable on real framework data? Build a formal context (rows = controls across NIST + CIS + ISO, columns = properties extracted from description text) and see whether the auto-computed concept lattice reveals real equivalences that match SCF’s hand-curated mappings. If yes, we have an automated mapping discovery path. This is one of the explicit investigation branches in Challenge 06 (spine distillation option).
Sizing the synthetic spine. If we adopt hub-and-spoke, is the spine inherited (reuse SCF or OSCAL), distilled (compute it from FCA over the imported frameworks), or authored (handcraft a small canonical set)? Each has very different operational implications. Explicitly addressed in section 4 of Challenge 06, with resilience and trustworthiness profiles for each option.
Versioning model. Content-addressable (Git-style CIDs per framework snapshot)? Semantic version strings (NIST r5 / r6)? Datomic-style immutable accumulation? Research pointed at all three as viable — we haven’t picked. Related: framework versioning concept page. No dedicated roadmap item yet.
CRDT layer for future distributed editing. Not Foundation-phase, but the research flagged that if we ever want multi-user editing, we should layer Yjs (via the Relay plugin architecture for Obsidian) on top rather than invent our own reconciler. See the consistency models concept page for the formal backdrop. Note for later.
LLM matching pipeline (Tier 3 of crosswalk matching). Magneto / MILA / LLMs4OM achieved 0.83–0.95 F1 on ontology alignment by using LLMs only for uncertain mappings. When we get to automated crosswalk suggestion, this is the architecture to adopt — three-tier: lexical → embedding → LLM-for-uncertain-only. No roadmap item yet; will fit under a future “AI-assisted transforms” workstream. Related concept: schema matching.

Each of these is a candidate for a new research challenge brief so fresh agents can attack them without inheriting our assumptions. Challenge 06 is the first one spun out from this synthesis; future items should follow the same pattern.

What this log does NOT settle

To be explicit: this log is the synthesis, not the decisions. Nothing in the KB, roadmap, or terminology pages gets modified based on this log alone. The next steps are:

Review this synthesis with fresh eyes
Pick the decisions above one at a time, each becoming its own dated log entry
Update roadmap / concepts / terminology pages as each decision lands
Archive the .workspace/ research docs once their insights are represented in the KB

The Foundation phase holds up. It gets refined, not rebuilt.

Atomic operations research (04-09) — the 6 GED atoms, our 9 primitives, completeness proof
Ontology evolution first principles (04-08) — the original 13 structural change primitives exercise
User-first ontology maintenance (04-09) — entity-aligned UX, Path C
Primitives depth and pluggable layers (04-09) — detection / decisioning / handling separation
Evolution pattern taxonomy draft (04-03) — the taxonomy this research wants grounded in Stojanovic
Why Obsidian, why files (04-03) — the file-first commitment
Layered architecture vision (04-03) — progressive Tier 1/2/3
Terminology — definitions for ontology diff primitives, EvolutionPattern, transformation recipes, decisioning, handling strategy, pluggable layer
Roadmap — where the ontology diff primitives and EvolutionPattern research items live
Research challenges — the place where “next research items” become fresh-agent assignments