🧠 Research & Foundations phase — building the KB from the ground up. See the roadmap →

How to enforce round-trip correctness at CI time

UpdatedApr 18, 2026

The core problem

Principle 2 says Obsidian semantics must round-trip between the Web CMS and the vault without corruption. That’s enforceable only if:

We have a test corpus of real Obsidian-flavored pages
Each page has a canonical “expected after round-trip” snapshot
CI runs the full pipeline: parse → render → simulate an edit → save → parse again → compare

The comparison step is where this gets hard. “Equal” for markdown isn’t string equality — whitespace, frontmatter ordering, link aliasing, callout syntax variants all produce non-corrupting differences. A naive diff will flag every benign reformat as a regression.

Why it’s an open challenge, not just a test-harness problem

Frame it as a testing problem and the answer is “write unit tests.” But the actual blocker is defining what “round-trip correct” means for each feature:

Wikilinks: is [[Foo]] ≡ [[Foo|Foo]] ≡ [[Foo| Foo ]]? Probably yes. Codify.
Callouts: is > [!note] ≡ > [!note]-? Different — - means collapsed. Not equivalent.
Frontmatter: YAML key ordering is not semantic; re-ordering on save is benign.
Embeds: ![[image.png]] should round-trip as the same literal. But what about ![[image.png|200]]? That’s Obsidian’s size hint — probably needs to survive.
Block refs: [[Note#^xyz]] — the block ID must be preserved exactly; a re-key would break inbound references.

Each of these needs a written equivalence rule before we can automate the test.

Candidate approaches

Approach A — Property-based testing

Use fast-check or similar. Generate synthetic Obsidian markdown, run it through the pipeline twice, assert equality modulo equivalence rules. Pros: catches generalization bugs. Cons: generators for realistic markdown are non-trivial to write; false positives on edge cases that aren’t real-world.

Approach B — Fixture corpus

Hand-write a set of vault pages covering every Tier 1 / Tier 2 feature. Each is a .in.md file paired with a .expected.md file (the state after one round-trip). CI runs the round-trip and does a semantic diff. Pros: concrete, understandable. Cons: corpus maintenance burden grows with feature list.

Approach C — Real vault diffing

Clone cybersader/cyberbase at a known commit, run the full pipeline, render, simulate edits on a random 1% of pages, save, compare. Pros: catches real-world regressions. Cons: requires network I/O at test time, slow, flaky.

Approach D — Nothing, document the risk

Accept that round-trip correctness is an aspiration, not a property. Document each failure mode as it’s discovered. Pros: zero upfront cost. Cons: Principle 2 becomes a lie; contributors will eventually lose trust.

Leaning

Approach B + Approach A, in that order. Fixture corpus first (concrete, testable, documents expectations). Then property-based testing layered on top once the equivalence rules are codified.

What needs to happen next

Define equivalence rules for each Tier 1 feature (wikilinks, callouts, embeds, code blocks, math, Mermaid, tables). One Markdown document per feature, documenting what “equal” means.
Write 3–5 fixture pages covering the common cases. Store in docs/tests/fixtures/ (or wherever, TBD).
Build a minimal semantic-diff tool — initially just normalize(a) === normalize(b) where normalize canonicalizes whitespace, frontmatter order, and link aliases.
Wire into CI as a Playwright test or a standalone script.

This is probably 2–3 sessions of work once Phase R exits and Phase 1 begins. Not doing it during Phase R because the principle needs to be grounded first.

Translation Layer — the subsystem this challenge is about
Q04 in Open Questions — the precise question form
Principle 2 — the principle this enforces