Regex vs. path templates — abstraction research

Frame: what’s actually being mapped

The plugin sits between two structurally different namespaces. A clean picture of the bidirectional problem before any technical depth:

THE ABSTRACTION CHOICE

Same input, same matching outcome. What does each view tell the engine about its structure?

SETUP · one rule, one path

Rule (PARA Projects, identity transfer): folderEntryPoint = "Projects" · folderAnchor = 'root' · transfer.op = 'identity'

INPUT FOLDER PATH

Projects/Web Auth/oauth-flow/notes.md

two views ↓

REGEX VIEW — TODAYpositional

DERIVED PATTERN

^Projects/([^/]+)/([^/]+)/([^/]+)$

✓ WHAT IT GIVES US

match? yes
3 positional capture groups: "Web Auth", "oauth-flow", "notes.md"

✗ WHAT'S MISSING

what role does each group play? unnamed
per-slot transform handle? none
how to invert? hand-rolled string surgery
bijection visible from pattern alone? no — asserted via cardinality + bijective metadata

TEMPLATE VIEW — PROPOSEDnamed

TEMPLATE

Projects/{project}/{tail...}

✓ WHAT IT GIVES US

match? yes
project = "Web Auth" ← one project entry under PARA
tail = "oauth-flow/notes.md" ← everything deeper, glob

✓ ALSO BUILT-IN

layer = literal prefix Projects/ (no separate folderAnchor needed)
per-slot transform handle: {project | kebab} kebab-cases just the project name
inverse = template instantiation with slot values
bijection: visible — both sides share {project} and {tail...} = round-trips

Slot names are labels, not domain claims. {project} here just means "the single path segment that comes immediately after the Projects/ entry." It doesn't try to be the user's vocabulary — it labels what role that segment plays in this rule. The same template applied to a different vault would still call it {project}; the slot is named for its position relative to the literal prefix, not for what it semantically represents on disk.

Aside — “what about Johnny Decimal’s 10 - Projects?” Both abstractions agree: 10 - Projects is one composite path segment on disk. The regex captures it via ([^/]+) as one group; a template captures it via one slot like {jdEntry}. Neither view splits the 10 - prefix from the rest — that’s not what either pattern is doing. The prefix is sort-order metadata baked into the folder name; turning 10 - Projects into the tag-side projects is the job of an existing transform primitive: numberPrefixHandling: 'strip'. That runs after match/extract, in the transform pipeline. Both regex rules and template rules are agnostic to it — the transform primitive does the work either way. So the pattern abstraction question (regex vs. template) and the prefix-stripping question (number-prefix transform) are orthogonal. Templates don’t change what numberPrefixHandling does; they change what the engine knows about the slot it’s applied to.

Three components, three asymmetries:

Filesystem is a strict hierarchy — every file has exactly one path, exactly one parent folder. The OS enforces this; we don’t get to negotiate it.
Tag namespace is a polyhierarchy — the same file can be reachable via many tag paths (#projects/web, #ritual/auth, #q4-2026 all addressing the same note.md). Tags compose freely; the same nested term (#projects/web) can sit under multiple roots without contradiction.
Sync engine mediates: each rule is a (folder pattern, tag pattern, transfer-op) triple that describes a correspondence. Whether that correspondence is invertible — i.e. whether forward(inverse(t)) === t for every tag t the rule produces — is the question at the heart of this document.

Direct answer to “is full bidirectional determinism always achievable?”: no, and that’s by design. Identity-style rules (PARA’s Projects/{slug} ↔ #projects/{slug}) round-trip perfectly. Lossy operations — truncation-with-drop, marker-only, promotion-to-root — deliberately throw information away in one direction; the inverse can’t reconstruct what was dropped. The abstraction we want isn’t one that forces determinism (that would just disallow useful rules), but one that makes the per-rule determinism status visible at authoring time, instead of being asserted as metadata afterward. That’s the criterion path templates with typed slots satisfy and bare regex doesn’t.

Vocabulary borrowed from prior fields

These terms are load-bearing for the rest of this document. They’re pulled from information science (classification theory, knowledge organization), formal language theory, and bidirectional programming research — established terms, not invented for this plugin.

Strict hierarchy vs. polyhierarchy

Classification-theory terms. A strict hierarchy (one parent per child — a tree) enforces single parentage at every level. A polyhierarchy (multi-parent structure — same item sits under several broader categories at once) permits a directed acyclic graph where the same node can sit under several broader categories without ambiguity. Library Subject Heading systems (LCSH = Library of Congress Subject Headings; MeSH = Medical Subject Headings) are explicitly polyhierarchical for exactly the reasons Obsidian tags are: real-world concepts don’t fit into one parent category. Folder-tag-sync’s reason-to-exist is bridging a strict-hierarchy primitive (filesystem) to a polyhierarchical addressing system (tags).

Pre-coordination vs. post-coordination

Already covered in philosophy. Briefly: a pre-coordinated descriptor fuses concepts into a single hierarchical token (#projects/q4-roadmap is one term carrying two concepts joined by subordination). A post-coordinated descriptor splits concepts into independent tags applied together (#projects AND #q4 AND #roadmap). Folder paths are inherently pre-coordinated; tag systems can be either.

Syntax vs. semantics

Regex captures syntax: does this character sequence satisfy this pattern? It says nothing about semantics: which part of the matched sequence is the layer (where in the tree the rule fires), which part is a variable (capturable, recoordinable content), what name the variable carries, what role it plays in the rule. Phase G’s folderAnchor field exists because the syntactic representation hid the semantic question “where does this rule anchor?” Path templates with named slots are the more direct encoding: the literal segments are syntax, the {name} slots carry semantics, and the conversion between them is mechanical rather than interpretive.

Plain-English version: regex tells the engine “does this string look right?” Templates tell the engine “what role does each piece of this string play?” Both produce the same yes/no match answer. The difference is what the system can do after the match — generate the inverse, surface a slot to the user, run a per-slot transform, prove bijection — because templates know which piece is which.

Lossy vs. lossless transformation

Information-theoretic. A transformation is lossless when the input can be perfectly reconstructed from the output; lossy when it can’t. Identity rules are lossless in both directions (folder ↔ tag preserves content). truncation with tailHandling: 'drop' is lossy folder-to-tag (segments past the depth cap are erased) and partial tag-to-folder (the inverse can only restore the depth-capped prefix). marker-only collapses any folder under the entry into a single fixed tag — maximally lossy in the folder→tag direction.

IDENTITY · lossless both ways

Projects/Web
→ forward →
#projects/web
every character round-trips · forward and inverse are perfect inverses

TRUNCATION (depth=3, tailHandling: drop) · lossy forward

Projects/Web/Auth/Backend/details
→ forward →
#projects/web/auth
"Backend/details" dropped · inverse can recover Projects/Web/Auth, not the discarded segments

MARKER-ONLY · maximally lossy forward

Inbox/today.md
Inbox/yesterday.md
Inbox/2024/Q4/note.md
→ all forward →
#inbox
(one tag for all)
many-to-one by design · inverse can only recover the entry folder Inbox/, not the specific path that produced any given tagged file

Collision vs. lossy — distinct failure modes

These two terms describe different problems that both look like “the abstraction is letting me down”:

Collision is a forward-direction problem: two distinct inputs accidentally produce the same output because the rule’s pattern was too permissive.
Lossy is an inverse-direction problem: one output could map back to many inputs by design — the forward transformation deliberately dropped information.

Solving one doesn’t automatically solve the other. A rule can be perfectly bijective on its matched domain (no lossy) but still cause collisions if its pattern over-matches (Entity/Cybersader/10 - Projects/foo and Entity/Bob/10 - Projects/foo both match a root-anchored ^10 - Projects rule). Conversely, a marker-only rule is defined to be lossy, but never collides — every match goes to the same tag intentionally.

COLLISION (forward problem)

Two distinct folders → same tag by accident. Pattern too permissive.

Entity/Cybersader/10 - Projects/foo
Entity/Bob/10 - Projects/foo
↓ same root-anchored ^10 - Projects rule fires on both
#10-projects/foo   ← same tag for both

Different intended meanings collapse to the same output. The user's mental model said "Cybersader's projects" and "Bob's projects" should be distinct namespaces. The rule didn't capture that.

LOSSY (inverse problem)

One tag → many possible folders by design. Forward transformation dropped info.

#inbox   ← any of these produced it
↓ inverse cannot uniquely reconstruct
Inbox/
Inbox/today.md
Inbox/2024/Q4/note.md

marker-only rules deliberately collapse many sources to one tag. The inverse direction is many-to-one — there's no unique folder to reconstruct.

Plain-English version: Collision is when the abstraction lets two things look the same that shouldn’t. Lossy is when the abstraction deliberately throws information away. Different failure modes; different fixes. The fix for collision is usually a more specific pattern (or capturing the disambiguator into the tag). The fix for lossy is to accept that the rule is many-to-one and design the user experience around that.

Surjection, injection, bijection

Function-theoretic terms (formal vocabulary from set theory for the shapes a function can take) that formalize what “lossless in both directions” means. Given a function f: A → B:

Injective (one-to-one — distinct inputs always go to distinct outputs; no two inputs collide). Folder→tag of an identity rule is injective: two different folder paths produce two different tags.
Surjective (onto — every possible output is reached by some input; no “unused” outputs). Tag→folder of a marker-only rule is surjective on its (tiny) image — every tag the rule produces (just one, the marker) corresponds to many folders.
Bijective (both injective and surjective — perfect 1-to-1 correspondence; the function has a true inverse). Identity rules with no destructive transforms are bijective.

The plugin’s bijective: boolean field on rules is asking exactly this question. Today it’s asserted by the typed-spec semantics. The point of Phase H is to make it computable from the (folder template, tag template) pair — slots that appear on both sides round-trip; slots only on one side document a lossy direction.

Homomorphism, isomorphism

Abstract-algebra terms (vocabulary from the math of structure-preserving maps between systems). A homomorphism (a structure-preserving map in one direction — e.g., the function preserves “this thing is a sub-part of that thing”). An isomorphism (a homomorphism with a structure-preserving inverse — the two structures are formally interchangeable). A perfectly bidirectional rule is asking to be an isomorphism between a folder shape and a tag shape. The lens calculus’s three round-trip laws — GetPut (putting back what you got gives the original), PutGet (getting after putting gives what you put), PutPut (putting twice equals putting once) — are the formal version of “this rule defines an isomorphism on its domain.”

Putting them together: regex vs. templates, in vocabulary

Question	Regex view	Template view
Where does the rule anchor?	Buried in pattern syntax (`^X` vs `(?:^	/)X`vs`^P/X`) — semantics inferred from syntax shape
What part is variable?	Unnamed capture group `(.+)` — positional, no role	Named slot `{slug}` — role visible at authoring time
Is the rule bijective?	Asserted via separate `bijective` metadata field	Computable from slot overlap on both sides
Is one direction lossy?	Asserted via `cardinality` metadata	Computable from which slots appear in which template
Forward composition (folder → tag)	Compile regex, match, position-extract, transform	Compile template, slot-extract, instantiate target template
Inverse composition (tag → folder)	Hand-rolled string surgery (entry-strip, anchor prepend)	Same as forward, with templates swapped

The tradeoff is summarized: regex hides semantics inside syntax; templates surface them. Both compile to the same runtime regex, but at authoring time the template view answers the questions the regex view forces us to compute via metadata.

What surfaced

Phase G made layer a first-class concept on rules — every rule now declares whether it anchors at vault root, at any path-segment boundary, or under a specific parent prefix. The motivating bug was concrete: the user’s dev vault has Johnny Decimal folders nested under fixtures/10 - Projects, but the JD pack’s ^\d{2} - X pattern requires path-start. Preview showed 0 matches; the rule was correctly imported but anchored to a layer the vault didn’t use.

Adding the folderAnchor field fixed the immediate bug. But while writing it, several pieces of code felt like they were doing the wrong job:

src/engine/inferTyped.ts:inferEntryFromPattern is hand-rolled regex parsing — it strips known suffixes ((?:/|$), (?:/.*)?$), checks for leftover metacharacters, returns a string. We’re parsing our own emitted regex back into structure.
src/engine/applyTransfer.ts:buildEntryStripPattern builds another regex to strip the entry portion from a matched path. Three branches, one per anchor mode.
The cardinality and bijective fields on MappingRule are computed from typed-spec semantics — not derivable from the regex pair. The regex doesn’t know whether the rule is bidirectional.

The pattern is the same in each case: regex captures syntax (this string matches that regex), not semantics (this rule lives at this layer, has these named parts, round-trips to that target). The semantic information that makes folder-tag-sync interesting — entry points, anchors, slot extraction, transform composition — has been re-encoded around the regex rather than expressed in the regex itself.

This entry is the research that follows from noticing that.

The tension — where regex leaks

Regex does two distinct jobs in the plugin today, and we’re conflating them:

Membership predicate — does this folder path satisfy the rule? (a gate)
Structural extractor — given a matching path, what are its parts? (a parser)

For (1), regex is fine. RegExp.test() is fast, well-understood, escapeable. For (2) the seams show:

Job	Current implementation	Why it’s awkward
Strip the entry-point prefix from a matched path so the remainder can be recoordinated	`folderPath.replace(new RegExp(\`^${entry}/?`), ”)` — anchor-aware variant added in Phase G	The pattern matched the path, but we run a different regex to extract structure. Two passes, two sources of truth.
Recover the entry literal from a derived rule, so the guided modal can show it as a form field	`inferEntryFromPattern(rule.folderPattern)` — pattern-shape parsing	We emit `(?:/
Decide whether two rules round-trip without information loss	`cardinality` + `bijective` fields, computed from `TransferOp` shape	The regex pair `(folderPattern, tagPattern)` doesn’t tell you. We compute it from the typed-spec semantics, then attach as metadata.

The third one is the deepest leak: bijection is asserted, not proven. We say a rule is bijective because the typed model says identity-transfer + entry-points-on-both-sides ⇒ round-trip. But there’s no automated check that the regex pair is consistent with that claim. If a rule pack author writes folderPattern: '^Projects(?:/|$)' and tagPattern: '^archive/', the metadata might still claim bijective: true if the typed fields say so.

For PARA / JD / SEACOW, the typed model is enough — these rules are simple enough that semantics + heuristics gets us there. But the abstraction is leaking, and Phase G’s folderAnchor field is the most recent leak made visible.

Evaluation criteria

What does “the right abstraction” need to satisfy here?

Bijectivity by construction. Forward + inverse pairs that compose, with round-trip consistency provable (or at least checkable) from the rule’s structure alone — not from a separate metadata field.
User authoring cost. The guided-modal must remain learnable for non-regex users; the abstraction can’t require knowing recursion schemes or category theory.
Performance. 10k+ file vault scans must stay fast. The abstraction should compile to something close to a regex (or be one) at runtime.
Composability. Rule packs that nest. SEACOW outer wrapping PARA wrapping individual projects. The abstraction should support a rule pack that says “I live inside whatever pack scopes me.”
Power graceful-degradation. When a rule shape is too complex for the abstraction, raw regex stays available as an escape hatch. The advanced editor remains the power-user surface.
Reversibility limits. Some rules are intentionally lossy (marker-only, promotion-to-root). The abstraction must let lossy be a first-class property — not a bug to design around.

Prior art

Lenses & bidirectional programming

Combinators for Bi-Directional Tree Transformations: A Linguistic Approach to the View-Update Problem (Foster, Greenwald, Moore, Pierce, Schmitt — POPL 2005, TOPLAS 2007). The seminal academic work. A lens is a forward + backward pair (get, put) satisfying three round-trip laws (GetPut, PutGet, PutPut) that guarantee consistency. Lenses compose — a complex bidirectional transformation is built from primitive lenses combined with sequencing, mapping, conditional, etc.

Implementations:

Boomerang — the canonical bidirectional language built on the lens calculus
Haskell lens (Kmett) — gold-standard functional optics
monocle-ts (Giulio Canti) — TypeScript Profunctor lenses; closest fit if we wanted to vendor an existing JS-ecosystem library
partial-lenses — JS lenses with first-class handling for missing fields (relevant to optional slots)

// monocle-ts shape — what a lens-based PARA rule could look like:
import { Lens } from 'monocle-ts';

const projectsPrefix = Lens.fromProp<FolderPath>()('prefix');  // get/set 'Projects/'
const slugLens       = Lens.fromProp<FolderPath>()('slug');    // captured part
const tagPrefix      = Lens.fromProp<TagSpec>()('namespace');  // '#projects/'

// Compose forward (folder → tag); inverse falls out automatically:
const paraProjects = projectsPrefix.composeLens(slugLens).composeLens(tagPrefix);

Folder-tag-sync’s transfer and inverseTransfer fields are an informal lens. Making them formal would mean: each rule is literally a lens; sync is lens.get; reverse-sync is lens.set; the laws guarantee bidirectional consistency.

Production lens implementations worth studying:

Augeas — a C library that edits Linux config files via lenses. Each config-file format (/etc/hosts, /etc/sshd_config, etc.) has a hand-written lens that round-trips between the on-disk text format and a structured tree. The most production-tested lens implementation in real-world software. Read source at github.com/hercules-team/augeas. Lens definitions live in lenses/*.aug files — DSL syntax like let lns = (record . eol)* that compiles to bidirectional get/put pairs. Closest precedent to “we have on-disk artifacts (config files / vault folders) and want a structured edit/query interface.”
Unison file synchronizer — Benjamin Pierce’s earlier project (1995+), the system that motivated the original lens research. Bidirectional file synchronization: edits from either side propagate, conflicts are surfaced, the sync runs to fixpoint. Same author who later wrote the lens papers. Folder-tag-sync’s bidirectional sync sits in nearly the same problem space — user edits in either world (filesystem vs. tag namespace), system propagates.

The lens calculus has been extended in several useful directions:

Quotient lenses (Foster, Pilkiewicz, Pierce — POPL 2008) — lenses up to equivalence. Useful for transformations that should be insensitive to whitespace, ordering, or case differences (which the plugin’s caseTransform is exactly).
Edit lenses (Hofmann, Pierce, Wagner — ICFP 2012) — propagate edits rather than complete states. Maps well onto folder-tag-sync’s sync model: don’t recompute everything when one folder moves; propagate the move as a delta.
Putback-based bidirectional programming (Hu, Mu, Takeichi — JFP 2014) — start from the put (inverse) direction; the get falls out. Often more intuitive for non-academics; matches how rule pack authors actually think (“I want the tag side to look like this; what folder produces it?”).
Bidirectional Transformations Workshop (BX) — annual academic venue. Living bibliography of bx research; useful for finding more recent papers.
Triple Graph Grammars (TGG) — bidirectional graph transformation, mostly used in model-driven engineering. Heavier-weight than what folder-tag-sync needs but worth noting as a parallel lineage.

// Hand-written sketch (no library):
const paraProjectsLens: Lens<FolderPath, Tag> = compose(
  prefixLens('Projects'),       // get: strip 'Projects/' ; put: prepend 'Projects/'
  caseLens('Title', 'kebab'),   // get: kebab-ize    ; put: title-ize
  tagPrefixLens('projects'),    // get: prepend '#projects/' ; put: strip
);

Asymmetric and symmetric lenses

Hofmann, Pierce, Wagner — Symmetric Lenses (POPL 2011); also Asymmetric Lenses. Relaxes the symmetry assumption — one direction can be lossy if the structure is correctly accounted for. Maps directly onto our cardinality field: lossy direction = many:1, lossless = 1:1.

This is the most directly relevant academic frame for folder-tag-sync. Folders → tags is sometimes lossy (truncation drops segments below the cap; marker-only collapses any structure under the entry to a fixed term). Tags → folders is correspondingly partial (you can’t recover what truncation dropped). Asymmetric lenses formalize that exactly.

BiYacc / BiGUL

BiGUL (Hu, Ko, Trippel — bidirectional grammar update language). Write the grammar once, get parse + print for free with consistency guarantees. More tractable for implementation than full lenses; pattern-matches into how rule packs already feel (declarative, structural).

If we replaced regex with a tiny grammar — Projects/ segment / rest with segment and rest as named bindings — we’d get parse + print symmetrically.

Path templates with named slots

The “least surprising” evolution, and probably the right first step. URL routing systems have used this for decades — same primitive, well-defined limits, familiar to power users.

// Express / Fastify / NestJS — all use path-to-regexp underneath
app.get('/users/:userId/posts/:postId', (req, res) => {
  // req.params.userId, req.params.postId
});

# FastAPI — slots typed at the function signature
@app.get('/users/{user_id}/posts/{post_id}')
async def read_post(user_id: int, post_id: int): ...

// URL Pattern Standard (browser-native in Chromium; polyfill exists)
const pattern = new URLPattern({ pathname: '/users/:userId/posts/:postId' });
const result = pattern.exec({ pathname: '/users/42/posts/100' });
//   → { pathname: { groups: { userId: '42', postId: '100' } } }

# Next.js / SvelteKit / Astro — file-system as syntax
pages/users/[userId]/posts/[postId].tsx
pages/blog/[...slug].tsx              ← glob: catches arbitrary depth
pages/shop/[[...filters]].tsx          ← optional glob

# OpenAPI 3 — language-neutral path-templating standard
paths:
  /pets/{petId}:
    parameters:
      - name: petId
        in: path
        required: true
        schema: { type: string }

# gRPC HTTP transcoding — path templates as the REST↔RPC bridge
rpc GetBook(GetBookRequest) returns (Book) {
  option (google.api.http) = {
    get: "/v1/{name=publishers/*/books/*}"
  };
}

# Rails — :name slots, *splat for multi-segment
get '/users/:user_id/posts/:post_id', to: 'posts#show'
get '/files/*path', to: 'files#serve'                  # *path captures rest

// Symfony — {name} braces with constraint syntax
#[Route('/users/{userId}/posts/{postId}', requirements: ['userId' => '\d+'])]
public function show(int $userId, int $postId) { ... }

// Spring Boot — {name} braces with PathVariable annotation
@GetMapping("/users/{userId}/posts/{postId}")
public Post show(@PathVariable Long userId, @PathVariable Long postId) { ... }

# Phoenix — :name colons, route definitions in compile-time DSL
scope "/api", AppWeb do
  get "/users/:user_id/posts/:post_id", PostController, :show
end

// Tanstack Router — typed slots, file-system + code-defined routes
const postRoute = createRoute({
  path: '/users/$userId/posts/$postId',  // $name slot syntax
  parseParams: (params) => ({ userId: Number(params.userId), postId: Number(params.postId) }),
});

// Hono — :name slots, ergonomic for edge runtimes
app.get('/users/:userId/posts/:postId', (c) => {
  const { userId, postId } = c.req.param();
});

Weaker than full lenses (no formal laws, composition is informal), but covers the vast majority of folder-tag-sync use cases. The semantic information that regex hides — what part is the layer, what part is the variable, what name does it carry — becomes explicit in the syntax. The set of primitives is small enough to fit on one card: literal segments, single-segment slots {name}, glob slots {name...}, optional slots {name?}.

Syntax convergence. Looking across the dozen+ frameworks above, two slot conventions dominate:

Convention	Used by	Pros	Cons
`{name}` braces	OpenAPI, FastAPI, Spring, Symfony, gRPC	Reads as “data shape” — intuitive for non-developers	Conflict with template-string interpolation in some langs (Bash, JS)
`:name` colons	Express, NestJS, React Router, Phoenix, Hono, Rails	Less escape-character pressure, idiomatic in URL conventions	Looks like a CSS pseudo-class or YAML key to outsiders

$name (Tanstack), [name] (Next.js, Astro), *name (Rails splat) are minority dialects. For folder-tag-sync, {name} braces feel right — our user is a knowledge-worker authoring rule packs, not a backend engineer; the data-shape framing of {slug} and {rest...} is closer to the typed model already in place.

Template engines (forward / instantiation half)

Path-template matching (the “get” half) is well-explored above. The instantiation half (the “put”) has its own decades-deep prior art under “template engines” — same primitive applied to text generation rather than path matching. Most relevant for folder-tag-sync: when a rule’s tag template is #projects/{slug}/{rest...} and we have slot values { slug: 'Web', rest: 'auth' }, instantiation is exactly what these engines do.

Mustache / Handlebars — {{name}} syntax, deliberately logic-less, implementations in 40+ languages. Pure substitution model.
Go text/template — {{.UserId}} syntax, action grammar. Used in Helm charts, Kubernetes manifests, Hugo. Production-grade compile-once-execute-many.
Jinja2 (Python) — {{ name }} braces with filter pipeline ({{ name|upper }}). Closest analog to “slot value with per-slot transform” — exactly what Phase H’s per-slot transform composition would need.
Liquid (Shopify) — {{ name }} with safe-by-default rendering, used in Jekyll, Eleventy. Same shape as Jinja2.
ERB / EJS — <%= name %> block syntax. Less aligned with our needs (we want declarative templates, not embedded code).

The pattern across these: {{name}} for slot-with-transforms, {name} for slot-only. If we adopt the Jinja-style filter syntax for per-slot transforms in Phase H+ ({slug|kebab}), there’s decades of user familiarity to lean on.

{# Jinja-style per-slot transform — what folder-tag-sync's tag template
   could look like if we extend slots with transform pipelines: #}
folder: 'Projects/{slug}/{rest...}'
tag:    '#projects/{slug|kebab}/{rest|kebab}'

Knowledge-management adjacent tools

How do other note-taking and file-organization tools handle the same problem (declaring how files map to a different addressable namespace)? Useful comparison points:

TagSpaces — embeds tags in filenames: note[tag1 tag2].md. Effectively a path template note[{tags...}].md where the slot lives in the filename rather than the folder path. Same primitive, different placement. The bidirectional sync is implicit (rename the file, tags update; edit tags, filename updates).
Hazel (macOS) — rule-based filing tool. Rules are if-then chains: “if filename matches X, move to folder Y”. Forward-only (no inverse), but the condition language is regex-on-paths — exactly the primitive folder-tag-sync uses, applied to a different domain.
Logseq and Roam Research — block-based knowledge-graph tools. Block references (((block-id))) are a different primitive than path templates, but the design tension is the same: how does the underlying file/block layout connect to the user-facing knowledge graph?
DEVONthink — rule-based document filing with regex conditions and AI-assisted classification. A useful reminder that “deterministic regex rules” and “AI suggestions” can coexist — DEVONthink layers them cleanly.
Tinderbox — “smart adornments” that auto-tag notes based on declarative pattern conditions. Mark Bernstein has been refining this since 2002. Worth studying as a long-evolved design point.
Obsidian Templater plugin — <% tp.file.title %> syntax for note templating. Forward-only (a template renders into a new note); not bidirectional. But the syntax convention sits adjacent to where folder-tag-sync’s tag templates would land if we wanted in-vault discoverability.
Maggie Appleton’s research notes on note-taking systems — not a tool, but a thoughtful set of design observations on the folder/tag/link tension that informs the same problem space.

The pattern across these tools: forward-only is the norm; bidirectional is rare and a real differentiator. Folder-tag-sync’s commitment to bidirectional sync is itself a design choice worth highlighting in the docs (and Phase H makes it more rigorous).

Glob patterns and pathspec

Adjacent prior art worth mentioning: shell-style globs are the de facto path-pattern language across the Unix ecosystem. Less expressive than templates with named slots (no captures, position-only), but the syntax conventions are deeply familiar:

micromatch / minimatch — the npm-ecosystem matchers behind ESLint, Prettier, file-glob libraries. Support ** (globstar — multi-segment), * (single-segment), ? (single char), {a,b} (alternation), !(...) (negation).
Git pathspec — anchored with leading /, recursive with **, exclusion with !. The mental model that’s already in users’ heads when they author .gitignore.
rsync include/exclude — anchored, ordered rule lists with +/- prefixes. Production-tested for “select these files, skip those” matching at scale.

Glob doesn’t give us bijection. But the syntax conventions (** for multi-segment globstar, anchoring with /, alternation with {a,b}) are reusable lexicon when we design template syntax — borrow what’s familiar.

Tree pattern languages

XPath, JSONPath, JsonLogic. The vault folder structure IS a tree. Tree pattern languages match on tree shape rather than serialized path strings. Useful if the abstraction needs to handle structural queries beyond linear paths (“all leaf folders under X”, “any folder whose parent matches Y”). Worth holding in reserve; not the immediate target.

Datalog & logic programming

Datalog, Soufflé, or any logic-based bidirectional rules engine. Maximum expressiveness — bidirectional reasoning falls out of the relational model essentially for free. Almost certainly overkill for an Obsidian plugin. Useful as a north-star (“what would the most powerful version look like?”), not a near-term implementation target.

Proposed evolution — bidirectional path templates with typed slots

What would folder-tag-sync’s rule data model look like if templates replaced regex as the user-facing primitive?

Today (Phase G):

{
  folderEntry: 'Projects',
  folderAnchor: { under: 'fixtures' },
  // ... derived: folderPattern: '^fixtures/Projects(?:/|$)'
  tagEntry: 'projects',
  // ... derived: tagPattern: '^projects/'
  transfer: { op: 'identity' },
}

Phase H sketch:

{
  folderTemplate: 'fixtures/Projects/{rest...}',
  tagTemplate:    '#projects/{rest...}',
  // bijection automatic from slot overlap
}

Slots are written as {name} (single segment) or {name...} (one or more — glob). Both templates compile to regex internally; sync engines still consume the compiled folderPattern for matching speed. The slot data flows in both directions:

Forward (folder → tag):
  fixtures/Projects/Web/auth-rewrite
  ───────  ──────── ─────────────────
  literal  literal  {rest...}
                        │
                        ▼ slot extraction
                    rest = "Web/auth-rewrite"
                        │
                        ▼ instantiate tag template
                    #projects/Web/auth-rewrite
                                 ─────────────
                                 {rest...} filled

Inverse (tag → folder):
  #projects/Web/auth-rewrite
   ──────── ─────────────────
   literal  {rest...}
              │
              ▼ slot extraction
          rest = "Web/auth-rewrite"
              │
              ▼ instantiate folder template
          fixtures/Projects/Web/auth-rewrite
                            ─────────────────
                            {rest...} filled

What this gets us

Bijection visible at authoring time. Slots that appear on both sides round-trip. Slots only on one side are derivation-only or capture-only — the structure tells you. No more separate bijective: boolean field.
Anchor concept disappears. The template’s literal prefix IS the anchor. 'Projects/{slug}' is root-anchored; '{base}/Projects/{slug}' is any-segment with the parent captured into base; 'fixtures/Projects/{slug}' is the under-prefix case spelled out literally.
Inference becomes parsing instead of regex pattern-matching. No more inferEntryFromPattern hand-rolled string surgery. Re-loading a rule means parsing its template once.
Sync engine gains slot-level access for transforms. Per-slot case rules become possible — {slug} could carry a transform spec (“this slot is kebab-cased on the tag side”). Today’s caseTransform applies globally; templates open up per-slot composition cleanly.
Power-user escape hatch remains. Raw regex stays available in the advanced modal for cases templates can’t express.

What about the existing typed model?

FolderClassifier, TagVocabulary, and TransferOp don’t go away — they’re orthogonal. The template describes the shape; the typed model describes the semantics. A marker-only rule with template 'Capture/Inbox/{rest...}' and a tag template that omits {rest...} (just emits #capture-inbox) is still a marker-only rule — the typed semantics tell you that, the templates tell you the structural mapping.

Cardinality/bijective fall out of the template shapes too: count slots that appear on both sides. All slots shared → bijective. Folder-side has a slot the tag side doesn’t → lossy folder-to-tag direction. The metadata becomes a derivable view over the structure rather than asserted alongside it.

Reference implementations — what we could borrow

Phase H doesn’t have to be greenfield. Several existing libraries do exactly the compile-template-to-regex + extract-slots + instantiate-from-slots dance. Listed in priority order for fit:

`path-to-regexp` (most directly applicable)

path-to-regexp — the regex-compiler behind Express, NestJS, Fastify, ky, react-router. Production-grade, ~7M weekly downloads. Exports both directions:

import { match, compile } from 'path-to-regexp';

// Forward: extract slots from a path
const fn = match('/users/:userId/posts/:postId');
fn('/users/42/posts/100');
// → { path: '/users/42/posts/100', params: { userId: '42', postId: '100' } }

// Inverse: build a path from slot values
const toPath = compile('/users/:userId/posts/:postId');
toPath({ userId: '42', postId: '100' });
// → '/users/42/posts/100'

The library handles syntax sugar we’d otherwise build ourselves: optional slots (:name?), repeating segments (:rest+ and :rest*), custom slot patterns (:name(\\d+)), escape characters. It compiles down to standard RegExp so sync engines stay pattern-agnostic.

Tradeoffs: 8KB+ minified, opinionated :name syntax (no {name} braces), tied to web/URL conventions (separator is always /). Could vendor a tiny subset, or pull in as a dependency.

URL Pattern Standard / `urlpattern-polyfill`

URL Pattern Standard — modern web standard, browser-native in Chromium. urlpattern-polyfill for non-browser environments.

const pattern = new URLPattern({ pathname: '/Projects/:slug/:rest*' });
const result = pattern.exec({ pathname: '/Projects/Web/auth-rewrite' });
// result.pathname.groups → { slug: 'Web', rest: 'auth-rewrite' }

Same primitive as path-to-regexp but with a structured spec. Slightly heavier (it’s URL-shaped, not just path-shaped), but stable / standardized / has multi-vendor implementation effort behind it.

`micromatch` (glob-flavored matching)

micromatch — the matcher behind most npm-ecosystem path tooling. Glob-shaped (no named captures), but battle-tested for vault-scale path enumeration:

import micromatch from 'micromatch';

micromatch(['Projects/Web', 'Areas/Health'], 'Projects/**');
// → ['Projects/Web']

// Capture mode (limited; positional, not named):
const captures = micromatch.capture('Projects/*/auth', 'Projects/Web/auth');
// → ['Web']

Useful for the match half of the equation; useless for the inverse (positional capture without named slots can’t reliably round-trip). Worth knowing about as the reference implementation for “vault scan, find candidates” workflows.

`monocle-ts` (lens-flavored, TypeScript-native)

monocle-ts — TypeScript Profunctor-style optics. Mostly forward-direction (getters/setters), but composes cleanly. The “what would adopting lenses look like in our actual codebase” reference.

import { Lens } from 'monocle-ts';

interface ParaPath { entry: 'Projects'; slug: string; rest?: string }

const slugLens = Lens.fromProp<ParaPath>()('slug');
const slug = slugLens.get(parsedPath);            // 'Web'
const updated = slugLens.set('NewName')(parsedPath);

Heavier learning curve than path-to-regexp; pays off if we eventually want full lens-law guarantees rather than just slot extraction.

Side-by-side capability matrix

Library / spec	Forward (match)	Inverse (instantiate)	Named slots	Globs	Optional	Per-slot transforms	License	Bundle size	Fit
`path-to-regexp`	✓	✓	✓	✓ (`*`, `+`)	✓ (`?`)	✗	MIT	~8 KB	Best
URL Pattern Standard	✓	partial	✓	✓ (`*`)	✓	✗	Spec	(native)	Good
`urlpattern-polyfill`	✓	partial	✓	✓	✓	✗	Apache-2.0	~30 KB	Heavy
`micromatch`	✓	✗	✗ (positional)	✓ (`**`, `{a,b}`)	✓	✗	MIT	~25 KB	Match-only
`monocle-ts`	✓	✓	n/a (typed access)	✗	n/a	✗	MIT	~15 KB	Heavy / formal
Augeas (C)	✓	✓	✓	✓	✓	✓	LGPL	C lib	Reference only
Mustache/Handlebars	✗	✓	✓	✗	✗	✓ (helpers)	MIT	varies	Inverse-only
Jinja2	✗	✓	✓	✗	✗	✓ (filters)	BSD	Python	Syntax inspiration
Hand-rolled (~50 LOC)	✓	✓	✓	✓	✓	future	n/a	~1 KB	Likely choice

Recommendation

For Phase H’s first cut: write the compiler ourselves (~50 lines as the Migration story section sketches, plus tests). The surface is small enough that vendoring path-to-regexp is overkill, and the rule-pack file format already has its own JSON shape — the slot syntax just needs to round-trip cleanly through that.

What we borrow from the prior art:

Slot syntax: {name} braces (OpenAPI / FastAPI / Spring / Symfony / Mustache convention). Reads as “data shape” rather than URL path, which fits how rule packs are authored.
Glob slot suffix: {name...} for multi-segment (Next.js [...rest]-flavored, since * already has regex meaning).
Optional slots: {name?} (path-to-regexp / Mustache).
Future per-slot transforms: {name|kebab} (Jinja-style pipe operator) — Phase H+ or wherever transform composition lands.
Glob conventions for any-segment matching: ** from gitignore/micromatch, if we extend templates to support arbitrary-depth matching beyond the explicit {name...} glob slot.
Bidirectional consistency thinking from lenses — even if we don’t formalize the laws, we name the consistency requirement explicitly: “slots that appear on both sides round-trip; everything else is documented as one-way.”

If we hit composition/expressiveness limits in Phase H+ (multi-template fan-out, formal bijection checking, edit propagation), revisit monocle-ts or full Boomerang-style lenses then. The path-template surface has plenty of room to grow without leaving the ~50-LOC compiler.

Open questions — where the abstraction might still leak

Optional vs required slots. {slug?} or some trailing-? syntax? What does omission mean — does the template fall through to a shorter form, or does the rule decline to match?
Slot cardinality. {slug} is exactly one segment; {rest...} is one-or-more. What about zero-or-more? What about a fixed depth ({a}/{b} matches exactly two segments)? Maps to the existing truncation.depth + tailHandling choices, but that translation has corners.
Per-slot transforms. If {slug} on the tag side is implicitly kebab-cased, what does that mean when the template also declares a global caseTransform? Composition order matters and gets confusing fast. The rule for “transforms apply per-slot only when explicitly declared” is probably the right default.
Many-to-one fan-out. Multiple folder templates collapsing into the same tag (e.g., 'Projects/{slug}' AND 'Active/Projects/{slug}' both emit #projects/{slug}). Single-template rules can’t express this; needs a higher-level “alternation” or multiple rules + priority.
Static bijection checking. Can we tell at authoring time whether a template pair is lossy? Slot-set comparison gets us most of the way — folderSlots ⊆ tagSlots ⇒ folder-to-tag is bijective; etc. — but transforms and conditional logic complicate the picture.
Unicode literals in templates. The cyberbase-actual rule pack uses emoji prefixes (⬇️ Clipping). Templates need to handle unicode in literal segments cleanly — verifiable in the compiler tests.

Migration story (Phase H plan summary)

Define the type and slot syntax — PathTemplate, SlotDef, CompiledTemplate in src/types/typed.ts. Optional fields on TypedRuleSpec.
Pure compiler — new src/engine/compileTemplate.ts with compileTemplate, extractSlots, instantiateTemplate. Comprehensive unit tests for single-segment, glob, mixed, optional, unicode literals, escape characters.
Sync-engine slot extraction — applyRuleForward / applyRuleInverse use extractSlots + instantiateTemplate when a rule has templates. Anchor-aware regex strip stays as the legacy path.
Derivation branch — when a rule pack provides folderTemplate, deriveRule compiles it and stores both the regex (for engine matching) and the slot metadata (for forward/inverse extraction).
Loader validation — balanced braces, valid slot names, optional fields. Existing packs continue to load without templates.
Guided modal — visual slot diagram. The most uncertain piece. Two text inputs (folder template, tag template); below, a visual shows each slot as a chip — green if it appears on both sides, yellow if only one (lossy), blue if it picks up a per-slot transform. Will likely need its own mini-plan after the engine work is solid.
Migrate one shipped rule pack — PARA most likely (simplest). Verify both old and new paths produce identical sync behavior. Worked example for this very document to point to.

Open invitation

This is a research challenge in the literal sense — an architectural question we want to explore in code, not just on paper. Counterexamples (rules templates can’t express), pointers to additional prior art, or implementation contributions are all welcome. Open an issue at obsidian-folder-tag-sync to discuss.

Phase G commits 1-5 already shipped (folderAnchor first-class). The remaining Phase G commits (anchor selector UI, fixtures) land before Phase H starts. The research here grounds why Phase H is the next step, not a far-future evolution.

Transfer operations — the 8 primitives templates layer over (this is the load-bearing primitives page)
Bijection and loss — the bridge from primitives to round-trip behavior; the collision-vs-lossy distinction explained at length
Terminology — plain-English glossary covering the vocabulary used in this entry
Philosophy — why typed layers exist, why determinism is non-negotiable
When to use regex — current escape hatch (will remain in Phase H)
Open questions — design decisions still in flight
Tradeoffs — chosen-vs-rejected captures