Regex vs. path templates — abstraction research
Frame: what’s actually being mapped
Section titled “Frame: what’s actually being mapped”The plugin sits between two structurally different namespaces. A clean picture of the bidirectional problem before any technical depth:
folderEntryPoint = "Projects" · folderAnchor = 'root' · transfer.op = 'identity'Projects/Web Auth/oauth-flow/notes.md^Projects/([^/]+)/([^/]+)/([^/]+)$- match? yes
- 3 positional capture groups:
"Web Auth","oauth-flow","notes.md"
- what role does each group play? unnamed
- per-slot transform handle? none
- how to invert? hand-rolled string surgery
- bijection visible from pattern alone? no — asserted via
cardinality+bijectivemetadata
Projects/{project}/{tail...}- match? yes
project="Web Auth"← one project entry under PARAtail="oauth-flow/notes.md"← everything deeper, glob
- layer = literal prefix Projects/ (no separate
folderAnchorneeded) - per-slot transform handle:
{project | kebab}kebab-cases just the project name - inverse = template instantiation with slot values
- bijection: visible — both sides share
{project}and{tail...}= round-trips
{project} here just means "the single path segment that comes immediately after the Projects/ entry." It doesn't try to be the user's vocabulary — it labels what role that segment plays in this rule. The same template applied to a different vault would still call it {project}; the slot is named for its position relative to the literal prefix, not for what it semantically represents on disk.Aside — “what about Johnny Decimal’s
10 - Projects?” Both abstractions agree:10 - Projectsis one composite path segment on disk. The regex captures it via([^/]+)as one group; a template captures it via one slot like{jdEntry}. Neither view splits the10 -prefix from the rest — that’s not what either pattern is doing. The prefix is sort-order metadata baked into the folder name; turning10 - Projectsinto the tag-sideprojectsis the job of an existing transform primitive:numberPrefixHandling: 'strip'. That runs after match/extract, in the transform pipeline. Both regex rules and template rules are agnostic to it — the transform primitive does the work either way. So the pattern abstraction question (regex vs. template) and the prefix-stripping question (number-prefix transform) are orthogonal. Templates don’t change whatnumberPrefixHandlingdoes; they change what the engine knows about the slot it’s applied to.
Three components, three asymmetries:
- Filesystem is a strict hierarchy — every file has exactly one path, exactly one parent folder. The OS enforces this; we don’t get to negotiate it.
- Tag namespace is a polyhierarchy — the same file can be reachable via many tag paths (
#projects/web,#ritual/auth,#q4-2026all addressing the samenote.md). Tags compose freely; the same nested term (#projects/web) can sit under multiple roots without contradiction. - Sync engine mediates: each rule is a (folder pattern, tag pattern, transfer-op) triple that describes a correspondence. Whether that correspondence is invertible — i.e. whether
forward(inverse(t)) === tfor every tagtthe rule produces — is the question at the heart of this document.
Direct answer to “is full bidirectional determinism always achievable?”: no, and that’s by design. Identity-style rules (PARA’s Projects/{slug} ↔ #projects/{slug}) round-trip perfectly. Lossy operations — truncation-with-drop, marker-only, promotion-to-root — deliberately throw information away in one direction; the inverse can’t reconstruct what was dropped. The abstraction we want isn’t one that forces determinism (that would just disallow useful rules), but one that makes the per-rule determinism status visible at authoring time, instead of being asserted as metadata afterward. That’s the criterion path templates with typed slots satisfy and bare regex doesn’t.
Vocabulary borrowed from prior fields
Section titled “Vocabulary borrowed from prior fields”These terms are load-bearing for the rest of this document. They’re pulled from information science (classification theory, knowledge organization), formal language theory, and bidirectional programming research — established terms, not invented for this plugin.
Strict hierarchy vs. polyhierarchy
Section titled “Strict hierarchy vs. polyhierarchy”Classification-theory terms. A strict hierarchy (one parent per child — a tree) enforces single parentage at every level. A polyhierarchy (multi-parent structure — same item sits under several broader categories at once) permits a directed acyclic graph where the same node can sit under several broader categories without ambiguity. Library Subject Heading systems (LCSH = Library of Congress Subject Headings; MeSH = Medical Subject Headings) are explicitly polyhierarchical for exactly the reasons Obsidian tags are: real-world concepts don’t fit into one parent category. Folder-tag-sync’s reason-to-exist is bridging a strict-hierarchy primitive (filesystem) to a polyhierarchical addressing system (tags).
Pre-coordination vs. post-coordination
Section titled “Pre-coordination vs. post-coordination”Already covered in philosophy. Briefly: a pre-coordinated descriptor fuses concepts into a single hierarchical token (#projects/q4-roadmap is one term carrying two concepts joined by subordination). A post-coordinated descriptor splits concepts into independent tags applied together (#projects AND #q4 AND #roadmap). Folder paths are inherently pre-coordinated; tag systems can be either.
Syntax vs. semantics
Section titled “Syntax vs. semantics”Regex captures syntax: does this character sequence satisfy this pattern? It says nothing about semantics: which part of the matched sequence is the layer (where in the tree the rule fires), which part is a variable (capturable, recoordinable content), what name the variable carries, what role it plays in the rule. Phase G’s folderAnchor field exists because the syntactic representation hid the semantic question “where does this rule anchor?” Path templates with named slots are the more direct encoding: the literal segments are syntax, the {name} slots carry semantics, and the conversion between them is mechanical rather than interpretive.
Plain-English version: regex tells the engine “does this string look right?” Templates tell the engine “what role does each piece of this string play?” Both produce the same yes/no match answer. The difference is what the system can do after the match — generate the inverse, surface a slot to the user, run a per-slot transform, prove bijection — because templates know which piece is which.
Lossy vs. lossless transformation
Section titled “Lossy vs. lossless transformation”Information-theoretic. A transformation is lossless when the input can be perfectly reconstructed from the output; lossy when it can’t. Identity rules are lossless in both directions (folder ↔ tag preserves content). truncation with tailHandling: 'drop' is lossy folder-to-tag (segments past the depth cap are erased) and partial tag-to-folder (the inverse can only restore the depth-capped prefix). marker-only collapses any folder under the entry into a single fixed tag — maximally lossy in the folder→tag direction.
Inbox/yesterday.md
Inbox/2024/Q4/note.md
(one tag for all)
Collision vs. lossy — distinct failure modes
Section titled “Collision vs. lossy — distinct failure modes”These two terms describe different problems that both look like “the abstraction is letting me down”:
- Collision is a forward-direction problem: two distinct inputs accidentally produce the same output because the rule’s pattern was too permissive.
- Lossy is an inverse-direction problem: one output could map back to many inputs by design — the forward transformation deliberately dropped information.
Solving one doesn’t automatically solve the other. A rule can be perfectly bijective on its matched domain (no lossy) but still cause collisions if its pattern over-matches (Entity/Cybersader/10 - Projects/foo and Entity/Bob/10 - Projects/foo both match a root-anchored ^10 - Projects rule). Conversely, a marker-only rule is defined to be lossy, but never collides — every match goes to the same tag intentionally.
Entity/Bob/10 - Projects/foo
↓ same root-anchored ^10 - Projects rule fires on both
#10-projects/foo ← same tag for both
↓ inverse cannot uniquely reconstruct
Inbox/
Inbox/today.md
Inbox/2024/Q4/note.md
Plain-English version: Collision is when the abstraction lets two things look the same that shouldn’t. Lossy is when the abstraction deliberately throws information away. Different failure modes; different fixes. The fix for collision is usually a more specific pattern (or capturing the disambiguator into the tag). The fix for lossy is to accept that the rule is many-to-one and design the user experience around that.
Surjection, injection, bijection
Section titled “Surjection, injection, bijection”Function-theoretic terms (formal vocabulary from set theory for the shapes a function can take) that formalize what “lossless in both directions” means. Given a function f: A → B:
- Injective (one-to-one — distinct inputs always go to distinct outputs; no two inputs collide). Folder→tag of an identity rule is injective: two different folder paths produce two different tags.
- Surjective (onto — every possible output is reached by some input; no “unused” outputs). Tag→folder of a marker-only rule is surjective on its (tiny) image — every tag the rule produces (just one, the marker) corresponds to many folders.
- Bijective (both injective and surjective — perfect 1-to-1 correspondence; the function has a true inverse). Identity rules with no destructive transforms are bijective.
The plugin’s bijective: boolean field on rules is asking exactly this question. Today it’s asserted by the typed-spec semantics. The point of Phase H is to make it computable from the (folder template, tag template) pair — slots that appear on both sides round-trip; slots only on one side document a lossy direction.
Homomorphism, isomorphism
Section titled “Homomorphism, isomorphism”Abstract-algebra terms (vocabulary from the math of structure-preserving maps between systems). A homomorphism (a structure-preserving map in one direction — e.g., the function preserves “this thing is a sub-part of that thing”). An isomorphism (a homomorphism with a structure-preserving inverse — the two structures are formally interchangeable). A perfectly bidirectional rule is asking to be an isomorphism between a folder shape and a tag shape. The lens calculus’s three round-trip laws — GetPut (putting back what you got gives the original), PutGet (getting after putting gives what you put), PutPut (putting twice equals putting once) — are the formal version of “this rule defines an isomorphism on its domain.”
Putting them together: regex vs. templates, in vocabulary
Section titled “Putting them together: regex vs. templates, in vocabulary”| Question | Regex view | Template view |
|---|---|---|
| Where does the rule anchor? | Buried in pattern syntax (^X vs `(?:^ | /)Xvs^P/X`) — semantics inferred from syntax shape |
| What part is variable? | Unnamed capture group (.+) — positional, no role | Named slot {slug} — role visible at authoring time |
| Is the rule bijective? | Asserted via separate bijective metadata field | Computable from slot overlap on both sides |
| Is one direction lossy? | Asserted via cardinality metadata | Computable from which slots appear in which template |
| Forward composition (folder → tag) | Compile regex, match, position-extract, transform | Compile template, slot-extract, instantiate target template |
| Inverse composition (tag → folder) | Hand-rolled string surgery (entry-strip, anchor prepend) | Same as forward, with templates swapped |
The tradeoff is summarized: regex hides semantics inside syntax; templates surface them. Both compile to the same runtime regex, but at authoring time the template view answers the questions the regex view forces us to compute via metadata.
What surfaced
Section titled “What surfaced”Phase G made layer a first-class concept on rules — every rule now declares whether it anchors at vault root, at any path-segment boundary, or under a specific parent prefix. The motivating bug was concrete: the user’s dev vault has Johnny Decimal folders nested under fixtures/10 - Projects, but the JD pack’s ^\d{2} - X pattern requires path-start. Preview showed 0 matches; the rule was correctly imported but anchored to a layer the vault didn’t use.
Adding the folderAnchor field fixed the immediate bug. But while writing it, several pieces of code felt like they were doing the wrong job:
src/engine/inferTyped.ts:inferEntryFromPatternis hand-rolled regex parsing — it strips known suffixes ((?:/|$),(?:/.*)?$), checks for leftover metacharacters, returns a string. We’re parsing our own emitted regex back into structure.src/engine/applyTransfer.ts:buildEntryStripPatternbuilds another regex to strip the entry portion from a matched path. Three branches, one per anchor mode.- The
cardinalityandbijectivefields onMappingRuleare computed from typed-spec semantics — not derivable from the regex pair. The regex doesn’t know whether the rule is bidirectional.
The pattern is the same in each case: regex captures syntax (this string matches that regex), not semantics (this rule lives at this layer, has these named parts, round-trips to that target). The semantic information that makes folder-tag-sync interesting — entry points, anchors, slot extraction, transform composition — has been re-encoded around the regex rather than expressed in the regex itself.
This entry is the research that follows from noticing that.
The tension — where regex leaks
Section titled “The tension — where regex leaks”Regex does two distinct jobs in the plugin today, and we’re conflating them:
- Membership predicate — does this folder path satisfy the rule? (a gate)
- Structural extractor — given a matching path, what are its parts? (a parser)
For (1), regex is fine. RegExp.test() is fast, well-understood, escapeable. For (2) the seams show:
| Job | Current implementation | Why it’s awkward |
|---|---|---|
| Strip the entry-point prefix from a matched path so the remainder can be recoordinated | folderPath.replace(new RegExp(\^${entry}/?`), ”)` — anchor-aware variant added in Phase G | The pattern matched the path, but we run a different regex to extract structure. Two passes, two sources of truth. |
| Recover the entry literal from a derived rule, so the guided modal can show it as a form field | inferEntryFromPattern(rule.folderPattern) — pattern-shape parsing | We emit `(?:/ |
| Decide whether two rules round-trip without information loss | cardinality + bijective fields, computed from TransferOp shape | The regex pair (folderPattern, tagPattern) doesn’t tell you. We compute it from the typed-spec semantics, then attach as metadata. |
The third one is the deepest leak: bijection is asserted, not proven. We say a rule is bijective because the typed model says identity-transfer + entry-points-on-both-sides ⇒ round-trip. But there’s no automated check that the regex pair is consistent with that claim. If a rule pack author writes folderPattern: '^Projects(?:/|$)' and tagPattern: '^archive/', the metadata might still claim bijective: true if the typed fields say so.
For PARA / JD / SEACOW, the typed model is enough — these rules are simple enough that semantics + heuristics gets us there. But the abstraction is leaking, and Phase G’s folderAnchor field is the most recent leak made visible.
Evaluation criteria
Section titled “Evaluation criteria”What does “the right abstraction” need to satisfy here?
- Bijectivity by construction. Forward + inverse pairs that compose, with round-trip consistency provable (or at least checkable) from the rule’s structure alone — not from a separate metadata field.
- User authoring cost. The guided-modal must remain learnable for non-regex users; the abstraction can’t require knowing recursion schemes or category theory.
- Performance. 10k+ file vault scans must stay fast. The abstraction should compile to something close to a regex (or be one) at runtime.
- Composability. Rule packs that nest. SEACOW outer wrapping PARA wrapping individual projects. The abstraction should support a rule pack that says “I live inside whatever pack scopes me.”
- Power graceful-degradation. When a rule shape is too complex for the abstraction, raw regex stays available as an escape hatch. The advanced editor remains the power-user surface.
- Reversibility limits. Some rules are intentionally lossy (marker-only, promotion-to-root). The abstraction must let lossy be a first-class property — not a bug to design around.
Prior art
Section titled “Prior art”Lenses & bidirectional programming
Section titled “Lenses & bidirectional programming”Combinators for Bi-Directional Tree Transformations: A Linguistic Approach to the View-Update Problem (Foster, Greenwald, Moore, Pierce, Schmitt — POPL 2005, TOPLAS 2007). The seminal academic work. A lens is a forward + backward pair (get, put) satisfying three round-trip laws (GetPut, PutGet, PutPut) that guarantee consistency. Lenses compose — a complex bidirectional transformation is built from primitive lenses combined with sequencing, mapping, conditional, etc.
Implementations:
- Boomerang — the canonical bidirectional language built on the lens calculus
- Haskell
lens(Kmett) — gold-standard functional optics monocle-ts(Giulio Canti) — TypeScript Profunctor lenses; closest fit if we wanted to vendor an existing JS-ecosystem librarypartial-lenses— JS lenses with first-class handling for missing fields (relevant to optional slots)
// monocle-ts shape — what a lens-based PARA rule could look like:import { Lens } from 'monocle-ts';
const projectsPrefix = Lens.fromProp<FolderPath>()('prefix'); // get/set 'Projects/'const slugLens = Lens.fromProp<FolderPath>()('slug'); // captured partconst tagPrefix = Lens.fromProp<TagSpec>()('namespace'); // '#projects/'
// Compose forward (folder → tag); inverse falls out automatically:const paraProjects = projectsPrefix.composeLens(slugLens).composeLens(tagPrefix);Folder-tag-sync’s transfer and inverseTransfer fields are an informal lens. Making them formal would mean: each rule is literally a lens; sync is lens.get; reverse-sync is lens.set; the laws guarantee bidirectional consistency.
Production lens implementations worth studying:
-
Augeas — a C library that edits Linux config files via lenses. Each config-file format (
/etc/hosts,/etc/sshd_config, etc.) has a hand-written lens that round-trips between the on-disk text format and a structured tree. The most production-tested lens implementation in real-world software. Read source atgithub.com/hercules-team/augeas. Lens definitions live inlenses/*.augfiles — DSL syntax likelet lns = (record . eol)*that compiles to bidirectional get/put pairs. Closest precedent to “we have on-disk artifacts (config files / vault folders) and want a structured edit/query interface.” -
Unison file synchronizer — Benjamin Pierce’s earlier project (1995+), the system that motivated the original lens research. Bidirectional file synchronization: edits from either side propagate, conflicts are surfaced, the sync runs to fixpoint. Same author who later wrote the lens papers. Folder-tag-sync’s bidirectional sync sits in nearly the same problem space — user edits in either world (filesystem vs. tag namespace), system propagates.
Refinements & follow-up academic work
Section titled “Refinements & follow-up academic work”The lens calculus has been extended in several useful directions:
- Quotient lenses (Foster, Pilkiewicz, Pierce — POPL 2008) — lenses up to equivalence. Useful for transformations that should be insensitive to whitespace, ordering, or case differences (which the plugin’s
caseTransformis exactly). - Edit lenses (Hofmann, Pierce, Wagner — ICFP 2012) — propagate edits rather than complete states. Maps well onto folder-tag-sync’s sync model: don’t recompute everything when one folder moves; propagate the move as a delta.
- Putback-based bidirectional programming (Hu, Mu, Takeichi — JFP 2014) — start from the put (inverse) direction; the get falls out. Often more intuitive for non-academics; matches how rule pack authors actually think (“I want the tag side to look like this; what folder produces it?”).
- Bidirectional Transformations Workshop (BX) — annual academic venue. Living bibliography of bx research; useful for finding more recent papers.
- Triple Graph Grammars (TGG) — bidirectional graph transformation, mostly used in model-driven engineering. Heavier-weight than what folder-tag-sync needs but worth noting as a parallel lineage.
// Hand-written sketch (no library):const paraProjectsLens: Lens<FolderPath, Tag> = compose( prefixLens('Projects'), // get: strip 'Projects/' ; put: prepend 'Projects/' caseLens('Title', 'kebab'), // get: kebab-ize ; put: title-ize tagPrefixLens('projects'), // get: prepend '#projects/' ; put: strip);Asymmetric and symmetric lenses
Section titled “Asymmetric and symmetric lenses”Hofmann, Pierce, Wagner — Symmetric Lenses (POPL 2011); also Asymmetric Lenses. Relaxes the symmetry assumption — one direction can be lossy if the structure is correctly accounted for. Maps directly onto our cardinality field: lossy direction = many:1, lossless = 1:1.
This is the most directly relevant academic frame for folder-tag-sync. Folders → tags is sometimes lossy (truncation drops segments below the cap; marker-only collapses any structure under the entry to a fixed term). Tags → folders is correspondingly partial (you can’t recover what truncation dropped). Asymmetric lenses formalize that exactly.
BiYacc / BiGUL
Section titled “BiYacc / BiGUL”BiGUL (Hu, Ko, Trippel — bidirectional grammar update language). Write the grammar once, get parse + print for free with consistency guarantees. More tractable for implementation than full lenses; pattern-matches into how rule packs already feel (declarative, structural).
If we replaced regex with a tiny grammar — Projects/ segment / rest with segment and rest as named bindings — we’d get parse + print symmetrically.
Path templates with named slots
Section titled “Path templates with named slots”The “least surprising” evolution, and probably the right first step. URL routing systems have used this for decades — same primitive, well-defined limits, familiar to power users.
// Express / Fastify / NestJS — all use path-to-regexp underneathapp.get('/users/:userId/posts/:postId', (req, res) => { // req.params.userId, req.params.postId});# FastAPI — slots typed at the function signature@app.get('/users/{user_id}/posts/{post_id}')async def read_post(user_id: int, post_id: int): ...// URL Pattern Standard (browser-native in Chromium; polyfill exists)const pattern = new URLPattern({ pathname: '/users/:userId/posts/:postId' });const result = pattern.exec({ pathname: '/users/42/posts/100' });// → { pathname: { groups: { userId: '42', postId: '100' } } }# Next.js / SvelteKit / Astro — file-system as syntaxpages/users/[userId]/posts/[postId].tsxpages/blog/[...slug].tsx ← glob: catches arbitrary depthpages/shop/[[...filters]].tsx ← optional glob# OpenAPI 3 — language-neutral path-templating standardpaths: /pets/{petId}: parameters: - name: petId in: path required: true schema: { type: string }# gRPC HTTP transcoding — path templates as the REST↔RPC bridgerpc GetBook(GetBookRequest) returns (Book) { option (google.api.http) = { get: "/v1/{name=publishers/*/books/*}" };}# Rails — :name slots, *splat for multi-segmentget '/users/:user_id/posts/:post_id', to: 'posts#show'get '/files/*path', to: 'files#serve' # *path captures rest// Symfony — {name} braces with constraint syntax#[Route('/users/{userId}/posts/{postId}', requirements: ['userId' => '\d+'])]public function show(int $userId, int $postId) { ... }// Spring Boot — {name} braces with PathVariable annotation@GetMapping("/users/{userId}/posts/{postId}")public Post show(@PathVariable Long userId, @PathVariable Long postId) { ... }# Phoenix — :name colons, route definitions in compile-time DSLscope "/api", AppWeb do get "/users/:user_id/posts/:post_id", PostController, :showend// Tanstack Router — typed slots, file-system + code-defined routesconst postRoute = createRoute({ path: '/users/$userId/posts/$postId', // $name slot syntax parseParams: (params) => ({ userId: Number(params.userId), postId: Number(params.postId) }),});// Hono — :name slots, ergonomic for edge runtimesapp.get('/users/:userId/posts/:postId', (c) => { const { userId, postId } = c.req.param();});Weaker than full lenses (no formal laws, composition is informal), but covers the vast majority of folder-tag-sync use cases. The semantic information that regex hides — what part is the layer, what part is the variable, what name does it carry — becomes explicit in the syntax. The set of primitives is small enough to fit on one card: literal segments, single-segment slots {name}, glob slots {name...}, optional slots {name?}.
Syntax convergence. Looking across the dozen+ frameworks above, two slot conventions dominate:
| Convention | Used by | Pros | Cons |
|---|---|---|---|
{name} braces | OpenAPI, FastAPI, Spring, Symfony, gRPC | Reads as “data shape” — intuitive for non-developers | Conflict with template-string interpolation in some langs (Bash, JS) |
:name colons | Express, NestJS, React Router, Phoenix, Hono, Rails | Less escape-character pressure, idiomatic in URL conventions | Looks like a CSS pseudo-class or YAML key to outsiders |
$name (Tanstack), [name] (Next.js, Astro), *name (Rails splat) are minority dialects. For folder-tag-sync, {name} braces feel right — our user is a knowledge-worker authoring rule packs, not a backend engineer; the data-shape framing of {slug} and {rest...} is closer to the typed model already in place.
Template engines (forward / instantiation half)
Section titled “Template engines (forward / instantiation half)”Path-template matching (the “get” half) is well-explored above. The instantiation half (the “put”) has its own decades-deep prior art under “template engines” — same primitive applied to text generation rather than path matching. Most relevant for folder-tag-sync: when a rule’s tag template is #projects/{slug}/{rest...} and we have slot values { slug: 'Web', rest: 'auth' }, instantiation is exactly what these engines do.
- Mustache / Handlebars —
{{name}}syntax, deliberately logic-less, implementations in 40+ languages. Pure substitution model. - Go
text/template—{{.UserId}}syntax, action grammar. Used in Helm charts, Kubernetes manifests, Hugo. Production-grade compile-once-execute-many. - Jinja2 (Python) —
{{ name }}braces with filter pipeline ({{ name|upper }}). Closest analog to “slot value with per-slot transform” — exactly what Phase H’s per-slot transform composition would need. - Liquid (Shopify) —
{{ name }}with safe-by-default rendering, used in Jekyll, Eleventy. Same shape as Jinja2. - ERB / EJS —
<%= name %>block syntax. Less aligned with our needs (we want declarative templates, not embedded code).
The pattern across these: {{name}} for slot-with-transforms, {name} for slot-only. If we adopt the Jinja-style filter syntax for per-slot transforms in Phase H+ ({slug|kebab}), there’s decades of user familiarity to lean on.
{# Jinja-style per-slot transform — what folder-tag-sync's tag template could look like if we extend slots with transform pipelines: #}folder: 'Projects/{slug}/{rest...}'tag: '#projects/{slug|kebab}/{rest|kebab}'Knowledge-management adjacent tools
Section titled “Knowledge-management adjacent tools”How do other note-taking and file-organization tools handle the same problem (declaring how files map to a different addressable namespace)? Useful comparison points:
-
TagSpaces — embeds tags in filenames:
note[tag1 tag2].md. Effectively a path templatenote[{tags...}].mdwhere the slot lives in the filename rather than the folder path. Same primitive, different placement. The bidirectional sync is implicit (rename the file, tags update; edit tags, filename updates). -
Hazel (macOS) — rule-based filing tool. Rules are if-then chains: “if filename matches X, move to folder Y”. Forward-only (no inverse), but the condition language is regex-on-paths — exactly the primitive folder-tag-sync uses, applied to a different domain.
-
Logseq and Roam Research — block-based knowledge-graph tools. Block references (
((block-id))) are a different primitive than path templates, but the design tension is the same: how does the underlying file/block layout connect to the user-facing knowledge graph? -
DEVONthink — rule-based document filing with regex conditions and AI-assisted classification. A useful reminder that “deterministic regex rules” and “AI suggestions” can coexist — DEVONthink layers them cleanly.
-
Tinderbox — “smart adornments” that auto-tag notes based on declarative pattern conditions. Mark Bernstein has been refining this since 2002. Worth studying as a long-evolved design point.
-
Obsidian Templater plugin —
<% tp.file.title %>syntax for note templating. Forward-only (a template renders into a new note); not bidirectional. But the syntax convention sits adjacent to where folder-tag-sync’s tag templates would land if we wanted in-vault discoverability. -
Maggie Appleton’s research notes on note-taking systems — not a tool, but a thoughtful set of design observations on the folder/tag/link tension that informs the same problem space.
The pattern across these tools: forward-only is the norm; bidirectional is rare and a real differentiator. Folder-tag-sync’s commitment to bidirectional sync is itself a design choice worth highlighting in the docs (and Phase H makes it more rigorous).
Glob patterns and pathspec
Section titled “Glob patterns and pathspec”Adjacent prior art worth mentioning: shell-style globs are the de facto path-pattern language across the Unix ecosystem. Less expressive than templates with named slots (no captures, position-only), but the syntax conventions are deeply familiar:
micromatch/minimatch— the npm-ecosystem matchers behind ESLint, Prettier, file-glob libraries. Support**(globstar — multi-segment),*(single-segment),?(single char),{a,b}(alternation),!(...)(negation).- Git pathspec — anchored with leading
/, recursive with**, exclusion with!. The mental model that’s already in users’ heads when they author.gitignore. - rsync include/exclude — anchored, ordered rule lists with
+/-prefixes. Production-tested for “select these files, skip those” matching at scale.
Glob doesn’t give us bijection. But the syntax conventions (** for multi-segment globstar, anchoring with /, alternation with {a,b}) are reusable lexicon when we design template syntax — borrow what’s familiar.
Tree pattern languages
Section titled “Tree pattern languages”XPath, JSONPath, JsonLogic. The vault folder structure IS a tree. Tree pattern languages match on tree shape rather than serialized path strings. Useful if the abstraction needs to handle structural queries beyond linear paths (“all leaf folders under X”, “any folder whose parent matches Y”). Worth holding in reserve; not the immediate target.
Datalog & logic programming
Section titled “Datalog & logic programming”Datalog, Soufflé, or any logic-based bidirectional rules engine. Maximum expressiveness — bidirectional reasoning falls out of the relational model essentially for free. Almost certainly overkill for an Obsidian plugin. Useful as a north-star (“what would the most powerful version look like?”), not a near-term implementation target.
Proposed evolution — bidirectional path templates with typed slots
Section titled “Proposed evolution — bidirectional path templates with typed slots”What would folder-tag-sync’s rule data model look like if templates replaced regex as the user-facing primitive?
Today (Phase G):
Section titled “Today (Phase G):”{ folderEntry: 'Projects', folderAnchor: { under: 'fixtures' }, // ... derived: folderPattern: '^fixtures/Projects(?:/|$)' tagEntry: 'projects', // ... derived: tagPattern: '^projects/' transfer: { op: 'identity' },}Phase H sketch:
Section titled “Phase H sketch:”{ folderTemplate: 'fixtures/Projects/{rest...}', tagTemplate: '#projects/{rest...}', // bijection automatic from slot overlap}Slots are written as {name} (single segment) or {name...} (one or more — glob). Both templates compile to regex internally; sync engines still consume the compiled folderPattern for matching speed. The slot data flows in both directions:
Forward (folder → tag): fixtures/Projects/Web/auth-rewrite ─────── ──────── ───────────────── literal literal {rest...} │ ▼ slot extraction rest = "Web/auth-rewrite" │ ▼ instantiate tag template #projects/Web/auth-rewrite ───────────── {rest...} filled
Inverse (tag → folder): #projects/Web/auth-rewrite ──────── ───────────────── literal {rest...} │ ▼ slot extraction rest = "Web/auth-rewrite" │ ▼ instantiate folder template fixtures/Projects/Web/auth-rewrite ───────────────── {rest...} filledWhat this gets us
Section titled “What this gets us”- Bijection visible at authoring time. Slots that appear on both sides round-trip. Slots only on one side are derivation-only or capture-only — the structure tells you. No more separate
bijective: booleanfield. - Anchor concept disappears. The template’s literal prefix IS the anchor.
'Projects/{slug}'is root-anchored;'{base}/Projects/{slug}'is any-segment with the parent captured intobase;'fixtures/Projects/{slug}'is the under-prefix case spelled out literally. - Inference becomes parsing instead of regex pattern-matching. No more
inferEntryFromPatternhand-rolled string surgery. Re-loading a rule means parsing its template once. - Sync engine gains slot-level access for transforms. Per-slot case rules become possible —
{slug}could carry a transform spec (“this slot is kebab-cased on the tag side”). Today’scaseTransformapplies globally; templates open up per-slot composition cleanly. - Power-user escape hatch remains. Raw regex stays available in the advanced modal for cases templates can’t express.
What about the existing typed model?
Section titled “What about the existing typed model?”FolderClassifier, TagVocabulary, and TransferOp don’t go away — they’re orthogonal. The template describes the shape; the typed model describes the semantics. A marker-only rule with template 'Capture/Inbox/{rest...}' and a tag template that omits {rest...} (just emits #capture-inbox) is still a marker-only rule — the typed semantics tell you that, the templates tell you the structural mapping.
Cardinality/bijective fall out of the template shapes too: count slots that appear on both sides. All slots shared → bijective. Folder-side has a slot the tag side doesn’t → lossy folder-to-tag direction. The metadata becomes a derivable view over the structure rather than asserted alongside it.
Reference implementations — what we could borrow
Section titled “Reference implementations — what we could borrow”Phase H doesn’t have to be greenfield. Several existing libraries do exactly the compile-template-to-regex + extract-slots + instantiate-from-slots dance. Listed in priority order for fit:
path-to-regexp (most directly applicable)
Section titled “path-to-regexp (most directly applicable)”path-to-regexp — the regex-compiler behind Express, NestJS, Fastify, ky, react-router. Production-grade, ~7M weekly downloads. Exports both directions:
import { match, compile } from 'path-to-regexp';
// Forward: extract slots from a pathconst fn = match('/users/:userId/posts/:postId');fn('/users/42/posts/100');// → { path: '/users/42/posts/100', params: { userId: '42', postId: '100' } }
// Inverse: build a path from slot valuesconst toPath = compile('/users/:userId/posts/:postId');toPath({ userId: '42', postId: '100' });// → '/users/42/posts/100'The library handles syntax sugar we’d otherwise build ourselves: optional slots (:name?), repeating segments (:rest+ and :rest*), custom slot patterns (:name(\\d+)), escape characters. It compiles down to standard RegExp so sync engines stay pattern-agnostic.
Tradeoffs: 8KB+ minified, opinionated :name syntax (no {name} braces), tied to web/URL conventions (separator is always /). Could vendor a tiny subset, or pull in as a dependency.
URL Pattern Standard / urlpattern-polyfill
Section titled “URL Pattern Standard / urlpattern-polyfill”URL Pattern Standard — modern web standard, browser-native in Chromium. urlpattern-polyfill for non-browser environments.
const pattern = new URLPattern({ pathname: '/Projects/:slug/:rest*' });const result = pattern.exec({ pathname: '/Projects/Web/auth-rewrite' });// result.pathname.groups → { slug: 'Web', rest: 'auth-rewrite' }Same primitive as path-to-regexp but with a structured spec. Slightly heavier (it’s URL-shaped, not just path-shaped), but stable / standardized / has multi-vendor implementation effort behind it.
micromatch (glob-flavored matching)
Section titled “micromatch (glob-flavored matching)”micromatch — the matcher behind most npm-ecosystem path tooling. Glob-shaped (no named captures), but battle-tested for vault-scale path enumeration:
import micromatch from 'micromatch';
micromatch(['Projects/Web', 'Areas/Health'], 'Projects/**');// → ['Projects/Web']
// Capture mode (limited; positional, not named):const captures = micromatch.capture('Projects/*/auth', 'Projects/Web/auth');// → ['Web']Useful for the match half of the equation; useless for the inverse (positional capture without named slots can’t reliably round-trip). Worth knowing about as the reference implementation for “vault scan, find candidates” workflows.
monocle-ts (lens-flavored, TypeScript-native)
Section titled “monocle-ts (lens-flavored, TypeScript-native)”monocle-ts — TypeScript Profunctor-style optics. Mostly forward-direction (getters/setters), but composes cleanly. The “what would adopting lenses look like in our actual codebase” reference.
import { Lens } from 'monocle-ts';
interface ParaPath { entry: 'Projects'; slug: string; rest?: string }
const slugLens = Lens.fromProp<ParaPath>()('slug');const slug = slugLens.get(parsedPath); // 'Web'const updated = slugLens.set('NewName')(parsedPath);Heavier learning curve than path-to-regexp; pays off if we eventually want full lens-law guarantees rather than just slot extraction.
Side-by-side capability matrix
Section titled “Side-by-side capability matrix”| Library / spec | Forward (match) | Inverse (instantiate) | Named slots | Globs | Optional | Per-slot transforms | License | Bundle size | Fit |
|---|---|---|---|---|---|---|---|---|---|
path-to-regexp | ✓ | ✓ | ✓ | ✓ (*, +) | ✓ (?) | ✗ | MIT | ~8 KB | Best |
| URL Pattern Standard | ✓ | partial | ✓ | ✓ (*) | ✓ | ✗ | Spec | (native) | Good |
urlpattern-polyfill | ✓ | partial | ✓ | ✓ | ✓ | ✗ | Apache-2.0 | ~30 KB | Heavy |
micromatch | ✓ | ✗ | ✗ (positional) | ✓ (**, {a,b}) | ✓ | ✗ | MIT | ~25 KB | Match-only |
monocle-ts | ✓ | ✓ | n/a (typed access) | ✗ | n/a | ✗ | MIT | ~15 KB | Heavy / formal |
| Augeas (C) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | LGPL | C lib | Reference only |
| Mustache/Handlebars | ✗ | ✓ | ✓ | ✗ | ✗ | ✓ (helpers) | MIT | varies | Inverse-only |
| Jinja2 | ✗ | ✓ | ✓ | ✗ | ✗ | ✓ (filters) | BSD | Python | Syntax inspiration |
| Hand-rolled (~50 LOC) | ✓ | ✓ | ✓ | ✓ | ✓ | future | n/a | ~1 KB | Likely choice |
Recommendation
Section titled “Recommendation”For Phase H’s first cut: write the compiler ourselves (~50 lines as the Migration story section sketches, plus tests). The surface is small enough that vendoring path-to-regexp is overkill, and the rule-pack file format already has its own JSON shape — the slot syntax just needs to round-trip cleanly through that.
What we borrow from the prior art:
- Slot syntax:
{name}braces (OpenAPI / FastAPI / Spring / Symfony / Mustache convention). Reads as “data shape” rather than URL path, which fits how rule packs are authored. - Glob slot suffix:
{name...}for multi-segment (Next.js[...rest]-flavored, since*already has regex meaning). - Optional slots:
{name?}(path-to-regexp / Mustache). - Future per-slot transforms:
{name|kebab}(Jinja-style pipe operator) — Phase H+ or wherever transform composition lands. - Glob conventions for any-segment matching:
**from gitignore/micromatch, if we extend templates to support arbitrary-depth matching beyond the explicit{name...}glob slot. - Bidirectional consistency thinking from lenses — even if we don’t formalize the laws, we name the consistency requirement explicitly: “slots that appear on both sides round-trip; everything else is documented as one-way.”
If we hit composition/expressiveness limits in Phase H+ (multi-template fan-out, formal bijection checking, edit propagation), revisit monocle-ts or full Boomerang-style lenses then. The path-template surface has plenty of room to grow without leaving the ~50-LOC compiler.
Open questions — where the abstraction might still leak
Section titled “Open questions — where the abstraction might still leak”- Optional vs required slots.
{slug?}or some trailing-?syntax? What does omission mean — does the template fall through to a shorter form, or does the rule decline to match? - Slot cardinality.
{slug}is exactly one segment;{rest...}is one-or-more. What about zero-or-more? What about a fixed depth ({a}/{b}matches exactly two segments)? Maps to the existingtruncation.depth + tailHandlingchoices, but that translation has corners. - Per-slot transforms. If
{slug}on the tag side is implicitly kebab-cased, what does that mean when the template also declares a globalcaseTransform? Composition order matters and gets confusing fast. The rule for “transforms apply per-slot only when explicitly declared” is probably the right default. - Many-to-one fan-out. Multiple folder templates collapsing into the same tag (e.g.,
'Projects/{slug}'AND'Active/Projects/{slug}'both emit#projects/{slug}). Single-template rules can’t express this; needs a higher-level “alternation” or multiple rules + priority. - Static bijection checking. Can we tell at authoring time whether a template pair is lossy? Slot-set comparison gets us most of the way —
folderSlots ⊆ tagSlots⇒ folder-to-tag is bijective; etc. — but transforms and conditional logic complicate the picture. - Unicode literals in templates. The
cyberbase-actualrule pack uses emoji prefixes (⬇️ Clipping). Templates need to handle unicode in literal segments cleanly — verifiable in the compiler tests.
Migration story (Phase H plan summary)
Section titled “Migration story (Phase H plan summary)”- Define the type and slot syntax —
PathTemplate,SlotDef,CompiledTemplateinsrc/types/typed.ts. Optional fields onTypedRuleSpec. - Pure compiler — new
src/engine/compileTemplate.tswithcompileTemplate,extractSlots,instantiateTemplate. Comprehensive unit tests for single-segment, glob, mixed, optional, unicode literals, escape characters. - Sync-engine slot extraction —
applyRuleForward/applyRuleInverseuseextractSlots+instantiateTemplatewhen a rule has templates. Anchor-aware regex strip stays as the legacy path. - Derivation branch — when a rule pack provides
folderTemplate,deriveRulecompiles it and stores both the regex (for engine matching) and the slot metadata (for forward/inverse extraction). - Loader validation — balanced braces, valid slot names, optional fields. Existing packs continue to load without templates.
- Guided modal — visual slot diagram. The most uncertain piece. Two text inputs (folder template, tag template); below, a visual shows each slot as a chip — green if it appears on both sides, yellow if only one (lossy), blue if it picks up a per-slot transform. Will likely need its own mini-plan after the engine work is solid.
- Migrate one shipped rule pack — PARA most likely (simplest). Verify both old and new paths produce identical sync behavior. Worked example for this very document to point to.
Open invitation
Section titled “Open invitation”This is a research challenge in the literal sense — an architectural question we want to explore in code, not just on paper. Counterexamples (rules templates can’t express), pointers to additional prior art, or implementation contributions are all welcome. Open an issue at obsidian-folder-tag-sync to discuss.
Phase G commits 1-5 already shipped (folderAnchor first-class). The remaining Phase G commits (anchor selector UI, fixtures) land before Phase H starts. The research here grounds why Phase H is the next step, not a far-future evolution.
Related concepts
Section titled “Related concepts”- Transfer operations — the 8 primitives templates layer over (this is the load-bearing primitives page)
- Bijection and loss — the bridge from primitives to round-trip behavior; the collision-vs-lossy distinction explained at length
- Terminology — plain-English glossary covering the vocabulary used in this entry
- Philosophy — why typed layers exist, why determinism is non-negotiable
- When to use regex — current escape hatch (will remain in Phase H)
- Open questions — design decisions still in flight
- Tradeoffs — chosen-vs-rejected captures