Challenge 01: Rule priority stress test
Prompt for the dispatched agent
Section titled “Prompt for the dispatched agent”Open this challenge in a fresh-context Claude / LLM session, paste the URL, and say “research this challenge.” The reading list below is layered for progressive disclosure — start at level 1 if you’re new to the project, jump deeper if you already know the context.
The question in one sentence: does the engine’s current priority + first-match-wins model actually match how users author rules, or do they expect “more specific wins” — and what’s the right replacement primitive?
Reading order (level 1 → level 4)
Section titled “Reading order (level 1 → level 4)”- Foundations (orient first if new to the project):
- Terminology — plain-English glossary of every load-bearing term
- Philosophy — typed-model layers (Layer 1 regex, Layer 2 typed); why determinism is the project’s non-negotiable
- Core concepts for this question:
- Bijection and loss —
cardinality, lossy vs lossless, collision-vs-lossy distinction (relevant because priority decides which rule fires when several plausibly match) - Transfer operations — the eight library-science primitives the priority decision dispatches on
- Bijection and loss —
- Direct context (the research that frames this challenge):
- Tag → folder resolution research — the six-candidate survey of inverse-direction resolution; this challenge is the framing the survey expanded on
- Specificity + groups research — combined design for B+C; surfaces that
calculateMatchConfidenceis already partial implementation
- Reference (optional, code-level grounding):
- Rule schema — exact field definitions for
priority,confidence src/engine/ruleMatcher.ts:97-185— the actualfindBestMatchandcalculateMatchConfidenceimplementations (read on the GitHub repo)
- Rule schema — exact field definitions for
Deliverable
Section titled “Deliverable”Short report at agent-context/zz-log/YYYY-MM-DD-challenge-01-findings.md (~1500–2500 words). Required sections: your framing of the problem (does priority-as-scalar genuinely fail in practice?), concrete failure cases drawn from real-world rule stacks, verdict on the alternatives (specificity-aware, rule groups, all-rules-fire, conflict-UI), recommended primitive replacement with migration sketch, open questions left unresolved.
Treat existing recommendations as hypotheses to test, not conclusions to defend. If practical analysis says “actually, priority is fine and the recommendations are over-engineered,” that’s a more valuable finding than confirming the recommendation. Fresh-agent context-skepticism is the point.
Assumption under test
Section titled “Assumption under test”Rules are ordered by integer priority (lower = higher precedence). The first rule whose regex pattern matches the incoming event “wins” and no other rules fire for that file.
This is simple, deterministic, and implemented.
But is it the right model for users?
Why it might not be
Section titled “Why it might not be”Specificity vs priority mismatch
Section titled “Specificity vs priority mismatch”A user sets Rule 1: ^Projects/(.*)$ → #projects/* at priority 10. Then they add Rule 2: ^Projects/Archive/(.*)$ → #archive/* at priority 20.
Rule 2 is more specific — it matches a narrower path — but lower priority (higher number). With first-match-wins, Rule 1 fires first and Rule 2 never gets a chance.
Users naturally expect “more specific wins.” They’ll write rules in the order they think of them and expect the system to figure out specificity.
Multi-match-needed cases
Section titled “Multi-match-needed cases”A user has a file in Projects/Archive/Q4-retrospective/. They want it tagged both #projects/q4-retrospective AND #archive/q4-retrospective. First-match-wins can’t express this.
The escape hatch today is: run the sync manually, one rule direction at a time. But that’s a workflow, not a feature.
Silent dead rules
Section titled “Silent dead rules”If Rule 1 matches everything Rule 2 would, Rule 2 silently never fires. User has no way to see this. Settings UI treats it as “configured and enabled.”
Research brief
Section titled “Research brief”- Survey prior art. How do similar tools handle this?
- Auto Note Mover — first match wins (confirmed)
- Obsidian Tasks’ rule system — ?
- CSS specificity algorithm — adapted to regex, would this make sense?
- Firewall rule ordering (iptables, pf) — user-ordered, first-match-wins; has the same problem
- Email filter systems (Gmail, Fastmail) — mix of approaches
- Stress-test with a 20-rule SEACOW pack. Draft the full SEACOW rule set on paper. Identify every pairwise priority conflict. Is the priority number field expressive enough, or do users need “groups with inheritance”?
- Design alternatives. Score each against the current first-match-wins baseline:
- “Most specific wins” via regex-pattern-length heuristic
- “All matching rules fire” with explicit conflict resolution
- “Rule groups” with priority within group, all groups apply
- Explicit
combineflag per rule
- Consider observability. If we keep first-match-wins, can we give users a “test this file path” preview in the rule editor that shows which rule would win?
Deliverable
Section titled “Deliverable”Short report:
- Recommendation: keep first-match-wins or switch
- If switch: which model and why
- If keep: what observability tools close the UX gap
- Migration path if we switch (existing user configs)
Log findings in zz-log/ as YYYY-MM-DD-challenge-01-findings.md.