File-based graph databases
What kind of database is an Obsidian vault?
Section titled “What kind of database is an Obsidian vault?”When Crosswalker generates folders and notes with frontmatter properties and WikiLinks, it’s building a file-based graph database. Understanding this helps explain both the power and the limitations of the approach.
An Obsidian vault is a hybrid data model combining concepts from three database paradigms: property graphs, document stores, and file-based databases.
Property graph model
Section titled “Property graph model”A property graph is a directed, labeled multigraph where both nodes and edges can have key-value properties.
Formal structure:
Crosswalker’s implementation:
- Nodes (V): Markdown files in generated folders (one per framework control, technique, safeguard)
- Edges (E): WikiLinks between notes (crosswalk relationships, hierarchy links)
- Properties (σ): YAML frontmatter on each file
- Labels (λ): Folder location +
_crosswalker.typemetadata
Comparison to established databases
Section titled “Comparison to established databases”| Concept | Neo4j | MongoDB | Obsidian Vault |
|---|---|---|---|
| Node/Document | Node in node store | BSON document | Markdown file |
| Edge/Relationship | Relationship store | Manual references | WikiLink |
| Properties | Property store | JSON fields | YAML frontmatter |
| Collection/Table | Labels | Collections | Folders |
| Primary key | Internal ID | _id field | File path |
| Foreign key | Relationship | $ref / manual | [[WikiLink]] |
| Query language | Cypher | MQL | Dataview DQL / Bases formulas |
| Index | Native indexes | B-tree indexes | Obsidian cache + folder structure |
| Transactions | ACID transactions | ACID (single doc) | None |
| Graph traversal | Native O(1) | Manual joins | WikiLink following |
The denormalization tax
Section titled “The denormalization tax”Denormalization is the deliberate introduction of redundancy into a data model to optimize read performance at the cost of write complexity.
In Crosswalker’s output:
When Crosswalker generates both forward links (note A references note B) and frontmatter properties that duplicate relationship data, it creates a materialized view — cached relationship data for quick access that must be kept in sync with the source.
This is the denormalization tax: every re-import must update all copies, or the data becomes inconsistent. See data model resilience for strategies.
Relationship storage patterns
Section titled “Relationship storage patterns”Forward and reverse pointers
Section titled “Forward and reverse pointers”In graph databases, edges can be traversed in both directions:
- Forward pointer: Source note links to target (
maps_to:: [[AC-2]]) - Reverse pointer: Target note’s backlinks show incoming references (Obsidian computes this)
Adjacency lists
Section titled “Adjacency lists”Frontmatter arrays function as adjacency lists — each note stores a list of its connections:
This enables efficient local queries (“what does this control relate to?”) without scanning all files.
Materialized views
Section titled “Materialized views”Obsidian Bases and Dataview function as materialized views over the file graph — they compute and display aggregated data from individual files. Unlike a SQL materialized view, they recompute on every query (no staleness risk, but no persistence either).
Trade-offs of file-based storage
Section titled “Trade-offs of file-based storage”| Advantage | Disadvantage |
|---|---|
| Human-readable (plain markdown) | No ACID transactions |
| Git-friendly (version control) | No built-in constraints |
| Portable (copy the folder) | No query optimization |
| No server needed | Manual relationship maintenance |
| Works offline | No referential integrity |
| Obsidian-native (graph view, backlinks, search) | Eventual consistency only |
Why this matters for Crosswalker
Section titled “Why this matters for Crosswalker”Crosswalker generates a file-based graph database every time it imports a framework. Understanding this model explains:
- Why re-import is complex — updating a materialized view across hundreds of files requires careful coordination. See ontology evolution.
- Why crosswalk links can become stale — no foreign key constraints mean broken links accumulate silently. See constraint enforcement.
- Why frontmatter-first is the right approach — Obsidian Bases can query frontmatter but not inline fields, making YAML properties the most queryable storage location.
- Why the
_crosswalkermetadata exists — it provides the schema versioning and provenance tracking that file-based systems lack natively.
Resources
Section titled “Resources”Database theory
Section titled “Database theory”- Robinson, Webber, Eifrem — “Graph Databases” (O’Reilly, 2015)
- Codd, E.F. — “A Relational Model of Data for Large Shared Data Banks” (1970)
Property graphs
Section titled “Property graphs”- Neo4j Graph Data Science — property graph reference implementation
- Apache TinkerPop — open property graph computing framework
Related pages
Section titled “Related pages”- The problem — hierarchy vs. graph tension
- Consistency models — ACID, CAP, and eventual consistency
- Data model resilience — handling framework updates
- Metadata ecosystem — Properties, Dataview, Bases, Datacore