Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

File-based graph databases

Updated

What kind of database is an Obsidian vault?

Section titled “What kind of database is an Obsidian vault?”

When Crosswalker generates folders and notes with frontmatter properties and WikiLinks, it’s building a file-based graph database. Understanding this helps explain both the power and the limitations of the approach.

An Obsidian vault is a hybrid data model combining concepts from three database paradigms: property graphs, document stores, and file-based databases.

A property graph is a directed, labeled multigraph where both nodes and edges can have key-value properties.

Formal structure:

G = (V, E, ρ, λ, σ)
where:
  V = set of vertices (nodes)
  E = set of edges
  ρ: E → V × V (edge endpoint mapping)
  λ: V ∪ E → Σ (labeling function)
  σ: V ∪ E → Properties (property assignment)

Crosswalker’s implementation:

  • Nodes (V): Markdown files in generated folders (one per framework control, technique, safeguard)
  • Edges (E): WikiLinks between notes (crosswalk relationships, hierarchy links)
  • Properties (σ): YAML frontmatter on each file
  • Labels (λ): Folder location + _crosswalker.type metadata
ConceptNeo4jMongoDBObsidian Vault
Node/DocumentNode in node storeBSON documentMarkdown file
Edge/RelationshipRelationship storeManual referencesWikiLink
PropertiesProperty storeJSON fieldsYAML frontmatter
Collection/TableLabelsCollectionsFolders
Primary keyInternal ID_id fieldFile path
Foreign keyRelationship$ref / manual[[WikiLink]]
Query languageCypherMQLDataview DQL / Bases formulas
IndexNative indexesB-tree indexesObsidian cache + folder structure
TransactionsACID transactionsACID (single doc)None
Graph traversalNative O(1)Manual joinsWikiLink following

Denormalization is the deliberate introduction of redundancy into a data model to optimize read performance at the cost of write complexity.

In Crosswalker’s output:

# Source of truth: the crosswalk link in the note
nist_csf.maps_to:: [[AC-2]]

# Derived: AC-2.md's backlinks show this note links to it
# (Obsidian computes this automatically from WikiLinks)

When Crosswalker generates both forward links (note A references note B) and frontmatter properties that duplicate relationship data, it creates a materialized view — cached relationship data for quick access that must be kept in sync with the source.

This is the denormalization tax: every re-import must update all copies, or the data becomes inconsistent. See data model resilience for strategies.

In graph databases, edges can be traversed in both directions:

  • Forward pointer: Source note links to target (maps_to:: [[AC-2]])
  • Reverse pointer: Target note’s backlinks show incoming references (Obsidian computes this)

Frontmatter arrays function as adjacency lists — each note stores a list of its connections:

related_controls:
  - "[[AC-1]]"
  - "[[AC-3]]"
  - "[[AC-5]]"

This enables efficient local queries (“what does this control relate to?”) without scanning all files.

Obsidian Bases and Dataview function as materialized views over the file graph — they compute and display aggregated data from individual files. Unlike a SQL materialized view, they recompute on every query (no staleness risk, but no persistence either).

AdvantageDisadvantage
Human-readable (plain markdown)No ACID transactions
Git-friendly (version control)No built-in constraints
Portable (copy the folder)No query optimization
No server neededManual relationship maintenance
Works offlineNo referential integrity
Obsidian-native (graph view, backlinks, search)Eventual consistency only

Crosswalker generates a file-based graph database every time it imports a framework. Understanding this model explains:

  1. Why re-import is complex — updating a materialized view across hundreds of files requires careful coordination. See ontology evolution.
  2. Why crosswalk links can become stale — no foreign key constraints mean broken links accumulate silently. See constraint enforcement.
  3. Why frontmatter-first is the right approachObsidian Bases can query frontmatter but not inline fields, making YAML properties the most queryable storage location.
  4. Why the _crosswalker metadata exists — it provides the schema versioning and provenance tracking that file-based systems lack natively.

  • Robinson, Webber, Eifrem — “Graph Databases” (O’Reilly, 2015)
  • Codd, E.F. — “A Relational Model of Data for Large Shared Data Banks” (1970)