Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Embedded vs server substrates — the file-IS-the-database pattern

Updated

Most databases are servers — long-running processes that own a private directory of files you must not touch directly. The user talks to the server via a network socket; the database files are an implementation detail.

A few databases reject this. They’re libraries the application links into. The application owns the database file. Anyone can copy it, version it, mail it, diff it. SQLite’s quiet revolution was proving this could work for a real DB.

Crosswalker’s three-tier architecture leans hard into this — Tier 1 and Tier 2 are both “the file IS the database,” in two different shapes.

PatternShapeExamples
Single-file embedded DBOne opaque (usually binary) file holds everything; library API reads/writes itSQLite, libSQL, DuckDB, kuzu
Directory-of-text-files DBMany small human-readable files; the filesystem IS the database; any tool can readCrosswalker Tier 1 (Markdown + YAML), Git, Obsidian vaults

Both reject the server model. Both let cp / rsync / git diff / Dropbox / GitHub do the right thing without a daemon. Tier 1 is even more file-based than SQLite — every record is its own readable Markdown file.

TierFile modelSubstrate (v0.1)Why
Tier 1 (canonical, must-work)Directory of text files (Markdown + YAML)FilesystemPlain text, version-controllable, every tool can read, no server, no lock-in. The load-bearing commitment
Tier 2 (projection, recoverable)Single-file embedded DB@sqlite.org/sqlite-wasm (~600 KB)Same anti-server philosophy, in DB-shape; deletable, reprojected from Tier 1; library not a server
Tier 3 (opt-in, recoverable)Server with file-backingPostgres / Fuseki / oxigraph-server / TerminusDBBreaks the philosophy intentionally for power-user scale-up. Requires ops. Not in v0.1

The reason Tier 3 is opt-in and not bundled is exactly the embedded-vs-server distinction. A server can’t ship inside an Obsidian plugin.

The user’s question: “are there OTHER embedded databases like the ones we’ve talked about — graph-based, document, etc. — but embedded?”

Yes. Embedded DBs exist across every major data model. Here’s the landscape, grouped by data model, with file-shape annotated and cross-linked to the Crosswalker research challenges that evaluated each.

EngineFile shapeNotes
SQLite (@sqlite.org/sqlite-wasm)Single-file .sqlite / .dbCanonical. Foundation-governed. ✅ Tier 2 v0.1 commitment. Per Ch 18, comfortably to ~100K mappings
libSQL (Turso)Single-file (claims SQLite compat)SQLite soft-fork; native vector + replication. Under evaluation — see Ch 24
DuckDB(-WASM)Single-file .duckdbAnalytical/columnar instead of OLTP. Was in the third-wave Tier 2 stack; deferred from v0.1 per the stack pivot (kept as researched back-pocket)
StoolapSingle-fileModern Go-based embedded SQL DB; column-store + row-store hybrid; HTAP (transactional + analytical in one engine); built for embedded use via Go API. Long-horizon track via Ch 12b — interesting if Crosswalker outgrows recursive-CTE patterns on sqlite-wasm and wants single-file analytical performance without DuckDB’s bundle weight
Turso Database / LimboSingle-file (Rust rewrite of SQLite)Pre-1.0; experimental. Long-horizon track via Ch 24 Q3
EngineFile shapeNotes
Oxigraph (library mode)Directory of RocksDB filesRust embedded RDF triplestore. SPARQL. Per Ch 11 — kept as opt-in federation layer; deferred from v0.1 bundle
HDT (Header-Dictionary-Triples)Single-file (binary RDF)Read-only after build; great for shipping pre-built RDF datasets. Per Ch 14 — opt-in federation format
Apache Jena TDB (embedded library mode)Directory of filesJava embedded; usually wrapped by Fuseki as a server, but can run as library
rdflib (Python)N-Triples / Turtle text filesEmbedded RDF library; emits/parses RDF text formats
N3.jsRDF text files (Turtle / N3)JS RDF library; small; embedded
ComunicaFederates over local + remote sourcesPer Ch 14 — opt-in federation engine; runs in browser

Graph — property graph (Cypher / GQL flavor)

Section titled “Graph — property graph (Cypher / GQL flavor)”
EngineFile shapeNotes
kuzuSingle-file .kzModern embedded property-graph DB. Cypher. C++ core; growing JS/Python bindings. Not yet evaluated in Crosswalker KB — emerging candidate worth tracking
DuckDB-PGQ (extension)Lives inside DuckDB single-fileExperimental property-graph query layer over DuckDB
simple-graph (SQL pattern)Lives inside SQLite single-filePer Ch 18 — recursive-CTE pattern over SQLite; what Crosswalker actually uses for transitive closure
Apache AGEPostgres extension (in Postgres data dir)NOT embedded standalone; requires Postgres. Demoted from Tier 3 default per Ch 16
Neo4j EmbeddedDirectory of files (Java)Largely retired as Neo4j has shifted to server-only
MinigrafEmbedded property-graph (early stage)Tracked in Ch 11 and Ch 12b; not adopted
EngineFile shapeNotes
Crosswalker Tier 1Directory of .md files with YAML frontmatterA document store where the filesystem is the index. The canonical file-based DB Crosswalker actually ships
PouchDBBrowser IndexedDB / LevelDBJS embedded document DB; CouchDB API; bidirectionally syncs with CouchDB
RxDBBrowser IndexedDB / OPFSJS reactive document DB; multi-storage backend
NeDBSingle text file (JSON-lines)Historical embedded JS document DB; abandoned
TinyDBSingle JSON filePython embedded document DB
DuckLake (DuckDB JSON view)Single DuckDB file with JSON columnsDocument-shaped queries over DuckDB

Key-value (the storage primitive most others build on)

Section titled “Key-value (the storage primitive most others build on)”
EngineFile shapeNotes
LevelDBDirectory of SST filesGoogle’s embedded KV; many higher-level DBs build on it
RocksDBDirectory of SST filesFacebook’s LevelDB fork. Oxigraph uses this internally
LMDBSingle memory-mapped fileCompact embedded KV with B+tree; very fast reads
IndexedDBBrowser-nativeEmbedded KV/document store in every browser
OPFS (Origin Private File System)Browser-native filesystemWhat sqlite-wasm uses for persistence in the browser
EngineFile shapeNotes
sqlite-vec (Alex Garcia)Lives inside SQLite single-fileVector tables as SQLite extension. ✅ Crosswalker Tier 2 v0.1 commitment for vector search per Ch 18
DuckDB VSS (extension)Lives inside DuckDB single-fileVector similarity search for DuckDB
LanceDBSingle-file (.lance columnar)Embedded vector DB; Rust core; columnar; has JS bindings
chromadb (embedded mode)Directory of filesPython embedded vector DB; can run in-process
FAISSSingle binary index fileIn-process similarity search library; not really a DB, more an index
hnswlibSingle binary index fileSame; index library
libSQL native vectorsLives inside libSQL single-fileNative vector type built into libSQL fork; the strongest specific case for migrating to libSQL per Ch 24
EngineFile shapeNotes
NemoIn-memory; reads/writes facts to text filesPer Ch 12 — Crosswalker’s chain-rule engine; deferred from v0.1 bundle, kept as researched back-pocket
Differential dataflow / DatafrogIn-memory; library onlyRust Datalog primitives; very fast incremental
Souffle (compiled)Compiles Datalog → C++ binaryHeavyweight; for big static rule sets
Recursive CTE (in SQL)Lives in SQLite single-filePure SQL emulation of Datalog rules; ✅ What Crosswalker v0.1 actually uses per Ch 12 + Ch 18

These engines try to combine relational + graph + document + vector in one substrate. Tempting because Crosswalker has facets across multiple models — but each came with caveats.

EngineStatus in Crosswalker KB
CozoDB❌ Rejected per Ch 14 — no release since v0.7; project signal too weak
SurrealDB (embedded mode)❌ Rejected per Ch 14 — BSL license + 12.6 MB bundle
HelixDB⚠️ Tracked-but-not-adopted per Ch 14 and Ch 16; too immature
TerminusDB (v12)⚠️ Opt-in vault-mirror per Ch 16; single-vendor (DFRNT) flagged
Datalevin (Clojure)Not yet evaluated; embedded multi-model with Clojure focus
DuckDB with extensions (vector + graph PGQ + JSON + FTS + geo)⚠️ Multi-model-via-extensions on a single substrate; in third-wave back-pocket
cr-sqlite (CRDT SQLite)❌ Rejected per Ch 14 — stalled
Grafeo⚠️ Follow-up tracked in Ch 11

The pattern: multi-model embedded DBs look attractive on paper but have consistently failed the Crosswalker governance/maturity bar. The single-model embedded DBs (SQLite, DuckDB, Oxigraph, kuzu) have stronger track records.

Why this matters for Crosswalker design decisions

Section titled “Why this matters for Crosswalker design decisions”

Three load-bearing consequences fall out of the embedded-vs-server split:

1. Why Tier 2 had to be SQLite (not DuckDB or Oxigraph) for v0.1

Section titled “1. Why Tier 2 had to be SQLite (not DuckDB or Oxigraph) for v0.1”

DuckDB is also single-file and embedded. Oxigraph is also embedded (library mode). Why did v0.1 pivot to sqlite-wasm specifically?

  • Bundle sizesqlite-wasm ~600 KB; DuckDB-WASM ~2.5–3 MB; Oxigraph WASM larger still
  • Governance — sqlite.org Consortium (multi-stakeholder, 20+ years) vs DuckDB Labs (single-vendor, ~5 years) vs Oxigraph (single-maintainer)
  • Maturity — every Obsidian plugin developer has shipped SQLite once; far fewer have shipped DuckDB-WASM
  • Ecosystemsqlite-vec, FTS5, recursive CTEs cover Crosswalker’s actual workload per Ch 18

Per the v0.1 stack pivot, DuckDB-WASM and Oxigraph are researched back-pocket — ready when needed, not the v0.1 default.

2. Why Tier 3 is opt-in and outside the plugin

Section titled “2. Why Tier 3 is opt-in and outside the plugin”

Every Tier 3 candidate (Postgres, Fuseki, oxigraph-server, TerminusDB, AGE) is a server. You can’t ship a server inside an Obsidian plugin. Tier 3 always requires the user to deploy a separate process. That’s the cost of getting beyond Tier 2’s scale ceiling.

3. Why “marketplace bundles” fit cleanly

Section titled “3. Why “marketplace bundles” fit cleanly”

The marketplace pattern ships Tier 1 only — directory of Markdown files. Substrate-irrelevant. A user who downloads a community-shared NIST 800-53 bundle gets a directory of .md files; their local Tier 2 reprojects them into whatever single-file SQLite they have. Server-database substrates can’t do this — the data lives inside the server’s private storage.

Substrates and adjacent file-based tools that Crosswalker has evaluated and chosen not to adopt today, with conditions for re-evaluation. The pattern: explicit, dated, reasoned non-adoption with falsifiable triggers is itself documentation. Cargo-culting would be refusing to revisit; conservatism with explicit triggers is engineering discipline.

EntryCategoryStatusRe-evaluation trigger
Turso Database / LimboSingle-file embedded relational (Rust rewrite of SQLite, BETA)WatchPre-1.0 stable; non-Turso fork or alternative steward emerges; Obsidian core / Dataview / Bases adopts Limbo. Review at v1.0 + 12 months. Per Ch 24 §7
libSQL-WASMSQLite soft fork; native vector + replicationReject (Q1 of Ch 24)sqlite-vec lapses + libSQL becomes de facto standard, OR Obsidian itself adopts libSQL, OR Turso recommits libSQL as long-term peer to Limbo
Turso Cloud / sqldHosted edge-replicated libSQL Tier 3 optionReject (Q2 of Ch 24)Q1 triggers fire OR Turso publicly recommits Cloud to long-term libSQL backend
kuzuEmbedded property graph (Cypher); single-file .kzWatchRecursive-CTE-on-SQLite pattern hits expressivity wall on user-relevant graph queries; kuzu reaches 1.0 with stable file format; multi-stakeholder governance signal
LanceDBEmbedded columnar vector DB; Rust coreWatchsqlite-vec maintenance lapses OR Crosswalker grows into AI-heavy semantic-search workloads where columnar vector storage matters
DuckDB-PGQProperty graph extension on DuckDB single-fileWatchDuckDB-WASM becomes Tier 2 alternative (back-pocket activated); user-facing graph queries need a non-recursive-CTE substrate
StoolapModern Go-based embedded HTAP (column + row hybrid); single-fileWatchRecursive-CTE-on-SQLite hits analytical performance wall; user wants single-file analytical engine without DuckDB’s ~2.5–3 MB bundle
DatalevinClojure embedded multi-modelWatchNiche; architecturally interesting; revisit if Crosswalker grows a Clojure-flavored ecosystem
PouchDB / RxDBJS embedded document DBs (IndexedDB-backed)WatchFuture cross-vault sync features; Tier 1 is already a document store, but these could power “live-sync between two users editing the same crosswalk” workflows

Adjacent file-based tools — version control

Section titled “Adjacent file-based tools — version control”

Crosswalker’s audit trail commitment (Ch 08 + Ch 13 + Ch 15) is git as the v0.1 default — signed commits over Tier 1 markdown, with optional layered attestation primitives (RFC 3161, Sigstore Rekor v2 + in-toto, eIDAS QTSA + W3C VC) for higher compliance tiers. This is a settled commitment; the watch register tracks emerging alternatives.

EntryCategoryStatusWhy it matters for Crosswalker
jj / jujutsuModern VCS; git-compatible front-end + standalone backend; “operations as data” model with first-class conflict objects and rich history rewritingWatchThree potential angles: (1) recipe versioning — recipes are YAML files that evolve continuously; jj’s conflict-as-data model handles “two users edited the same recipe” more gracefully than git merges; (2) marketplace bundle history — pre-transformed Tier 1 bundles need version history shareable across users; jj’s content-addressed history stays portable; (3) audit trail — jj inherits git’s signed-commit story while adding richer rewrite history that some compliance models could leverage. Revisit if jj reaches 1.0, the Obsidian community adopts it, or recipe-collaboration UX becomes a v1.0 priority
PijulPatch-theoretic VCS (Darcs descendant); content-addressedWatchMore radical than jj; not git-compatible; theoretical model is elegant but adoption is small
Sapling (sl)Meta’s Mercurial-derived VCS; git-compatibleWatchHeavyweight; designed for monorepo workflows; less obviously relevant to Crosswalker’s vault-scale use case

These are not blockers for v0.1. Git remains the v0.1 audit-trail substrate. The watch register exists so a future reader can see what alternatives have been considered and on what grounds they remain unadopted.

Adjacent file-based tools — content-addressed storage

Section titled “Adjacent file-based tools — content-addressed storage”
EntryCategoryStatusWhy it matters for Crosswalker
IPLDContent-addressed graph dataWatchMarketplace bundle distribution — content-addressed Tier 1 directories could be shared via IPFS without trust assumptions about the source. Per Ch 12b deliverable, long-horizon
UnisonContent-addressed code/data; “everything is hashed”WatchInfluential design; Unison-inspired patterns may appear in Crosswalker’s recipe versioning if recipes-by-hash becomes a workflow

Database semantics don’t have to come at the cost of file-system simplicity.

The ChunkyCSV / JSONaut / SEACOW / folder-tag-sync portfolio (the project owner’s prior tools — see ETL and import § broader portfolio context) is essentially the same observation applied to ETL, knowledge organization, and tag/folder semantics. Crosswalker’s Tier 1 is the same observation applied to ontology storage: human-readable files with structured metadata, no server, no lock-in.

This is why the libSQL question (Ch 24) is high-stakes despite the substrate being “another single-file SQLite-shaped thing” — single-vendor governance is exactly the dimension SQLite optimized away.