🚧 Early alpha — building the foundation. See the roadmap →

Embedded vs server substrates — the file-IS-the-database pattern

Updated Jun 1, 2026

The architectural pattern

Most databases are servers — long-running processes that own a private directory of files you must not touch directly. The user talks to the server via a network socket; the database files are an implementation detail.

A few databases reject this. They’re libraries the application links into. The application owns the database file. Anyone can copy it, version it, mail it, diff it. SQLite’s quiet revolution was proving this could work for a real DB.

Crosswalker’s three-tier architecture leans hard into this — Tier 1 and Tier 2 are both “the file IS the database,” in two different shapes.

Two distinct file-based patterns

Pattern	Shape	Examples
Single-file embedded DB	One opaque (usually binary) file holds everything; library API reads/writes it	SQLite, libSQL, DuckDB, kuzu
Directory-of-text-files DB	Many small human-readable files; the filesystem IS the database; any tool can read	Crosswalker Tier 1 (Markdown + YAML), Git, Obsidian vaults

Both reject the server model. Both let cp / rsync / git diff / Dropbox / GitHub do the right thing without a daemon. Tier 1 is even more file-based than SQLite — every record is its own readable Markdown file.

How Crosswalker maps onto the axis

Tier	File model	Substrate (v0.1)	Why
Tier 1 (canonical, must-work)	Directory of text files (Markdown + YAML)	Filesystem	Plain text, version-controllable, every tool can read, no server, no lock-in. The load-bearing commitment
Tier 2 (projection, recoverable)	Single-file embedded DB	`@sqlite.org/sqlite-wasm` (~600 KB)	Same anti-server philosophy, in DB-shape; deletable, reprojected from Tier 1; library not a server
Tier 3 (opt-in, recoverable)	Server with file-backing	Postgres / Fuseki / oxigraph-server / TerminusDB	Breaks the philosophy intentionally for power-user scale-up. Requires ops. Not in v0.1

The reason Tier 3 is opt-in and not bundled is exactly the embedded-vs-server distinction. A server can’t ship inside an Obsidian plugin.

The embedded landscape by data model

The user’s question: “are there OTHER embedded databases like the ones we’ve talked about — graph-based, document, etc. — but embedded?”

Yes. Embedded DBs exist across every major data model. Here’s the landscape, grouped by data model, with file-shape annotated and cross-linked to the Crosswalker research challenges that evaluated each.

Relational / SQL

Engine	File shape	Notes
SQLite (`@sqlite.org/sqlite-wasm`)	Single-file `.sqlite` / `.db`	Canonical. Foundation-governed. ✅ Tier 2 v0.1 commitment. Per Ch 18, comfortably to ~100K mappings
libSQL (Turso)	Single-file (claims SQLite compat)	SQLite soft-fork; native vector + replication. Under evaluation — see Ch 24
DuckDB(-WASM)	Single-file `.duckdb`	Analytical/columnar instead of OLTP. Was in the third-wave Tier 2 stack; deferred from v0.1 per the stack pivot (kept as researched back-pocket)
Stoolap	Single-file	Modern Go-based embedded SQL DB; column-store + row-store hybrid; HTAP (transactional + analytical in one engine); built for embedded use via Go API. Long-horizon track via Ch 12b — interesting if Crosswalker outgrows recursive-CTE patterns on `sqlite-wasm` and wants single-file analytical performance without DuckDB’s bundle weight
Turso Database / Limbo	Single-file (Rust rewrite of SQLite)	Pre-1.0; experimental. Long-horizon track via Ch 24 Q3

Graph — RDF / triplestore

Engine	File shape	Notes
Oxigraph (library mode)	Directory of RocksDB files	Rust embedded RDF triplestore. SPARQL. Per Ch 11 — kept as opt-in federation layer; deferred from v0.1 bundle
HDT (Header-Dictionary-Triples)	Single-file (binary RDF)	Read-only after build; great for shipping pre-built RDF datasets. Per Ch 14 — opt-in federation format
Apache Jena TDB (embedded library mode)	Directory of files	Java embedded; usually wrapped by Fuseki as a server, but can run as library
rdflib (Python)	N-Triples / Turtle text files	Embedded RDF library; emits/parses RDF text formats
N3.js	RDF text files (Turtle / N3)	JS RDF library; small; embedded
Comunica	Federates over local + remote sources	Per Ch 14 — opt-in federation engine; runs in browser

Graph — property graph (Cypher / GQL flavor)

Engine	File shape	Notes
kuzu	Single-file `.kz`	Modern embedded property-graph DB. Cypher. C++ core; growing JS/Python bindings. Not yet evaluated in Crosswalker KB — emerging candidate worth tracking
DuckDB-PGQ (extension)	Lives inside DuckDB single-file	Experimental property-graph query layer over DuckDB
simple-graph (SQL pattern)	Lives inside SQLite single-file	Per Ch 18 — recursive-CTE pattern over SQLite; what Crosswalker actually uses for transitive closure
Apache AGE	Postgres extension (in Postgres data dir)	NOT embedded standalone; requires Postgres. Demoted from Tier 3 default per Ch 16
Neo4j Embedded	Directory of files (Java)	Largely retired as Neo4j has shifted to server-only
Minigraf	Embedded property-graph (early stage)	Tracked in Ch 11 and Ch 12b; not adopted

Document

Engine	File shape	Notes
Crosswalker Tier 1	Directory of `.md` files with YAML frontmatter	A document store where the filesystem is the index. The canonical file-based DB Crosswalker actually ships
PouchDB	Browser IndexedDB / LevelDB	JS embedded document DB; CouchDB API; bidirectionally syncs with CouchDB
RxDB	Browser IndexedDB / OPFS	JS reactive document DB; multi-storage backend
NeDB	Single text file (JSON-lines)	Historical embedded JS document DB; abandoned
TinyDB	Single JSON file	Python embedded document DB
DuckLake (DuckDB JSON view)	Single DuckDB file with JSON columns	Document-shaped queries over DuckDB

Key-value (the storage primitive most others build on)

Engine	File shape	Notes
LevelDB	Directory of SST files	Google’s embedded KV; many higher-level DBs build on it
RocksDB	Directory of SST files	Facebook’s LevelDB fork. Oxigraph uses this internally
LMDB	Single memory-mapped file	Compact embedded KV with B+tree; very fast reads
IndexedDB	Browser-native	Embedded KV/document store in every browser
OPFS (Origin Private File System)	Browser-native filesystem	What `sqlite-wasm` uses for persistence in the browser

Vector / similarity search

Engine	File shape	Notes
`sqlite-vec` (Alex Garcia)	Lives inside SQLite single-file	Vector tables as SQLite extension. ✅ Crosswalker Tier 2 v0.1 commitment for vector search per Ch 18
DuckDB VSS (extension)	Lives inside DuckDB single-file	Vector similarity search for DuckDB
LanceDB	Single-file (`.lance` columnar)	Embedded vector DB; Rust core; columnar; has JS bindings
chromadb (embedded mode)	Directory of files	Python embedded vector DB; can run in-process
FAISS	Single binary index file	In-process similarity search library; not really a DB, more an index
hnswlib	Single binary index file	Same; index library
libSQL native vectors	Lives inside libSQL single-file	Native vector type built into libSQL fork; the strongest specific case for migrating to libSQL per Ch 24

Datalog / rule engines

Engine	File shape	Notes
Nemo	In-memory; reads/writes facts to text files	Per Ch 12 — Crosswalker’s chain-rule engine; deferred from v0.1 bundle, kept as researched back-pocket
Differential dataflow / Datafrog	In-memory; library only	Rust Datalog primitives; very fast incremental
Souffle (compiled)	Compiles Datalog → C++ binary	Heavyweight; for big static rule sets
Recursive CTE (in SQL)	Lives in SQLite single-file	Pure SQL emulation of Datalog rules; ✅ What Crosswalker v0.1 actually uses per Ch 12 + Ch 18

Multi-model embedded

These engines try to combine relational + graph + document + vector in one substrate. Tempting because Crosswalker has facets across multiple models — but each came with caveats.

Engine	Status in Crosswalker KB
CozoDB	❌ Rejected per Ch 14 — no release since v0.7; project signal too weak
SurrealDB (embedded mode)	❌ Rejected per Ch 14 — BSL license + 12.6 MB bundle
HelixDB	⚠️ Tracked-but-not-adopted per Ch 14 and Ch 16; too immature
TerminusDB (v12)	⚠️ Opt-in vault-mirror per Ch 16; single-vendor (DFRNT) flagged
Datalevin (Clojure)	Not yet evaluated; embedded multi-model with Clojure focus
DuckDB with extensions (vector + graph PGQ + JSON + FTS + geo)	⚠️ Multi-model-via-extensions on a single substrate; in third-wave back-pocket
cr-sqlite (CRDT SQLite)	❌ Rejected per Ch 14 — stalled
Grafeo	⚠️ Follow-up tracked in Ch 11

The pattern: multi-model embedded DBs look attractive on paper but have consistently failed the Crosswalker governance/maturity bar. The single-model embedded DBs (SQLite, DuckDB, Oxigraph, kuzu) have stronger track records.

Why this matters for Crosswalker design decisions

Three load-bearing consequences fall out of the embedded-vs-server split:

1. Why Tier 2 had to be SQLite (not DuckDB or Oxigraph) for v0.1

DuckDB is also single-file and embedded. Oxigraph is also embedded (library mode). Why did v0.1 pivot to sqlite-wasm specifically?

Bundle size — sqlite-wasm ~600 KB; DuckDB-WASM ~2.5–3 MB; Oxigraph WASM larger still
Governance — sqlite.org Consortium (multi-stakeholder, 20+ years) vs DuckDB Labs (single-vendor, ~5 years) vs Oxigraph (single-maintainer)
Maturity — every Obsidian plugin developer has shipped SQLite once; far fewer have shipped DuckDB-WASM
Ecosystem — sqlite-vec, FTS5, recursive CTEs cover Crosswalker’s actual workload per Ch 18

Per the v0.1 stack pivot, DuckDB-WASM and Oxigraph are researched back-pocket — ready when needed, not the v0.1 default.

2. Why Tier 3 is opt-in and outside the plugin

Every Tier 3 candidate (Postgres, Fuseki, oxigraph-server, TerminusDB, AGE) is a server. You can’t ship a server inside an Obsidian plugin. Tier 3 always requires the user to deploy a separate process. That’s the cost of getting beyond Tier 2’s scale ceiling.

3. Why “marketplace bundles” fit cleanly

The marketplace pattern ships Tier 1 only — directory of Markdown files. Substrate-irrelevant. A user who downloads a community-shared NIST 800-53 bundle gets a directory of .md files; their local Tier 2 reprojects them into whatever single-file SQLite they have. Server-database substrates can’t do this — the data lives inside the server’s private storage.

Long-horizon watch register

Substrates and adjacent file-based tools that Crosswalker has evaluated and chosen not to adopt today, with conditions for re-evaluation. The pattern: explicit, dated, reasoned non-adoption with falsifiable triggers is itself documentation. Cargo-culting would be refusing to revisit; conservatism with explicit triggers is engineering discipline.

Substrates under observation

Entry	Category	Status	Re-evaluation trigger
Turso Database / Limbo	Single-file embedded relational (Rust rewrite of SQLite, BETA)	Watch	Pre-1.0 stable; non-Turso fork or alternative steward emerges; Obsidian core / Dataview / Bases adopts Limbo. Review at v1.0 + 12 months. Per Ch 24 §7
libSQL-WASM	SQLite soft fork; native vector + replication	Reject (Q1 of Ch 24)	sqlite-vec lapses + libSQL becomes de facto standard, OR Obsidian itself adopts libSQL, OR Turso recommits libSQL as long-term peer to Limbo
Turso Cloud / sqld	Hosted edge-replicated libSQL Tier 3 option	Reject (Q2 of Ch 24)	Q1 triggers fire OR Turso publicly recommits Cloud to long-term libSQL backend
kuzu	Embedded property graph (Cypher); single-file `.kz`	Watch	Recursive-CTE-on-SQLite pattern hits expressivity wall on user-relevant graph queries; kuzu reaches 1.0 with stable file format; multi-stakeholder governance signal
LanceDB	Embedded columnar vector DB; Rust core	Watch	sqlite-vec maintenance lapses OR Crosswalker grows into AI-heavy semantic-search workloads where columnar vector storage matters
DuckDB-PGQ	Property graph extension on DuckDB single-file	Watch	DuckDB-WASM becomes Tier 2 alternative (back-pocket activated); user-facing graph queries need a non-recursive-CTE substrate
Stoolap	Modern Go-based embedded HTAP (column + row hybrid); single-file	Watch	Recursive-CTE-on-SQLite hits analytical performance wall; user wants single-file analytical engine without DuckDB’s ~2.5–3 MB bundle
Datalevin	Clojure embedded multi-model	Watch	Niche; architecturally interesting; revisit if Crosswalker grows a Clojure-flavored ecosystem
PouchDB / RxDB	JS embedded document DBs (IndexedDB-backed)	Watch	Future cross-vault sync features; Tier 1 is already a document store, but these could power “live-sync between two users editing the same crosswalk” workflows

Adjacent file-based tools — version control

Crosswalker’s audit trail commitment (Ch 08 + Ch 13 + Ch 15) is git as the v0.1 default — signed commits over Tier 1 markdown, with optional layered attestation primitives (RFC 3161, Sigstore Rekor v2 + in-toto, eIDAS QTSA + W3C VC) for higher compliance tiers. This is a settled commitment; the watch register tracks emerging alternatives.

Entry	Category	Status	Why it matters for Crosswalker
jj / jujutsu	Modern VCS; git-compatible front-end + standalone backend; “operations as data” model with first-class conflict objects and rich history rewriting	Watch	Three potential angles: (1) recipe versioning — recipes are YAML files that evolve continuously; jj’s conflict-as-data model handles “two users edited the same recipe” more gracefully than git merges; (2) marketplace bundle history — pre-transformed Tier 1 bundles need version history shareable across users; jj’s content-addressed history stays portable; (3) audit trail — jj inherits git’s signed-commit story while adding richer rewrite history that some compliance models could leverage. Revisit if jj reaches 1.0, the Obsidian community adopts it, or recipe-collaboration UX becomes a v1.0 priority
Pijul	Patch-theoretic VCS (Darcs descendant); content-addressed	Watch	More radical than jj; not git-compatible; theoretical model is elegant but adoption is small
Sapling (sl)	Meta’s Mercurial-derived VCS; git-compatible	Watch	Heavyweight; designed for monorepo workflows; less obviously relevant to Crosswalker’s vault-scale use case

These are not blockers for v0.1. Git remains the v0.1 audit-trail substrate. The watch register exists so a future reader can see what alternatives have been considered and on what grounds they remain unadopted.

Adjacent file-based tools — content-addressed storage

Entry	Category	Status	Why it matters for Crosswalker
IPLD	Content-addressed graph data	Watch	Marketplace bundle distribution — content-addressed Tier 1 directories could be shared via IPFS without trust assumptions about the source. Per Ch 12b deliverable, long-horizon
Unison	Content-addressed code/data; “everything is hashed”	Watch	Influential design; Unison-inspired patterns may appear in Crosswalker’s recipe versioning if recipes-by-hash becomes a workflow

SQLite’s contribution in one sentence

Database semantics don’t have to come at the cost of file-system simplicity.

The ChunkyCSV / JSONaut / SEACOW / folder-tag-sync portfolio (the project owner’s prior tools — see ETL and import § broader portfolio context) is essentially the same observation applied to ETL, knowledge organization, and tag/folder semantics. Crosswalker’s Tier 1 is the same observation applied to ontology storage: human-readable files with structured metadata, no server, no lock-in.

This is why the libSQL question (Ch 24) is high-stakes despite the substrate being “another single-file SQLite-shaped thing” — single-vendor governance is exactly the dimension SQLite optimized away.

File-based graph databases — Obsidian vault as a graph DB; complementary view
The problem — why files-not-servers matters for ontology lifecycle
v0.1 stack pivot (2026-05-02) — three-tier architecture commitment
v0.1 schema spec §7 — Tier 2 sidecar SQL DDL
Ch 11 — Tier 2/3 engine deep survey — original triplestore + DuckDB + Nemo evaluation
Ch 14 — Missed engines — multi-model rejections
Ch 16 — Tier 3 reconsideration — Fuseki vs AGE vs TerminusDB
Ch 18 — Tier 2-Lite scale ceiling — what the SQLite stack actually delivers
Ch 24 — Turso/libSQL evaluation (archived) — resolved 2026-05-04: REJECT all three Qs
2026-05-04 Tier 2 substrate synthesis log — vector-layer-decoupled-from-substrate as load-bearing modularity commitment
What makes Crosswalker unique — Spec / Library / Integrations philosophy that this substrate choice serves