Embedded vs server substrates — the file-IS-the-database pattern
The architectural pattern
Section titled “The architectural pattern”Most databases are servers — long-running processes that own a private directory of files you must not touch directly. The user talks to the server via a network socket; the database files are an implementation detail.
A few databases reject this. They’re libraries the application links into. The application owns the database file. Anyone can copy it, version it, mail it, diff it. SQLite’s quiet revolution was proving this could work for a real DB.
Crosswalker’s three-tier architecture leans hard into this — Tier 1 and Tier 2 are both “the file IS the database,” in two different shapes.
Two distinct file-based patterns
Section titled “Two distinct file-based patterns”| Pattern | Shape | Examples |
|---|---|---|
| Single-file embedded DB | One opaque (usually binary) file holds everything; library API reads/writes it | SQLite, libSQL, DuckDB, kuzu |
| Directory-of-text-files DB | Many small human-readable files; the filesystem IS the database; any tool can read | Crosswalker Tier 1 (Markdown + YAML), Git, Obsidian vaults |
Both reject the server model. Both let cp / rsync / git diff / Dropbox / GitHub do the right thing without a daemon. Tier 1 is even more file-based than SQLite — every record is its own readable Markdown file.
How Crosswalker maps onto the axis
Section titled “How Crosswalker maps onto the axis”| Tier | File model | Substrate (v0.1) | Why |
|---|---|---|---|
| Tier 1 (canonical, must-work) | Directory of text files (Markdown + YAML) | Filesystem | Plain text, version-controllable, every tool can read, no server, no lock-in. The load-bearing commitment |
| Tier 2 (projection, recoverable) | Single-file embedded DB | @sqlite.org/sqlite-wasm (~600 KB) | Same anti-server philosophy, in DB-shape; deletable, reprojected from Tier 1; library not a server |
| Tier 3 (opt-in, recoverable) | Server with file-backing | Postgres / Fuseki / oxigraph-server / TerminusDB | Breaks the philosophy intentionally for power-user scale-up. Requires ops. Not in v0.1 |
The reason Tier 3 is opt-in and not bundled is exactly the embedded-vs-server distinction. A server can’t ship inside an Obsidian plugin.
The embedded landscape by data model
Section titled “The embedded landscape by data model”The user’s question: “are there OTHER embedded databases like the ones we’ve talked about — graph-based, document, etc. — but embedded?”
Yes. Embedded DBs exist across every major data model. Here’s the landscape, grouped by data model, with file-shape annotated and cross-linked to the Crosswalker research challenges that evaluated each.
Relational / SQL
Section titled “Relational / SQL”| Engine | File shape | Notes |
|---|---|---|
SQLite (@sqlite.org/sqlite-wasm) | Single-file .sqlite / .db | Canonical. Foundation-governed. ✅ Tier 2 v0.1 commitment. Per Ch 18, comfortably to ~100K mappings |
| libSQL (Turso) | Single-file (claims SQLite compat) | SQLite soft-fork; native vector + replication. Under evaluation — see Ch 24 |
| DuckDB(-WASM) | Single-file .duckdb | Analytical/columnar instead of OLTP. Was in the third-wave Tier 2 stack; deferred from v0.1 per the stack pivot (kept as researched back-pocket) |
| Stoolap | Single-file | Modern Go-based embedded SQL DB; column-store + row-store hybrid; HTAP (transactional + analytical in one engine); built for embedded use via Go API. Long-horizon track via Ch 12b — interesting if Crosswalker outgrows recursive-CTE patterns on sqlite-wasm and wants single-file analytical performance without DuckDB’s bundle weight |
| Turso Database / Limbo | Single-file (Rust rewrite of SQLite) | Pre-1.0; experimental. Long-horizon track via Ch 24 Q3 |
Graph — RDF / triplestore
Section titled “Graph — RDF / triplestore”| Engine | File shape | Notes |
|---|---|---|
| Oxigraph (library mode) | Directory of RocksDB files | Rust embedded RDF triplestore. SPARQL. Per Ch 11 — kept as opt-in federation layer; deferred from v0.1 bundle |
| HDT (Header-Dictionary-Triples) | Single-file (binary RDF) | Read-only after build; great for shipping pre-built RDF datasets. Per Ch 14 — opt-in federation format |
| Apache Jena TDB (embedded library mode) | Directory of files | Java embedded; usually wrapped by Fuseki as a server, but can run as library |
| rdflib (Python) | N-Triples / Turtle text files | Embedded RDF library; emits/parses RDF text formats |
| N3.js | RDF text files (Turtle / N3) | JS RDF library; small; embedded |
| Comunica | Federates over local + remote sources | Per Ch 14 — opt-in federation engine; runs in browser |
Graph — property graph (Cypher / GQL flavor)
Section titled “Graph — property graph (Cypher / GQL flavor)”| Engine | File shape | Notes |
|---|---|---|
| kuzu | Single-file .kz | Modern embedded property-graph DB. Cypher. C++ core; growing JS/Python bindings. Not yet evaluated in Crosswalker KB — emerging candidate worth tracking |
| DuckDB-PGQ (extension) | Lives inside DuckDB single-file | Experimental property-graph query layer over DuckDB |
| simple-graph (SQL pattern) | Lives inside SQLite single-file | Per Ch 18 — recursive-CTE pattern over SQLite; what Crosswalker actually uses for transitive closure |
| Apache AGE | Postgres extension (in Postgres data dir) | NOT embedded standalone; requires Postgres. Demoted from Tier 3 default per Ch 16 |
| Neo4j Embedded | Directory of files (Java) | Largely retired as Neo4j has shifted to server-only |
| Minigraf | Embedded property-graph (early stage) | Tracked in Ch 11 and Ch 12b; not adopted |
Document
Section titled “Document”| Engine | File shape | Notes |
|---|---|---|
| Crosswalker Tier 1 | Directory of .md files with YAML frontmatter | A document store where the filesystem is the index. The canonical file-based DB Crosswalker actually ships |
| PouchDB | Browser IndexedDB / LevelDB | JS embedded document DB; CouchDB API; bidirectionally syncs with CouchDB |
| RxDB | Browser IndexedDB / OPFS | JS reactive document DB; multi-storage backend |
| NeDB | Single text file (JSON-lines) | Historical embedded JS document DB; abandoned |
| TinyDB | Single JSON file | Python embedded document DB |
| DuckLake (DuckDB JSON view) | Single DuckDB file with JSON columns | Document-shaped queries over DuckDB |
Key-value (the storage primitive most others build on)
Section titled “Key-value (the storage primitive most others build on)”| Engine | File shape | Notes |
|---|---|---|
| LevelDB | Directory of SST files | Google’s embedded KV; many higher-level DBs build on it |
| RocksDB | Directory of SST files | Facebook’s LevelDB fork. Oxigraph uses this internally |
| LMDB | Single memory-mapped file | Compact embedded KV with B+tree; very fast reads |
| IndexedDB | Browser-native | Embedded KV/document store in every browser |
| OPFS (Origin Private File System) | Browser-native filesystem | What sqlite-wasm uses for persistence in the browser |
Vector / similarity search
Section titled “Vector / similarity search”| Engine | File shape | Notes |
|---|---|---|
sqlite-vec (Alex Garcia) | Lives inside SQLite single-file | Vector tables as SQLite extension. ✅ Crosswalker Tier 2 v0.1 commitment for vector search per Ch 18 |
| DuckDB VSS (extension) | Lives inside DuckDB single-file | Vector similarity search for DuckDB |
| LanceDB | Single-file (.lance columnar) | Embedded vector DB; Rust core; columnar; has JS bindings |
| chromadb (embedded mode) | Directory of files | Python embedded vector DB; can run in-process |
| FAISS | Single binary index file | In-process similarity search library; not really a DB, more an index |
| hnswlib | Single binary index file | Same; index library |
| libSQL native vectors | Lives inside libSQL single-file | Native vector type built into libSQL fork; the strongest specific case for migrating to libSQL per Ch 24 |
Datalog / rule engines
Section titled “Datalog / rule engines”| Engine | File shape | Notes |
|---|---|---|
| Nemo | In-memory; reads/writes facts to text files | Per Ch 12 — Crosswalker’s chain-rule engine; deferred from v0.1 bundle, kept as researched back-pocket |
| Differential dataflow / Datafrog | In-memory; library only | Rust Datalog primitives; very fast incremental |
| Souffle (compiled) | Compiles Datalog → C++ binary | Heavyweight; for big static rule sets |
| Recursive CTE (in SQL) | Lives in SQLite single-file | Pure SQL emulation of Datalog rules; ✅ What Crosswalker v0.1 actually uses per Ch 12 + Ch 18 |
Multi-model embedded
Section titled “Multi-model embedded”These engines try to combine relational + graph + document + vector in one substrate. Tempting because Crosswalker has facets across multiple models — but each came with caveats.
| Engine | Status in Crosswalker KB |
|---|---|
| CozoDB | ❌ Rejected per Ch 14 — no release since v0.7; project signal too weak |
| SurrealDB (embedded mode) | ❌ Rejected per Ch 14 — BSL license + 12.6 MB bundle |
| HelixDB | ⚠️ Tracked-but-not-adopted per Ch 14 and Ch 16; too immature |
| TerminusDB (v12) | ⚠️ Opt-in vault-mirror per Ch 16; single-vendor (DFRNT) flagged |
| Datalevin (Clojure) | Not yet evaluated; embedded multi-model with Clojure focus |
| DuckDB with extensions (vector + graph PGQ + JSON + FTS + geo) | ⚠️ Multi-model-via-extensions on a single substrate; in third-wave back-pocket |
| cr-sqlite (CRDT SQLite) | ❌ Rejected per Ch 14 — stalled |
| Grafeo | ⚠️ Follow-up tracked in Ch 11 |
The pattern: multi-model embedded DBs look attractive on paper but have consistently failed the Crosswalker governance/maturity bar. The single-model embedded DBs (SQLite, DuckDB, Oxigraph, kuzu) have stronger track records.
Why this matters for Crosswalker design decisions
Section titled “Why this matters for Crosswalker design decisions”Three load-bearing consequences fall out of the embedded-vs-server split:
1. Why Tier 2 had to be SQLite (not DuckDB or Oxigraph) for v0.1
Section titled “1. Why Tier 2 had to be SQLite (not DuckDB or Oxigraph) for v0.1”DuckDB is also single-file and embedded. Oxigraph is also embedded (library mode). Why did v0.1 pivot to sqlite-wasm specifically?
- Bundle size —
sqlite-wasm~600 KB; DuckDB-WASM ~2.5–3 MB; Oxigraph WASM larger still - Governance — sqlite.org Consortium (multi-stakeholder, 20+ years) vs DuckDB Labs (single-vendor, ~5 years) vs Oxigraph (single-maintainer)
- Maturity — every Obsidian plugin developer has shipped SQLite once; far fewer have shipped DuckDB-WASM
- Ecosystem —
sqlite-vec, FTS5, recursive CTEs cover Crosswalker’s actual workload per Ch 18
Per the v0.1 stack pivot, DuckDB-WASM and Oxigraph are researched back-pocket — ready when needed, not the v0.1 default.
2. Why Tier 3 is opt-in and outside the plugin
Section titled “2. Why Tier 3 is opt-in and outside the plugin”Every Tier 3 candidate (Postgres, Fuseki, oxigraph-server, TerminusDB, AGE) is a server. You can’t ship a server inside an Obsidian plugin. Tier 3 always requires the user to deploy a separate process. That’s the cost of getting beyond Tier 2’s scale ceiling.
3. Why “marketplace bundles” fit cleanly
Section titled “3. Why “marketplace bundles” fit cleanly”The marketplace pattern ships Tier 1 only — directory of Markdown files. Substrate-irrelevant. A user who downloads a community-shared NIST 800-53 bundle gets a directory of .md files; their local Tier 2 reprojects them into whatever single-file SQLite they have. Server-database substrates can’t do this — the data lives inside the server’s private storage.
Long-horizon watch register
Section titled “Long-horizon watch register”Substrates and adjacent file-based tools that Crosswalker has evaluated and chosen not to adopt today, with conditions for re-evaluation. The pattern: explicit, dated, reasoned non-adoption with falsifiable triggers is itself documentation. Cargo-culting would be refusing to revisit; conservatism with explicit triggers is engineering discipline.
Substrates under observation
Section titled “Substrates under observation”| Entry | Category | Status | Re-evaluation trigger |
|---|---|---|---|
| Turso Database / Limbo | Single-file embedded relational (Rust rewrite of SQLite, BETA) | Watch | Pre-1.0 stable; non-Turso fork or alternative steward emerges; Obsidian core / Dataview / Bases adopts Limbo. Review at v1.0 + 12 months. Per Ch 24 §7 |
| libSQL-WASM | SQLite soft fork; native vector + replication | Reject (Q1 of Ch 24) | sqlite-vec lapses + libSQL becomes de facto standard, OR Obsidian itself adopts libSQL, OR Turso recommits libSQL as long-term peer to Limbo |
| Turso Cloud / sqld | Hosted edge-replicated libSQL Tier 3 option | Reject (Q2 of Ch 24) | Q1 triggers fire OR Turso publicly recommits Cloud to long-term libSQL backend |
| kuzu | Embedded property graph (Cypher); single-file .kz | Watch | Recursive-CTE-on-SQLite pattern hits expressivity wall on user-relevant graph queries; kuzu reaches 1.0 with stable file format; multi-stakeholder governance signal |
| LanceDB | Embedded columnar vector DB; Rust core | Watch | sqlite-vec maintenance lapses OR Crosswalker grows into AI-heavy semantic-search workloads where columnar vector storage matters |
| DuckDB-PGQ | Property graph extension on DuckDB single-file | Watch | DuckDB-WASM becomes Tier 2 alternative (back-pocket activated); user-facing graph queries need a non-recursive-CTE substrate |
| Stoolap | Modern Go-based embedded HTAP (column + row hybrid); single-file | Watch | Recursive-CTE-on-SQLite hits analytical performance wall; user wants single-file analytical engine without DuckDB’s ~2.5–3 MB bundle |
| Datalevin | Clojure embedded multi-model | Watch | Niche; architecturally interesting; revisit if Crosswalker grows a Clojure-flavored ecosystem |
| PouchDB / RxDB | JS embedded document DBs (IndexedDB-backed) | Watch | Future cross-vault sync features; Tier 1 is already a document store, but these could power “live-sync between two users editing the same crosswalk” workflows |
Adjacent file-based tools — version control
Section titled “Adjacent file-based tools — version control”Crosswalker’s audit trail commitment (Ch 08 + Ch 13 + Ch 15) is git as the v0.1 default — signed commits over Tier 1 markdown, with optional layered attestation primitives (RFC 3161, Sigstore Rekor v2 + in-toto, eIDAS QTSA + W3C VC) for higher compliance tiers. This is a settled commitment; the watch register tracks emerging alternatives.
| Entry | Category | Status | Why it matters for Crosswalker |
|---|---|---|---|
| jj / jujutsu | Modern VCS; git-compatible front-end + standalone backend; “operations as data” model with first-class conflict objects and rich history rewriting | Watch | Three potential angles: (1) recipe versioning — recipes are YAML files that evolve continuously; jj’s conflict-as-data model handles “two users edited the same recipe” more gracefully than git merges; (2) marketplace bundle history — pre-transformed Tier 1 bundles need version history shareable across users; jj’s content-addressed history stays portable; (3) audit trail — jj inherits git’s signed-commit story while adding richer rewrite history that some compliance models could leverage. Revisit if jj reaches 1.0, the Obsidian community adopts it, or recipe-collaboration UX becomes a v1.0 priority |
| Pijul | Patch-theoretic VCS (Darcs descendant); content-addressed | Watch | More radical than jj; not git-compatible; theoretical model is elegant but adoption is small |
| Sapling (sl) | Meta’s Mercurial-derived VCS; git-compatible | Watch | Heavyweight; designed for monorepo workflows; less obviously relevant to Crosswalker’s vault-scale use case |
These are not blockers for v0.1. Git remains the v0.1 audit-trail substrate. The watch register exists so a future reader can see what alternatives have been considered and on what grounds they remain unadopted.
Adjacent file-based tools — content-addressed storage
Section titled “Adjacent file-based tools — content-addressed storage”| Entry | Category | Status | Why it matters for Crosswalker |
|---|---|---|---|
| IPLD | Content-addressed graph data | Watch | Marketplace bundle distribution — content-addressed Tier 1 directories could be shared via IPFS without trust assumptions about the source. Per Ch 12b deliverable, long-horizon |
| Unison | Content-addressed code/data; “everything is hashed” | Watch | Influential design; Unison-inspired patterns may appear in Crosswalker’s recipe versioning if recipes-by-hash becomes a workflow |
SQLite’s contribution in one sentence
Section titled “SQLite’s contribution in one sentence”Database semantics don’t have to come at the cost of file-system simplicity.
The ChunkyCSV / JSONaut / SEACOW / folder-tag-sync portfolio (the project owner’s prior tools — see ETL and import § broader portfolio context) is essentially the same observation applied to ETL, knowledge organization, and tag/folder semantics. Crosswalker’s Tier 1 is the same observation applied to ontology storage: human-readable files with structured metadata, no server, no lock-in.
This is why the libSQL question (Ch 24) is high-stakes despite the substrate being “another single-file SQLite-shaped thing” — single-vendor governance is exactly the dimension SQLite optimized away.
Related
Section titled “Related”- File-based graph databases — Obsidian vault as a graph DB; complementary view
- The problem — why files-not-servers matters for ontology lifecycle
- v0.1 stack pivot (2026-05-02) — three-tier architecture commitment
- v0.1 schema spec §7 — Tier 2 sidecar SQL DDL
- Ch 11 — Tier 2/3 engine deep survey — original triplestore + DuckDB + Nemo evaluation
- Ch 14 — Missed engines — multi-model rejections
- Ch 16 — Tier 3 reconsideration — Fuseki vs AGE vs TerminusDB
- Ch 18 — Tier 2-Lite scale ceiling — what the SQLite stack actually delivers
- Ch 24 — Turso/libSQL evaluation (archived) — resolved 2026-05-04: REJECT all three Qs
- 2026-05-04 Tier 2 substrate synthesis log — vector-layer-decoupled-from-substrate as load-bearing modularity commitment
- What makes Crosswalker unique — Spec / Library / Integrations philosophy that this substrate choice serves