Tagged: Import
Data import pipeline — parsing, transformation, generation
Pages with this tag
Agent tooling — progressive-disclosure space for AI agents helping with imports
1A structured surface for AI agents (Claude, GPT, etc.) that have been pointed at Crosswalker by a user who wants help transforming their data into Tier 1. Header here; body fills as the underlying specs land. Anyone — agent or human — can navigate from this page to the artifacts they need to do the import work.
Prior art
2Existing tools, research, and the history of the Crosswalker project.
Challenge 21: Should Crosswalker build its own import/ETL engine, adopt an existing one, or compose them? — long-term build-vs-buy with critical and adversarial thinking
3Ch 20 confirmed that Crosswalker's import primitive IS an ETL engine (specifically: graph-aware, semantically-constrained, format-diverse). That answered 'what shape should the primitive take.' This challenge asks the strictly upstream question: should Crosswalker BUILD its own ETL engine, ADOPT an existing one wholesale, or COMPOSE — wrap an existing engine and add a thin Tier-1-specific layer? Critical, long-term, opportunity-cost-aware. Evaluate against dbt, dlt, Singer/Meltano, Airbyte, Apache Hop, Kettle/PDI, RMLMapper, Morph-KGC, Apache Beam, Pandas, Arquero, JSONata, Jolt, the user's own ChunkyCSV + JSONaut, and others. Surface the maintenance/governance/single-vendor risks. Recommend a path that's durable for 5–10 years.
Challenge 20: First-principles primitive for the import side (archived)
4First-principles primitive for ImportRecipe. RESOLVED 2026-05-03 by three fresh-agent deliverables + a substantive user/agent dialog. See callout.
Challenge 22: Target-structure expressivity in import recipes (archived)
5RESOLVED 2026-05-04: closed grammar of 5 mechanisms (folder/file/heading/tag/wikilink) × ordered layout × also_emit × graph_edges; render(Recipe, ConceptIdentity) → Address as single coupling point; content addressing computed BEFORE render (target structure is a view); managed/user_preserve frontmatter split; v0.1 ships full schema with folder+file+heading wired. See callout.
Challenge 23: Bundle engine implementation language (archived)
6Bundle engine language — TypeScript in-plugin vs external Python vs hybrid vs Rust/Go-WASM vs JVM. RESOLVED 2026-05-04: Path A (Pure TS in-plugin) for v0.1; Path C (Hybrid: optional external producer) reserved for v0.5+. See callout.
Challenge 25: Two-mode import architecture and streaming
7Should Crosswalker's bundled engine accept a normalized intermediate format produced by external ETL tools, or should external producers emit Tier 1 directly? Resolves how ChunkyCSV / JSONaut / dbt / Polars compose with the bundled engine; resolves the streaming-at-the-right-boundary question. Filed and resolved same session 2026-05-05.
Import primitive formal foundation — Ch 20 synthesis (wargaming setup, not a decision log)
8Three Ch 20 fresh-agent deliverables landed 2026-05-03. Runs A and C converge on a transformation-algebra primitive set (5–6 primitives, RML/YARRRML retargeted to Tier-1, MTT-justified). Run B operates at a complementary boundary-semantics layer (ref/resolve/bind/seal, Backpack-style holes/fills, Nix content addressing). User declined to make Path A vs Path B v0.1 decisions yet — this log is a wargaming setup, not a decision-driver. Centerpiece: 7 concrete worked examples spanning simple-CSV to external-API-protocol scenarios; explicit wargaming questions; reference baseline (Obsidian Importer plugin) of overly-simplified import; the protocol-surface insight that Crosswalker's import side may be a protocol, not just internal Obsidian-plugin logic.
Two-mode import architecture decision — bundled projector + direct emission, both first-class
9Decision: Crosswalker supports two architectural modes for producing Tier 1 vaults — Mode 1 (hand structured rows to bundled engine) and Mode 2 (bypass engine, emit Tier 1 Markdown directly). Both are first-class. ChunkyCSV/JSONaut compose naturally with Mode 1. Streaming is at the engine boundary so huge inputs work. Supersedes earlier 'three producer paths' framing.
Ch 20 deliverable A: T1TMA — Tier-1 Term-Map Algebra (RML retargeted, 6 primitives, MTT-justified, lens-contracted)
10Fresh-agent research deliverable A for Challenge 20. Recommends a YARRRML-shaped DSL retargeted from RDF to Tier-1 Notes. Six primitives: ITERATE, REFERENCE, TEMPLATE, BIND, JOIN, INVERT. Closed Tier-1 slot vocabulary (id, label, body.section, frontmatter.k, links.role, folder, aliases, tags, metadata.sssom-key). JSONata as expression sub-language; CSVW as tabular type profile. Macro Tree Transducer theory as completeness justification; Foster/Pierce lens semantics for the round-trippable subset. ~480 KB total bundle. Concrete NIST 800-53 r5 worked example, v0.1→v0.2 transpiler plan, adversarial cognitive-load check.
Ch 20 deliverable B: Boundary semantics — ref / resolve / bind / seal (Backpack-style holes/fills, Nix content addressing, sheaf-theoretic gluing)
11Fresh-agent research deliverable B for Challenge 20. Operates at a different layer than Runs A and C: not transformation algebra, but boundary semantics. Four primitives: ref (typed reference, Selector + ContextHints), resolve (sandboxed effectful artifact retrieval producing Provenance), bind (artifact + LocalName → VaultDelta), seal (VaultDelta + Manifest → SealedImport). Identity = content-digest; names = presentation; versions = aliases. Append-only Merkle provenance. Object-capability-style sealed manifests. Backpack mixin holes/fills as the type-theoretic model; vaults-as-sheaves over a site of contexts as the categorical model. Three-layer syntax (vault root manifest, import declaration files, materialized notes). Theoretically rigorous; ranges from production-shippable to v3+ aspirational depending on adoption depth.
Ch 20 deliverable C: 5+4 primitive set — RML/YARRRML retargeted, FNML transforms, SSSOM/T filter→action overlay (s-t tgds + MTT + functorial migration justified)
12Fresh-agent research deliverable C for Challenge 20. Recommends a hybrid grounded in RML/YARRRML retargeted from RDF triples to Tier-1 Note tuples. Five import primitives (Source, Term, Map, Join, Function) producing into four output sinks (path, frontmatter, body, wikilink). Theoretical justification: RML's algebraic core matches data-exchange theory's source-to-target tuple-generating dependencies; the 5 primitives correspond to MTT operation classes; bidirectionality as opt-in lens contract on the round-trippable subset. Concrete NIST 800-53 r5 OSCAL JSON example with TypeScript schema and YAML surface. Convergent with Run A on the substantive recommendation; differs in framing (Run A names INVERT as a sixth primitive; Run C makes bidirectionality an annotation flag).