🚧 Early alpha — building the foundation. See the roadmap →
Transform engine research
The question
Section titled “The question”Should Crosswalker use an existing declarative transformation engine or build custom?
Engines evaluated
Section titled “Engines evaluated”| Engine | Good at | Why not for Crosswalker |
|---|---|---|
| JSONata | JSON query + transformation | Not tabular-aware; overhead for simple CSV operations |
| Arquero | SQL-like column operations | No hierarchical output; no Obsidian integration |
| Apache Arrow | Columnar analytics at scale | Overengineered for under 100K rows; 200KB+ bundle |
| DanfoJS | Pandas-like for JS | ML/numerical focus; TensorFlow overhead |
| Polars | 10-30x faster than alternatives | No native JavaScript binding |
| dbt | Declarative SQL transforms | Warehouse-only; not applicable to files |
| OpenETL / ETL-Gun | Database I/O pipelines | Infrastructure overhead; wrong abstraction |
| n8n / Retool / Zapier | SaaS workflow automation | Not embeddable in a plugin |
| lodash/fp / Ramda | Functional composition | Too low-level; no declarative config format |
| csv-parse + csv-stringify | CSV stream transforms | No structured transform DSL |
| jq (JS ports) | JSON filtering | CLI-focused; awkward in browser |
| JSON Schema + JSON Patch | Schema validation + patching | Patching, not transformation |
| XSLT | XML transformation | XML-only concept (though the declarative model applies) |
| Great Expectations | Data validation | Validation, not transformation |
Decision: Build custom
Section titled “Decision: Build custom”| Criteria | Custom engine | Best alternative (Arquero) |
|---|---|---|
| Bundle size | ~2KB | ~180KB |
| Performance (1000 rows) | under 25ms | 30-40ms |
| Obsidian-native output | Built-in (WikiLinks, YAML, vault paths) | Not supported |
| Hierarchical folder generation | Native | Not supported |
| Per-framework transforms | Native (tag aggregation, forward-fill, ID normalization) | Requires custom wrapper |
| Learning curve | None (our code) | Medium (their API) |
| Breaking updates risk | Zero | Library-dependent |
Escape hatches (future, optional)
Section titled “Escape hatches (future, optional)”- Arquero — for cross-row aggregation on huge datasets (>100K rows)
- JSONata — for complex JSON expression transforms (opt-in advanced feature)
- Apache Arrow — for massive datasets (>1M rows, unlikely for framework data)
These would plug into the Transform stage of the pipeline as optional providers, not replacements.
Connection to ChunkyCSV
Section titled “Connection to ChunkyCSV”ChunkyCSV does structured-to-tabular conversion with search and flatten operations. These are specific transform operations that the custom engine should handle natively — the engine’s design should be informed by ChunkyCSV’s patterns. Earlier log.
Implementation scope
Section titled “Implementation scope”24 transform types across 4 categories, implementable in 3-5 days:
- String: trim, lowercase, uppercase, titlecase, replace, regex_extract, prefix, suffix, template
- Array: split, join, unique, filter, first, last, map
- Type: to_number, to_boolean, to_date, to_tags, to_wikilinks
- Conditional: if_empty, if_matches, coalesce, lookup
Plus the per-framework transforms (hierarchical forward-fill, tag aggregation, ID normalization, preamble extraction) as built-in operations.