Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Transform engine research

Created Updated

Should Crosswalker use an existing declarative transformation engine or build custom?

EngineGood atWhy not for Crosswalker
JSONataJSON query + transformationNot tabular-aware; overhead for simple CSV operations
ArqueroSQL-like column operationsNo hierarchical output; no Obsidian integration
Apache ArrowColumnar analytics at scaleOverengineered for under 100K rows; 200KB+ bundle
DanfoJSPandas-like for JSML/numerical focus; TensorFlow overhead
Polars10-30x faster than alternativesNo native JavaScript binding
dbtDeclarative SQL transformsWarehouse-only; not applicable to files
OpenETL / ETL-GunDatabase I/O pipelinesInfrastructure overhead; wrong abstraction
n8n / Retool / ZapierSaaS workflow automationNot embeddable in a plugin
lodash/fp / RamdaFunctional compositionToo low-level; no declarative config format
csv-parse + csv-stringifyCSV stream transformsNo structured transform DSL
jq (JS ports)JSON filteringCLI-focused; awkward in browser
JSON Schema + JSON PatchSchema validation + patchingPatching, not transformation
XSLTXML transformationXML-only concept (though the declarative model applies)
Great ExpectationsData validationValidation, not transformation
CriteriaCustom engineBest alternative (Arquero)
Bundle size~2KB~180KB
Performance (1000 rows)under 25ms30-40ms
Obsidian-native outputBuilt-in (WikiLinks, YAML, vault paths)Not supported
Hierarchical folder generationNativeNot supported
Per-framework transformsNative (tag aggregation, forward-fill, ID normalization)Requires custom wrapper
Learning curveNone (our code)Medium (their API)
Breaking updates riskZeroLibrary-dependent
  • Arquero — for cross-row aggregation on huge datasets (>100K rows)
  • JSONata — for complex JSON expression transforms (opt-in advanced feature)
  • Apache Arrow — for massive datasets (>1M rows, unlikely for framework data)

These would plug into the Transform stage of the pipeline as optional providers, not replacements.

ChunkyCSV does structured-to-tabular conversion with search and flatten operations. These are specific transform operations that the custom engine should handle natively — the engine’s design should be informed by ChunkyCSV’s patterns. Earlier log.

24 transform types across 4 categories, implementable in 3-5 days:

  • String: trim, lowercase, uppercase, titlecase, replace, regex_extract, prefix, suffix, template
  • Array: split, join, unique, filter, first, last, map
  • Type: to_number, to_boolean, to_date, to_tags, to_wikilinks
  • Conditional: if_empty, if_matches, coalesce, lookup

Plus the per-framework transforms (hierarchical forward-fill, tag aggregation, ID normalization, preamble extraction) as built-in operations.