DuckDB at the Core
SQL-first transformations and in-process analytics powered by DuckDB
DuckDB at the Core
ETLX is built around DuckDB as its execution engine. At its core, ETLX embraces a SQL-first philosophy, enabling powerful in-process analytics and transformations without requiring external compute engines or distributed systems.
DuckDB acts as the analytical backbone of the pipeline, executing transformations, validations, exports, and even orchestration-related logic using standard SQL.
🧠 Why DuckDB?
DuckDB is a modern analytical database designed for OLAP workloads, embedded directly into applications. This makes it a perfect fit for ETLX.
Key advantages:
- Embedded & In-process – No external service to manage
- Columnar execution – High performance for analytical queries
- Rich SQL support – Window functions, CTEs, complex joins
- Extensible – Load extensions (SQLite, Postgres, Excel, JSON, HTTP, Parquet, etc.)
- Portable – Same SQL runs locally, in CI, or in production
ETLX leverages all of these features while keeping the workflow declarative and reproducible.
🔹 SQL-First Transformations
In ETLX, SQL is the primary transformation language.
This means:
- Business logic is written in plain SQL
- Transformations are self-documented
- Pipelines are easier to review, audit, and version-control
Example:
SELECT
customer_id,
SUM(amount) AS total_spent,
COUNT(*) AS total_orders
FROM sales
GROUP BY customer_id;
No DSLs. No custom operators. Just SQL.
🔹 In-Process Analytics
Because DuckDB runs inside the ETLX process, analytics are:
- Executed in-memory or on local disk
- Free from network latency
- Deterministic and easy to debug
This enables advanced use cases such as:
- Ad-hoc analytics during ETL
- Data quality checks
- Profiling and statistics generation
- Report aggregation
- Metadata-driven validation
All without shipping data to an external engine.
🔹 One Engine, Multiple Data Sources
DuckDB allows ETLX to query multiple data sources using SQL:
- Local files (CSV, Parquet, JSON, Excel)
- SQLite databases
- Postgres / MySQL (via extensions)
- HTTP / S3-compatible storage
Example:
SELECT *
FROM read_parquet('s3://bucket/data/*.parquet');
This enables federated analytics while keeping a single execution engine.
🔹 Foundation for ETLX Features
DuckDB powers almost every ETLX feature:
- ETL / ELT transformations
- DATA_QUALITY validations
- MULTI_QUERIES aggregation
- EXPORTS (CSV, Excel, templates)
- LOGS persistence
- SCRIPTS execution
By standardizing on DuckDB, ETLX ensures consistent behavior across all pipeline stages.
🎯 Design Philosophy
If it can be expressed in SQL, ETLX should execute it.
DuckDB enables ETLX to remain:
- Minimal – fewer moving parts
- Transparent – SQL is visible and inspectable
- Performant – optimized analytical execution
- Portable – runs anywhere
✅ Summary
- DuckDB is the core execution engine of ETLX
- Enables SQL-first, declarative pipelines
- Provides in-process analytics with no external dependencies
- Powers transformations, validations, exports, and observability
DuckDB is not just a dependency in ETLX — it is the foundation.
Last updated 05 Jan 2026, 15:03 -01 .