On this page

DuckDB at the Core

SQL-first transformations and in-process analytics powered by DuckDB

DuckDB at the Core

ETLX is built around DuckDB as its execution engine. At its core, ETLX embraces a SQL-first philosophy, enabling powerful in-process analytics and transformations without requiring external compute engines or distributed systems.

DuckDB acts as the analytical backbone of the pipeline, executing transformations, validations, exports, and even orchestration-related logic using standard SQL.

🧠 Why DuckDB?

DuckDB is a modern analytical database designed for OLAP workloads, embedded directly into applications. This makes it a perfect fit for ETLX.

Key advantages:

Embedded & In-process – No external service to manage
Columnar execution – High performance for analytical queries
Rich SQL support – Window functions, CTEs, complex joins
Extensible – Load extensions (SQLite, Postgres, Excel, JSON, HTTP, Parquet, etc.)
Portable – Same SQL runs locally, in CI, or in production

ETLX leverages all of these features while keeping the workflow declarative and reproducible.

🔹 SQL-First Transformations

In ETLX, SQL is the primary transformation language.

This means:

Business logic is written in plain SQL
Transformations are self-documented
Pipelines are easier to review, audit, and version-control

Example:

  SELECT
  customer_id,
  SUM(amount) AS total_spent,
  COUNT(*) AS total_orders
FROM sales
GROUP BY customer_id;

No DSLs. No custom operators. Just SQL.

🔹 In-Process Analytics

Because DuckDB runs inside the ETLX process, analytics are:

Executed in-memory or on local disk
Free from network latency
Deterministic and easy to debug

This enables advanced use cases such as:

Ad-hoc analytics during ETL
Data quality checks
Profiling and statistics generation
Report aggregation
Metadata-driven validation

All without shipping data to an external engine.

🔹 One Engine, Multiple Data Sources

DuckDB allows ETLX to query multiple data sources using SQL:

Local files (CSV, Parquet, JSON, Excel)
SQLite databases
Postgres / MySQL (via extensions)
HTTP / S3-compatible storage

Example:

  SELECT *
FROM read_parquet('s3://bucket/data/*.parquet');

This enables federated analytics while keeping a single execution engine.

🔹 Foundation for ETLX Features

DuckDB powers almost every ETLX feature:

ETL / ELT transformations
DATA_QUALITY validations
MULTI_QUERIES aggregation
EXPORTS (CSV, Excel, templates)
LOGS persistence
SCRIPTS execution

By standardizing on DuckDB, ETLX ensures consistent behavior across all pipeline stages.

🎯 Design Philosophy

If it can be expressed in SQL, ETLX should execute it.

DuckDB enables ETLX to remain:

Minimal – fewer moving parts
Transparent – SQL is visible and inspectable
Performant – optimized analytical execution
Portable – runs anywhere

✅ Summary

DuckDB is the core execution engine of ETLX
Enables SQL-first, declarative pipelines
Provides in-process analytics with no external dependencies
Powers transformations, validations, exports, and observability

DuckDB is not just a dependency in ETLX — it is the foundation.

Edit this page

Last updated 05 Jan 2026, 15:03 -01 . history

Embedding in GO

Embedding ETLX in Go …

Multi-Engine Execution

Run ETLX pipelines across …

DuckDB at the Core

DuckDB at the Core link

🧠 Why DuckDB? link

🔹 SQL-First Transformations link

🔹 In-Process Analytics link

🔹 One Engine, Multiple Data Sources link

🔹 Foundation for ETLX Features link

🎯 Design Philosophy link

✅ Summary link

DuckDB at the Core

🧠 Why DuckDB?

🔹 SQL-First Transformations

🔹 In-Process Analytics

🔹 One Engine, Multiple Data Sources

🔹 Foundation for ETLX Features

🎯 Design Philosophy

✅ Summary