Overview

Central Set uses ETLX as its data integration and transformation engine.

When used as recommended β€” with DuckDB as the query engine β€” ETLX gives Central Set the ability to integrate data from a vast ecosystem of data sources, normalize and transform it in a secure and reproducible way, and expose it through standardized consumption protocols such as Apache Arrow Flight or OData v4.

The result is a unified, secure, analytics-ready data layer that can sit in front of operational systems, data lakes, files, APIs, and cloud platforms.


ETLX + DuckDB: The Integration Core

ETLX is designed around declarative, SQL-first data integration, using DuckDB as its execution engine.

DuckDB provides:

  • High-performance analytical execution
  • Native support for Apache Arrow
  • A rich extension ecosystem
  • Embedded, serverless-friendly deployment

ETLX builds on top of this by:

  • Managing connections and credentials securely
  • Orchestrating ingestion, transformation, and exposure
  • Making integrations reproducible and environment-safe

πŸ”— ETLX: https://github.com/realdatadriven/etlx
πŸ”— DuckDB: https://duckdb.org


Supported Data Sources

Through DuckDB core extensions and community extensions, Central Set can integrate data from many (and growing) sources.

File-Based Sources

Native or extension-based support for:

  • Parquet
  • CSV
  • JSON
  • Avro
  • Spreadsheets (Excel / Sheets-like formats)
  • Vortex

πŸ”— DuckDB File Formats:
https://duckdb.org/docs/data/overview


Relational Databases

Supported via native connectors and extensions:

  • PostgreSQL
  • MySQL
  • SQLite
  • ODBC (planned / evolving)

πŸ”— DuckDB PostgreSQL Extension:
https://duckdb.org/docs/extensions/postgres

πŸ”— DuckDB MySQL Extension:
https://duckdb.org/docs/extensions/mysql

πŸ”— DuckDB SQLite Scanner:
https://duckdb.org/docs/extensions/sqlite


Open Table & Lakehouse Formats

Modern analytics and lakehouse integrations:

  • Apache Iceberg
  • Delta Lake
  • Unity Catalog

πŸ”— DuckDB Iceberg Extension:
https://duckdb.org/docs/extensions/iceberg

πŸ”— DuckDB Delta Extension:
https://duckdb.org/docs/extensions/delta


Arrow, Flight & ADBC Ecosystem

Deep integration with the Arrow ecosystem:

  • Apache Arrow
  • Apache Arrow Flight
  • ADBC Drivers

These are foundational for how Central Set exposes data efficiently and safely.

πŸ”— Apache Arrow:
https://arrow.apache.org

πŸ”— Arrow Flight:
https://arrow.apache.org/docs/format/Flight.html

πŸ”— ADBC:
https://arrow.apache.org/adbc/


Community & Advanced Integrations

Through DuckDB community extensions and ETLX adapters, the ecosystem expands almost endlessly:

  • Web APIs
  • OData
  • BigQuery
  • Snowflake
  • Google Sheets
  • Microsoft SQL Server
  • SSH-based sources
  • Web scraping
  • XML / HTML
  • Custom connectors

This allows Central Set to act as a data unification layer, even across heterogeneous, non-traditional sources.

πŸ”— DuckDB Community Extensions:
https://duckdb.org/docs/extensions/overview


Secure Credential Management

All integrations are designed with security-first principles.

ETLX supports:

  • Environment variable injection (@ENV_VAR_NAME)
  • DuckDB CREATE SECRET syntax
  • Token- and key-based authentication
  • No hardcoded credentials in configs or SQL

This makes configurations safe to:

  • Commit to version control
  • Share across environments
  • Deploy in containerized or cloud setups

Transform Once, Expose Anywhere

Once data is integrated and transformed, Central Set can expose it in multiple standardized ways.

Apache Arrow Flight

  • High-performance, columnar transport
  • Language-agnostic (Python, Go, Java, Rust, etc.)
  • Ideal for analytics, ML, and large-scale consumers
  • Secure, token-authenticated access

Arrow Flight is the primary and recommended exposure protocol.


OData v4

  • REST-based, standards-compliant API
  • Ideal for BI tools, dashboards, and lightweight consumers
  • Supports filtering, projection, and pagination
  • Best suited for smaller or interactive workloads

Access Control & Data Scoping

All exposed data respects Central Set’s access rules:

  • Authorization header is required
  • Access tokens are created in:
    • Admin β†’ Access Keys
  • Tokens are scoped to users
  • Users must have access to the underlying resources
  • Row-level access rules, when defined, are enforced

This allows Central Set to:

  • Securely expose shared datasets
  • Scope data per tenant, team, or user
  • Reuse the same integrations safely across consumers

Why This Matters

By combining:

  • ETLX for integration and orchestration
  • DuckDB for execution
  • Arrow Flight and OData for exposure
  • Strong access control and security

Central Set becomes a single, unified gateway between raw data sources and final consumers β€” without forcing data duplication, fragile pipelines, or vendor lock-in.


πŸ‘‰ For hands-on examples, configuration files, and real-world setups, see the documentation site:
https://realdatadriven.github.io/central-set-go/

Last updated 29 Jan 2026, 10:02 -01 . history

Was this page helpful?