ETLX Workflow Visualization
How Central Set automatically generates visual workflows from ETLX models using depends_on and query inference.
ETLX is not just for executing pipelines โ it also enables automatic workflow visualization.
Every time an ETLX Markdown model is processed, Central Set:
- parses the structure
- detects dependencies
- generates a graph (nodes + edges)
- renders a Mermaid workflow diagram
๐ง Concept
Each ETLX model is composed of:
- Level 1 โ logical groups (e.g.
EXTRACT_LOAD,TRANSFORM,QUALITY_CHECK) - Level 2 โ actual nodes (e.g.
TRIP_DATA,ZONES,MostPopularRoutes)
These are transformed into a graph structure:
Level 1 โ Subgraph
Level 2 โ Node
Dependencies โ Edges
๐ Defining Dependencies
โ
Explicit Dependencies (depends_on)
You can explicitly define dependencies using the depends_on key.
depends_on:
- EXTRACT_LOAD.TRIP_DATA
- EXTRACT_LOAD.ZONES
Rules
- Must be a list (array)
- Format:
LEVEL1.LEVEL2
Example:
TRANSFORM.MostPopularRoutes depends_on:
- EXTRACT_LOAD.TRIP_DATA
- EXTRACT_LOAD.ZONES
This generates edges:
TRIP_DATA โ MostPopularRoutes
ZONES โ MostPopularRoutes
๐ค Inferred Dependencies (Automatic)
If depends_on is not defined, Central Set will infer dependencies automatically.
How it works
- The system scans queries (SQL, ETLX steps, etc.)
- If a query references a Level 2 name, it assumes a dependency
Example
SELECT *
FROM TRIP_DATA
โก๏ธ Central Set infers:
TRANSFORM.X depends_on EXTRACT_LOAD.TRIP_DATA
๐งฉ Resolution Logic
Dependency resolution follows:
- Explicit
depends_on(highest priority) - Query-based inference
- No dependency โ standalone node
๐งฑ Graph Generation
From the model, Central Set generates:
Nodes
- Each
LEVEL2becomes a node - Grouped by
LEVEL1into subgraphs
Edges
- Created from
depends_onor inferred relationships
๐ Mermaid Workflow Generation
The graph is converted into a Mermaid flowchart.
Example:
```mermaid
---
config:
look: handDrawn
theme: neutral
---
flowchart LR
%% NODES
subgraph extract_load["EXTRACT_LOAD (ETL)"]
extract_load_trip_data["TRIP_DATA"]
extract_load_zones["ZONES"]
end
subgraph transform["TRANSFORM (ETL)"]
transform_mostpopularroutes["MostPopularRoutes"]
end
subgraph quality_check["QUALITY_CHECK (DATA_QUALITY)"]
quality_check_rule0001["Rule0001"]
quality_check_rule0002["Rule0002"]
end
%% EDGES (generated from depends_on keys)
extract_load_trip_data --> transform_mostpopularroutes
extract_load_zones --> transform_mostpopularroutes
extract_load_trip_data --> quality_check_rule0001
extract_load_trip_data --> quality_check_rule0002
```
Resolves to:
---
config:
look: handDrawn
theme: neutral
---
flowchart LR
%% NODES
subgraph extract_load["EXTRACT_LOAD (ETL)"]
extract_load_trip_data["TRIP_DATA"]
extract_load_zones["ZONES"]
end
subgraph transform["TRANSFORM (ETL)"]
transform_mostpopularroutes["MostPopularRoutes"]
end
subgraph quality_check["QUALITY_CHECK (DATA_QUALITY)"]
quality_check_rule0001["Rule0001"]
quality_check_rule0002["Rule0002"]
end
%% EDGES (generated from depends_on keys)
extract_load_trip_data --> transform_mostpopularroutes
extract_load_zones --> transform_mostpopularroutes
extract_load_trip_data --> quality_check_rule0001
extract_load_trip_data --> quality_check_rule0002
๐ผ๏ธ Visual Output
The generated Mermaid diagram is rendered as a workflow graph:
- Subgraphs โ represent ETL stages
- Nodes โ represent datasets or rules
- Edges โ represent dependencies
This allows you to instantly understand:
- data flow
- transformation steps
- validation rules
- pipeline structure
๐ Real Example Breakdown
From the SQLite example:
Extract Layer
TRIP_DATAZONES
Transform Layer
MostPopularRoutes
Data Quality
Rule0001Rule0002
Relationships
TRIP_DATA โ MostPopularRoutes
ZONES โ MostPopularRoutes
TRIP_DATA โ Rule0001
TRIP_DATA โ Rule0002
๐ Why This Matters
This approach provides:
โ Instant Visualization
No need to manually draw diagrams.
โ Always Up-to-Date
The diagram is generated directly from the model.
โ Debugging Power
Quickly identify:
- missing dependencies
- circular flows
- unused nodes
โ Documentation for Free
Your ETLX model becomes:
- execution logic
- documentation
- architecture diagram
๐ง Best Practices
Use explicit depends_on when:
- pipelines are complex
- dependencies are not obvious from queries
- you want full control
Rely on inference when:
- queries are simple
- naming is consistent
- rapid prototyping
Naming Tip
Keep names consistent:
TRIP_DATA
ZONES
CUSTOMERS
This improves dependency detection accuracy.
๐ฎ Future Possibilities
This graph structure can also power:
- pipeline execution planners
- dependency validation
- impact analysis
- lineage tracking
- visual editors
๐ Summary
Central Set automatically transforms ETLX models into visual workflows by:
- parsing model structure
- detecting dependencies (
depends_onor inferred) - generating nodes and edges
- rendering Mermaid diagrams
This makes ETLX:
- self-documenting
- visual by default
- easier to debug
- easier to understand
ETLX is not just a pipeline definition โ it is a living, visual data workflow system.
Last updated 17 2026, 15:33 -01 .