Skip to content

ml-labs

Concepts

sunkusun9/ml-labs

Concepts¶

This section explains the core ideas behind ml-labs — how it models ML workflows, manages state, and moves data through a pipeline.

Architecture ¶

ml-labs is built around four cooperating modules: Pipeline, Experimenter, Trainer, and Inferencer. Each has a single, well-defined responsibility. Understanding how they relate to each other is the starting point for everything else.

Pipeline ¶

A Pipeline is a directed node graph that describes what to run and how nodes connect — not when or with what data. Nodes are organised into groups for shared configuration, and edges specify which upstream outputs feed into each node.

State Model ¶

Every node tracks its own lifecycle: init → built → finalized (or error). At the Experimenter level, a session is either open or closed. Knowing the state model helps you understand when results are available, when resources are held, and how to recover from errors.

Data Flow ¶

Data travels from DataSource through Stage nodes to Head nodes, assembled at each step according to the node's edges. This page explains the data_dict structure that processors receive, how the LRU cache works, and how X-less nodes (e.g. LabelEncoder) are handled.