Skip to content

Architecture

ml-labs is composed of four core modules, each with a distinct responsibility.

Pipeline ──────── defines the node graph (structure)
Experimenter ──── executes builds and experiments (single dataset)
Trainer ────────── trains with cross-validation splits
Inferencer ─────── applies trained processors to new data

Pipeline

Pipeline is a directed graph of nodes that describes the ML workflow structure — which processors exist, how they connect, and what parameters they use. It holds no data and performs no computation. It is the blueprint that Experimenter and Trainer read.

Experimenter

Experimenter takes a Pipeline and a dataset, then executes the graph node by node. It manages:

  • Build (build()): runs Stage nodes (transformers)
  • Experiment (exp()): runs Head nodes (predictors)
  • Collectors: pluggable objects that capture metrics, outputs, SHAP values, or stacking data during execution
  • Cache: LRU cache (capacity-based) to avoid recomputing Stage outputs

Trainer

Trainer handles cross-validation. It splits data using a splitter, then runs each node across all splits. Stage outputs are kept in memory; Head outputs are written to disk per split. The result can be converted to an Inferencer via to_inferencer().

Inferencer

Inferencer holds the fitted processors produced by Trainer. Given new data, it runs each split's processors and aggregates the results (mean, mode, or a custom callable).