Skip to content

Collectors

mllabs.collector.Collector

Base class for data collectors attached to an Experimenter.

Subclasses override the lifecycle hooks _start, _collect, _end_idx, and _end to capture data during :meth:~mllabs.Experimenter.exp.

Parameters:

Name Type Description Default
name str

Collector name (unique within an Experimenter).

required
connector Connector

Determines which Head nodes this collector attaches to.

required

Attributes:

Name Type Description
path Path | None

Set by Experimenter on registration.

has(node)

has_node(node)

reset_nodes(nodes)

save()

load(path) classmethod

mllabs.collector.MetricCollector

Bases: Collector

Computes a scalar metric against ground-truth y for each fold.

Parameters:

Name Type Description Default
name str

Collector name.

required
connector Connector

Node matching criteria.

required
output_var

Column selector for prediction output. None uses all output columns.

required
metric_func callable

func(y_true, y_pred) -> float.

required
include_train bool

If True, also compute on train/inner-valid folds.

False

get_metric(node)

Return per-fold metrics for a single node.

Parameters:

Name Type Description Default
node str

Node name.

required

Returns:

Type Description

pd.Series: Metrics indexed by (split, inner_split, metric_key).

get_metrics(nodes=None)

Return per-fold metrics for multiple nodes.

Parameters:

Name Type Description Default
nodes

Node query — None (all), list, or regex str.

None

Returns:

Type Description

pd.DataFrame: Rows are nodes, columns are fold MultiIndex.

get_metrics_agg(nodes=None, inner_fold=True, outer_fold=True, include_std=False)

Return aggregated metrics across folds.

Parameters:

Name Type Description Default
nodes

Node query. None uses all collected nodes.

None
inner_fold bool

Aggregate inner folds first (mean). Required when outer_fold=True.

True
outer_fold bool

Aggregate outer folds after inner aggregation.

True
include_std bool

Also return a std DataFrame.

False

Returns:

Type Description

tuple[pd.DataFrame, pd.DataFrame | None]: (mean, std) where std

is None unless include_std=True. When inner_fold=False

returns the raw DataFrame directly.

mllabs.collector.StackingCollector

Bases: Collector

Collects out-of-fold (OOF) predictions for stacking.

Predictions are aggregated across inner folds and saved per outer fold, then assembled into a dataset aligned to the original data index.

Parameters:

Name Type Description Default
name str

Collector name.

required
connector Connector

Node matching criteria. The edges 'y' entry is used to extract the target column.

required
output_var

Column selector for the Head output.

required
experimenter Experimenter

Used to build the OOF index and target.

required
method str

Inner-fold aggregation — 'mean' (default), 'mode', or 'simple' (concatenate).

'mean'

get_dataset(nodes=None, include_target=True)

Return OOF predictions as a DataFrame aligned to the original index.

Parameters:

Name Type Description Default
nodes

Node query. None returns all collected nodes.

None
include_target bool

Append the target column(s) if available.

True

Returns:

Name Type Description
DataFrame

OOF prediction columns (+ target) indexed to match

the original dataset.

mllabs.collector.ModelAttrCollector

Bases: Collector

Collects model attributes (e.g. feature importances) for each fold.

Parameters:

Name Type Description Default
name str

Collector name.

required
connector Connector

Node matching criteria. Used to infer the adapter from connector.processor when adapter is None.

required
result_key str

Key in the adapter's result_objs dict (e.g. 'feature_importances').

required
adapter ModelAdapter

Explicit adapter. Auto-inferred from connector.processor if omitted.

None
params dict

Extra keyword arguments forwarded to the result extractor function.

None

get_attr(node, idx=None)

get_attrs(nodes=None)

get_attrs_agg(node, agg_inner=True, agg_outer=True)

Return aggregated model attributes across folds.

Only valid for mergeable result types (result_objs[key][1] == True).

Parameters:

Name Type Description Default
node str

Node name.

required
agg_inner bool

Average inner folds. Required when agg_outer=True.

True
agg_outer bool

Average outer folds after inner aggregation. Returns a pd.Series when both are True.

True

Returns:

Type Description

pd.Series | pd.DataFrame: Aggregated result.

Raises:

Type Description
ValueError

If the result type is not mergeable or agg_outer=True while agg_inner=False.

mllabs.collector.SHAPCollector

Bases: Collector

Computes SHAP values and feature importance for each fold.

Applies an optional data_filter to subsample rows before computing SHAP values. Supports tree-based multiclass models (3-D SHAP arrays are averaged over the class axis).

Parameters:

Name Type Description Default
name str

Collector name.

required
connector Connector

Node matching criteria.

required
explainer_cls

SHAP explainer class. Default shap.TreeExplainer.

None
data_filter DataFilter

Applied to train and valid data before calling the explainer.

None

get_feature_importance(node, idx)

Return per-inner-fold feature importance for one outer fold.

Parameters:

Name Type Description Default
node str

Node name.

required
idx int

Outer fold index.

required

Returns:

Type Description

list[pd.Series]: One Series per inner fold (mean absolute SHAP

values over samples).

get_feature_importance_agg(node, agg_inner='mean', agg_outer='mean')

Return aggregated feature importance across all folds.

Parameters:

Name Type Description Default
node str

Node name.

required
agg_inner str | None

Aggregation function name for inner folds (passed to pd.DataFrame.agg). None keeps inner fold axis as a MultiIndex level.

'mean'
agg_outer str | None

Aggregation function name for outer folds. None returns a DataFrame with one column per outer fold.

'mean'

Returns:

Type Description

pd.Series | pd.DataFrame: When both agg_inner and agg_outer are

set, returns a pd.Series. When agg_outer is None, returns

a DataFrame. When agg_inner is None, returns a MultiIndex DataFrame.

mllabs.collector.OutputCollector

Bases: Collector

Saves raw train/valid outputs to disk for each fold.

Data is stored as {path}/{node}/{idx}_{inner_idx}.pkl.

Parameters:

Name Type Description Default
name str

Collector name.

required
connector Connector

Node matching criteria.

required
output_var

Column selector applied to Head output.

required
include_target bool

Whether to save target alongside output.

True

get_output(node, idx, inner_idx)

Load saved output for a specific fold.

Parameters:

Name Type Description Default
node str

Node name.

required
idx int

Outer fold index.

required
inner_idx int

Inner fold index.

required

Returns:

Name Type Description
dict

``{'output_train': (train_arr, valid_sub_arr),

'output_valid': arr, 'columns': [...]}``.

get_outputs(node)

Load all saved outputs for a node.

Parameters:

Name Type Description Default
node str

Node name.

required

Returns:

Type Description

dict[tuple[int, int], dict]: {(idx, inner_idx): entry} for all

saved folds.

mllabs.collector.ProcessCollector

Bases: Collector

Collects predictions on external (test) data for each matched node.

For each matched head node, passes the external data through upstream stage processors (via :meth:~mllabs.Experimenter.process_ext) and calls the fitted processor to produce predictions. Inner-fold predictions are aggregated per outer fold; outer-fold predictions are aggregated on query.

Parameters:

Name Type Description Default
name str

Collector name.

required
connector Connector

Node matching criteria.

required
ext_data

External dataset (pandas/polars/numpy) to predict on.

required
experimenter Experimenter

Used to run upstream stage transforms via process_ext.

required
output_var

Column selector applied to processor output.

None
method str

Inner-fold aggregation — 'mean' (default), 'mode', or 'simple'.

'mean'

get_output(nodes=None, agg='mean')

Return aggregated test predictions, optionally for multiple nodes.

Parameters:

Name Type Description Default
nodes

Node query — None (all), list, or regex str.

None
agg str

Outer-fold aggregation — 'mean' (default), 'mode', or 'simple'.

'mean'

Returns:

Type Description

Native DataFrame with columns from all matched nodes concatenated.