Collectors¶
mllabs.collector.Collector
¶
Base class for data collectors attached to an Experimenter.
Subclasses override the lifecycle hooks _start, _collect,
_end_idx, and _end to capture data during :meth:~mllabs.Experimenter.exp.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name (unique within an Experimenter). |
required |
connector
|
Connector
|
Determines which Head nodes this collector attaches to. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
path |
Path | None
|
Set by Experimenter on registration. |
mllabs.collector.MetricCollector
¶
Bases: Collector
Computes a scalar metric against ground-truth y for each fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name. |
required |
connector
|
Connector
|
Node matching criteria. |
required |
output_var
|
Column selector for prediction output. |
required | |
metric_func
|
callable
|
|
required |
include_train
|
bool
|
If |
False
|
get_metric(node)
¶
Return per-fold metrics for a single node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.Series: Metrics indexed by |
get_metrics(nodes=None)
¶
Return per-fold metrics for multiple nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query — |
None
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Rows are nodes, columns are fold MultiIndex. |
get_metrics_agg(nodes=None, inner_fold=True, outer_fold=True, include_std=False)
¶
Return aggregated metrics across folds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query. |
None
|
|
inner_fold
|
bool
|
Aggregate inner folds first (mean). Required when
|
True
|
outer_fold
|
bool
|
Aggregate outer folds after inner aggregation. |
True
|
include_std
|
bool
|
Also return a std DataFrame. |
False
|
Returns:
| Type | Description |
|---|---|
|
tuple[pd.DataFrame, pd.DataFrame | None]: |
|
|
is |
|
|
returns the raw DataFrame directly. |
mllabs.collector.StackingCollector
¶
Bases: Collector
Collects out-of-fold (OOF) predictions for stacking.
Predictions are aggregated across inner folds and saved per outer fold, then assembled into a dataset aligned to the original data index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name. |
required |
connector
|
Connector
|
Node matching criteria. The |
required |
output_var
|
Column selector for the Head output. |
required | |
experimenter
|
Experimenter
|
Used to build the OOF index and target. |
required |
method
|
str
|
Inner-fold aggregation — |
'mean'
|
get_dataset(nodes=None, include_target=True)
¶
Return OOF predictions as a DataFrame aligned to the original index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query. |
None
|
|
include_target
|
bool
|
Append the target column(s) if available. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
OOF prediction columns (+ target) indexed to match |
|
|
the original dataset. |
mllabs.collector.ModelAttrCollector
¶
Bases: Collector
Collects model attributes (e.g. feature importances) for each fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name. |
required |
connector
|
Connector
|
Node matching criteria. Used to infer the
adapter from |
required |
result_key
|
str
|
Key in the adapter's |
required |
adapter
|
ModelAdapter
|
Explicit adapter. Auto-inferred
from |
None
|
params
|
dict
|
Extra keyword arguments forwarded to the result extractor function. |
None
|
get_attr(node, idx=None)
¶
get_attrs(nodes=None)
¶
get_attrs_agg(node, agg_inner=True, agg_outer=True)
¶
Return aggregated model attributes across folds.
Only valid for mergeable result types (result_objs[key][1] == True).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
agg_inner
|
bool
|
Average inner folds. Required when |
True
|
agg_outer
|
bool
|
Average outer folds after inner aggregation.
Returns a |
True
|
Returns:
| Type | Description |
|---|---|
|
pd.Series | pd.DataFrame: Aggregated result. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the result type is not mergeable or
|
mllabs.collector.SHAPCollector
¶
Bases: Collector
Computes SHAP values and feature importance for each fold.
Applies an optional data_filter to subsample rows before computing
SHAP values. Supports tree-based multiclass models (3-D SHAP arrays are
averaged over the class axis).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name. |
required |
connector
|
Connector
|
Node matching criteria. |
required |
explainer_cls
|
SHAP explainer class. Default |
None
|
|
data_filter
|
DataFilter
|
Applied to train and valid data before calling the explainer. |
None
|
get_feature_importance(node, idx)
¶
Return per-inner-fold feature importance for one outer fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
idx
|
int
|
Outer fold index. |
required |
Returns:
| Type | Description |
|---|---|
|
list[pd.Series]: One Series per inner fold (mean absolute SHAP |
|
|
values over samples). |
get_feature_importance_agg(node, agg_inner='mean', agg_outer='mean')
¶
Return aggregated feature importance across all folds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
agg_inner
|
str | None
|
Aggregation function name for inner folds
(passed to |
'mean'
|
agg_outer
|
str | None
|
Aggregation function name for outer folds.
|
'mean'
|
Returns:
| Type | Description |
|---|---|
|
pd.Series | pd.DataFrame: When both agg_inner and agg_outer are |
|
|
set, returns a |
|
|
a DataFrame. When agg_inner is |
mllabs.collector.OutputCollector
¶
Bases: Collector
Saves raw train/valid outputs to disk for each fold.
Data is stored as {path}/{node}/{idx}_{inner_idx}.pkl.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name. |
required |
connector
|
Connector
|
Node matching criteria. |
required |
output_var
|
Column selector applied to Head output. |
required | |
include_target
|
bool
|
Whether to save target alongside output. |
True
|
get_output(node, idx, inner_idx)
¶
Load saved output for a specific fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
idx
|
int
|
Outer fold index. |
required |
inner_idx
|
int
|
Inner fold index. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
``{'output_train': (train_arr, valid_sub_arr), |
|
|
'output_valid': arr, 'columns': [...]}``. |
get_outputs(node)
¶
Load all saved outputs for a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
Returns:
| Type | Description |
|---|---|
|
dict[tuple[int, int], dict]: |
|
|
saved folds. |
mllabs.collector.ProcessCollector
¶
Bases: Collector
Collects predictions on external (test) data for each matched node.
For each matched head node, passes the external data through upstream
stage processors (via :meth:~mllabs.Experimenter.process_ext) and
calls the fitted processor to produce predictions. Inner-fold predictions
are aggregated per outer fold; outer-fold predictions are aggregated on
query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Collector name. |
required |
connector
|
Connector
|
Node matching criteria. |
required |
ext_data
|
External dataset (pandas/polars/numpy) to predict on. |
required | |
experimenter
|
Experimenter
|
Used to run upstream stage transforms
via |
required |
output_var
|
Column selector applied to processor output. |
None
|
|
method
|
str
|
Inner-fold aggregation — |
'mean'
|
get_output(nodes=None, agg='mean')
¶
Return aggregated test predictions, optionally for multiple nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query — |
None
|
|
agg
|
str
|
Outer-fold aggregation — |
'mean'
|
Returns:
| Type | Description |
|---|---|
|
Native DataFrame with columns from all matched nodes concatenated. |