Skip to content

Experimenter

mllabs._experimenter.Experimenter

Executes and manages a Pipeline experiment on a single dataset.

Splits data using sp (outer) and optionally sp_v (inner), then runs Stage builds and Head experiments fold-by-fold.

Parameters:

Name Type Description Default
data

Input dataset (pandas DataFrame, polars DataFrame, or numpy array).

required
path str | Path

Directory for persisting experiment artifacts.

required
data_names list[str]

Column names override.

None
sp

Outer splitter (sklearn splitter API). Default ShuffleSplit(n_splits=1, random_state=1).

ShuffleSplit(n_splits=1, random_state=1)
sp_v

Inner splitter for nested cross-validation. None disables.

None
splitter_params dict

Maps splitter keyword args to column names in data, e.g. {'y': 'target'}.

None
title str

Human-readable experiment title.

None
data_key str

Identifier verified on :meth:load to prevent data mismatch.

None
cache_maxsize int

Stage output cache size in bytes. Default 4 GB.

4 * 1024 ** 3
logger

Logger instance. Default DefaultLogger(level=['info', 'progress']).

DefaultLogger(level=['info', 'progress'])

Attributes:

Name Type Description
pipeline Pipeline

The pipeline being experimented on.

node_objs dict

{node_name: StageObj | HeadObj}.

cache DataCache

Shared LRU cache.

collectors dict

Registered :class:~mllabs.collector.Collector instances.

trainers dict

Registered :class:~mllabs._trainer.Trainer instances.

status str

'open' or 'closed'.

create(data, path, data_names=None, sp=ShuffleSplit(n_splits=1, random_state=1), sp_v=None, splitter_params=None, title=None, data_key=None, cache_maxsize=4 * 1024 ** 3, logger=DefaultLogger(level=['info', 'progress']), aug_data=None) staticmethod

load(filepath, data, data_key=None, aug_data=None) staticmethod

Load a saved Experimenter from disk.

Parameters:

Name Type Description Default
filepath str | Path

Path to the experiment directory (contains __exp.pkl).

required
data

Dataset to attach. Must match the original data shape.

required
data_key str

If the saved experiment has a data_key, this must match.

None

Returns:

Name Type Description
Experimenter

Restored experimenter with all nodes, collectors, and

trainers reloaded.

Raises:

Type Description
ValueError

If data_key does not match the saved value.

set_grp(name, role=None, processor=None, edges=None, method=None, parent=None, adapter=None, params=None, desc=None, exist='diff')

set_node(name, grp, processor=None, edges=None, method=None, adapter=None, params=None, desc=None, exist='diff')

rename_grp(name_from, name_to)

remove_grp(name)

remove_node(name)

노드를 제거

Parameters:

Name Type Description Default
name

제거할 노드 이름

required

Raises:

Type Description
ValueError

노드가 존재하지 않거나, 자식 노드가 있는 경우

build(nodes=None, rebuild=False)

Build Stage nodes.

Parameters:

Name Type Description Default
nodes

Node query — None (all stages), list, or regex str.

None
rebuild bool

If True, rebuild already-built nodes.

False

exp(nodes=None)

Run Head nodes and invoke all matching Collectors.

Parameters:

Name Type Description Default
nodes

Node query — None (all heads), list, or regex str.

None

reset_nodes(nodes)

Reset nodes to init state.

Removes node objects, clears cache entries, and resets Collector and Trainer data for the affected nodes.

Parameters:

Name Type Description Default
nodes list[str]

Node names to reset.

required

show_error_nodes(nodes=None, traceback=False)

Print nodes in error state.

Parameters:

Name Type Description Default
nodes

Node query to filter. None checks all nodes.

None
traceback bool

Include full traceback in output.

False

finalize(nodes)

Release memory for built Head nodes (builtfinalized).

Disk artifacts are preserved so nodes can be reloaded.

Parameters:

Name Type Description Default
nodes

Node query for Head nodes to finalize.

required

reinitialize(nodes)

close_exp()

Finalize all built nodes and mark the experiment as closed.

Collector data is preserved. After this call, :attr:status is 'closed' and no further builds or experiments are permitted until :meth:reopen_exp is called.

reopen_exp()

Reopen a closed experiment and rebuild Stage nodes.

Clears all node objects, sets status back to 'open', then calls :meth:build.

add_collector(collector, exist='skip')

Register a Collector and immediately collect from built Head nodes.

Parameters:

Name Type Description Default
collector Collector

Collector instance to register.

required
exist str

'skip' (default) returns existing if already registered; 'error' raises.

'skip'

Returns:

Name Type Description
Collector

The registered collector.

get_collector(name)

remove_collector(name)

collect(collector, nodes=None, exist='skip')

Run a Collector ad-hoc over already-built Head nodes.

Parameters:

Name Type Description Default
collector Collector

Collector instance to run.

required
nodes

Node query — None (all heads), list, or regex str.

None
exist str

'skip' (default) skips nodes already collected.

'skip'

Returns:

Name Type Description
Collector

The same collector after collection.

add_trainer(name, data=None, splitter='same', splitter_params=None, exist='skip', aug_data=None)

Create and register a Trainer.

Parameters:

Name Type Description Default
name str

Trainer name.

required
data

Dataset for the Trainer. None → use Experimenter's data.

None
splitter

Splitter to use. 'same' reuses sp_v; pass a sklearn splitter object for a custom split strategy; None trains on the full dataset.

'same'
splitter_params dict

Column mappings for the splitter. Must be None when splitter='same'.

None
exist str

'skip' (default) returns existing if name already registered; 'error' raises.

'skip'

Returns:

Name Type Description
Trainer

The newly created (or existing) Trainer.

get_trainer(name)

remove_trainer(name)

get_node_output(node, idx, v=None)

get_node_train_output(node, idx, v=None)

get_node_valid_output(node, idx, v=None)

process_ext(data, node, idx)

get_n_splits()

get_n_splits_inner()

desc_status()

desc_spec()

desc_pipeline(max_depth=None, direction='TD')

desc_node(node_name, direction='TD', show_params=False)

mllabs._experimenter.DataCache

LRU cache for Stage node outputs, keyed by (node, type, fold_idx).

Capacity is measured in bytes using nbytes / memory_usage.

Parameters:

Name Type Description Default
maxsize int

Maximum cache capacity in bytes. Default 4 GB.

4 * 1024 ** 3

get_data(node, typ, idx)

put_data(node, typ, idx, data)

clear_nodes(nodes)