Experimenter¶
mllabs._experimenter.Experimenter
¶
Executes and manages a Pipeline experiment on a single dataset.
Splits data using sp (outer) and optionally sp_v (inner), then runs Stage builds and Head experiments fold-by-fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Input dataset (pandas DataFrame, polars DataFrame, or numpy array). |
required | |
path
|
str | Path
|
Directory for persisting experiment artifacts. |
required |
data_names
|
list[str]
|
Column names override. |
None
|
sp
|
Outer splitter (sklearn splitter API). Default
|
ShuffleSplit(n_splits=1, random_state=1)
|
|
sp_v
|
Inner splitter for nested cross-validation. |
None
|
|
splitter_params
|
dict
|
Maps splitter keyword args to column
names in data, e.g. |
None
|
title
|
str
|
Human-readable experiment title. |
None
|
data_key
|
str
|
Identifier verified on :meth: |
None
|
cache_maxsize
|
int
|
Stage output cache size in bytes. Default 4 GB. |
4 * 1024 ** 3
|
logger
|
Logger instance. Default |
DefaultLogger(level=['info', 'progress'])
|
Attributes:
| Name | Type | Description |
|---|---|---|
pipeline |
Pipeline
|
The pipeline being experimented on. |
node_objs |
dict
|
|
cache |
DataCache
|
Shared LRU cache. |
collectors |
dict
|
Registered :class: |
trainers |
dict
|
Registered :class: |
status |
str
|
|
create(data, path, data_names=None, sp=ShuffleSplit(n_splits=1, random_state=1), sp_v=None, splitter_params=None, title=None, data_key=None, cache_maxsize=4 * 1024 ** 3, logger=DefaultLogger(level=['info', 'progress']), aug_data=None)
staticmethod
¶
load(filepath, data, data_key=None, aug_data=None)
staticmethod
¶
Load a saved Experimenter from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str | Path
|
Path to the experiment directory
(contains |
required |
data
|
Dataset to attach. Must match the original data shape. |
required | |
data_key
|
str
|
If the saved experiment has a |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Experimenter |
Restored experimenter with all nodes, collectors, and |
|
|
trainers reloaded. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
set_grp(name, role=None, processor=None, edges=None, method=None, parent=None, adapter=None, params=None, desc=None, exist='diff')
¶
set_node(name, grp, processor=None, edges=None, method=None, adapter=None, params=None, desc=None, exist='diff')
¶
rename_grp(name_from, name_to)
¶
remove_grp(name)
¶
remove_node(name)
¶
노드를 제거
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
제거할 노드 이름 |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
노드가 존재하지 않거나, 자식 노드가 있는 경우 |
build(nodes=None, rebuild=False)
¶
Build Stage nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query — |
None
|
|
rebuild
|
bool
|
If |
False
|
exp(nodes=None)
¶
Run Head nodes and invoke all matching Collectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query — |
None
|
reset_nodes(nodes)
¶
Reset nodes to init state.
Removes node objects, clears cache entries, and resets Collector and Trainer data for the affected nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
list[str]
|
Node names to reset. |
required |
show_error_nodes(nodes=None, traceback=False)
¶
Print nodes in error state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query to filter. |
None
|
|
traceback
|
bool
|
Include full traceback in output. |
False
|
finalize(nodes)
¶
Release memory for built Head nodes (built → finalized).
Disk artifacts are preserved so nodes can be reloaded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Node query for Head nodes to finalize. |
required |
reinitialize(nodes)
¶
close_exp()
¶
Finalize all built nodes and mark the experiment as closed.
Collector data is preserved. After this call, :attr:status is
'closed' and no further builds or experiments are permitted until
:meth:reopen_exp is called.
reopen_exp()
¶
Reopen a closed experiment and rebuild Stage nodes.
Clears all node objects, sets status back to 'open', then calls
:meth:build.
add_collector(collector, exist='skip')
¶
Register a Collector and immediately collect from built Head nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collector
|
Collector
|
Collector instance to register. |
required |
exist
|
str
|
|
'skip'
|
Returns:
| Name | Type | Description |
|---|---|---|
Collector |
The registered collector. |
get_collector(name)
¶
remove_collector(name)
¶
collect(collector, nodes=None, exist='skip')
¶
Run a Collector ad-hoc over already-built Head nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collector
|
Collector
|
Collector instance to run. |
required |
nodes
|
Node query — |
None
|
|
exist
|
str
|
|
'skip'
|
Returns:
| Name | Type | Description |
|---|---|---|
Collector |
The same collector after collection. |
add_trainer(name, data=None, splitter='same', splitter_params=None, exist='skip', aug_data=None)
¶
Create and register a Trainer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Trainer name. |
required |
data
|
Dataset for the Trainer. |
None
|
|
splitter
|
Splitter to use. |
'same'
|
|
splitter_params
|
dict
|
Column mappings for the splitter. Must be
|
None
|
exist
|
str
|
|
'skip'
|
Returns:
| Name | Type | Description |
|---|---|---|
Trainer |
The newly created (or existing) Trainer. |
get_trainer(name)
¶
remove_trainer(name)
¶
get_node_output(node, idx, v=None)
¶
get_node_train_output(node, idx, v=None)
¶
get_node_valid_output(node, idx, v=None)
¶
process_ext(data, node, idx)
¶
get_n_splits()
¶
get_n_splits_inner()
¶
desc_status()
¶
desc_spec()
¶
desc_pipeline(max_depth=None, direction='TD')
¶
desc_node(node_name, direction='TD', show_params=False)
¶
mllabs._experimenter.DataCache
¶
LRU cache for Stage node outputs, keyed by (node, type, fold_idx).
Capacity is measured in bytes using nbytes / memory_usage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maxsize
|
int
|
Maximum cache capacity in bytes. Default 4 GB. |
4 * 1024 ** 3
|