emloop.hooks

Module with official emloop hooks.

Tip

Hooks listed here may be configured without specifying their fully qualified names. E.g.:

hooks:
  - SaveBest

Classes

  • AbstractHook: emloop hook interface.
  • AccumulateVariables: Accumulate the specified variables allowing their aggregation after each epoch.
  • WriteCSV: Log epoch_data variables to a CSV file after each epoch.
  • StopAfter: Stop the training after any of the specified conditions is met.
  • LogVariables: Log the training results to stderr via standard logging module.
  • LogProfile: Summarize and log epoch profile via standard logging.
  • LogDir: Log the output dir before training, after each epoch and after training.
  • SaveEvery: Save the model every n_epochs epoch.
  • SaveBest: Maintain the best performing model given the specified criteria.
  • SaveLatest: Save the latest model.
  • ComputeStats: Accumulate the specified variables, compute the specified aggregation values and save them to the epoch data.
  • Check: Terminate training if the given stream variable exceeds the threshold in at most specified number of epochs.
  • ShowProgress: Show stream progresses and ETA in the current epoch.
  • EveryNEpoch: This hook should be used as base hook in the case when some action need to be processed every n epoch.
  • OnPlateau: Base hook for hooks taking actions when certain variable reaches its plateau.
  • StopOnPlateau: Terminate the training when the observed variable reaches its plateau.
  • StopOnNaN: Stop the training when any of the specified variables contain NaN.
  • SaveConfusionMatrix: After each epoch, compute and save/store confusion matrix figure for the predicted and expected labels.
  • Flatten: Flatten a stream variable.
  • PlotLines: Plot sequences of numbers using matplotlib.
  • LogitsToCsv: Save a stream of logits to a csv file.
  • SequenceToCsv: Save a stream of sequences to a csv file.
  • SaveFile: Save files to the output dir before training.
  • Benchmark: Log mean and median example times via standard logging.
  • ClassificationMetrics: Accumulate the specified prediction and gt variables and compute their classification statistics after each epoch.
class emloop.hooks.AbstractHook(**kwargs)

Bases: object

emloop hook interface.

Hook lifecycle (event -> method invocation):

  1. emloop constructs the hooks -> __init__()
  2. emloop enters the main loop -> before_training()
    1. emloop starts an epoch
    2. emloop computes a batch -> after_batch()
    3. emloop finishes the epoch -> after_epoch() and after_epoch_profile()
  3. emloop terminates the main loop -> after_training()

Caution

Hook naming conventions:

  • hook names should describe hook actions with verb stems. E.g.: LogProfile or SaveBest
  • hook names should not include Hook suffix
Inheritance diagram of AbstractHook

__init__(**kwargs)[source]

Check and warn if there is any argument created by the user yet not recognized in the child hook __init__ method.

Parameters:kwargs**kwargs not recognized in the child hook
after_batch(stream_name, batch_data)[source]

After batch event.

This event is triggered after every processed batch regardless of stream type. Batch results are available in results argument.

Parameters:
  • stream_name (str) – name of the stream (usually train, valid or``test``)
  • batch_data (Mapping[str, Sequence[Any]]) – batch inputs and model outputs
Return type:

None

after_epoch(epoch_id, epoch_data)[source]

After epoch event.

This event is triggered after every epoch wherein all the streams were iterated. The epoch_data object is initially empty and shared among all the hooks.

Parameters:
  • epoch_id (int) – finished epoch id
  • epoch_data (Mapping[str, object]) – epoch data flowing through all hooks
Return type:

None

after_epoch_profile(epoch_id, profile, streams)[source]

After epoch profile event.

This event provides opportunity to process time profile of the finished epoch.

Parameters:
  • epoch_id (int) – finished epoch id
  • streams (List[str]) – streams which are in profile
Return type:

None

after_training()[source]

After training event.

This event is called after the training finished either naturally or thanks to an interrupt.

Note

This method is called exactly once during the training.

Return type:None
before_training()[source]

Before training event.

No data were processed at this moment.

Note

This method is called exactly once during the training.

Return type:None
register_mainloop(main_loop)[source]

Pass emloop.MainLoop to hook. Raise ValueError if MainLoop was already passed before.

Parameters:main_loop (emloop.MainLoop) – emloop main loop for training
Raises:ValueError – if MainLoop was already passed before
Return type:None
class emloop.hooks.AccumulateVariables(variables, **kwargs)[source]

Bases: hooks.AbstractHook

Accumulate the specified variables allowing their aggregation after each epoch.

The hook itself does not utilize the accumulated variables. It is meant to be inherited from. The child hook will have the accumulated variables available in self._accumulator after each epoch.

The data are accumulated in a form of nested mapping stream_name -> variable_name -> Iterable``[``values].

Warning

This hook should not be used directly as it does nothing on its own.

Inheritance diagram of AccumulateVariables

__init__(variables, **kwargs)[source]

Create new AccumulateVariables hook.

Parameters:variables (Iterable[str]) – collection of variable names to be logged
_reset_accumulator()[source]

Set the accumulator to an empty double-index collections.defaultdict.

after_batch(stream_name, batch_data)[source]

Extend the accumulated variables with the given batch data.

Parameters:
  • stream_name (str) – stream name; e.g. train or any other…
  • batch_data (Mapping[str, Sequence[Any]]) – batch data = stream sources + model outputs
Raises:
  • KeyError – if the variables to be aggregated are missing
  • TypeError – if the variable value is not iterable (e.g. it is only a scalar)
after_epoch(**_)[source]

Reset the accumulator after each epoch.

class emloop.hooks.WriteCSV(output_dir, output_file='training.csv', delimiter=', ', default_value='', variables=None, on_unknown_type='default', on_missing_variable='error', **kwargs)[source]

Bases: hooks.AbstractHook

Log epoch_data variables to a CSV file after each epoch.

Log all the variables
hooks:
  - WriteCSV
Log only certain variables
hooks:
  - WriteCSV:
      variables: [loss, fscore]
Warn about unsupported variables
hooks:
  - WriteCSV:
      variables: [loss, fscore, xxx]
      on_unknown_type: warn
Inheritance diagram of WriteCSV

MISSING_VARIABLE_ACTIONS = ['error', 'warn', 'default']

Action executed on missing variable.

UNKNOWN_TYPE_ACTIONS = ['error', 'warn', 'default']

Action executed on unknown type detection.

__init__(output_dir, output_file='training.csv', delimiter=', ', default_value='', variables=None, on_unknown_type='default', on_missing_variable='error', **kwargs)[source]
Parameters:
  • output_dir (str) – directory to save the output CSV
  • output_file (str) – name of the output CSV file
  • delimiter (str) – CSV delimiter
  • default_value (str) – default value to supplement missing variables
  • variables (Optional[Iterable[str]]) – subset of variable names to be written (all the variables are written by default)
  • on_unknown_type (str) – an action to be taken if the variable value type is not supported (e.g. a list)
  • on_missing_variable (str) – an action to be taken if the variable is specified but not provided
_write_header(epoch_data)[source]

Write CSV header row with column names.

Column names are inferred from the epoch_data and self.variables (if specified). Variables and streams expected later on are stored in self._variables and self._streams respectively.

Parameters:epoch_data (Mapping[str, object]) – epoch data to be logged
Return type:None
_write_row(epoch_id, epoch_data)[source]

Write a single epoch result row to the CSV file.

Parameters:
  • epoch_id (int) – epoch number (will be written at the first column)
  • epoch_data (Mapping[str, object]) – epoch data
Raises:
  • KeyError – if the variable is missing and self._on_missing_variable is set to error
  • TypeError – if the variable has wrong type and self._on_unknown_type is set to error
Return type:

None

after_epoch(epoch_id, epoch_data)[source]

Write a new row to the CSV file with the given epoch data.

In the case of first invocation, create the CSV header.

Parameters:
  • epoch_id (int) – number of the epoch
  • epoch_data (Mapping[str, object]) – epoch data to be logged
Return type:

None

class emloop.hooks.StopAfter(epochs=None, iterations=None, minutes=None, train_stream_name='train', **kwargs)[source]

Bases: hooks.AbstractHook

Stop the training after any of the specified conditions is met.

:caption: stop the training after 500 epochs
hooks:
  - StopAfter:
      epochs: 500
:caption: stop the training after 1000 iterations of 1 hour whichever comes first
hooks:
  - StopAfter:
      minutes: 60
      iterations: 1000
Inheritance diagram of StopAfter

__init__(epochs=None, iterations=None, minutes=None, train_stream_name='train', **kwargs)[source]

Create new StopAfter hook.

Possible stopping conditions are:

  • after the specified number of epochs
  • after the specified number of iterations (only train stream batches are counted as iterations)
  • after the model is trained for more than the specified number of minutes (and after_batch, after_epoch event is triggered)
Parameters:
  • epochs (Optional[int]) – stop after the specified number of epochs
  • iterations (Optional[int]) – stop after the specified number of iterations
  • minutes (Optional[float]) – stop after the specified number minutes
Raises:

ValueError – if no stopping condition is specified

_check_train_time()[source]

Stop the training if the training time exceeded self._minutes.

Raises:TrainingTerminated – if the training time exceeded self._minutes
Return type:None
after_batch(stream_name, batch_data)[source]

If stream_name equals to emloop.constants.TRAIN_STREAM, increase the iterations counter and possibly stop the training; additionally, call _check_train_time().

Parameters:
Raises:

TrainingTerminated – if the number of iterations reaches self._iters

Return type:

None

after_epoch(epoch_id, epoch_data)[source]

Stop the training if the epoch_id reaches self._epochs; additionally, call _check_train_time().

Parameters:
Raises:

TrainingTerminated – if the epoch_id reaches self._epochs

Return type:

None

before_training()[source]

Start measuring the train time.

class emloop.hooks.LogVariables(variables=None, on_unknown_type='ignore', **kwargs)[source]

Bases: hooks.AbstractHook

Log the training results to stderr via standard logging module.

log all the variables
hooks:
  - LogVariables
log only certain variables
hooks:
  - LogVariables:
      variables: [loss]
warn about unsupported variables
hooks:
  - LogVariables:
      on_unknown_type: warn
Inheritance diagram of LogVariables

UNKNOWN_TYPE_ACTIONS = ['error', 'warn', 'str', 'ignore']

Posible actions to take on unknown variable type.

__init__(variables=None, on_unknown_type='ignore', **kwargs)[source]

Create new LogVariables hook.

Parameters:
  • variables (Optional[Iterable[str]]) – variable names to be logged; log all the variables by default
  • on_unknown_type – an action to be taken if the variable type is not supported (e.g. a list)
_log_variables(epoch_data)[source]

Log variables from the epoch data.

Warning

At the moment, only scalars and dicts of scalars are properly formatted and logged. Other value types are ignored by default.

One may set on_unknown_type to str in order to log all the variables anyways.

Parameters:

epoch_data (Mapping[str, object]) – epoch data to be logged

Raises:
  • KeyError – if the specified variable is not found in the stream
  • TypeError – if the variable value is of unsupported type and self._on_unknown_type is set to error
after_epoch(epoch_id, epoch_data)[source]

Log the epoch data via logging API. Additionally, a blank line is printed directly to stderr to delimit the outputs from other epochs.

Parameters:
  • epoch_id (int) – number of processed epoch
  • epoch_data (Mapping[str, object]) – epoch data to be logged
Return type:

None

class emloop.hooks.LogProfile(**kwargs)[source]

Bases: hooks.AbstractHook

Summarize and log epoch profile via standard logging.

Epoch profile contains info about time spent training, reading data etc. For full reference, see emloop.MainLoop.

log the time profile after each epoch
hooks:
  - LogProfile
Inheritance diagram of LogProfile

after_epoch_profile(epoch_id, profile, streams)[source]

Summarize and log the given epoch profile.

The profile is expected to contain at least:
  • read_data_train, eval_batch_train and after_batch_hooks_train entries produced by the train stream (if train stream name is train)
  • after_epoch_hooks entry
Parameters:
  • profile (Mapping[str, List[float]]) – epoch timings profile
  • streams (List[str]) – streams for which profiling times will be printed
Return type:

None

class emloop.hooks.LogDir(output_dir, **kwargs)[source]

Bases: hooks.AbstractHook

Log the output dir before training, after each epoch and after training.

log the training dir
hooks:
  - LogDir
Inheritance diagram of LogDir

__init__(output_dir, **kwargs)[source]

Create new LogDir hook.

Parameters:output_dir (str) – training output directory
after_epoch(**_)[source]

Log the output directory.

Return type:None
after_training()[source]

Log the output directory.

Return type:None
before_training()[source]

Log the output directory.

Return type:None
class emloop.hooks.SaveEvery(model, on_failure='error', **kwargs)[source]

Bases: emloop.hooks.every_n_epoch.EveryNEpoch

Save the model every n_epochs epoch.

save every 10th epoch
hooks:
  - SaveEvery:
      n_epochs: 10
save every epoch and only warn on failure
hooks:
  - SaveEvery:
      on_failure: warn
Inheritance diagram of SaveEvery

SAVE_FAILURE_ACTIONS = ['error', 'warn', 'ignore']

Action to be executed when model save fails.

__init__(model, on_failure='error', **kwargs)[source]
Parameters:
_after_n_epoch(epoch_id, **_)[source]

Save the model every n_epochs epoch.

Parameters:epoch_id (int) – number of the processed epoch
Return type:None
static save_model(model, name_suffix, on_failure)[source]

Save the given model with the given name_suffix. On failure, take the specified action.

Parameters:
Raises:

IOError – on save failure with on_failure set to error

Return type:

None

class emloop.hooks.SaveBest(model, model_name='best', variable='loss', condition='min', stream='valid', aggregation='mean', on_save_failure='error', **kwargs)[source]

Bases: hooks.AbstractHook

Maintain the best performing model given the specified criteria.

save model with minimal valid loss
hooks:
  - SaveBest
save model with max accuracy
hooks:
  - SaveBest:
      variable: accuracy
      condition: max
Inheritance diagram of SaveBest

OBJECTIVES = {'max', 'min'}

Possible objectives for the monitor variable.

__init__(model, model_name='best', variable='loss', condition='min', stream='valid', aggregation='mean', on_save_failure='error', **kwargs)[source]

Example: metric=loss, condition=min -> saved the model when the loss is best so far (on stream).

Parameters:
  • model (AbstractModel) – trained model
  • model_name (str) – name under which model will be saved
  • variable (str) – variable name to be monitored
  • condition (str) – performance objective; one of OBJECTIVES
  • stream (str) – stream name to be monitored
  • aggregation (str) – variable aggregation to be used (mean by default)
  • on_save_failure (str) – action to be taken when model fails to save itself, one of SaveEvery.SAVE_FAILURE_ACTIONS
_get_value(epoch_data)[source]

Retrieve the value of the monitored variable from the given epoch data.

Parameters:

epoch_data (Mapping[str, object]) – epoch data which determine whether the model will be saved or not

Raises:
  • KeyError – if any of the specified stream, variable or aggregation is not present in the epoch_data
  • TypeError – if the variable value is not a dict when aggregation is specified
  • ValueError – if the variable value is not a scalar
Return type:

float

_is_value_better(new_value)[source]

Test if the new value is better than the best so far.

Parameters:new_value (float) – current value of the objective function
Return type:bool
after_epoch(epoch_data, **_)[source]

Save the model if the new value of the monitored variable is better than the best value so far.

Parameters:epoch_data (Mapping[str, object]) – epoch data to be processed
Return type:None
class emloop.hooks.SaveLatest(model, on_save_failure='error', **kwargs)[source]

Bases: hooks.AbstractHook

Save the latest model.

save the latest model
hooks:
  - SaveLatest
Inheritance diagram of SaveLatest

__init__(model, on_save_failure='error', **kwargs)[source]

Create new SaveLatest hook.

Parameters:
after_epoch(**_)[source]

Save/override the latest model after every epoch.

Return type:None
class emloop.hooks.ComputeStats(variables, **kwargs)[source]

Bases: emloop.hooks.accumulate_variables.AccumulateVariables

Accumulate the specified variables, compute the specified aggregation values and save them to the epoch data.

compute loss and accuracy means after each epoch
hooks:
- ComputeStats:
    variables: [loss, accuracy]
compute min and max loss after each epoch
hooks:
  - ComputeStats:
      variables:
        - loss : [min, max]
Inheritance diagram of ComputeStats

EXTRA_AGGREGATIONS = {'nancount', 'nanfraction'}

Extra aggregation methods extending the set of all NumPy functions.

__init__(variables, **kwargs)[source]

Create new stats hook.

Parameters:
  • variables – list of variables mapping: variable_name -> List [aggregations…] wherein aggregations are the names of arbitrary NumPy functions returning a scalar (e.g., 'mean', 'nanmean', 'max', etc.) or one of EXTRA_AGGREGATIONS. Passing just the variable name instead of a mapping is the same as passing {variable_name: [‘mean’]}.
  • kwargs – Ignored
Raises:

ValueError – if the specified aggregation function is not supported

static _compute_aggregation(aggregation, data)[source]

Compute the specified aggregation on the given data.

Parameters:
  • aggregation (str) – the name of an arbitrary NumPy function (e.g., mean, max, median, nanmean, …) or one of EXTRA_AGGREGATIONS.
  • data (Iterable[Any]) – data to be aggregated
Raises:

ValueError – if the specified aggregation is not supported or found in NumPy

static _raise_check_aggregation(aggregation)[source]

Check whether the given aggregation is present in NumPy or it is one of EXTRA_AGGREGATIONS.

Parameters:aggregation (str) – the aggregation name
Raises:ValueError – if the specified aggregation is not supported or found in NumPy
_save_stats(epoch_data)[source]

Extend epoch_data by stream:variable:aggreagation data.

Parameters:epoch_data (Mapping[str, object]) – data source from which the statistics are computed
Return type:None
after_epoch(epoch_data, **kwargs)[source]

Compute the specified aggregations and save them to the given epoch data.

Parameters:epoch_data (Mapping[str, object]) – epoch data to be processed
Return type:None
class emloop.hooks.Check(variable, required_min_value, max_epoch, stream='valid', **kwargs)[source]

Bases: hooks.AbstractHook

Terminate training if the given stream variable exceeds the threshold in at most specified number of epochs.

Raise ValueError if the threshold was not exceeded in given number of epochs

exceed 95% accuracy on valid (default) stream within at most 10 epochs
hooks:
  - Check:
      variable: accuracy
      required_min_value: 0.93
      max_epoch: 10
Inheritance diagram of Check

__init__(variable, required_min_value, max_epoch, stream='valid', **kwargs)[source]

Create new Check hook.

Parameters:
  • variable (str) – variable to be checked
  • required_min_value (float) – threshold to be exceeded
  • max_epoch (int) – maximum epochs to be run
  • stream (str) – stream to be checked
after_epoch(epoch_id, epoch_data)[source]

Check termination conditions.

Parameters:
  • epoch_id (int) – number of the processed epoch
  • epoch_data (Mapping[str, object]) – epoch data to be checked
Raises:
  • KeyError – if the stream of variable was not found in epoch_data
  • TypeError – if the monitored variable is not a scalar or scalar mean aggregation
  • ValueError – if the specified number of epochs exceeded
  • TrainingTerminated – if the monitor variable is above the required level
class emloop.hooks.ShowProgress(dataset, **kwargs)[source]

Bases: hooks.AbstractHook

Show stream progresses and ETA in the current epoch.

Tip

If the dataset provides num_batches property, the hook will be able to display the progress and ETA for the 1st epoch as well. The property should return a mapping of <stream name> -> <batch count>.

Caution

ShowProgress hook should be placed as the first in hooks config section, otherwise the progress bar may not be displayed correctly.

show progress of the current epoch
hooks:
  - ShowProgress
Inheritance diagram of ShowProgress

__init__(dataset, **kwargs)[source]

Create new ShowProgress hook.

Fetch the batch counts from dataset.num_batches property if available.

Parameters:dataset (AbstractDataset) – training dataset
after_batch(stream_name, batch_data)[source]

Display the progress and ETA for the current stream in the epoch. If the stream size (total batch count) is unknown (1st epoch), print only the number of processed batches.

Return type:None
after_epoch(**_)[source]

Reset progress counters. Save total_batch_count after the 1st epoch.

Return type:None
class emloop.hooks.EveryNEpoch(n_epochs=1, **kwargs)[source]

Bases: hooks.AbstractHook

This hook should be used as base hook in the case when some action need to be processed every n epoch. Call _after_n_epoch method every n_epochs epoch.

Inheritance diagram of EveryNEpoch

__init__(n_epochs=1, **kwargs)[source]

Create EveryNEpoch hook.

Parameters:n_epochs (int) – how often _after_n_epoch method is called
_after_n_epoch()[source]

Abstract method which is called every n_epochs epoch. This method must be overridden.

Raises:TypeError – if this method is not overridden
after_epoch(epoch_id, **kwargs)[source]

Call _after_n_epoch method every n_epochs epoch.

Parameters:epoch_id (int) – number of the processed epoch
Return type:None
class emloop.hooks.OnPlateau(long_term=50, short_term=10, stream='valid', variable='loss', objective='min', **kwargs)[source]

Bases: emloop.hooks.compute_stats.ComputeStats

Base hook for hooks taking actions when certain variable reaches its plateau. The variable is observed on epoch level and plateau is reached when its long_term mean is lower/greater than the short_term mean.

Call _on_plateau_action() method when the observed variable reaches its plateau.

Inheritance diagram of OnPlateau

OBJECTIVES = {'max', 'min'}

Possible objectives for the observed variable.

_AGGREGATION = 'mean'

Epoch aggregation method of the observed variable.

__init__(long_term=50, short_term=10, stream='valid', variable='loss', objective='min', **kwargs)[source]

Create new OnPlateau hook.

Parameters:
  • long_term (int) – count of last epochs representing long training period
  • short_term (int) – count of last epochs representing short training period
  • stream (str) – name of the processed stream
  • variable (str) – name of the observed variable
  • objective (str) – observed variable objective; one of OnPlateau.OBJECTIVES
  • kwargs – ignored
Raises:

AssertionError – if long_term < short_term

_on_plateau_action(**kwargs)[source]

Abstract method which is called when the observed variable reaches its plateau.

Return type:None
after_epoch(epoch_id, epoch_data)[source]

Call _on_plateau_action() if the long_term variable mean is lower/greater than the short_term mean.

Return type:None
class emloop.hooks.StopOnPlateau(long_term=50, short_term=10, stream='valid', variable='loss', objective='min', **kwargs)[source]

Bases: emloop.hooks.on_plateau.OnPlateau

Terminate the training when the observed variable reaches its plateau.

stop the training when the mean of last 100 valid loss values is smaller than the mean of last 30 loss values.
hooks:
  - StopOnPlateau:
      long_term: 100
      short_term: 30
stop the training when accuracy stops improving (raising)
hooks:
  - StopOnPlateau:
      variable: accuracy
      objective: max
Inheritance diagram of StopOnPlateau

_on_plateau_action(**kwargs)[source]

Terminate the training when the observed variable reaches its plateau.

Raises:TrainingTerminated – if the model stops improving
Return type:None
class emloop.hooks.StopOnNaN(variables=None, on_unknown_type='ignore', stop_on_inf=False, after_batch=False, after_epoch=True, **kwargs)[source]

Bases: hooks.AbstractHook

Stop the training when any of the specified variables contain NaN.

stop as soon as any variable contains NaN
hooks:
  - StopOnNaN
stop on NaN in loss variable
hooks:
  - StopOnNan:
      variables: [loss]
Inheritance diagram of StopOnNaN

UNKNOWN_TYPE_ACTIONS = ['error', 'warn', 'ignore']

Posible actions to take on unknown variable type.

__init__(variables=None, on_unknown_type='ignore', stop_on_inf=False, after_batch=False, after_epoch=True, **kwargs)[source]

Create new StopOnNaN hook.

Parameters:
  • variables (Optional[Iterable[str]]) – variable names to be checked; check all variables in epoch_data by default
  • on_unkown_type – option for handling unknown data types, possible options are 'warn', 'error' and default 'ignore'
  • stop_on_inf (bool) – if True consider infinity values as NaN, default is False
  • after_batch (bool) – check data after each batch? default is False
  • after_epoch (bool) – check data after each epoch? default is True
Raises:
_check_nan(epoch_data)[source]

Raise an exception when some of the monitored data is NaN.

Parameters:

epoch_data (Mapping[str, object]) – epoch data checked

Raises:
  • KeyError – if the specified variable is not found in the stream
  • ValueError – if the variable value is of unsupported type and self._on_unknown_type is set to error
Return type:

None

_is_nan(variable, data)[source]

Recursively search passed data and find NaNs.

Parameters:
  • variable (str) – name of variable to be checked
  • data – data object (dict, list, scalar)
Return type:

bool

Returns:

True if there is a NaN value in the data; False otherwise.

Raises:

ValueError – if the variable value is of unsupported type and on_unknown_type is set to error

after_batch(stream_name, batch_data)[source]

If initialized to check after each batch, stop the training once the batch data contains a monitored variable equal to NaN.

Parameters:
  • stream_name (str) – name of the stream to be checked
  • batch_data – batch data to be checked
Return type:

None

after_epoch(epoch_data, **kwargs)[source]

If initialized to check after each epoch, stop the training once the epoch data contains a monitored variable equal to NaN.

Parameters:epoch_data (Mapping[str, object]) – epoch data to be checked
Return type:None
class emloop.hooks.SaveConfusionMatrix(output_dir, dataset, labels_name='labels', predictions_name='predictions', classes_names=None, figsize=None, figure_action='save', num_classes_method_name='num_classes', classes_names_method_name='classes_names', mask_name=None, normalize=True, cmap='Blues', **kwargs)[source]

Bases: emloop.hooks.accumulate_variables.AccumulateVariables

After each epoch, compute and save/store confusion matrix figure for the predicted and expected labels.

Store confusion matrix figure to epoch data with green colorbar
hooks:
  - SaveConfusionMatrix:
      figure_action: store
      cmap: Greens
Defined classes’ names and save confusion matrix figure to training logdir with absolute values
hooks:
  - SaveConfusionMatrix:
      classes_names: [class_with_index_zero, class_with_index_one, class_with_index_three]
      normalize: False
Inheritance diagram of SaveConfusionMatrix

FIGURE_ACTIONS = ['save', 'store']

Possible actions to be taken with the plotted figure. It can be either saved to a file or stored in the epoch data.

__init__(output_dir, dataset, labels_name='labels', predictions_name='predictions', classes_names=None, figsize=None, figure_action='save', num_classes_method_name='num_classes', classes_names_method_name='classes_names', mask_name=None, normalize=True, cmap='Blues', **kwargs)[source]

Create new SaveConfusionMatrix hook.

Parameters:
  • output_dir (str) – output directory
  • dataset (BaseDataset) – dataset (needed to translate predictions to strings)
  • labels_name (str) – annotation variable name
  • predictions_name (str) – prediction variable name
  • classes_names (Optional[Sequence[str]]) – List of classes’ names
  • figsize (Optional[Tuple[int, int]]) – the size of the matplotlib figure
  • figure_action (str) – action to be taken with the plotted figure, one of FIGURE_ACTIONS
  • normalize (bool) – False for plotting absolute values in confusion matrix, True for relative
  • num_classes_method_name (str) – self._dataset method name to get number of classes
  • classes_names_method_name (str) – self._dataset method name to get classes’ names Parameter is ignored when classes_names is provided
  • mask_name (Optional[str]) – the variable masking valid records (1 = valid, 0 = invalid)
  • cmap (str) – type of colorbar # http://matplotlib.org/examples/color/colormaps_reference.html
Raises:

ValueError – if the figure_action is not in FIGURE_ACTIONS

after_epoch(epoch_id, epoch_data)[source]

Reset the accumulator after each epoch.

Return type:None
class emloop.hooks.Flatten(variables, streams=None, **kwargs)[source]

Bases: hooks.AbstractHook

Flatten a stream variable.

Example: Flatten xs variable in test stream and save the result into variable xs_flat to be able to feed it into SaveConfusionMatrix hook.
hooks:
  - Flatten:
      variables: {xs: xs_flat}
      streams: [test]
  - SaveConfusionMatrix:
      variables: [xs_flat]
      streams: [test]
Inheritance diagram of Flatten

__init__(variables, streams=None, **kwargs)[source]

Hook constructor.

Parameters:
  • variables (Mapping[str, str]) – names of the variables to be flattened
  • streams (Optional[Iterable[str]]) – list of stream names to be considered; if None, the hook will be applied to all the available streams
after_batch(stream_name, batch_data)[source]

Flatten given variables.

Return type:None
class emloop.hooks.PlotLines(output_dir, variables, streams=None, id_variable='ids', pad_mask_variable=None, out_format='png', ymin=None, ymax=None, example_count=None, batch_count=None, root_dir='visual', **kwargs)[source]

Bases: hooks.AbstractHook

Plot sequences of numbers using matplotlib.

Plot xs variable for each example in test and valid streams.
hooks:
  - PlotLines:
      variables: [xs]
      streams: [test, valid]
Plot xs and ys variables only for the first two examples from the first ten batches (from the train stream).
hooks:
  - PlotLines:
      variables: [xs, ys]
      example_count: 2
      batch_count: 10
Inheritance diagram of PlotLines

__init__(output_dir, variables, streams=None, id_variable='ids', pad_mask_variable=None, out_format='png', ymin=None, ymax=None, example_count=None, batch_count=None, root_dir='visual', **kwargs)[source]

Hook constructor.

Parameters:
  • output_dir (str) – output directory where plots will be saved
  • variables (Iterable[str]) – names of the variables to be plotted
  • streams (Optional[Iterable[str]]) – list of stream names to be dumped; can be None to dump all streams
  • id_variable (str) – name of the source which represents a unique example id
  • pad_mask_variable (Optional[str]) – name of the source which represents the padding mask
  • out_format (str) – extension of the saved image
  • ymin (Optional[float]) – minimum on the Y axis
  • ymax (Optional[float]) – maximum on the Y axis
  • example_count (Optional[int]) – count of examples which will be plotted from each batch (first example_count examples will be plotted)
  • batch_count (Optional[int]) – count of batches from which the plot will be saved (first batch_count will be processed)
  • root_dir (str) – default directory where the plots will be saved
_reset()[source]

Reset _batch_count to initial value.

Return type:None
after_batch(stream_name, batch_data)[source]

Save images in provided streams from selected variable. The amount of batches and images to be processed is possible to control by batch_count and example_count parameters.

after_epoch(epoch_id, **_)[source]

Set _current_epoch_id which is used for distinguish between epoch directories. Call the _reset function.

figure_suffix

The suffix of the saved figure, used to distinguish between images from different hooks.

Return type:str
plot_figure(idx, batch_data)[source]

Plot the selected variables to a new figure.

Return type:Figure
class emloop.hooks.LogitsToCsv(variable, class_names, id_variable, output_file, streams=None, **kwargs)[source]

Bases: hooks.AbstractHook

Save a stream of logits to a csv file.

In the generated file, there are |class_names| + 1 columns for each example. The one extra column is for the id of the example. The class names are used as headers for the corresponding columns and the id column is named by the corresponding stream source.

Save a csv with columns red, green, and blue to /tmp/colors.csv. The stream variable color is expected to be a sequence of three numbers.
hooks:
  - LogitsToCsv:
      variable: color
      class_names: [red, green, blue]
      id_variable: picture_id
      output_file: /tmp/colors.csv
Inheritance diagram of LogitsToCsv

__init__(variable, class_names, id_variable, output_file, streams=None, **kwargs)[source]
Parameters:
  • variable (str) – name of the source with a sequence for each example
  • class_names (Iterable[str]) – the names of the individual classes; should correspond to the size of the variable source
  • id_variable (str) – name of the source which represents a unique example id
  • output_file (str) – the desired name of the output csv file
  • streams (Optional[Iterable[str]]) – names of the streams to be considered; leave None to consider all streams
after_batch(stream_name, batch_data)[source]

Accumulate the given logits.

Return type:None
after_epoch(epoch_id, **_)[source]

Save all the accumulated data to csv.

Return type:None
class emloop.hooks.SequenceToCsv(variables, id_variable, output_file, pad_mask_variable=None, streams=None, **kwargs)[source]

Bases: hooks.AbstractHook

Save a stream of sequences to a csv file.

In this file, there the following columns: <id_source>, index and <source_name…>, where <id_source> is the name of the id column and <source_name…> are the names of the stream columns to be dumped.

Save a csv with columns video_id, index, area and color to /tmp/areas.csv.
hooks:
  - SequenceToCsv:
      variables: [area, color]
      id_variable: video_id
      output_file: /tmp/areas.csv
Inheritance diagram of SequenceToCsv

__init__(variables, id_variable, output_file, pad_mask_variable=None, streams=None, **kwargs)[source]
Parameters:
  • variables (Iterable[str]) – names of the sources with an equally long sequence for each example
  • id_variable (str) – name of the source which represents a unique example id
  • output_file (str) – the desired name of the output csv file
  • pad_mask_variable (Optional[str]) – name of the source which represents the padding mask
  • streams (Optional[Iterable[str]]) – names of the streams to be considered; leave None to consider all streams
after_batch(stream_name, batch_data)[source]

Accumulate the given sequences.

Return type:None
after_epoch(epoch_id, **_)[source]

Save all the accumulated data to csv.

Return type:None
class emloop.hooks.SaveFile(files, output_dir, **kwargs)[source]

Bases: hooks.AbstractHook

Save files to the output dir before training.

save files to output dir
hooks:
  - SaveFile
      files: [path]
Inheritance diagram of SaveFile

__init__(files, output_dir, **kwargs)[source]
Parameters:
  • files (Iterable[str]) – files to be saved
  • output_dir (str) – directory to save the files
before_training()[source]

Before training event.

No data were processed at this moment.

Note

This method is called exactly once during the training.

class emloop.hooks.Benchmark(batch_size, **kwargs)[source]

Bases: hooks.AbstractHook

Log mean and median example times via standard logging.

log mean and median example times after each epoch
hooks:
  - Benchmark
Inheritance diagram of Benchmark

__init__(batch_size, **kwargs)[source]

Check and warn if there is any argument created by the user yet not recognized in the child hook __init__ method.

Parameters:kwargs**kwargs not recognized in the child hook
after_epoch_profile(epoch_id, profile, streams)[source]

Log average example times after each epoch.

The profile is expected to contain at least eval_batch_{stream} entry for each logged stream.

Parameters:
  • epoch_id (int) – number of the processed epoch
  • profile (Mapping[str, List[float]]) – epoch timings profile
  • streams (List[str]) – streams for which example times will be logged
class emloop.hooks.ClassificationMetrics(predicted_variable, gt_variable, f1_average=None, var_prefix='', **kwargs)[source]

Bases: emloop.hooks.accumulate_variables.AccumulateVariables

Accumulate the specified prediction and gt variables and compute their classification statistics after each epoch. In particular, accuracy, precisions, recalls, f1s and sometimes specificity (if f1_average is set to ‘binary’) are computed and saved to epoch data.

Warning

Specificity will be computed only if f1_average is set to binary.

Compute and save classification statistics between model output prediction and stream source labels.
hooks:
  - ClassificationMetrics:
      predicted_variable: prediction
      gt_variable: labels
Inheritance diagram of ClassificationMetrics

__init__(predicted_variable, gt_variable, f1_average=None, var_prefix='', **kwargs)[source]
Parameters:
_get_metrics(gt, predicted)[source]

Compute accuracy, precision, recall, f1 and sometimes specificity (if f1_average is set to ‘binary’).

Return type:Mapping[str, Union[float, List[float]]]
_save_metrics(epoch_data)[source]

Compute the classification statistics from the accumulator and save the results to the given epoch data. Set up ‘accuracy’, ‘precision’, ‘recall’, ‘f1’ and sometimes ‘specificity’ (if f1_average is set to ‘binary’) epoch data variables prefixed with self._var_prefix.

Parameters:epoch_data (Mapping[str, object]) – epoch data to save the results to
Raises:ValueError – if the output variables are already set
Return type:None
after_epoch(epoch_data, **kwargs)[source]

Compute and save the classification statistics and reset the accumulator.

Return type:None

Exceptions

exception emloop.hooks.TrainingTerminated[source]

Exception that is raised when a hook terminates the training.

Inheritance

Inheritance diagram of TrainingTerminated