emloop

emloop core module.

Functions

emloop.create_output_dir(config, output_root, default_model_name='Unnamed')[source]
Create output_dir under the given output_root and
  • dump the given config to YAML file under this dir

  • register a file logger logging to a file under this dir

Parameters
  • config (dict) – config to be dumped

  • output_root (str) – dir wherein output_dir shall be created

  • default_model_name (str) – name to be used when model.name is not found in the config

Return type

str

Returns

path to the created output_dir

emloop.create_dataset(config, output_dir=None)[source]

Create a dataset object according to the given config.

Dataset config section and the output_dir are passed to the constructor in a single YAML-encoded string.

Parameters
  • config (dict) – config dict with dataset config

  • output_dir (Optional[str]) – path to the training output dir or None

Return type

AbstractDataset

Returns

dataset object

emloop.create_model(config, output_dir=None, dataset=None, restore_from=None)[source]

Create a model object either from scratch or from the checkpoint specified by restore_from.

Emloop allows the following scenarios

  1. Create model: leave restore_from=None and specify class;

  2. Restore model: specify restore_from which is a backend-specific path to (a directory with) the saved model.

Parameters
Return type

AbstractModel

Returns

model object

emloop.create_hooks(config, model=None, dataset=None, output_dir=None)[source]

Create hooks specified in config['hooks'] list.

Hook config entries may be one of the following types:

A hook with default args specified only by its name as a string; e.g.
hooks:
  - LogVariables
  - emloop_tensorflow.WriteTensorBoard
A hook with custom args as a dict name -> args; e.g.
hooks:
  - StopAfter:
      n_epochs: 10
Parameters
Return type

Iterable[AbstractHook]

Returns

list of hook objects

emloop.create_main_loop(config, output_root, restore_from=None)[source]

Creates MainLoop with model, dataset and hooks according to config.

Parameters
  • config (dict) – config dict

  • output_root (str) – dir where output_dir shall be created

  • restore_from (Optional[str]) – if not None, from whence the model should be restored (backend-specific information)

Return type

MainLoop

Returns

main loop object

Classes

  • MainLoop: emloop main loop for training and model inference.

  • AbstractDataset: This concept prescribes the API that is required from every emloop dataset.

  • BaseDataset: Base class for datasets written in python.

  • DownloadableDataset: DownloadableDataset is dataset base class implementing routines for downloading and extracting data via

  • AbstractHook: emloop hook interface.

  • AbstractModel: Abstract machine learning model which exposes input and output names, run and save methods.

  • Batch: Abstract base class for generic types.

  • Stream: Abstract base class for generic types.

  • EpochData: Abstract base class for generic types.

  • TimeProfile: Abstract base class for generic types.

class emloop.MainLoop(model, dataset, hooks=(), train_stream_name='train', extra_streams=(), buffer=0, on_empty_batch='error', on_empty_stream='error', on_unused_sources='warn', on_incorrect_config='error', fixed_batch_size=None, fixed_epoch_size=None, skip_zeroth_epoch=False, **kwargs)[source]

Bases: emloop.utils.misc.CaughtInterrupts

emloop main loop for training and model inference.

Inheritance diagram of MainLoop

EMPTY_ACTIONS = ['ignore', 'warn', 'error']

Possible actions to be taken when a batch/stream is empty.

INCORRECT_CONFIG_ACTIONS = ['ignore', 'warn', 'error']

Possible actions to be taken when a mainloop config contains some unexpected arguments.

UNUSED_SOURCE_ACTIONS = ['ignore', 'warn', 'error']

Possible actions to be taken when a stream source is unused by the trained model.

__enter__()[source]

Calls before_training() for all hooks.

__exit__(exc_type, exc_value, traceback)[source]

Calls after_training() for all hooks.

__init__(model, dataset, hooks=(), train_stream_name='train', extra_streams=(), buffer=0, on_empty_batch='error', on_empty_stream='error', on_unused_sources='warn', on_incorrect_config='error', fixed_batch_size=None, fixed_epoch_size=None, skip_zeroth_epoch=False, **kwargs)[source]
Parameters
  • model (AbstractModel) – trained model

  • dataset (AbstractDataset) – loaded dataset

  • hooks (Iterable[AbstractHook]) – training hooks

  • train_stream_name (str) – name of the training stream

  • extra_streams (Iterable[str]) – additional stream names to be evaluated between epochs

  • buffer (int) – size of the batch buffer, 0 means no buffer

  • on_empty_batch (str) – action to take when batch is empty; one of MainLoop.EMPTY_ACTIONS

  • on_empty_stream (str) – action to take when stream is empty; one of MainLoop.EMPTY_ACTIONS

  • on_unused_sources (str) – action to take when stream provides an unused sources; one of UNUSED_SOURCE_ACTIONS

  • on_incorrect_config (str) – action to take when mainloop config contains unexpected arguments; one of MainLoop.INCORRECT_CONFIG_ACTIONS

  • fixed_batch_size (Optional[int]) – if specified, main_loop removes all batches that do not have the specified size

  • fixed_epoch_size (Optional[int]) – if specified, cut the train stream to epochs of at most fixed_epoch_size batches

  • skip_zeroth_epoch (bool) – if specified, main loop skips the 0th epoch

Raises

AssertionError – in case of unsupported value of on_empty_batch, on_empty_stream or on_unused_sources

_check_sources(batch)[source]

Check for unused and missing sources.

Parameters

batch (Dict[str, object]) – batch to be checked

Raises

ValueError – if a source is missing or unused and self._on_unused_sources is set to error

Return type

None

_epoch_impl(train_streams, eval_streams)[source]

Runs single epoch with given streams.

Parameters
  • train_streams (Iterable[str]) – list of training streams

  • eval_streams (Iterable[str]) – list of eval streams

Return type

None

_run_epoch(stream, train)[source]

Iterate through the given stream and evaluate/train the model with the received batches.

Calls emloop.hooks.AbstractHook.after_batch() events.

Parameters
  • stream (StreamWrapper) – stream to iterate

  • train (bool) – if set to True, the model will be trained

Raises
  • ValueError – in case of empty batch when on_empty_batch is set to error

  • ValueError – in case of empty stream when on_empty_stream is set to error

  • ValueError – in case of two batch variables having different lengths

Return type

None

epoch(train_streams, eval_streams)[source]

Runs single epoch with given streams.

Parameters
  • train_streams (Iterable[Iterable[+T_co]]) – list of training streams, each either string (e.g. ‘train’), StreamWrapper or iterator

  • eval_streams (Iterable[Iterable[+T_co]]) – list of eval streams, each either string (e.g. ‘valid’), StreamWrapper or iterator

Return type

None

extra_streams

List of extra stream names as specified in self.__init__().

Return type

List[str]

fixed_epoch_size

Fixed epoch size parameter as specified in self.__init__().

Return type

Optional[int]

get_stream(stream_name)[source]

Get a StreamWrapper with the given name.

Parameters

stream_name (str) – stream name

Return type

StreamWrapper

Returns

dataset function name providing the respective stream

Raises

AttributeError – if the dataset does not provide the function creating the stream

prepare_streams(stream_list, base_name)[source]

Converts streams to StreamWrappers, saves them to self._streams and returns their names as strings.

Parameters
  • stream_list (Iterable[Iterable[+T_co]]) – list of training streams, each either string (e.g. ‘train’), StreamWrapper or iterator

  • base_name (str) – default base name for unnamed streams

Return type

Iterable[str]

run_evaluation(stream_name)[source]

Evaluates given stream.

Parameters

stream_name – Name of stream to evaluate (e.g. valid).

Return type

None

run_training()[source]

Trains until TrainingTerminated exception is raised.

Return type

None

training_epochs_done

Number of training epochs done.

Return type

Optional[int]

class emloop.AbstractDataset(config_str)

Bases: object

This concept prescribes the API that is required from every emloop dataset.

Every emloop dataset has to have a constructor which takes YAML string config. Additionally, one may implement any <stream_name>_stream method in order to make stream_name stream available in the emloop emloop.MainLoop.

All the defined stream methods should return a Stream.

Inheritance diagram of AbstractDataset

__init__(config_str)[source]

Create new dataset configured with the given YAML string (obligatory).

The configuration must contain dataset entry and may contain output_dir entry.

Parameters

config_str (str) – YAML string config

class emloop.BaseDataset(config_str)

Bases: datasets.AbstractDataset

Base class for datasets written in python.

In the inherited class, one should:
  • override the _configure_dataset

  • (optional) implement train_stream method if intended to be used with emloop train ...

  • (optional) implement <stream_name>_stream method in order to make <stream_name> stream available

Inheritance diagram of BaseDataset

__init__(config_str)[source]

Create new dataset.

Decode the given YAML config string and pass the obtained **kwargs to _configure_dataset().

Parameters

config_str (str) – dataset configuration as YAML string

_configure_dataset(output_dir, **kwargs)[source]

Configure the dataset with **kwargs decoded from YAML configuration.

Parameters
  • output_dir (Optional[str]) – output directory for logging and any additional outputs (None if no output dir is available)

  • kwargs – dataset configuration as **kwargs parsed from config['dataset']

Raises

NotImplementedError – if not overridden

stream_info()[source]

Check and report source names, dtypes and shapes of all the streams available.

Return type

None

class emloop.DownloadableDataset(config_str)

Bases: datasets.BaseDataset

DownloadableDataset is dataset base class implementing routines for downloading and extracting data via emloop dataset download command.

The typical use-case is that data_root, url_root and download_filenames variables are passed to the dataset constructor. Alternatively, these properties might be directly implemented in their corresponding methods.

Inheritance diagram of DownloadableDataset

_configure_dataset(data_root=None, download_urls=None, **kwargs)[source]

Save the passed values and use them as a default property implementation.

Parameters
  • data_root (Optional[str]) – directory to which the files will be downloaded

  • download_urls (Optional[Iterable[str]]) – list of URLs to be downloaded

Return type

None

data_root

Path to the data root directory.

Return type

str

download()[source]

Maybe download and extract the extra files required.

If not already downloaded, download all files specified by download_urls(). Then, extract the downloaded files to data_root().

emloop CLI example
emloop dataset download <path-to-config>
Return type

None

download_urls

A list of URLs to be downloaded.

Return type

Iterable[str]

class emloop.AbstractHook(**kwargs)

Bases: object

emloop hook interface.

Hook lifecycle (event -> method invocation):

  1. emloop constructs the hooks -> __init__()

  2. emloop enters the main loop -> before_training()
    1. emloop starts an epoch

    2. emloop computes a batch -> after_batch()

    3. emloop finishes the epoch -> after_epoch() and after_epoch_profile()

  3. emloop terminates the main loop -> after_training()

Caution

Hook naming conventions:

  • hook names should describe hook actions with verb stems. E.g.: LogProfile or SaveBest

  • hook names should not include Hook suffix

Inheritance diagram of AbstractHook

__init__(**kwargs)[source]

Check and warn if there is any argument created by the user yet not recognized in the child hook __init__ method.

Parameters

kwargs**kwargs not recognized in the child hook

after_batch(stream_name, batch_data)[source]

After batch event.

This event is triggered after every processed batch regardless of stream type. Batch results are available in results argument.

Parameters
  • stream_name (str) – name of the stream (usually train, valid or``test``)

  • batch_data (Mapping[str, Sequence[Any]]) – batch inputs and model outputs

Return type

None

after_epoch(epoch_id, epoch_data)[source]

After epoch event.

This event is triggered after every epoch wherein all the streams were iterated. The epoch_data object is initially empty and shared among all the hooks.

Parameters
  • epoch_id (int) – finished epoch id

  • epoch_data (Mapping[str, object]) – epoch data flowing through all hooks

Return type

None

after_epoch_profile(epoch_id, profile, streams)[source]

After epoch profile event.

This event provides opportunity to process time profile of the finished epoch.

Parameters
  • epoch_id (int) – finished epoch id

  • streams (List[str]) – streams which are in profile

Return type

None

after_training(success)[source]

After training event.

This event is called after the training finished either naturally or thanks to an interrupt.

Note

This method is called exactly once during the training.

Parameters

success (bool) – whether the training ended with successfully or with exception

Return type

None

before_training()[source]

Before training event.

No data were processed at this moment.

Note

This method is called exactly once during the training.

Return type

None

register_mainloop(main_loop)[source]

Pass emloop.MainLoop to hook. Raise ValueError if MainLoop was already passed before.

Parameters

main_loop (emloop.MainLoop) – emloop main loop for training

Raises

ValueError – if MainLoop was already passed before

Return type

None

class emloop.AbstractModel(dataset, log_dir, restore_from=None, **kwargs)

Bases: object

Abstract machine learning model which exposes input and output names, run and save methods. AbstractModel implementations are trainable with emloop.MainLoop.

Inheritance diagram of AbstractModel

__init__(dataset, log_dir, restore_from=None, **kwargs)[source]

Model constructor interface.

Additional parameters (currently covered by **kwargs) are passed according to the configuration model section.

Parameters
  • dataset (Optional[AbstractDataset]) – dataset object

  • log_dir (str) – existing directory in which all output files should be stored

  • restore_from (Optional[str]) – information passed to the model constructor (backend-specific); usually a directory in which the trained model is stored

  • kwargs – configuration section model

input_names

List of model input names.

Return type

Iterable[str]

output_names

List of model output names.

Return type

Iterable[str]

run(batch, train, stream)[source]

Run feed-forward pass with the given batch and return the results as dict.

When train=True, also update parameters.

Parameters
  • batch (Mapping[str, Sequence[Any]]) – batch to be processed

  • train (bool) – True if this batch should be used for model update, False otherwise

  • stream (StreamWrapper) – stream wrapper (useful for precise buffer management)

Return type

Mapping[str, Sequence[Any]]

Returns

results dict

save(name_suffix)[source]

Save the model parameters with the given name_suffix.

Parameters

name_suffix (str) – name suffix to be appended to the saved model

Return type

str

Returns

path to the saved file/dir

emloop.Batch

alias of typing.Mapping

emloop.Stream

alias of typing.Iterable

emloop.EpochData

alias of typing.Mapping

emloop.TimeProfile

alias of typing.Mapping