Model

The model is the second component of the emloop environment. It defines the machine learning part of the whole workflow. The model object is defined by the emloop.models.AbstractModel interface.

The model constructor accepts a dataset instance, path to the logging directory and the information whether (and from where) the model should be restored or whether a new one should be created.

Additionally, every model has to define two properties, input_names and output_names. These properties should return the corresponding lists of input and output variable names. The input names are expected to exist in every batch as the keys to the batch dictionary. The output names are the variables, which the model computes and outputs, and which may be used by statistical hooks and alike. In the case of our animal recognition example from the dataset tutorial, the input names would contain image and animal, while the output names would contain predicted_animal and loss.

Running the Model

The most important method of the model is the emloop.models.AbstractModel.run(). This method evaluates the model on a single batch given as the first parameter. The second parameter is a boolean variable determining whether the model should update (train) on this batch or not.

Note that the trained model is not persistent, as it is only stored in the memory. The persistence of the model is provided by the emloop.models.AbstractModel.save() method, which dumps the model to the filesystem (although this behavior is model-specific and you may implement it as you wish in you own models). The emloop.models.AbstractModel.save() method shall accept only a single parameter -the name for the dumped file(s).

The pseudocode of model training, evaluation and saving may look as follows. Note that this loop is automatically managed by emloop.MainLoop and we publish this snippet just in order to demonstrate the process.

# `model` construction should be here

for epoch_id in range(10):
    for train_batch in dataset.train_stream():
        model.run(batch=train_batch, train=True)

    for test_batch in dataset.test_stream():
        model.run(batch=test_batch, train=False)

    model.save(name_suffix=str(epoch_id))

Restoring the Model

Once the model is successfully saved, it can be also restored. This is done when the training is about to continue (emloop resume) or in a production environemt (emloop eval <stream_name>). Both commands expect a single positional argument specifying from where the model shall be loaded. This argument is called restore_from and it is passed to the model constructor (see below).

If the restore_from argument is passed to the constructor, the model attempts to restore itself. Most often, it will consider the argument to be a file path and loads the file, yet the implementation is model-specific and may be implemented differently.

To restore the model, emloop needs to know what class should be instantiated to be able to call its constructor with the given restore_from argument. The class is inferred from the dumped configuration file in the output directory, specifically from the model.class entry. However, there are cases in which the original class cannot be constructed (somebody deleted source codes with the model object implementations etc.). For these cases, each model should implement a emloop.models.AbstractModel.restore_fallback() method, which usually points to a backend-specific base class, which is able to restore the saved files of all its subclasses. For instance, in the emloop-tensorflow backend, the emloop.models.AbstractModel.restore_fallback() class returns emloop_tensorflow.BaseModel, which it is able to load any checkpoint without the need for the original model source codes.