Using ModelHandler with Gluon

MXNet’s Gluon framework allows Neural Networks to be written under an imperative paradigm. ModelHandler is currently based around the symbolic graph implementation of MXNet and as a result, models written in Gluon cannot directly be used.

If the model is written in Gluon using HybridBlocks (i.e. if the network consists entirely of predefined MXNet layers) then the model can be compliled as a symbolic graph using the command .hybridize().

The Gluon defined model can then be converted to a symbol and set of parameters which can then be loaded as an MXNet Module and used with ModelHandler.

In this demo, we will show that you can define a model in Gluon using code from the Gluon MNIST demo and then convert it to a Module and use ModelHandler.

In [1]:
import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag

import os

# Fixing the random seed
mx.random.seed(42)

Train model in Gluon

Define model in Gluon

In [2]:
mnist = mx.test_utils.get_mnist()
In [3]:
batch_size = 100
train_data = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
In [4]:
# define network
net = nn.HybridSequential()
with net.name_scope():
    net.add(nn.Dense(128, activation='relu'))
    net.add(nn.Dense(64, activation='relu'))
    net.add(nn.Dense(10))

net.hybridize()
In [5]:
gpus = mx.test_utils.list_gpus()
ctx =  [mx.gpu()] if gpus else [mx.cpu(0), mx.cpu(1)]
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.02})

Training

In [6]:
%%time
epoch = 10
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss()
for i in range(epoch):
    # Reset the train data iterator.
    train_data.reset()
    # Loop over the train data iterator.
    for batch in train_data:
        # Splits train data into multiple slices along batch_axis
        # and copy each slice into a context.
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        # Splits train labels into multiple slices along batch_axis
        # and copy each slice into a context.
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                z = net(x)
                # Computes softmax cross entropy loss.
                loss = softmax_cross_entropy_loss(z, y)
                # Backpropagate the error for one iteration.
                loss.backward()
                outputs.append(z)
        # Updates internal evaluation
        metric.update(label, outputs)
        # Make one step of parameter update. Trainer needs to know the
        # batch size of data to normalize the gradient by 1/batch_size.
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()
    print('training acc at epoch {}: {}={}'.format(i, name, acc))
training acc at epoch 0: accuracy=0.7816
training acc at epoch 1: accuracy=0.89915
training acc at epoch 2: accuracy=0.9134666666666666
training acc at epoch 3: accuracy=0.9225833333333333
training acc at epoch 4: accuracy=0.9305666666666667
training acc at epoch 5: accuracy=0.9366666666666666
training acc at epoch 6: accuracy=0.9418166666666666
training acc at epoch 7: accuracy=0.94585
training acc at epoch 8: accuracy=0.9495333333333333
training acc at epoch 9: accuracy=0.9532333333333334
CPU times: user 43.1 s, sys: 4.18 s, total: 47.3 s
Wall time: 29.7 s

Testing

In [7]:
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
# Reset the validation data iterator.
val_data.reset()
# Loop over the validation data iterator.
for batch in val_data:
    # Splits validation data into multiple slices along batch_axis
    # and copy each slice into a context.
    data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
    # Splits validation label into multiple slices along batch_axis
    # and copy each slice into a context.
    label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
    outputs = []
    for x in data:
        outputs.append(net(x))
    # Updates internal evaluation
    metric.update(label, outputs)
print('validation acc: {}={}'.format(*metric.get()))
assert metric.get()[1] > 0.94
validation acc: accuracy=0.9527

Convert Gluon model to Module

Adapted from snippet found `here <https://github.com/apache/incubator-mxnet/issues/9374>`__

From the Gluon model, the symbol and parameters are extracted and used to define an Module object.

In [8]:
def block2symbol(block):
    data = mx.sym.Variable('data')
    sym = block(data)
    args = {}
    auxs = {}
    for k, v in block.collect_params().items():
        args[k] = mx.nd.array(v.data().asnumpy())
        auxs[k] = mx.nd.array(v.data().asnumpy())
    return sym, args, auxs
In [9]:
def symbol2mod(sym, args, auxs, data_iter):
    mx_sym = mx.sym.SoftmaxOutput(data=sym, name='softmax')
    model = mx.mod.Module(symbol=mx_sym, context=mx.cpu(),
                          label_names=['softmax_label'])
    model.bind( data_shapes = data_iter.provide_data,
                label_shapes = data_iter.provide_label )
    model.set_params(args, auxs)
    return model
In [10]:
sym_params = block2symbol(net)
In [11]:
mod = symbol2mod(*sym_params, train_data)

Alternative Method

Serialise Gluon model to file using .export().

Load the serialised model as an MXNet Module with Module.load() so that xfer can be used.

In [12]:
# model_name = 'gluon-model'
# net.export(model_name)

# mod = mx.mod.Module.load(model_name, 0, label_names=[])
# os.remove(model_name+'-symbol.json')
# os.remove(model_name+'-0000.params')

Apply ModelHandler

Now we can load the model into ModelHandler and use it to visualise the model, return the layer names, extract features and much more!

In [13]:
import xfer
In [14]:
mh = xfer.model_handler.ModelHandler(mod)
In [15]:
# Show architecture of model
mh.visualize_net()
Out[15]:
../_images/demos_xfer-gluon-with-modelhandler_21_0.svg
In [16]:
mh.layer_names
Out[16]:
['hybridsequential0_dense0_fwd',
 'hybridsequential0_dense0_relu_fwd',
 'hybridsequential0_dense1_fwd',
 'hybridsequential0_dense1_relu_fwd',
 'hybridsequential0_dense2_fwd',
 'softmax']
In [17]:
# Get output from intermediate layers of the model
mh.get_layer_output(train_data, ['hybridsequential0_dense1_fwd'])
Out[17]:
(OrderedDict([('hybridsequential0_dense1_fwd',
               array([[ 1.93497527e+00,  2.40295935e+00,  1.16074115e-01, ...,
                       -4.74348217e-02, -3.76087427e-03,  1.39985621e+00],
                      [ 2.15391922e+00,  1.97971451e+00,  4.61517543e-01, ...,
                        2.28680030e-01, -8.29489648e-01,  9.69915807e-01],
                      [ 2.06626105e+00,  4.06703472e+00,  7.65578270e-01, ...,
                        3.74726385e-01,  1.03201318e+00, -5.41208267e-01],
                      ...,
                      [ 2.55671740e+00,  4.17255354e+00,  5.60081601e-01, ...,
                        5.68660349e-02, -1.58825326e+00,  1.59997427e+00],
                      [ 2.30686831e+00,  2.34434009e+00, -5.84015131e-01, ...,
                        3.16424906e-01, -1.08476102e-01,  6.86561584e-01],
                      [ 9.71719801e-01,  1.08340001e+00,  1.72682357e+00, ...,
                       -2.98302293e-01,  1.48507738e+00, -7.40276098e-01]], dtype=float32))]),
 array([8, 8, 6, ..., 8, 8, 4]))
In [18]:
mh.get_layer_type('hybridsequential0_dense0_relu_fwd')
Out[18]:
'Activation'
In [19]:
# Add/Remove layers from model output
mh.drop_layer_top(2)
mh.add_layer_top([mx.sym.FullyConnected(num_hidden=30),
                  mx.sym.Activation(act_type='relu'),
                  mx.sym.FullyConnected(num_hidden=10),
                  mx.sym.SoftmaxOutput()])
mh.visualize_net()
Out[19]:
../_images/demos_xfer-gluon-with-modelhandler_25_0.svg
In [20]:
# Add/remove layers from model input
mh.add_layer_bottom([mx.sym.Convolution(kernel=(2,2), num_filter=10)])
mh.visualize_net()
Out[20]:
../_images/demos_xfer-gluon-with-modelhandler_26_0.svg
In [ ]: