Welcome to Xfer’s documentation!

Release: 1.1.0
Date: Jun 25, 2019

Xfer is a Transfer Learning framework written in Python.

Xfer features Repurposers that can be used to take an MXNet model and train a meta-model or modify the model for a new target dataset. To get started with Xfer checkout our introductory tutorial here.

The code can be found on our Github project page. It is open source and provided using the Apache 2.0 license.

Deep Transfer Learning with Xfer

Transfer learning in 3 lines of code:

repurposer = xfer.LrRepurposer(source_model, feature_layer_names=['fc7'])
repurposer.repurpose(train_iterator)
predictions = repurposer.predict_label(test_iterator)

Keep reading below to see Xfer in action!

Overview

What is Xfer?

Xfer is a library that allows quick and easy transfer of knowledge stored in deep neural networks. It can be used for the classification of data of arbitrary numeric format, and can be applied to the common cases of image or text data.

Xfer can be used as a pipeline that spans from extracting features to training a repurposer. The repurposer is then an object that performs classification in the target task.

You can also use individual components of Xfer as part of your own pipeline. For example, you can leverage the feature extractor to extract features from deep neural networks or ModelHandler, which allows for quick building of neural networks, even if you are not an MXNet expert.

How can Xfer help me?

  • Resource efficiency: you don’t have to train big neural networks from scratch.
  • Data efficiency: by transferring knowledge, you can classify complex data even if you have very few labels.
  • Easy access to neural networks: you don’t need to be an ML ninja in order to leverage the power of neural networks. With Xfer you can easily re-use them or even modify existing architectures and create your own solution.
  • Uncertainty modeling: With the Bayesian neural network (BNN) or the Gaussian process (GP) repurposers, you can obtain uncertainty in the predictions of the repurposer.
  • Utilities for feature extraction from neural networks.
  • Rapid prototyping.

This Demo

In this notebook we demonstrate Xfer in an image classification task. A pre-trained neural network is selected, from which we transfer knowledge for the classification task in the target domain. The target task is a much smaller set of images that come from a different domain (hand-drawn sketches), therefore the classifier from the source task cannot be used as is, without repurposing. Therefore, the aim is to train a new classifier and it is vital to transfer knowledge from the source task, due to the extremely scarce target dataset. The new classifier for the target task is either a meta-model or a modified and fine-tuned clone of the source task’s neural network.

Components

Xfer is comprised of 2 components:

  • ModelHandler - Extracts features from pretrained model and performs model manipulation
  • Repurposer - Repurposes model for target task

Transfer Learing Pipeline

In the following, we demonstrate the Xfer workflow:

  1. A data iterator creation
  2. A pre-trained model selection (i.e. picking a source task)
  3. Feature extraction with the ModelHandler
  4. Repurposer used to perform transfer learning from the source task to the target task

First we import or define all relevant modules and utilities.

In [1]:
import numpy as np
import os
import json
import random
import logging
import glob

import mxnet as mx
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import classification_report
from matplotlib import pylab as plt
%matplotlib inline

import xfer

seed=2
random.seed(seed)
np.random.seed(seed)
mx.random.seed(seed)

logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Change the default option below to test Xfer on other datasets (or use your own!).
TEST_IMAGES = 'test_sketches/' # Options: 'test_images' or 'test_sketches' or 'test_images_sketch'
In [2]:
def get_iterators_from_folder(data_dir, train_size=0.6, batchsize=10, label_name='softmax_label', data_name='data', random_state=1):
    """
    Method to create iterators from data stored in a folder with the following structure:
    /data_dir
        /class1
            class1_img1
            class1_img2
            ...
            class1_imgN
        /class2
            class2_img1
            class2_img2
            ...
            class2_imgN
        ...
        /classN
    """
    # assert dir exists
    if not os.path.isdir(data_dir):
        raise ValueError('Directory not found: {}'.format(data_dir))
    # get class names
    classes = [x.split('/')[-1] for x in glob.glob(data_dir+'/*')]
    classes.sort()
    fnames = []
    labels = []
    for c in classes:
            # get all the image filenames and labels
            images = glob.glob(data_dir+'/'+c+'/*')
            images.sort()
            fnames += images
            labels += [c]*len(images)
    # create label2id mapping
    id2label = dict(enumerate(set(labels)))
    label2id = dict((v,k) for k, v in id2label.items())

    # get indices of train and test
    sss = StratifiedShuffleSplit(n_splits=2, test_size=None, train_size=train_size, random_state=random_state)
    train_indices, test_indices = next(sss.split(labels, labels))

    train_img_list = []
    test_img_list = []
    train_labels = []
    test_labels = []
    # create imglist for training and test
    for idx in train_indices:
        train_img_list.append([label2id[labels[idx]], fnames[idx]])
        train_labels.append(label2id[labels[idx]])
    for idx in test_indices:
        test_img_list.append([label2id[labels[idx]], fnames[idx]])
        test_labels.append(label2id[labels[idx]])

    # make iterators
    train_iterator = mx.image.ImageIter(batchsize, (3,224,224), imglist=train_img_list, label_name=label_name, data_name=data_name,
                                        path_root='')
    test_iterator = mx.image.ImageIter(batchsize, (3,224,224), imglist=test_img_list, label_name=label_name, data_name=data_name,
                                      path_root='')

    return train_iterator, test_iterator, train_labels, test_labels, id2label, label2id


def get_images(iterator):
    """
    Returns list of image arrays from iterator
    """
    iterator.reset()
    images = []
    while True:
        try:
            batch = iterator.next().data[0]
            for n in range(batch.shape[0]):
                images.append(batch[n])
        except StopIteration:
            break
    return images


def show_predictions(predictions, images, id2label, uncertainty=None, figsize=(9,1.2), fontsize=12, n=8):
    """
    Plots images with predictions as labels. If uncertainty is given then this is plotted below as a
    series of horizontalbar charts.
    """
    num_rows = 1 if uncertainty is None else 2

    plt.figure(figsize=figsize)
    for cc in range(n):
        plt.subplot(num_rows,n,1+cc)
        plt.tick_params(
                        axis='both',          # changes apply to the x-axis
                        which='both',      # both major and minor ticks are affected
                        bottom=False,      # ticks along the bottom edge are off
                        top=False,         # ticks along the top edge are off
                        left=False,
                        labelleft=False,
                        labelbottom=False) # labels along the bottom edge are off
        plt.imshow(np.uint8(images[cc].asnumpy().transpose((1,2,0))))
        plt.title(id2label[predictions[cc]].split(',')[0], fontsize=fontsize)
        plt.axis

    if uncertainty is not None:
        pos = range(len(id2label.values()))
        for cc in range(n):
            plt.subplot(num_rows,n,n+1+cc)
            # Normalize the bars to be 0-1 for better readability.
            xx = uncertainty[cc]
            xx = (xx-min(xx))/(max(xx)-min(xx))
            plt.barh(pos, xx, align='center', height=0.3)
            if cc == 0:
                plt.yticks(pos, id2label.values())
            else:
                plt.gca().set_yticklabels([])
            plt.gca().set_xticklabels([])
            plt.grid(True)

Data Handling

In order for Xfer to process data, it must be given as an MXNet data iterator (mxnet.io.DataIter). MXNet expects labels to be sequential integers starting at zero so we have mapped all our string labels to integers to avoid any unexpected behaviours.

The data handling portion of the workflow is made up of the following steps:

  • Get iterators
  • Get labels
  • Get label to idx mapping dictionary
In [3]:
# We have chosen to split the data into train and test at a 60:40 ratio and use a batchsize of 4
train_iterator, test_iterator, train_labels, test_labels, id2label, label2id = get_iterators_from_folder(TEST_IMAGES, 0.6, 4, label_name='prob_label', random_state=1)
INFO:root:Using 1 threads for decoding...
INFO:root:Set enviroment variable MXNET_CPU_WORKER_NTHREADS to a larger number to use more threads.
INFO:root:ImageIter: loading image list...
INFO:root:Using 1 threads for decoding...
INFO:root:Set enviroment variable MXNET_CPU_WORKER_NTHREADS to a larger number to use more threads.
INFO:root:ImageIter: loading image list...

Source Model

ModelHandler is an Xfer module which handles everything related to the source pre-trained neural network. It can extract features given a target dataset and source model, and it can also manipulate the pre-trained network by adding/removing/freezing layers (we’ll see this functionality in the next section). For now, we simply:

  • Load MXNet Module from file
  • Instantiate ModelHandler object with VGG-19 model as source model

The VGG-19 model is a convolutional neural network trained on ImageNet and is good at image classification. Other models trained on ImageNet are likely to be good source models for this task.

In [4]:
# Download model
path = 'http://data.mxnet.io/models/imagenet/'
[mx.test_utils.download(path+'vgg/vgg19-0000.params'),
mx.test_utils.download(path+'vgg/vgg19-symbol.json')]
INFO:root:vgg19-0000.params exists, skipping download
INFO:root:vgg19-symbol.json exists, skipping download
Out[4]:
['vgg19-0000.params', 'vgg19-symbol.json']
In [5]:
source_model = mx.module.Module.load('vgg19', 0, label_names=['prob_label'])
mh = xfer.model_handler.ModelHandler(source_model)

How well the pre-trained network alone is doing (without repurposing)?

This section will show how well the pre-trained source model performs before any repurposing is applied.

In [6]:
# Get pre-trained model without modifications
model = mh.get_module(iterator=test_iterator)
# Predict on our test data
predictions = np.argmax(model.predict(test_iterator), axis=1).asnumpy().astype(int)
In [7]:
# This utility just allows us to translate image-id's of the imagenet dataset to human-readable labels
with open('imagenet1000-class-to-human.json', 'r') as fp:
    imagenet_class_to_human = json.load(fp)

imagenet_class_to_human = {int(k): v for k, v in imagenet_class_to_human.items()}
In [8]:
# Plot all test images along with the predicted labels
images = get_images(test_iterator)

show_predictions(predictions, images, imagenet_class_to_human, None, (15, 1.5))
_images/demos_xfer-overview_15_0.png

The model is performing badly on our sketch images - it thinks most of our drawings are hooks! The reason for this is that the label and image distribution in the target task are different (having come from a different dataset) i.e the model has been trained on photographs of objects and so cannot sensibly classify these sketches. The results would get worse if the source/target dataset mismatch was larger. A repurposing step is required to better align the pre-trained model with the target data.

Repurposing

(a) Repurposing with meta-models

By repurposing with meta models, we use the neural network as a feature extractor and fit a different model on these features.

In [9]:
# Instantiate a Logistic Regression repurposer (other options: SVM; GP; NN, BNN repurposers)
logging.info("Logistic Regression (LR) Repuroser")
repLR = xfer.LrRepurposer(source_model=source_model, feature_layer_names=['fc7'])
repLR.repurpose(train_iterator)
predictionsLR = repLR.predict_label(test_iterator)

logging.info("LR Repurposer - Classification Results")
print(classification_report(test_labels, predictionsLR, target_names=list(id2label.values()), digits=3))
INFO:root:Logistic Regression (LR) Repuroser
INFO:root:Extracting features from layers: fc7
INFO:root:Processed batch 1
INFO:root:Processed batch 2
INFO:root:Processed batch 3
INFO:root:Processed batch 4
INFO:root:Processed batch 5
INFO:root:Processed batch 6
 /anaconda/envs/matplotlib-backend/lib/python3.5/site-packages/sklearn/linear_model/sag.py:326: ConvergenceWarning:The max_iter was reached which means the coef_ did not converge
INFO:root:Extracting features from layers: fc7
INFO:root:Processed batch 1
INFO:root:Processed batch 2
INFO:root:Processed batch 3
INFO:root:Processed batch 4
INFO:root:LR Repurposer - Classification Results
             precision    recall  f1-score   support

       tree      1.000     0.750     0.857         4
        car      1.000     1.000     1.000         4
     cheese      1.000     1.000     1.000         4
      house      0.800     1.000     0.889         4

avg / total      0.950     0.938     0.937        16

In [10]:
show_predictions(predictionsLR, images, id2label, None, (15,1.5))
_images/demos_xfer-overview_20_0.png

(b) Fine-tuning Neural Network repurposer

Neural network repurposers will:

  • Modify the pretrained neural network architecture by adding and removing layers
  • Retrain the network with certain layers held fixed or randomised
In [11]:
# Choose which layers of the model to fix during training - more fixed layers lead to faster training
fixed_layers = ['conv1_1','conv1_2','conv2_1','conv2_2','conv3_1','conv3_2','conv3_3','conv3_4',
                'conv4_1','conv4_2', 'conv4_3','conv4_4','conv5_1','conv5_2','conv5_3', 'conv5_4']
# Choose which layers of the model to randomise before training - we may want to forget some of what
# this model knows
random_layers = []

repNN = xfer.NeuralNetworkRandomFreezeRepurposer(source_model, target_class_count=4, fixed_layers=fixed_layers, random_layers=random_layers)
repNN.repurpose(train_iterator)
predictionsNN = repNN.predict_label(test_iterator)
logging.info("NN Repurposer - Classification Results")
print(classification_report(test_labels, predictionsNN, target_names=list(id2label.values()), digits=3))
INFO:root:fc8, prob deleted from model top
INFO:root:Added new_fully_connected_layer, prob to model top
WARNING:root:Already bound, ignoring bind()
 /anaconda/envs/matplotlib-backend/lib/python3.5/site-packages/mxnet/module/base_module.py:488: UserWarning:Parameters already initialized and force_init=False. init_params call ignored.
INFO:root:Epoch[0] Train-accuracy=0.541667
INFO:root:Epoch[0] Time cost=33.846
INFO:root:Epoch[1] Train-accuracy=1.000000
INFO:root:Epoch[1] Time cost=22.451
INFO:root:Epoch[2] Train-accuracy=1.000000
INFO:root:Epoch[2] Time cost=24.884
INFO:root:Epoch[3] Train-accuracy=1.000000
INFO:root:Epoch[3] Time cost=23.454
INFO:root:Epoch[4] Train-accuracy=1.000000
INFO:root:Epoch[4] Time cost=24.850
INFO:root:NN Repurposer - Classification Results
             precision    recall  f1-score   support

       tree      1.000     0.750     0.857         4
        car      1.000     1.000     1.000         4
     cheese      1.000     1.000     1.000         4
      house      0.800     1.000     0.889         4

avg / total      0.950     0.938     0.937        16

The neural network repurposer will likely not be great if the target dataset is extremely small.

(c) Repurposing with probability and uncertainty

Two repurposers offer well-calibrated probability for predictions: GPRepurposer and BNNRepurposer. Here we explore the former (the latter can be used for non-tiny datasets).

In [12]:
# Instantiate a GP repurposer
repGP = xfer.GpRepurposer(source_model, feature_layer_names=['fc6'], apply_l2_norm=True)
repGP.repurpose(train_iterator)

logging.info("GP Repurposer - Classification Results")
uncertaintyGP = repGP.predict_probability(test_iterator)
predictionsGP = np.argmax(uncertaintyGP, axis=1)

print(classification_report(test_labels, predictionsGP,
    target_names=list(id2label.values()), digits=3))
INFO:root:Extracting features from layers: fc6
INFO:root:Processed batch 1
INFO:root:Processed batch 2
INFO:root:Processed batch 3
INFO:root:Processed batch 4
INFO:root:Processed batch 5
INFO:root:Processed batch 6
INFO:GP:initializing Y
INFO:GP:initializing inference method
INFO:GP:adding kernel and likelihood as parameters
INFO:GP:initializing Y
INFO:GP:initializing inference method
INFO:GP:adding kernel and likelihood as parameters
INFO:GP:initializing Y
INFO:GP:initializing inference method
INFO:GP:adding kernel and likelihood as parameters
INFO:GP:initializing Y
INFO:GP:initializing inference method
INFO:GP:adding kernel and likelihood as parameters
INFO:root:GP Repurposer - Classification Results
INFO:root:Extracting features from layers: fc6
INFO:root:Processed batch 1
INFO:root:Processed batch 2
INFO:root:Processed batch 3
INFO:root:Processed batch 4
             precision    recall  f1-score   support

       tree      1.000     0.750     0.857         4
        car      1.000     1.000     1.000         4
     cheese      1.000     1.000     1.000         4
      house      0.800     1.000     0.889         4

avg / total      0.950     0.938     0.937        16

The code below will plot the predictions and the probability for each class.

In [13]:
show_predictions(predictionsGP, images, id2label, uncertaintyGP, (17,3.8))
_images/demos_xfer-overview_27_0.png

We not only have predictions from this model but we can also see the uncertainty in the model for any given predition which allows us to make better decisions on our data.

Other repurposers

We have seen the use of LrRepurposer, NeuralNetworkRandomFreezeRepurposer and GpRepurposer. Other repurposers offered are: SvmRepurposer, BnnRepurposer, NeuralNetworkFineTuneRepurposer.

You can also write your own repurposer

Using Xfer on your own data

All you need to do is generate your own data iterator and use it instead of the iterators used above.

For more details see the API documentation

In [ ]:

Model Handler

ModelHandler is a utility class for manipulating and inspecting MXNet models. It can be used to:

  • Add and remove layers from an existing model or “freeze” selected layers
  • Discover information such as layer names and types
  • Extract features from pretrained models

In this tutorial, we will demonstrate some of the key capabilities of ModelHandler.

Initialisation

In [1]:
import mxnet as mx
import logging

import xfer

logger = logging.getLogger()
logger.setLevel(logging.INFO)
In [2]:
# Download vgg19 (trained on imagenet)
path = 'http://data.mxnet.io/models/imagenet/'
[mx.test_utils.download(path+'vgg/vgg19-0000.params'),
mx.test_utils.download(path+'vgg/vgg19-symbol.json')]
INFO:root:vgg19-0000.params exists, skipping download
INFO:root:vgg19-symbol.json exists, skipping download
Out[2]:
['vgg19-0000.params', 'vgg19-symbol.json']
In [3]:
source_model = mx.module.Module.load('vgg19', 0, label_names=['prob_label'])

# The ModelHandler constructor takes an MXNet Module as input
mh = xfer.model_handler.ModelHandler(source_model)

Model Inspection

Layer names

In [4]:
print(mh.layer_names)
['conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5', 'flatten_0', 'fc6', 'relu6', 'drop6', 'fc7', 'relu7', 'drop7', 'fc8', 'prob']

Layer Types

Given the name of a layer, this function returns the type.

In [5]:
print(mh.get_layer_type('relu5_2'))
print(mh.get_layer_type('flatten_0'))
print(mh.get_layer_type('fc7'))
print(mh.get_layer_type('conv5_3'))
print(mh.get_layer_type('prob'))
Activation
Flatten
FullyConnected
Convolution
SoftmaxOutput

ModelHandler can be used to get a list of layers that are of a specific type.

In [6]:
import xfer.model_handler.consts as consts

print(mh.get_layer_names_matching_type('Convolution'))
print(mh.get_layer_names_matching_type('Pooling'))
print(mh.get_layer_names_matching_type('Activation'))
print(mh.get_layer_names_matching_type('BatchNorm'))
['conv1_1', 'conv1_2', 'conv2_1', 'conv2_2', 'conv3_1', 'conv3_2', 'conv3_3', 'conv3_4', 'conv4_1', 'conv4_2', 'conv4_3', 'conv4_4', 'conv5_1', 'conv5_2', 'conv5_3', 'conv5_4']
['pool1', 'pool2', 'pool3', 'pool4', 'pool5']
['relu1_1', 'relu1_2', 'relu2_1', 'relu2_2', 'relu3_1', 'relu3_2', 'relu3_3', 'relu3_4', 'relu4_1', 'relu4_2', 'relu4_3', 'relu4_4', 'relu5_1', 'relu5_2', 'relu5_3', 'relu5_4', 'relu6', 'relu7']
[]

Architecture Visualization

In [7]:
mh.visualize_net()
Out[7]:
_images/demos_xfer-modelhandler_12_0.svg

Feature Extraction

ModelHandler makes it easy to extract features from a dataset using a pretrained model.

By passing an MXNet DataIterator and a list of the layers to extract features from the get_layer_output() method will return a feature dictionary and an ordered list of labels.

In [8]:
imglist = [[0, 'test_images/accordion/accordion_1.jpg'], [0, 'test_images/accordion/accordion_2.jpg'], [0, 'test_images/accordion/accordion_3.jpg'],
           [0, 'test_images/accordion/accordion_4.jpg'], [0, 'test_images/accordion/accordion_5.jpg'], [1, 'test_images/ant/ant_1.jpg'],
           [1, 'test_images/ant/ant_2.jpg'], [1, 'test_images/ant/ant_3.jpg'], [1, 'test_images/ant/ant_4.jpg'], [1, 'test_images/ant/ant_5.jpg'],
           [2, 'test_images/anchor/anchor_1.jpg'], [2, 'test_images/anchor/anchor_2.jpg'], [2, 'test_images/anchor/anchor_3.jpg'],
           [2, 'test_images/anchor/anchor_4.jpg'], [2, 'test_images/anchor/anchor_5.jpg'], [3, 'test_images/airplanes/airplanes_1.jpg'],
           [3, 'test_images/airplanes/airplanes_2.jpg'], [3, 'test_images/airplanes/airplanes_3.jpg'], [3, 'test_images/airplanes/airplanes_4.jpg'],
           [3, 'test_images/airplanes/airplanes_5.jpg']]
iterator = mx.img.ImageIter(imglist=imglist, batch_size=4, path_root='', data_shape=(3, 224, 224))
INFO:root:Using 1 threads for decoding...
INFO:root:Set enviroment variable MXNET_CPU_WORKER_NTHREADS to a larger number to use more threads.
INFO:root:ImageIter: loading image list...
In [9]:
features, labels = mh.get_layer_output(data_iterator=iterator, layer_names=['fc6', 'fc8'])

print('Shape of output from fc6:', features['fc6'].shape)
print('Shape of output from fc8:', features['fc8'].shape)
print('Labels:', labels)

print('Subset of example feature:', features['fc8'][0,:100])
INFO:root:Extracting features from layers: fc6 fc8
INFO:root:Processed batch 1
INFO:root:Processed batch 2
INFO:root:Processed batch 3
INFO:root:Processed batch 4
INFO:root:Processed batch 5
Shape of output from fc6: (20, 4096)
Shape of output from fc8: (20, 1000)
Labels: [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3]
Subset of example feature: [-2.5173218  -3.0606608  -0.18566455 -1.195584   -1.6561157  -0.3466432
 -0.27523482 -3.0833778  -3.987122   -4.711858   -2.5934136  -1.1821984
  0.06423883 -3.5403426   0.36486915 -2.0515091  -3.3651357   0.47566187
 -0.93592185 -0.53005326 -2.7707744  -1.2674817  -2.3202353  -0.33125317
 -1.5847255  -4.2490544  -4.2170153  -5.6999183  -2.6653297  -3.4800928
 -4.693992   -3.4104934  -3.673527   -4.2224913  -0.29074478 -6.513745
 -4.4927287  -4.5361094  -2.549627    2.1703975  -1.3125131  -2.1347325
 -3.761081   -2.3712082  -3.8052034  -1.6259451  -1.68117    -1.481512
 -2.2081814  -1.5731778  -1.287838    1.2327844  -3.9466934  -3.9385183
 -0.87836707 -2.9489741  -3.4411037  -4.030957   -1.4967936  -3.7117271
 -2.2397022  -3.325867   -2.8145652  -0.63274264 -3.2671835  -1.8046627
 -3.0445974  -1.5151932  -2.7235372   6.5550556  -1.62281    -1.5104069
  1.3592944   0.8891826  -0.14048216 -1.0063077  -1.5578198  -0.45763612
 -2.0689113   3.2839453  -2.0749338  -4.179339   -0.49392343 -0.5244163
 -1.9723302  -0.07367857 -2.2878125  -0.96980214 -2.8648748  -2.6847577
 -3.5610118  -3.7286394  -4.5710897  -4.949738   -0.80546796 -5.8007493
 -3.260846   -6.434879   -4.7502995  -4.953493  ]

These features can be used for training a meta-model or for clustering as shown in this example.

In [10]:
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

reduced_data = PCA(n_components=2).fit_transform(features['fc8'])

kmeans = KMeans(init='k-means++', n_clusters=4, n_init=10)
kmeans.fit(reduced_data)

h=0.1

x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1
y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.imshow(Z, interpolation='nearest',
           extent=(xx.min(), xx.max(), yy.min(), yy.max()),
           cmap=plt.cm.Paired,
           aspect='auto', origin='lower')
plt.plot(reduced_data[:, 0], reduced_data[:, 1], 'k.', markersize=7)
Out[10]:
[<matplotlib.lines.Line2D at 0x11a199128>]
_images/demos_xfer-modelhandler_17_1.png

Model Manipulation

Modifying models in MXNet can be problematic because symbols are held as graphs. This means that modifiying the input of the model requires the graph to be reconstructed above any changes made. ModelHandler takes care of this for you which means that adding and removing layers from either end of a neural network can be done with 1-2 lines of code.

Remove layers

In [11]:
# Dropping 4 layers from the top of the layer hierarchy (where top = output)
mh.drop_layer_top(4)
print(mh.layer_names)
INFO:root:relu7, drop7, fc8, prob deleted from model top
['conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5', 'flatten_0', 'fc6', 'relu6', 'drop6', 'fc7']
In [12]:
# Dropping a layer from the bottom of the layer hierarchy (where bottom = input)
mh.drop_layer_bottom(1)
print(mh.layer_names)
INFO:root:conv1_1 deleted from model bottom
['relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5', 'flatten_0', 'fc6', 'relu6', 'drop6', 'fc7']

Add layers

Layers can be added to models by first defining the layer with an mxnet.symbol object and using add_layer_top() or add_layer_bottom() to add the layer to the model.

In [13]:
# define layer symbols
fc = mx.sym.FullyConnected(name='fullyconntected1', num_hidden=4)
softmax = mx.sym.SoftmaxOutput(name='softmax')
conv1 = mx.sym.Convolution(name='convolution1', kernel=(20,20), num_filter=64)

# Add layer to the bottom of the layer hierarchy (where bottom = input)
mh.add_layer_bottom([conv1])
# Add layer to the top of the layer hierarchy (where top = output)
mh.add_layer_top([fc, softmax])

print(mh.layer_names)
INFO:root:Added convolution1 to model bottom
INFO:root:Added fullyconntected1, softmax to model top
['convolution1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5', 'flatten_0', 'fc6', 'relu6', 'drop6', 'fc7', 'fullyconntected1', 'softmax']

Once a model has been modified, ModelHandler can be used to return an MXNet Module which can then be used for training.

There is an option to specify parameters which should stay fixed during training or should be randomised before training to allow different modes of transfer learning.

In [14]:
# In this case, the conv1_1 layer will stay fixed during training and the layers fc6 and fc7 will be randomised prior to training
mod = mh.get_module(iterator,
                    fixed_layer_parameters=mh.get_layer_parameters(['conv1_1']),
                    random_layer_parameters=mh.get_layer_parameters(['fc6', 'fc7']))
In [15]:
iterator.reset()
mod.fit(iterator, num_epoch=5)
WARNING:root:Already bound, ignoring bind()
/anaconda/envs/xfer_env/lib/python3.5/site-packages/mxnet/module/base_module.py:488: UserWarning: Parameters already initialized and force_init=False. init_params call ignored.
  allow_missing=allow_missing, force_init=force_init)
INFO:root:Epoch[0] Train-accuracy=0.100000
INFO:root:Epoch[0] Time cost=58.709
INFO:root:Epoch[1] Train-accuracy=0.250000
INFO:root:Epoch[1] Time cost=56.932
INFO:root:Epoch[2] Train-accuracy=0.250000
INFO:root:Epoch[2] Time cost=52.735
INFO:root:Epoch[3] Train-accuracy=0.250000
INFO:root:Epoch[3] Time cost=61.472
INFO:root:Epoch[4] Train-accuracy=0.250000
INFO:root:Epoch[4] Time cost=58.052

We now have a trained model ready to be used for prediction. This new model isn’t very useful but demonstrates the concept - to train a better model, use more data and experiment with combinations of fixed and random layers.

Now you have seen what ModelHandler can do, you should try it out for yourself!

For more details see the API docs

In [ ]:

Transfer learning for text categorization

In this notebook we showcase how to use Xfer to tackle a simple task of transfer learning for text categorization. To that end, we use the 20 newsgroups text dataset (http://qwone.com/~jason/20Newsgroups/) which is comprised of a collection of 20K newsgroups posts.

We use a Convolutional Neural Network (CNN) pre-trained on a subset of 13 classes (~12K instances). For the target task, we assume that we have access to a much smaller dataset with 100 posts from the remaining 7 categories. This is a common situation in many real world applications where the number of categories grows with time as we collect new data. In this case, we will always start with a low number of labelled instances for the new categories (this is the cold-start problem).

In this scenario, training a NN from scratch on the target task is not feasible: due to the scarcity of labeled instances (100 in this case) and the large number of parameters, the model will be prone to overfitting. Instead, we will use Xfer to transfer the knowledge from the source model and propose a data efficient classifier.

In [1]:
import warnings
warnings.filterwarnings('ignore')

import re
import numpy as np
import pickle
from collections import Counter
import itertools
import numpy as np
import mxnet as mx

import xfer

mx.random.seed(1)
np.random.seed(1)

%matplotlib inline
import matplotlib.pyplot as plt
In [2]:
config = {
    "source_vocab": "NewsGroupsSourceVocabulary.pickle",
    "model_prefix_source":'NewsGroupsSourceModel',
    "num_epoch_source": 100,
    "batch_size": 100,
    "num_train": 100,
    "context": mx.cpu(),
}

A) Load the pre-trained model

Before building the target dataset, we load the pre-trained model into a mxnet Module. In this case, the pre-trained model is a CNN that was trained with instances beloging to the following 13 categories that are not used in the target task:

  • comp.graphics
  • comp.os.ms-windows.misc
  • comp.sys.ibm.pc.hardware
  • comp.windows.x
  • misc.forsale
  • rec.motorcycles
  • rec.sport.baseball
  • sci.crypt
  • sci.med
  • sci.space
  • talk.politics.mideast
  • talk.politics.misc
  • talk.religion.misc
In [3]:
import zipfile
with zipfile.ZipFile("{}-{:04}.params.zip".format(config["model_prefix_source"], config["num_epoch_source"]),"r") as zip_ref:
    zip_ref.extractall()
In [4]:
sym, arg_params, aux_params = mx.model.load_checkpoint(config["model_prefix_source"], config["num_epoch_source"])
mx.viz.plot_network(sym)
Out[4]:
_images/demos_xfer-text-transfer_6_0.svg

B) Load the target dataset into an iterator

We define a helper function to download the dataset. The 7 classes used for the target task are the following:

  • alt.atheism
  • comp.sys.mac.hardware
  • rec.autos
  • rec.sport.hockey
  • sci.electronics
  • soc.religion.christian
  • talk.politics.guns
In [5]:
def download_dataset():
    from sklearn.datasets import fetch_20newsgroups
    categories=['alt.atheism',
                 'comp.sys.mac.hardware',
                 'rec.autos',
                 'rec.sport.hockey',
                 'sci.electronics',
                 'soc.religion.christian',
                 'talk.politics.guns'
                ]

    newsgroups_train = fetch_20newsgroups(subset='train',categories=categories)
    newsgroups_test = fetch_20newsgroups(subset='test',categories=categories)

    x_text = np.concatenate((newsgroups_train.data, newsgroups_test.data), axis=0)
    labels = np.concatenate((newsgroups_train.target, newsgroups_test.target))

    return x_text, labels, categories

In addition, we use two helper classes to create the corpus: * Vocabulary: It creates the lexicon for a given corpus. In addition, it provides a basic string cleaning function based on regular expressions. * Corpus: Given a corpus (text and labels) and a Vocabulary object, it converts the text instances into a numerical format. In particular, it uses the provided vocabulary object to tokenize and clean the text instances. Then, it pads the sentences using max_length/fix_length and the padding symbol defined in the vocabulary object. Finally, each token is encoded into a one-hot vector using the vocabulary. In addition, it provides a helper function to build the training and test sets.

In [6]:
class Vocabulary(object):
    def __init__(self, sentences, padding_word="</s>", unknown_word="</ukw>"):
        self.padding_word = padding_word
        self.unknown_word = unknown_word
        sentences = [self.clean_str(sent).split(" ") for sent in sentences]
        self.max_length = max(len(x) for x in sentences)
        self.word_counts = Counter(itertools.chain(*sentences))
        self.id2word = [x[0] for x in self.word_counts.most_common()]
        self.id2word.append(self.padding_word)
        self.id2word.append(self.unknown_word)
        self.word2id = {x: i for i, x in enumerate(self.id2word)}

        print('Vocabulary size', len(self.id2word))

    def clean_str(self, string):
        string = re.sub(r"[^A-Za-z0-9(),;!?\']", " ", string)
        contractions = ["\'t", "\'ve", "\'d", "\'s", "\'ll", "\'m", "\'er"]
        punctuations =  [",", ";", "!", "\?", "\)", "\("]
        for ee in contractions + punctuations:
            string = re.sub(r"{}".format(ee), " {} ".format(ee), string)
        return string.strip().lower()
In [7]:
class Corpus(object):
    def __init__(self, sentences, labels, vocabulary, max_length=None, fix_length=None):
        self.vocabulary = vocabulary
        self.max_length = max_length
        self.fix_length = fix_length
        sentences = [self.vocabulary.clean_str(sent).split(" ") for sent in sentences]
        sentences_padded = self._pad_sentences(sentences, self.vocabulary.padding_word, self.max_length, self.fix_length)
        x = []
        for sentence in sentences_padded:
            x.append([self.vocabulary.word2id.get(word, self.vocabulary.word2id[self.vocabulary.unknown_word]) for word in sentence])
        self.x = np.array(x)
        self.y = np.array(labels)

        print('Data shape:', self.x.shape)
        print('Vocabulary size', len(vocabulary.id2word))
        print('Maximum number words per sentence', self.x.shape[1])
        print('Number of labels', len(np.unique(self.y)))

    def _pad_sentences(self, sentences, padding_word="</s>", max_length=None, fix_length=None):
        sequence_length = max(len(x) for x in sentences)
        if max_length is not None:
            sequence_length = min(sequence_length, max_length)
        if fix_length is not None:
            sequence_length = fix_length
        padded_sentences = []
        for i in range(len(sentences)):
            sentence = sentences[i]
            if len(sentence) > sequence_length:
                sentence = sentence[0:sequence_length]
            num_padding = sequence_length - len(sentence)
            new_sentence = sentence + [padding_word] * num_padding
            padded_sentences.append(new_sentence)

        return padded_sentences

    def split_train_test(self, num_train):
        shuffle_indices = np.random.permutation(np.arange(len(self.y)))
        x_shuffled = self.x[shuffle_indices]
        y_shuffled = self.y[shuffle_indices]
        x_train, x_dev = x_shuffled[:num_train], x_shuffled[num_train:]
        y_train, y_dev = y_shuffled[:num_train], y_shuffled[num_train:]

        print('Train/Test split: %d/%d' % (len(y_train), len(y_dev)))
        print('Train shape:', x_train.shape)
        print('Test shape:', x_dev.shape)

        return x_train, x_dev, y_train, y_dev

We import the vocabulary object of the source model

In [8]:
with open(config["source_vocab"], 'rb') as handle:
    vocab_source = pickle.load(handle)

We build the dataset for the target task and split it in training/test sets. Notice that we use only 100 instances for the training set since we care about scenarios in which the target dataset is small compared to the source dataset.

In [9]:
x_text, labels, categories = download_dataset()
corpus = Corpus(x_text, labels, vocab_source, fix_length=arg_params["vocab_embed_weight"].shape[1])
x_train, x_dev, y_train, y_dev = corpus.split_train_test(config["num_train"])
Data shape: (6642, 300)
Vocabulary size 149371
Maximum number words per sentence 300
Number of labels 7
Train/Test split: 100/6542
Train shape: (100, 300)
Test shape: (6542, 300)

We finally load the training/test datasets into mxnet iterators:

In [10]:
train_iter = mx.io.NDArrayIter(x_train, y_train, config['batch_size'], shuffle=True)
val_iter = mx.io.NDArrayIter(x_dev, y_dev, config['batch_size'], shuffle=False)

C) How well we predict without repurposing?

We create a new mxnet Module, bind the training/test iterators and load the parameters of the pre-trained model.

In [11]:
mod = mx.mod.Module(symbol=sym,
                    context=config["context"],
                    data_names=['data'],
                    label_names=['softmax_label'])
mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
mod.set_params(arg_params, aux_params)
In [12]:
mod.save_checkpoint("NewsGroupsSourceModelCompressed",1)

If we use the pre-trained network directly, we obtain a poor performance since it was trained for a different task (different set of classes).

In [13]:
y_predicted = mod.predict(val_iter).asnumpy().argmax(axis=1)
print("Accuracy score before repurposing: {}".format(np.mean((y_predicted == y_dev))))
Accuracy score before repurposing: 0.07169061449098135

D) Repurposing

For this demo, we use one class of repurposers called MetaModelRepurposer. These are repurposers that use the pre-trained model as a feature extractor and fit a different model using the extracted features. In particular, we use an LrRepurposer which fits a logistic regression using the extracted features.

In [14]:
lr_repurposer = xfer.LrRepurposer(mod, ["dropout0"])
lr_repurposer.repurpose(train_iter)
y_predicted = lr_repurposer.predict_label(val_iter)
In [15]:
print("Accuracy score after repurposing: {}".format(np.mean((y_predicted == y_dev))))
Accuracy score after repurposing: 0.5889636196881688

Training the model from scratch results in an accuracy of approximately 0.5 and it is much slower to train having to resort to GPUs to speed up the process.

Finally, we show the full pipeline to categorize a new test example. In this case, we have used the first paragraphs of the wikipedia article for “Automotive Electronics”. (Feel free to replace it with your own example!!!)

In [16]:
test_example = ["""
Automotive electronics are electronic systems used in vehicles, including engine management, ignition, radio, carputers, telematics, in-car entertainment systems and others. Ignition, engine, and transmission electronics are also found in trucks, motorcycles, off-road vehicles, and other internal combustion-powered machinery such as forklifts, tractors, and excavators. Related elements for control of relevant electrical systems are found on hybrid vehicles and electric cars as well.

Electronic systems have become an increasingly large component of the cost of an automobile, from only around 1% of its value in 1950 to around 30% in 2010.[1]

The earliest electronics systems available as factory installations were vacuum tube car radios, starting in the early 1930s. The development of semiconductors after WWII greatly expanded the use of electronics in automobiles, with solid-state diodes making the automotive alternator the standard after about 1960, and the first transistorized ignition systems appearing about 1955.
"""]
In [17]:
corpus = Corpus(test_example, [1], vocab_source, fix_length=arg_params["vocab_embed_weight"].shape[1])
Data shape: (1, 300)
Vocabulary size 149371
Maximum number words per sentence 300
Number of labels 1
In [18]:
test_example_iter = mx.io.NDArrayIter(corpus.x, corpus.y, len(corpus.y), shuffle=False)
In [19]:
y_predicted_prob = lr_repurposer.predict_probability(test_example_iter)
In [20]:
plt.barh(list(range(y_predicted_prob.shape[1])), y_predicted_prob[0,:], align='center', height=0.3)
plt.yticks(list(range(y_predicted_prob.shape[1])), categories)
plt.title("Estimated probability for each of the categories")
plt.grid(True)
_images/demos_xfer-text-transfer_35_0.png

We can see that the main two identified topics are electronics and autos, which is what we would expect given the title of the article.

In [ ]:

Xfer with HyperParameter Optimization

When training neural networks, hyperparameters may have to be tuned to improve accuracy metrics. The purpose of this notebook is to demonstrate how to do HyperParameter Optimization (HPO) when repurposing neural networks in Xfer. Here, we use emukit to do HPO through Bayesian Optimization.

Note that depending on number of epochs, the target data set and transferability between source and target tasks, the default hyperparameter settings in Xfer could give desired results and HPO may not be required. If someone wants to try HPO, this notebook shows how to do it using emukit.

In [1]:
import warnings
warnings.filterwarnings("ignore")

import logging
logging.disable(logging.WARNING)

import gc
import glob
import os
import random
import time

import emukit
import mxnet as mx
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
import xfer

from matplotlib import pylab as plt
%matplotlib inline

Utility methods

In [2]:
def set_random_seeds():
    seed = 1234
    np.random.seed(seed)
    mx.random.seed(seed)
    random.seed(seed)

def get_iterators(data_dir, train_size=0.3, validation_size=0.3, test_size=0.4, batch_size=1,
                  label_name='softmax_label', data_name='data', random_state=1):
    """
    Method to create iterators from data stored in a folder with the following structure:
    /data_dir
        /class1
            class1_img1 ... class1_imgN
        /class2
            class2_img1 ... class2_imgN
        ...
        /classN
    """
    set_random_seeds()
    # Assert data_dir exists
    if not os.path.isdir(data_dir):
        raise ValueError('Directory not found: {}'.format(data_dir))
    # Get class names
    classes = [x.split('/')[-1] for x in glob.glob(data_dir+'/*')]
    classes.sort()
    fnames = []
    labels = []
    for c in classes:
            # Get all the image filenames and labels
            images = glob.glob(data_dir+'/'+c+'/*')
            images.sort()
            fnames += images
            labels += [c]*len(images)
    # Create label2id mapping
    id2label = dict(enumerate(set(labels)))
    label2id = dict((v,k) for k, v in id2label.items())

    # Split training(train+validation) and test data
    sss = StratifiedShuffleSplit(n_splits=2, test_size=None, train_size=train_size+validation_size, random_state=random_state)
    train_indices, test_indices = next(sss.split(labels, labels))

    # Training data (train+validation)
    train_validation_images = []
    train_validation_labels = []
    for idx in train_indices:
        train_validation_images.append([label2id[labels[idx]], fnames[idx]])
        train_validation_labels.append(label2id[labels[idx]])

    # Test data
    test_images = []
    test_labels = []
    for idx in test_indices:
        test_images.append([label2id[labels[idx]], fnames[idx]])
        test_labels.append(label2id[labels[idx]])

    # Separate validation set and train set
    train_percent = train_size / (train_size+validation_size)
    sss_1 = StratifiedShuffleSplit(n_splits=2, test_size=None, train_size=train_percent, random_state=random_state)
    train_indices, validation_indices = next(sss_1.split(train_validation_labels, train_validation_labels))
    train_images = []
    train_labels = []
    for idx in train_indices:
        train_images.append(train_validation_images[idx])
        train_labels.append(train_validation_labels[idx])
    validation_images = []
    validation_labels = []
    for idx in validation_indices:
        validation_images.append(train_validation_images[idx])
        validation_labels.append(train_validation_labels[idx])

    # Create iterators
    train_iterator = mx.image.ImageIter(batch_size, (3,224,224), imglist=train_images, label_name=label_name,
                                        data_name=data_name, path_root='')
    validation_iterator = mx.image.ImageIter(batch_size, (3,224,224), imglist=validation_images, label_name=label_name,
                                             data_name=data_name, path_root='')
    train_validation_iterator = mx.image.ImageIter(batch_size, (3,224,224), imglist=train_validation_images,
                                                   label_name=label_name, data_name=data_name, path_root='')
    test_iterator = mx.image.ImageIter(batch_size, (3,224,224), imglist=test_images, label_name=label_name,
                                       data_name=data_name, path_root='')

    return train_iterator, validation_iterator, train_validation_iterator, test_iterator, id2label

def get_labels(iterator):
    """ Return labels from data iterator """
    iterator.reset()
    labels = []
    while True:
        try:
            labels = labels + iterator.next().label[0].asnumpy().astype(int).tolist()
        except StopIteration:
            break
    return labels

def get_images(iterator):
    """ Return list of image arrays from iterator """
    iterator.reset()
    images = []
    while True:
        try:
            batch = iterator.next().data[0]
            for n in range(batch.shape[0]):
                images.append(batch[n])
        except StopIteration:
            break
    return images

def show_predictions(predictions, images, id2label, figsize=(15,1.5), fontsize=12, n=None):
    """ Display images along with predicted labels """
    n = len(images) if n is None else n
    num_rows = 1
    plt.figure(figsize=figsize)
    for cc in range(n):
        plt.subplot(num_rows,n,1+cc)
        plt.tick_params(
                        axis='both',          # changes apply to the x-axis
                        which='both',      # both major and minor ticks are affected
                        bottom=False,      # ticks along the bottom edge are off
                        top=False,         # ticks along the top edge are off
                        left=False,
                        labelleft=False,
                        labelbottom=False) # labels along the bottom edge are off
        plt.imshow(np.uint8(images[cc].asnumpy().transpose((1,2,0))))
        plt.title(id2label[predictions[cc]].split(',')[0], fontsize=fontsize)
        plt.axis

A) Source model

In Transfer Learning, the model from which knowledge is transferred is called the source model. Here, we use vgg19 model from MXNet Model Zoo as the source model. vgg19 is a convolutional neural network trained on the ImageNet dataset which contains 1 million natural images categorized into 1000 classes.

In [3]:
# Download source model
path = 'http://data.mxnet.io/models/imagenet/'
[mx.test_utils.download(path+'vgg/vgg19-0000.params'), mx.test_utils.download(path+'vgg/vgg19-symbol.json')]

# Load source model from file
source_model = mx.module.Module.load('vgg19', 0, label_names=['prob_label'])

B) Target data

Target data is a much smaller dataset with 40 images categorized into 4 classes from a different domain (hand-drawn sketches). We’ll demonstrate how to use Xfer along with HPO to learn classifying this target data by transferring knowledge from vgg19 model.

In [4]:
TARGET_DATA_DIR = 'test_sketches'
set_random_seeds()
train_iterator, validation_iterator, train_validation_iterator, test_iterator, id2label = get_iterators(TARGET_DATA_DIR)
train_labels = get_labels(train_iterator)
validation_labels = get_labels(validation_iterator)
test_labels = get_labels(test_iterator)
train_validation_labels = get_labels(train_validation_iterator)

print('Number of train images: {}'.format(len(train_labels)))
print('Number of validation images: {}'.format(len(validation_labels)))
print('Number of test images: {}'.format(len(test_labels)))
Number of train images: 12
Number of validation images: 12
Number of test images: 16

How are these data sets used?

During HPO, we train the model using the training data and evaluate the hyperparameters with the validation data. Once we find an optimized learning rate, we do a final train of the model using both the training data and the validation data, and report precision on our withheld testing data.

C) Repurpose without HPO

This section demonstrates how to repurpose the source model to target data with default hyperparameters.

In [5]:
# Default optimizer, learning rate and number of epochs used in Xfer to train neural network
DEFAULT_OPTIMIZER = 'sgd'
DEFAULT_LEARNING_RATE = 0.01
DEFAULT_OPTIMIZER_PARAMS = {'learning_rate': DEFAULT_LEARNING_RATE}
DEFAULT_NUM_EPOCHS = 4

TARGET_CLASS_COUNT = 4  # 4 classes of sketch images are used for this demo (car, cheese, house and tree)
CONTEXT_FUNCTION = mx.cpu # 'mx.gpu' or 'mx.cpu' (MXNet context function to train neural network)

# Layers to freeze or randomly initialize during neural network training.
# For this demo, we freeze the first 16 convolutional layers transferred from 'vgg19' model.
FIXED_LAYERS = ['conv1_1','conv1_2','conv2_1','conv2_2','conv3_1','conv3_2','conv3_3','conv3_4',
                'conv4_1','conv4_2','conv4_3','conv4_4','conv5_1','conv5_2','conv5_3','conv5_4']
RANDOM_LAYERS = []

# Method to repurpose neural network with given hyperparameters
def train_and_predict(source_model, train_data_iterator, test_data_iterator, optimizer, optimizer_params):
    set_random_seeds()
    repurposer = xfer.NeuralNetworkRandomFreezeRepurposer(source_model,
                                                          optimizer=optimizer,
                                                          optimizer_params=optimizer_params,
                                                          target_class_count=TARGET_CLASS_COUNT,
                                                          fixed_layers=FIXED_LAYERS,
                                                          random_layers=RANDOM_LAYERS,
                                                          context_function=CONTEXT_FUNCTION,
                                                          num_epochs=DEFAULT_NUM_EPOCHS)
    repurposer.repurpose(train_data_iterator)
    predictions = repurposer.predict_label(test_data_iterator)
    return predictions

# Train neural network with default hyperparameters
predictions = train_and_predict(source_model, train_validation_iterator, test_iterator,
                                   DEFAULT_OPTIMIZER, DEFAULT_OPTIMIZER_PARAMS)
precision_default = np.mean(predictions == test_labels)
print('Trained neural network with default hyperparameters. Precision: {}'.format(precision_default))

# Display test images and predictions
test_images = get_images(test_iterator)
show_predictions(predictions, test_images, id2label)
Trained neural network with default hyperparameters. Precision: 0.25
_images/demos_xfer-hpo_10_1.png

Note that the quality of the results may vary due to randomness e.g. in the initialization of the neural network weights. However, the idea is that, since the precision varies with different choices of the learning rate and in certain cases the default learning rate might not be the best. We therefore wish to find a good learning rate automatically rather than with trial-and-error. HPO helps us achieve this, as demonstrated below.

D) Repurpose with HPO: Optimizing learning rate

i) Declare the hyperparameter to optimize and its domain

In [6]:
# Learning rate is the hyperparameter we will optimize here
# We allow emukit to operate in a normalized domain [0,1] and map the value to a desired log scale inside our objective function
# This helps emukit to learn a smooth underlying function in fewer iterations
from emukit.core import ParameterSpace, ContinuousParameter

emukit_param_space = ParameterSpace([ContinuousParameter('learning_rate', 0, 1)])

# Method to map a value given by emukit in [0, 1] to a desired range of learning rate
def map_learning_rate(source_value):
    if(source_value < 0 or source_value > 1):
        raise ValueError('source_value must be in the range [0,1]')

    # We explore learning rate in the range [1e-6 , 1e-1]. You can choose a different range to explore
    # Log scale is used here because it is an intuitive way to explore learning rates
    # For example, if 1e-2 doesn't work, we tend to explore 1e-3 or 1e-1 which is a jump in log scale
    log_learning_rate_start = -6  # 1e-6 in linear scale
    log_learning_rate_end = -1  # 1e-1 in linear scale

    log_span = abs(log_learning_rate_end - log_learning_rate_start)
    log_mapped_value = log_learning_rate_start + (source_value * log_span)
    mapped_value = 10 ** log_mapped_value  # Convert from log scale to linear scale
    return mapped_value

ii) Define an objective function to optimize the hyperparameter

In [7]:
def get_hyperparameters_from_config(config):
    """
    Extract hyperparameters from input configuration provided by emukit.
    Refer the caller 'hpo_objective_function' for more details.
    """
    learning_rate = map_learning_rate(config[0]) # Map learning_rate value given by emukit to the desire range
    optimizer = DEFAULT_OPTIMIZER  # Using default optimizer here i.e. 'sgd'
    return optimizer, learning_rate

def hpo_objective_function(config_matrix):
    """
    Objective function to optimize the hyperparameters for
    This method is called by emukit during the optimization loop
    to get outputs of objective function for different input configurations

    We train a neural network with given hyperparameters and return (1-precision) on validation data as the output
    You can choose to optimize for a different measure and create the objective function accordingly
    Here, we consider one hyperparameter (learning_rate) to optimize precision

    Note: config_matrix has m rows and n columns
    m denotes the number of experiments to run i.e. each row would contain input configuration to run one experiment
    n denotes the number of hyperparameters (e.g. 2 columns for learning_rate and batch_size)
    """
    # Output of objective function for each input configuration
    function_output = np.zeros((config_matrix.shape[0], 1))

    # For each input configuration, train a nerual network and calculate accuracy on validation data
    for idx, config in enumerate(config_matrix):
        optimizer, learning_rate = get_hyperparameters_from_config(config)

        # Train neural network with the mapped learning rate and get predictions on validation data
        predictions = train_and_predict(source_model=source_model,
                                           train_data_iterator=train_iterator,
                                           test_data_iterator=validation_iterator,
                                           optimizer = optimizer,
                                           optimizer_params = {'learning_rate': learning_rate})

        # Calculate precision on validation set and update function_output with (1-precision)
        precision = np.mean(predictions == validation_labels)
        function_output[idx][0] = (1.0 - precision)  # (1-precision) to keep a minimization objective
        print('learning_rate: {}. optimizer: {}. precision: {}'.format(learning_rate, optimizer, precision))
        gc.collect()

    return function_output

iii) Define a model using GPy for emukit to optimize

In [8]:
# Notice that our initial learning rate value corresponds to 0.8 on [0, 1] scale
map_learning_rate(0.8) == DEFAULT_LEARNING_RATE
Out[8]:
True
In [9]:
set_random_seeds()

import GPy
X = np.array([[0.8]])
Y = np.array([[1.0 - precision_default]])

gpy_model = GPy.models.GPRegression(X, Y)

iv) Initialize a Bayesian optimizer using emukit

In [10]:
from emukit.bayesian_optimization.loops import BayesianOptimizationLoop
from emukit.model_wrappers import GPyModelWrapper

emukit_model = GPyModelWrapper(gpy_model)
hyperparameter_optimizer = BayesianOptimizationLoop(emukit_param_space, emukit_model)

v) Run optimization loop to identify a better learning rate for our objective function

In [11]:
NUM_ITERATIONS_TO_RUN = 7

hyperparameter_optimizer.run_loop(hpo_objective_function, NUM_ITERATIONS_TO_RUN)
results = hyperparameter_optimizer.get_results()
Optimization restart 1/1, f = 1.1312564608151783
learning_rate: 1e-06. optimizer: sgd. precision: 0.5
Optimization restart 1/1, f = 0.9815790760721594
learning_rate: 1e-06. optimizer: sgd. precision: 0.5
Optimization restart 1/1, f = -6.963249113616735
learning_rate: 1.604696232278503e-06. optimizer: sgd. precision: 0.5
Optimization restart 1/1, f = -9.048731845639432
learning_rate: 1.2681487716415876e-06. optimizer: sgd. precision: 0.5
Optimization restart 1/1, f = -16.18851622078782
learning_rate: 1.260324119615607e-06. optimizer: sgd. precision: 0.5
Optimization restart 1/1, f = -24.10206487648798
learning_rate: 6.481443855100692e-06. optimizer: sgd. precision: 0.5833333333333334
Optimization restart 1/1, f = -20.776248579222724
learning_rate: 0.00011946457911772839. optimizer: sgd. precision: 1.0
Optimization restart 1/1, f = -21.117596675753305

Optimized learning rate

In [12]:
# Take hyperparameters that minimized the objective function output
x_best = results.minimum_location
optimized_learning_rate = map_learning_rate(x_best[0])
precision = 1.0 - results.minimum_value  # Objective was to minimize 1-precision
print('Optimized learning rate: {}. Precision on validation data: {}'.format(optimized_learning_rate, precision))
Optimized learning rate: 0.00011946457911772839. Precision on validation data: 1.0
In [13]:
gpy_model
Out[13]:

Model: GP regression
Objective: -21.117596675753305
Number of Parameters: 3
Number of Optimization Parameters: 3
Updates: True

GP_regression. valueconstraintspriors
rbf.variance 0.24506271895881576 +ve
rbf.lengthscale 0.2097337210916541 +ve
Gaussian_noise.variance1.5082312531001544e-15 +ve

Note that the optimized learning rate found is from the iterations run so far. Based on time available, one can run more iterations which may help in obtaining a more optimized learning rate.

Precision on test data with optimal learning rate

In [14]:
# Train neural network with optimal learning rate and get predictions on test data
# Along with training data, validation data is also used to train the final model
predictions = train_and_predict(source_model=source_model,
                                   train_data_iterator=train_validation_iterator,  # train on (train + validation) set
                                   test_data_iterator=test_iterator,  # predict on test data
                                   optimizer = DEFAULT_OPTIMIZER,
                                   optimizer_params = {'learning_rate': optimized_learning_rate})
precision_optimized = np.mean(predictions == test_labels)
print('Optimized learning rate: {}. Precision on test data: {}'.format(optimized_learning_rate, precision_optimized))
show_predictions(predictions, test_images, id2label)
Optimized learning rate: 0.00011946457911772839. Precision on test data: 0.9375
_images/demos_xfer-hpo_29_1.png

E) Repurpose with HPO: Optimizing multiple hyperparameters

The following section can be used for reference when someone wants to optimize multiple hyperparameters to repurpose models using Xfer. Here, the hyperparameters chosen are: 1. Optimizer for neural network (sgd or adam). 2. Learning rate.

Note that running more iterations could be useful here because there are more combination of values to explore.

# Choose the hyperparameters and specify the domain
from emukit.core import CategoricalParameter, OneHotEncoding
p1 = ContinuousParameter('learning_rate', 0, 1)
p2 = CategoricalParameter('optimizer', OneHotEncoding(['sgd', 'adam']))
space_with_two_params = ParameterSpace([p1, p2])

# Override this method to extract the optimizer in addition to learning_rate from emukit config
def get_hyperparameters_from_config(config):
    """
    Extract hyperparameters from input configuration provided by emukit.
    Refer the caller 'hpo_objective_function' for more details.
    """
    learning_rate = map_learning_rate(config[0]) # Map learning_rate value given by emukit to the desire range
    optimizer = p2.encoding.get_category(config[1:])  # Using optimizer given by emukit
    return optimizer, learning_rate

# Initialize emukit with new domain, create the model and run optimization
set_random_seeds()

X = np.array([[0.8] + p2.encoding.get_encoding(DEFAULT_OPTIMIZER)])
Y = np.array([[1.0 - precision_default]])
gpy_model = GPy.models.GPRegression(X, Y)

emukit_model = GPyModelWrapper(gpy_model)
hyperparameter_optimizer2 = BayesianOptimizationLoop(space_with_two_params, emukit_model)

hyperparameter_optimizer2.run_loop(hpo_objective_function, NUM_ITERATIONS_TO_RUN)
results2 = hyperparameter_optimizer2.get_results()

# Take hyperparameters that minimized the objective function output
x_best2 = results2.minimum_location
best_learning_rate = map_learning_rate(x_best2[0])
best_optimizer = p2.encoding.get_category(x_best2[1:])
precision2 = 1.0 - results2.minimum_value  # Objective was to minimize 1-precision
print('Optimized learning rate: {}. Optimizer: {}. Precision on validation data: {}'
      .format(best_learning_rate, best_optimizer, precision2))

# Train neural network with optimal (learning rate, optimizer) and get predictions on test data
# Along with training data, validation data is also used to train the final model
predictions2 = train_and_predict(source_model=source_model,
                                   train_data_iterator=train_validation_iterator,  # train on (train + validation) set
                                   test_data_iterator=test_iterator,  # predict on test data
                                   optimizer = best_optimizer,
                                   optimizer_params = {'learning_rate': best_learning_rate})
precision_optimized2 = np.mean(predictions2 == test_labels)
print('Precision on test data after optimization: ' + str(precision_optimized2))
show_predictions(predictions2, test_images, id2label)
In [ ]:

Using ModelHandler with Gluon

MXNet’s Gluon framework allows Neural Networks to be written under an imperative paradigm. ModelHandler is currently based around the symbolic graph implementation of MXNet and as a result, models written in Gluon cannot directly be used.

If the model is written in Gluon using HybridBlocks (i.e. if the network consists entirely of predefined MXNet layers) then the model can be compliled as a symbolic graph using the command .hybridize().

The Gluon defined model can then be converted to a symbol and set of parameters which can then be loaded as an MXNet Module and used with ModelHandler.

In this demo, we will show that you can define a model in Gluon using code from the Gluon MNIST demo and then convert it to a Module and use ModelHandler.

In [1]:
import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag

import os

# Fixing the random seed
mx.random.seed(42)

Train model in Gluon

Define model in Gluon

In [2]:
mnist = mx.test_utils.get_mnist()
In [3]:
batch_size = 100
train_data = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
In [4]:
# define network
net = nn.HybridSequential()
with net.name_scope():
    net.add(nn.Dense(128, activation='relu'))
    net.add(nn.Dense(64, activation='relu'))
    net.add(nn.Dense(10))

net.hybridize()
In [5]:
gpus = mx.test_utils.list_gpus()
ctx =  [mx.gpu()] if gpus else [mx.cpu(0), mx.cpu(1)]
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.02})

Training

In [6]:
%%time
epoch = 10
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss()
for i in range(epoch):
    # Reset the train data iterator.
    train_data.reset()
    # Loop over the train data iterator.
    for batch in train_data:
        # Splits train data into multiple slices along batch_axis
        # and copy each slice into a context.
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        # Splits train labels into multiple slices along batch_axis
        # and copy each slice into a context.
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                z = net(x)
                # Computes softmax cross entropy loss.
                loss = softmax_cross_entropy_loss(z, y)
                # Backpropagate the error for one iteration.
                loss.backward()
                outputs.append(z)
        # Updates internal evaluation
        metric.update(label, outputs)
        # Make one step of parameter update. Trainer needs to know the
        # batch size of data to normalize the gradient by 1/batch_size.
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()
    print('training acc at epoch {}: {}={}'.format(i, name, acc))
training acc at epoch 0: accuracy=0.7816
training acc at epoch 1: accuracy=0.89915
training acc at epoch 2: accuracy=0.9134666666666666
training acc at epoch 3: accuracy=0.9225833333333333
training acc at epoch 4: accuracy=0.9305666666666667
training acc at epoch 5: accuracy=0.9366666666666666
training acc at epoch 6: accuracy=0.9418166666666666
training acc at epoch 7: accuracy=0.94585
training acc at epoch 8: accuracy=0.9495333333333333
training acc at epoch 9: accuracy=0.9532333333333334
CPU times: user 43.1 s, sys: 4.18 s, total: 47.3 s
Wall time: 29.7 s

Testing

In [7]:
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
# Reset the validation data iterator.
val_data.reset()
# Loop over the validation data iterator.
for batch in val_data:
    # Splits validation data into multiple slices along batch_axis
    # and copy each slice into a context.
    data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
    # Splits validation label into multiple slices along batch_axis
    # and copy each slice into a context.
    label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
    outputs = []
    for x in data:
        outputs.append(net(x))
    # Updates internal evaluation
    metric.update(label, outputs)
print('validation acc: {}={}'.format(*metric.get()))
assert metric.get()[1] > 0.94
validation acc: accuracy=0.9527

Convert Gluon model to Module

Adapted from snippet found `here <https://github.com/apache/incubator-mxnet/issues/9374>`__

From the Gluon model, the symbol and parameters are extracted and used to define an Module object.

In [8]:
def block2symbol(block):
    data = mx.sym.Variable('data')
    sym = block(data)
    args = {}
    auxs = {}
    for k, v in block.collect_params().items():
        args[k] = mx.nd.array(v.data().asnumpy())
        auxs[k] = mx.nd.array(v.data().asnumpy())
    return sym, args, auxs
In [9]:
def symbol2mod(sym, args, auxs, data_iter):
    mx_sym = mx.sym.SoftmaxOutput(data=sym, name='softmax')
    model = mx.mod.Module(symbol=mx_sym, context=mx.cpu(),
                          label_names=['softmax_label'])
    model.bind( data_shapes = data_iter.provide_data,
                label_shapes = data_iter.provide_label )
    model.set_params(args, auxs)
    return model
In [10]:
sym_params = block2symbol(net)
In [11]:
mod = symbol2mod(*sym_params, train_data)

Alternative Method

Serialise Gluon model to file using .export().

Load the serialised model as an MXNet Module with Module.load() so that xfer can be used.

In [12]:
# model_name = 'gluon-model'
# net.export(model_name)

# mod = mx.mod.Module.load(model_name, 0, label_names=[])
# os.remove(model_name+'-symbol.json')
# os.remove(model_name+'-0000.params')

Apply ModelHandler

Now we can load the model into ModelHandler and use it to visualise the model, return the layer names, extract features and much more!

In [13]:
import xfer
In [14]:
mh = xfer.model_handler.ModelHandler(mod)
In [15]:
# Show architecture of model
mh.visualize_net()
Out[15]:
_images/demos_xfer-gluon-with-modelhandler_21_0.svg
In [16]:
mh.layer_names
Out[16]:
['hybridsequential0_dense0_fwd',
 'hybridsequential0_dense0_relu_fwd',
 'hybridsequential0_dense1_fwd',
 'hybridsequential0_dense1_relu_fwd',
 'hybridsequential0_dense2_fwd',
 'softmax']
In [17]:
# Get output from intermediate layers of the model
mh.get_layer_output(train_data, ['hybridsequential0_dense1_fwd'])
Out[17]:
(OrderedDict([('hybridsequential0_dense1_fwd',
               array([[ 1.93497527e+00,  2.40295935e+00,  1.16074115e-01, ...,
                       -4.74348217e-02, -3.76087427e-03,  1.39985621e+00],
                      [ 2.15391922e+00,  1.97971451e+00,  4.61517543e-01, ...,
                        2.28680030e-01, -8.29489648e-01,  9.69915807e-01],
                      [ 2.06626105e+00,  4.06703472e+00,  7.65578270e-01, ...,
                        3.74726385e-01,  1.03201318e+00, -5.41208267e-01],
                      ...,
                      [ 2.55671740e+00,  4.17255354e+00,  5.60081601e-01, ...,
                        5.68660349e-02, -1.58825326e+00,  1.59997427e+00],
                      [ 2.30686831e+00,  2.34434009e+00, -5.84015131e-01, ...,
                        3.16424906e-01, -1.08476102e-01,  6.86561584e-01],
                      [ 9.71719801e-01,  1.08340001e+00,  1.72682357e+00, ...,
                       -2.98302293e-01,  1.48507738e+00, -7.40276098e-01]], dtype=float32))]),
 array([8, 8, 6, ..., 8, 8, 4]))
In [18]:
mh.get_layer_type('hybridsequential0_dense0_relu_fwd')
Out[18]:
'Activation'
In [19]:
# Add/Remove layers from model output
mh.drop_layer_top(2)
mh.add_layer_top([mx.sym.FullyConnected(num_hidden=30),
                  mx.sym.Activation(act_type='relu'),
                  mx.sym.FullyConnected(num_hidden=10),
                  mx.sym.SoftmaxOutput()])
mh.visualize_net()
Out[19]:
_images/demos_xfer-gluon-with-modelhandler_25_0.svg
In [20]:
# Add/remove layers from model input
mh.add_layer_bottom([mx.sym.Convolution(kernel=(2,2), num_filter=10)])
mh.visualize_net()
Out[20]:
_images/demos_xfer-gluon-with-modelhandler_26_0.svg
In [ ]:

Using Gluon with Xfer

This notebook demonstrates how to use neural networks defined and trained with Gluon as source models for Transfer Learning with Xfer.

TL;DR Gluon models can be used with Xfer provided they use HybridBlocks so that the symbol can be extracted.

This demo is a dummy example where a CNN source model is trained on MNIST using Gluon and then repurposed for MNIST again. This is obviously redundant but shows the steps required to use Gluon with Xfer.

In [1]:
import numpy as np
import mxnet as mx
from mxnet import nd, autograd, gluon
mx.random.seed(1)

import time
from sklearn.metrics import classification_report
from scipy import io as scipyio
import urllib.request
import zipfile
import os
import logging

import xfer

Train CNN with gluon

Using code taken from The Straight Dope

In [2]:
ctx = mx.cpu()
In [3]:
batch_size = 64
num_inputs = 784
num_outputs = 10
def transform(data, label):
    return nd.transpose(data.astype(np.float32), (2,0,1))/255, label.astype(np.float32)
train_data = gluon.data.DataLoader(gluon.data.vision.MNIST(train=True, transform=transform),
                                      batch_size, shuffle=True)
test_data = gluon.data.DataLoader(gluon.data.vision.MNIST(train=False, transform=transform),
                                     batch_size, shuffle=False)
In [4]:
num_fc = 512
net = gluon.nn.HybridSequential()
with net.name_scope():
    net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    # The Flatten layer collapses all axis, except the first one, into one axis.
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_fc, activation="relu"))
    net.add(gluon.nn.Dense(num_outputs))
In [5]:
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
In [6]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
In [7]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1})
In [8]:
net.hybridize()
In [9]:
def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]
In [10]:
epochs = 1
smoothing_constant = .01

for e in range(epochs):
    start_time_train = time.time()
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        trainer.step(data.shape[0])

        ##########################
        #  Keep a moving average of the losses
        ##########################
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (curr_loss if ((i == 0) and (e == 0))
                       else (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss)
    end_time_train = time.time()

    start_time_eval = time.time()
    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    end_time_eval = time.time()

    epoch_time = end_time_train - start_time_train
    eval_time = end_time_eval - start_time_eval
    print("Epoch {}.\nLoss: {}, Train_acc {}, Test_acc {}, Epoch_time {}, Eval_time {}".format(e, moving_loss, train_accuracy, test_accuracy, epoch_time, eval_time))
Epoch 0.
Loss: 0.11107716270094219, Train_acc 0.9745833333333334, Test_acc 0.9742, Epoch_time 54.26378679275513, Eval_time 23.154165029525757

Load MNIST dataset

Load MNIST into data iterators

In [19]:
mnist = mx.test_utils.get_mnist()
In [20]:
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)

Convert Gluon model net to Module

Adapted from snippet found `here <https://github.com/apache/incubator-mxnet/issues/9374>`__

From the Gluon model, the symbol and parameters are extracted and used to define an Module object.

In [21]:
def block2symbol(block):
    data = mx.sym.Variable('data')
    sym = block(data)
    args = {}
    auxs = {}
    for k, v in block.collect_params().items():
        args[k] = mx.nd.array(v.data().asnumpy())
        auxs[k] = mx.nd.array(v.data().asnumpy())
    return sym, args, auxs
In [22]:
def symbol2mod(sym, args, auxs, data_iter):
    mx_sym = mx.sym.SoftmaxOutput(data=sym, name='softmax')
    model = mx.mod.Module(symbol=mx_sym, context=mx.cpu(),
                          label_names=['softmax_label'])
    model.bind( data_shapes = data_iter.provide_data,
                label_shapes = data_iter.provide_label )
    model.set_params(args, auxs)
    return model
In [23]:
sym_params = block2symbol(net)
In [24]:
net_mod = symbol2mod(*sym_params, train_iter)

Alternative Method

Serialise Gluon model to file using .export().

Load the serialised model as an MXNet Module with Module.load() so that xfer can be used.

In [25]:
# model_name = 'gluon-model'
# net.export(model_name)

# mod = mx.mod.Module.load(model_name, 0, label_names=[])
# os.remove(model_name+'-symbol.json')
# os.remove(model_name+'-0000.params')

Inspect Module

In [26]:
mh = xfer.model_handler.ModelHandler(net_mod)
In [27]:
mh.layer_names
Out[27]:
['hybridsequential0_conv0_fwd',
 'hybridsequential0_conv0_relu_fwd',
 'hybridsequential0_pool0_fwd',
 'hybridsequential0_conv1_fwd',
 'hybridsequential0_conv1_relu_fwd',
 'hybridsequential0_pool1_fwd',
 'hybridsequential0_flatten0_reshape0',
 'hybridsequential0_dense0_fwd',
 'hybridsequential0_dense0_relu_fwd',
 'hybridsequential0_dense1_fwd',
 'softmax']

Neural Network Repurposer

In [28]:
repFT = xfer.NeuralNetworkFineTuneRepurposer(source_model=net_mod,
                                             transfer_layer_name='hybridsequential0_dense0_relu_fwd',
                                             target_class_count=26, num_epochs=2)
In [29]:
repFT.repurpose(train_iter)
WARNING:root:Already bound, ignoring bind()
/anaconda/envs/xfer-env/lib/python3.6/site-packages/mxnet/module/base_module.py:488: UserWarning: Parameters already initialized and force_init=False. init_params call ignored.
  allow_missing=allow_missing, force_init=force_init)
In [30]:
predictionsFT = repFT.predict_label(val_iter)
In [32]:
print(classification_report(mnist['test_label'], predictionsFT,
      digits=3))
             precision    recall  f1-score   support

          0      0.960     0.990     0.975       980
          1      0.984     0.989     0.986      1135
          2      0.968     0.965     0.967      1032
          3      0.967     0.972     0.970      1010
          4      0.977     0.971     0.974       982
          5      0.975     0.975     0.975       892
          6      0.974     0.966     0.970       958
          7      0.970     0.959     0.964      1028
          8      0.966     0.966     0.966       974
          9      0.966     0.954     0.960      1009

avg / total      0.971     0.971     0.971     10000

Meta-model Repurposer

In [33]:
repLR = xfer.LrRepurposer(source_model=net_mod, feature_layer_names=['hybridsequential0_dense0_fwd'])
In [34]:
repLR.repurpose(train_iter)
/anaconda/envs/xfer-env/lib/python3.6/site-packages/sklearn/linear_model/sag.py:326: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  "the coef_ did not converge", ConvergenceWarning)
In [35]:
predictionsLR = repLR.predict_label(val_iter)
In [36]:
print(classification_report(mnist['test_label'], predictionsLR,
      digits=3))
             precision    recall  f1-score   support

          0      0.990     0.993     0.991       980
          1      0.991     0.996     0.993      1135
          2      0.985     0.989     0.987      1032
          3      0.987     0.989     0.988      1010
          4      0.992     0.990     0.991       982
          5      0.979     0.982     0.980       892
          6      0.990     0.984     0.987       958
          7      0.983     0.985     0.984      1028
          8      0.987     0.986     0.986       974
          9      0.989     0.977     0.983      1009

avg / total      0.987     0.987     0.987     10000

In [ ]:

Repurposing

Base Classes:

xfer.Repurposer Base Class for repurposers that train models using Transfer Learning (source_model -> target_model).
xfer.MetaModelRepurposer Base class for repurposers that extract features from layers in source neural network (Transfer) and train a meta-model using the extracted features (Learn).
xfer.NeuralNetworkRepurposer Base class for repurposers that create a target neural network from a source neural network through Transfer Learning.

Repurposers:

xfer.LrRepurposer Perform Transfer Learning through a Logistic Regression meta-model which repurposes the source neural network.
xfer.SvmRepurposer Perform Transfer Learning through a Support Vector Machine (SVM) meta-model which repurposes the source neural network.
xfer.GpRepurposer Repurpose source neural network to create a Gaussian Process (GP) meta-model through Transfer Learning.
xfer.BnnRepurposer Perform Transfer Learning through a Bayesian Neural Network (BNN) meta-model which repurposes the source neural network.
xfer.NeuralNetworkFineTuneRepurposer Class that creates a target neural network from a source neural network through Transfer Learning.
xfer.NeuralNetworkRandomFreezeRepurposer Class that creates a target neural network from a source neural network through Transfer Learning.

Model Handler

xfer.model_handler.ModelHandler Class for model manipulation and feature extraction.
xfer.model_handler.exceptions Exceptions for Model Handler.
xfer.model_handler.consts Model Handler constants.

Writing a custom Repurposer

Xfer implements and supports two kinds of Repurposers:

  • Meta-model Repurposer - this uses the source model to extract features and then fits a meta-model to the features
  • Neural network Repurposer - this modifies the source model to create a target model

Below are examples of creating custom Repurposers for both classes

Setup

First import relevant modules, define data iterators and load a source model

In [1]:
import warnings
warnings.filterwarnings("ignore")

import logging
logging.disable(logging.WARNING)

import xfer

import os
import glob
import mxnet as mx
import random
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import classification_report

random.seed(1)
In [2]:
def get_iterators_from_folder(data_dir, train_size=0.6, batchsize=10, label_name='softmax_label', data_name='data', random_state=1):
    """
    Method to create iterators from data stored in a folder with the following structure:
    /data_dir
        /class1
            class1_img1
            class1_img2
            ...
            class1_imgN
        /class2
            class2_img1
            class2_img2
            ...
            class2_imgN
        ...
        /classN
    """
    # assert dir exists
    if not os.path.isdir(data_dir):
        raise ValueError('Directory not found: {}'.format(data_dir))
    # get class names
    classes = [x.split('/')[-1] for x in glob.glob(data_dir+'/*')]
    classes.sort()
    fnames = []
    labels = []
    for c in classes:
            # get all the image filenames and labels
            images = glob.glob(data_dir+'/'+c+'/*')
            images.sort()
            fnames += images
            labels += [c]*len(images)
    # create label2id mapping
    id2label = dict(enumerate(set(labels)))
    label2id = dict((v,k) for k, v in id2label.items())

    # get indices of train and test
    sss = StratifiedShuffleSplit(n_splits=2, test_size=None, train_size=train_size, random_state=random_state)
    train_indices, test_indices = next(sss.split(labels, labels))

    train_img_list = []
    test_img_list = []
    train_labels = []
    test_labels = []
    # create imglist for training and test
    for idx in train_indices:
        train_img_list.append([label2id[labels[idx]], fnames[idx]])
        train_labels.append(label2id[labels[idx]])
    for idx in test_indices:
        test_img_list.append([label2id[labels[idx]], fnames[idx]])
        test_labels.append(label2id[labels[idx]])

    # make iterators
    train_iterator = mx.image.ImageIter(batchsize, (3,224,224), imglist=train_img_list, label_name=label_name, data_name=data_name,
                                        path_root='')
    test_iterator = mx.image.ImageIter(batchsize, (3,224,224), imglist=test_img_list, label_name=label_name, data_name=data_name,
                                      path_root='')

    return train_iterator, test_iterator, train_labels, test_labels, id2label, label2id
In [3]:
dataset = 'test_images' # options are: 'test_sketches', 'test_images_sketch', 'mnist-50', 'test_images' or your own data.
num_classes = 4

train_iterator, test_iterator, train_labels, test_labels, id2label, label2id = get_iterators_from_folder(dataset, 0.6, 4, label_name='prob_label', random_state=1)
In [4]:
# Download vgg19 (trained on imagenet)
path = 'http://data.mxnet.io/models/imagenet/'
[mx.test_utils.download(path+'vgg/vgg19-0000.params'),
mx.test_utils.download(path+'vgg/vgg19-symbol.json')]
Out[4]:
['vgg19-0000.params', 'vgg19-symbol.json']
In [5]:
# This will be the source model we use for repurposing later
source_model = mx.module.Module.load('vgg19', 0, label_names=['prob_label'])

Custom Meta-model Repurposer

We will create a new Repurposer that uses the KNN algorithm as a meta-model. The resulting Meta-model Repurposer will classify the features extracted by the neural network source model.

In [6]:
from sklearn.neighbors import KNeighborsClassifier

Definition

In [7]:
class KNNRepurposer(xfer.MetaModelRepurposer):
    def __init__(self, source_model: mx.mod.Module, feature_layer_names, context_function=mx.context.cpu, num_devices=1,
                 n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=-1):
        # Call init() of parent
        super(KNNRepurposer, self).__init__(source_model, feature_layer_names, context_function, num_devices)

        # Initialise parameters specific to the KNN algorithm
        self.n_neighbors = n_neighbors
        self.weights = weights
        self.algorithm = algorithm
        self.leaf_size = leaf_size
        self.p = p
        self.metric = metric
        self.metric_params = metric_params
        self.n_jobs = n_jobs

    # Define function that takes a set of features and labels and returns a trained model.
    # feature_indices_per_layer is a dictionary which gives the feature indices which correspond
    # to each layer's features.
    def _train_model_from_features(self, features, labels, feature_indices_per_layer=None):
        lin_model = KNeighborsClassifier(n_neighbors=self.n_neighbors,
                                        weights=self.weights,
                                        algorithm=self.algorithm,
                                        leaf_size=self.leaf_size,
                                        p=self.p,
                                        metric=self.metric,
                                        metric_params=self.metric_params)
        lin_model.fit(features, labels)
        return lin_model

    # Define a function that predicts the class probability given features
    def _predict_probability_from_features(self, features):
        return self.target_model.predict_proba(features)

    # Define a function that predicts the class label given features
    def _predict_label_from_features(self, features):
        return self.target_model.predict(features)

    # In order to make your repurposer serialisable, you will need to implement functions
    # which convert your model's parameters to a dictionary.
    def get_params(self):
        """
        This function should return a dictionary of all the parameters of the repurposer that
        are in the repurposer constructor arguments.
        """
        param_dict = super().get_params()
        param_dict['n_neighbors'] = self.n_neighbors
        param_dict['weights'] = self.weights
        param_dict['algorithm'] = self.algorithm
        param_dict['leaf_size'] = self.leaf_size
        param_dict['p'] = self.p
        param_dict['metric'] = self.metric
        param_dict['metric_params'] = self.metric_params
        param_dict['n_jobs'] = self.n_jobs
        return param_dict

    # Some repurposers will need a get_attributes() and set_attributes() to get and set the parameters
    # of the repurposer that are not in the constructor argument. An example is shown below:

    # def get_attributes(self):
    #     """
    #     This function should return a dictionary of all the parameters of the repurposer that
    #     are NOT in the constructor arguments.
    #
    #     This function does not need to be defined if the repurposer has no specific attributes.
    #     """
    #     param_dict = super().get_attributes()
    #     param_dict['example_attribute'] = self.example_attribute
    #     return param_dict

    # def set_attributes(self, input_dict):
    #     super().set_attributes(input_dict)
    #     self.example_attribute  = input_dict['example_attribute']

    def serialize(self, file_prefix):
        """
        Saves repurposer (excluding source model) to file_prefix.json.
        This method converts the repurposer to dictionary and saves as a json.


        :param str file_prefix: Prefix to save file with
        """
        output_dict = {}
        output_dict[repurposer_keys.PARAMS] = self.get_params()
        output_dict[repurposer_keys.TARGET_MODEL] = target_model_to_dict()  # This should be some serialised representation of the target model
        output_dict.update(self.get_attributes())

        utils.save_json(file_prefix, output_dict)

    def deserialize(self, input_dict):
        """
        Uses dictionary to set attributes of repurposer

        :param dict input_dict: Dictionary containing values for attributes to be set to
        """
        self.set_attributes(input_dict)  # Set attributes of the repurposer from input_dict
        self.target_model = target_model_from_dict()  # Unpack dictionary representation of target model

Use

In [8]:
repurposerKNN = KNNRepurposer(source_model, ['fc8'])
In [9]:
repurposerKNN.repurpose(train_iterator)
In [10]:
results = repurposerKNN.predict_label(test_iterator)
In [11]:
print(classification_report(y_pred=results, y_true=test_labels))
             precision    recall  f1-score   support

          0       1.00      0.50      0.67         2
          1       0.67      1.00      0.80         2
          2       1.00      1.00      1.00         2
          3       1.00      1.00      1.00         2

avg / total       0.92      0.88      0.87         8

Custom Neural Network Repurposer

Now we will define a custom Neural Network Repurposer which performs transfer learning by:

  1. taking the original source neural network and keeping all layers up to transfer_layer_name
  2. adding two fully connected layers on the top
  3. fine-tuning with any conv layers frozen

Definition

In [12]:
class Add2FullyConnectedRepurposer(xfer.NeuralNetworkRepurposer):
    def __init__(self, source_model: mx.mod.Module, transfer_layer_name, num_nodes, target_class_count,
                 context_function=mx.context.cpu, num_devices=1, batch_size=64, num_epochs=5):
        super().__init__(source_model, context_function, num_devices, batch_size, num_epochs)

        # initialse parameters
        self.transfer_layer_name = transfer_layer_name
        self.num_nodes = num_nodes
        self.target_class_count = target_class_count

    def _get_target_symbol(self, source_model_layer_names):
        # Check if 'transfer_layer_name' is present in source model
        if self.transfer_layer_name not in source_model_layer_names:
            raise ValueError('transfer_layer_name: {} not found in source model'.format(self.transfer_layer_name))

        # Create target symbol by transferring layers from source model up to 'transfer_layer_name'
        transfer_layer_key = self.transfer_layer_name + '_output'  # layer key with output suffix to lookup mxnet symbol group
        source_symbol = self.source_model.symbol.get_internals()
        target_symbol = source_symbol[transfer_layer_key]
        return target_symbol

    # All Neural Network Repurposers must implement this function which takes a training iterator and returns an MXNet Module
    def _create_target_module(self, train_iterator: mx.io.DataIter):
        # Create model handler to manipulate the source model
        model_handler = xfer.model_handler.ModelHandler(self.source_model, self.context_function, self.num_devices)

        # Create target symbol by transferring layers from source model up to and including 'transfer_layer_name'
        target_symbol = self._get_target_symbol(model_handler.layer_names)

        # Update model handler by replacing source symbol with target symbol
        # and cleaning up weights of layers that were not transferred
        model_handler.update_sym(target_symbol)

        # Add a fully connected layer (with nodes equal to number of target classes) and a softmax output layer on top
        fully_connected_layer1 = mx.sym.FullyConnected(num_hidden=self.num_nodes, name='fc_rep')
        fully_connected_layer2 = mx.sym.FullyConnected(num_hidden=self.target_class_count, name='fc_from_fine_tune_repurposer')
        softmax_output_layer = mx.sym.SoftmaxOutput(name=train_iterator.provide_label[0][0].replace('_label', ''))
        model_handler.add_layer_top([fully_connected_layer1, fully_connected_layer2,  softmax_output_layer])

        # Get fixed layers
        conv_layer_names = model_handler.get_layer_names_matching_type('Convolution')
        conv_layer_params = model_handler.get_layer_parameters(conv_layer_names)

        # Create and return target mxnet module using the new symbol and params
        return model_handler.get_module(train_iterator, fixed_layer_parameters=conv_layer_params)

    # To be serialisable, Neural Network Repurposers require get_params, get_attributes, set_attributes as shown above

Use

In [13]:
# instantiate repurposer
repurposer2Fc = Add2FullyConnectedRepurposer(source_model, transfer_layer_name='fc7', num_nodes=64, target_class_count=num_classes)
In [14]:
train_iterator.reset()
repurposer2Fc.repurpose(train_iterator)
In [15]:
results = repurposer2Fc.predict_label(test_iterator)
In [16]:
print(classification_report(y_pred=results, y_true=test_labels))
             precision    recall  f1-score   support

          0       1.00      0.50      0.67         2
          1       1.00      1.00      1.00         2
          2       1.00      1.00      1.00         2
          3       0.67      1.00      0.80         2

avg / total       0.92      0.88      0.87         8

Indices and tables