Over the past few months, we have upgraded our deep learning stack to take advantage of the latest developments in the world of deep learning. We transitioned from Theano to Tensorflow, upgraded our Keras version, introduced GPUs into the Talentpair stack and rolled out helpful machine learning tools to accelerate our model iterations.
Transitioning from Theano to Tensorflow
We follow the Tensorflow community closely and were impressed by the ecosystem Google has created. Tensorflow has a strong focus on deploying models to production, which provided us tools and options to export models efficiently or to investigate model internals (more later about our deep learning tools). Since deep learning became the bread and butter for Talentpair’s matching algorithms, we felt it was time to transition from Theano to Tensorflow. As we build most of our models in Keras, the transition was simple. We switched the backend framework flag in Keras and retrained the models. For internal reasons, we decided to retrain the models instead of simply exporting the model structure and the learned weights (this is the preferred way to do it).
Coincidently, one day before our deployment (9/28/2017) of the new Tensorflow models, Yoshua Bengio announced that the Montreal Institute for Learning Algorithms wouldn’t support the development of Theano anymore. Our timing couldn’t have been better. RIP dear Theano.
Upgrading Keras from 1.2.2 to 2.1.2
We also upgraded our Keras version. Due to the high frequency of Keras releases, we can’t upgrade with every release but the transition from Keras 1.x to Keras 2 seemed a no-brainer. Keras 2 includes various new layers and callbacks which became very useful. Keras 2 integrates nicely with TensorBoard (more later) and provides us new capabilities like CUDA optimized Long-short-term-memory layers (CuDNNLSTM) which we wanted to check out.
Introducing GPUs at Talentpair
With the increased model complexities and the larger amounts of training data we are processing, we saw the need to introduce graphical processing units (GPUs) into the Talentpair infrastructure stack. Entirely cloud-based, we had been running data science tasks on on-demand AWS m4.4xlarge instances. This would not scale past our initial models. So, we set up two AWS instances with Nvidia’s GRID K520 GPUs which provided us 4 GB or 16 GB of GPU memory (g2.2xlarge and g2.8xlarge respectively). However, our models have grown even more complex. In the past, we experimented with batch normalizations and developed a proof-of-concept for a ResNet for Natural Language Understanding. For these experiments, we are switching to AWS’ P2 instances based on Nvidia’s K80 GPUs. Thanks to the work of the Talentpair infrastructure team, transitioning the instances is easy and seamless.
Deep Learning Tools
All the framework and deep learning backend upgrades enabled us to roll out new Deep Learning tools.
TensorBoard is one of the tools where you say “How did we work without TensorBoard before?”. It allows us to graph the network internals, visualize our domain specific word embeddings, plot histograms and, of course, our networks metrics. TensorBoard became an integrated piece of our deep learning pipelines. No need anymore to write custom t-SNE dimensionality reductions. TensorBoard performs t-SNE or PCA on request and plots the embeddings nicely as 2D or 3D plots. In addition, it allowed us to quickly perform similarity calculations (cosine or euclidean). Initially, we used it to plot word embeddings, but the usage was quickly extended to plotting similarity of our document semantic vectors.
Our latest developments for a new matching model have greatly benefited from TensorBoard’s functionality to plot the distribution of internal values during each training epoch. These insights have been very beneficial for our understanding of the ConvNet developments at Talentpair.
We train and validate our deep learning models on remote AWS instances. While the connection to the instances is convenient via ssh, completing data science tasks quickly by plotting our data isn’t. One way to provide an interactive and graphical session is by running a Jupyter notebook session on our data science machines. That way, we have access to our development databases and can enjoy the same graphical capabilities as if we would develop locally on laptops. Using Jupyter notebooks have reduced the turnaround time for plotting data dramatically.
All upgrades to our deep learning stack during the last months have been the foundation to long list of deep learning developments at Talentpair. The changes allowed us to rethink our core matching algorithm, design new ways of estimating the candidate’s seniority or to train new deep learning classifiers to assist the Talentpair users in their workflow.