Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



43 Commits

Repository files navigation


Anything useful goes here

Possible research directions and Fundamental discussions

Structural probing: visualizing and understanding deep self-supervised language models, by Manning @ Stanford NLP - very important read

On interpretability of BERT - similar in spirit to the work above

The 4 Biggest Open Problems in NLP Today

What innate priors should we build into the architecture of deep learning systems? A debate between Prof. Y. LeCun and Prof. C. Manning


For practically anything involving LM:

XLNet: This King(s) are dead, long live the King! Sorry BERT, you were cool but now you are obolete :)

GPT-2: Too dangerous to release to the public. Well here it is, with the weights and all.

TRANSFORMER-XL: Still the only one you can use for realistically large documents. In the long term, IMHO this paper is a much more important contribution than BERT.

BERT: The great multi-tasker, trained to do a number of things really well. Great theoreical contributions -- the pinnacle of Attentions.

OpenAI Transformer, aka GPT-1

New Libraries we care about

PAIR: People + AI Research by Google Brain

StellarGraph: a Python library for machine learning on graph-structured or network-structured data

The prupose of this repository is to store tools on text classification with deep learning


Representation learning on large graphs using stochastic graph convolutions.

Wow this is good! ULMFit for graphs! This person has a ton of other stuff, more productive thansome institutes

The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer-XL.

Flair - LM/Embedding/General NLP lib. Fastest growing NLP project on github

FAIRseq - Facebook AI Research toolkit for seqence modeling; features multi-GPU (distributed) training on one machine or across multiple machines. PyTorch

AllenNLP - An open-source NLP research library, built on PyTorch by Allen AI Research Institute

FastAI - ULMFit, Transformer, TransformerXL implementations and more

Visualization Tools

People + AI Research (PAIR) by the Google Brain team:

What could inspect a machine learning model, with minimal coding required?



TensorboardX for PyTorch

Visdom - similar to tensorboard


LSTMVis: Visualizng LSTM

Seq2Seq Vis: Visualization for Sequential Neural Networks with Attention


BERTVizTool for visualizing attention in BERT and OpenAI GPT-2

tensor2tensor: visualizing Transformer paper

How to do visualization or highly visual articles:

A.I. Experiments: Visualizing High-Dimensional Space

Guide to visualization of NLP representations and neural nets by C.Olah @ Google Brain

Data Visualization

The Annotated Transformer

Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters, Part 1

Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters, Part 2

The illustrated BERT, ELMo, ULMFit, and Transformer

Visualizing Representations: Deep Learning for Human Beings

Jay Allamar: Visualizing machine learning one concept at a time

Attention? Attention!

My Bag of Tricks for Deep Learning performance - add yours too:

First, make sure you got all the NVIDIA stuff:

NVIDIA Apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

NVIDIA cuDNN: provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

NVIDIA NCCL: The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs

NVIDIA DALI: A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications

Consider using these:

NVIDIA optimized and tuned containers for various frameworks

Next we do parallelism:


Swifter will automatically apply the fastest method available (or so it says, more on it later). You want to make sure you have stuff like Dask intalled. It chooses between vectorization, Dask, and traditional pandas.apply

$ pip install -U pandas
$ pip install swifter

import pandas as pd
import swifter

mydf['outCol'] = df['inCol'].swifter.apply(anyfunction)

DASK - parallelizing numpy, pandasm python, scikit-learn, literally everything...

Modin: An alternative to DASK but only for Pandas - much simpler and lighter if I/O is what you need. Will process 10 GB DataFrame in seconds.

# replace the following line
#import pandas as pd
# with
import modin.pandas as pd

You are done, pandas is 10-30 times faster on some tasks! but sometimes will crash :)

Mini-batch data parallelism, sort of default in PyTorch

Compile your code the easy way

Numba: compiled and highly optimized C/C++/Fortran code will be used instead of slow numpy (even cython is slower)

Best of all you still code in python, just need a decorator on top of time-consuming function. MAKE SURE IT IS TIME CONSUMING - just spamming @njit eveywhere will do the opposite of what you want, initializing numba costs resources!

from numba import jit, int32

# a 4-letter magick word that will make any function that
# takes 20-30 seconds finish in 5 or so!

def function0(a, b):
    # your loop 
    return result
# we declare return value and types, turn off jit compiler 
# and go directly for binary (making it harder to debug 
# but SO much faster. Finally all vector ops will be 
# distributed between cores if your CPU
@jit(int32(int32, int32), nopython=true, parallel=true)
def function(a, b):
    # your loop or numerically intensive computations
    return result

# in this function we are saying "you are no longer restricted
# to types we specify, just run it all in parallel, on one
# or more CPUs, using threads or processes or whatever!
# numba is smart enough to figure out the best way to do so

def function2(c):
    # your loop or numerically intensive computations
    return result

Eliminate memory leaks

ipyexperiments - will save you 20-30% video and 10-15% system memory

ipyexperiments usage examples in some kaggle contest code I wrote

Make sure to either use IPyTorchExperiments all the time, or IPyCPUExperiments if don't care to use GPU. If you are using a GPU, you must be sure to use the IPyTorchExperiments and that the text after the cell tells you it is indeed using GPU backend.

Speed up your loops

In general, using numpy operations is preferred, e.g. np.sum() beats iterating.

Avoid if-else by using np.where is a big one. Here is an example of going from 1 trillion operations to 1 operation. Assuming each operation takes a nanosecond, that's 17 minutes vs 1 nanosecond.

# X is some numpy array, and you have a 1000 of those in a dataframe or in a list
# If your column is 1000 in length, this is 1000 operations * size of numpy array (say 1000) = 1000000 operations
def fun(x):
    if x > 0:
        x =+ 1
        x = 0
    return x
# ~1000000000000 operations
for X in data:
    for x in X:

# ~1000000 operations
df['data'].apply([x for x in X])

# ~1000 operations you are doing  no looping but pandas is single-threaded...

# This is very fast, using vector math extensions. 1 Op on Xeon or i9 with MKL Installed.
def fun2(x):
    x[np.where(x > 0)] += 1
    x[np.where(x <= 0)] = 0
    return x


Assume data contains some items that we abstract as ... . In general, follow this rule of thumb:

  1. Slowest: for i in range(len(data)):

  2. OK: for d in data:

  3. Faster: [d for d in data]

  4. Fastest (d for d in data)

Help with Class Balance and Distribution Issues

Learning to Reweight Examples for Robust Deep Learning implementation of the "Learning to Reweight..." paper

PyTorch imbalanced-dataset-toolkit

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

CVPR, Kaggle Winner: "Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning" with Imbalanced Class Labels

8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

Probability calibration

Training on validation set when train and test are different distributions

Deep Learning Unbalanced Data

Other useful tools

mlextend 0 a library with useful extensions to a variaty of ML/NLP tools

DataFrameSummary: An extension to pandas dataframes describe function

Paper and Technical Writing HOWTO

On writing research papers:

"How to Write an Introduction" by Dr. Om Gnawali

Some of the best examples of technical writing (papers & blogs go hand in hand!):

How to trick a neural network into thinking a panda is a vulture

Picking an optimizer for Style Transfer

How do we 'train' Neural Networks?

Must-read papers and technical articles

Disciplined Training of Neural Networks

On Language Models:

NLP's ImageNet moment has arrived

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All You Need

Improving Language Understanding by Generative Pre-Training

Deep contextualized word representations

Universal Language Model Fine-tuning for Text Classification


Comparing complex NLP models for complex languages on a set of real tasks

You don't need RNNs: When Recurrent Models Don't Need to be Recurrent


Deep Graph Methods Survey

Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Blogs You Should Follow

Stanford AI Salon: bi-weekly discussion on important topic in AI/NLP/ML, closed from the public due to space restrictions, but notes and videos are now posted in a blog. Previous guests include LeCun, Hinton, Ng., and others

Anrej Karpathy

Vitaliy Bushaev

Sylvain Gugger

Sebastian Ruder

Jeremy Howard

Jay Allamar


Anything useful goes here







No releases published


No packages published