Anything useful goes here
On interpretability of BERT - similar in spirit to the work above
The 4 Biggest Open Problems in NLP Today
For practically anything involving LM:
GPT-2: Too dangerous to release to the public. Well here it is, with the weights and all.
PAIR: People + AI Research by Google Brain
StellarGraph: a Python library for machine learning on graph-structured or network-structured data
The prupose of this repository is to store tools on text classification with deep learning
Representation learning on large graphs using stochastic graph convolutions.
Flair - LM/Embedding/General NLP lib. Fastest growing NLP project on github
AllenNLP - An open-source NLP research library, built on PyTorch by Allen AI Research Institute
FastAI - ULMFit, Transformer, TransformerXL implementations and more
People + AI Research (PAIR) by the Google Brain team:
What If...you could inspect a machine learning model, with minimal coding required?
Visdom - similar to tensorboard
Seq2Seq Vis: Visualization for Sequential Neural Networks with Attention
BERTVizTool for visualizing attention in BERT and OpenAI GPT-2
tensor2tensor: visualizing Transformer paper
A.I. Experiments: Visualizing High-Dimensional Space
Guide to visualization of NLP representations and neural nets by C.Olah @ Google Brain
Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters, Part 1
Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters, Part 2
The illustrated BERT, ELMo, ULMFit, and Transformer
Visualizing Representations: Deep Learning for Human Beings
Jay Allamar: Visualizing machine learning one concept at a time
NVIDIA Apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Consider using these:
NVIDIA optimized and tuned containers for various frameworks
pandas.DataFrame.swifter.apply
Swifter will automatically apply the fastest method available (or so it says, more on it later). You want to make sure you have stuff like Dask intalled. It chooses between vectorization, Dask, and traditional pandas.apply
$ pip install -U pandas
$ pip install swifter
import pandas as pd
import swifter
mydf['outCol'] = df['inCol'].swifter.apply(anyfunction)
DASK - parallelizing numpy, pandasm python, scikit-learn, literally everything...
# replace the following line
#import pandas as pd
# with
import modin.pandas as pd
You are done, pandas is 10-30 times faster on some tasks! but sometimes will crash :)
Mini-batch data parallelism, sort of default in PyTorch
Best of all you still code in python, just need a decorator on top of time-consuming function. MAKE SURE IT IS TIME CONSUMING - just spamming @njit eveywhere will do the opposite of what you want, initializing numba costs resources!
from numba import jit, int32
# a 4-letter magick word that will make any function that
# takes 20-30 seconds finish in 5 or so!
@njit
def function0(a, b):
# your loop
return result
# we declare return value and types, turn off jit compiler
# and go directly for binary (making it harder to debug
# but SO much faster. Finally all vector ops will be
# distributed between cores if your CPU
@jit(int32(int32, int32), nopython=true, parallel=true)
def function(a, b):
# your loop or numerically intensive computations
return result
# in this function we are saying "you are no longer restricted
# to types we specify, just run it all in parallel, on one
# or more CPUs, using threads or processes or whatever!
# numba is smart enough to figure out the best way to do so
@vectorize
def function2(c):
# your loop or numerically intensive computations
return result
ipyexperiments - will save you 20-30% video and 10-15% system memory
ipyexperiments usage examples in some kaggle contest code I wrote
Make sure to either use IPyTorchExperiments all the time, or IPyCPUExperiments if don't care to use GPU. If you are using a GPU, you must be sure to use the IPyTorchExperiments and that the text after the cell tells you it is indeed using GPU backend.
In general, using numpy operations is preferred, e.g. np.sum()
beats iterating.
Avoid if-else by using np.where is a big one. Here is an example of going from 1 trillion operations to 1 operation. Assuming each operation takes a nanosecond, that's 17 minutes vs 1 nanosecond.
# X is some numpy array, and you have a 1000 of those in a dataframe or in a list
# If your column is 1000 in length, this is 1000 operations * size of numpy array (say 1000) = 1000000 operations
def fun(x):
if x > 0:
x =+ 1
else:
x = 0
return x
# ~1000000000000 operations
for X in data:
for x in X:
output.append(fun(x))
# ~1000000 operations
df['data'].apply([x for x in X])
# ~1000 operations you are doing no looping but pandas is single-threaded...
df['data'].apply(X)
# This is very fast, using vector math extensions. 1 Op on Xeon or i9 with MKL Installed.
def fun2(x):
x[np.where(x > 0)] += 1
x[np.where(x <= 0)] = 0
return x
df['data'].swifter.apply(fun2)
Assume data
contains some items that we abstract as ...
. In general, follow this rule of thumb:
-
Slowest:
for i in range(len(data)):
-
OK:
for d in data:
-
Faster:
[d for d in data]
-
Fastest
(d for d in data)
Learning to Reweight Examples for Robust Deep Learning implementation of the "Learning to Reweight..." paper
PyTorch imbalanced-dataset-toolkit
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset
Training on validation set when train and test are different distributions
mlextend 0 a library with useful extensions to a variaty of ML/NLP tools
DataFrameSummary: An extension to pandas dataframes describe function
On writing research papers:
"How to Write an Introduction" by Dr. Om Gnawali
Some of the best examples of technical writing (papers & blogs go hand in hand!):
How to trick a neural network into thinking a panda is a vulture
Picking an optimizer for Style Transfer
How do we 'train' Neural Networks?
Disciplined Training of Neural Networks
On Language Models:
NLP's ImageNet moment has arrived
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Improving Language Understanding by Generative Pre-Training
Deep contextualized word representations
Universal Language Model Fine-tuning for Text Classification
TRANSFORMER-XL: ATTENTIVE LANGUAGE MODELS BEYOND A FIXED-LENGTH CONTEXT
Comparing complex NLP models for complex languages on a set of real tasks
You don't need RNNs: When Recurrent Models Don't Need to be Recurrent
Other:
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification