awesome-list

The research awesome list can be found here

List of curated machine learning frameworks and tools, inspired by awesome-machine-learning.

Contribute

Contributions welcome! Read the contribution guidelines first.

Datasets

Speach

Common-Voice - Multi language, open source database with voice samples that anyone can use to train speech-enabled applications.

Environments

Graphical

AI-Blocks - a powerful and intuitive WYSIWYG interface that allows anyone to create Machine Learning models.

Hybrid

Luna Studio - Hybrid textual and visual functional programming

Libraries

Deep Learning

fast.ai - The fastai library simplifies training fast and accurate neural nets using modern best practices
PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
tensorflow - An Open Source Machine Learning Framework for Everyone by Google
neon - Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
cleverhans - An adversarial example library for constructing attacks, building defenses, and benchmarking both
Netron - a viewer for neural network, deep learning and machine learning models.
Online viewer -
List of conversion tools for DNN models - list of many libraries (github projects) that provides options for converting DNN models between different frameworks

Dimensionality Reduction

umap - dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction

Multipurpose

DALI - A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications docs
gin-config - Gin provides a lightweight configuration framework for Python, by Google.
imbalanced-learn - A python package offering a number of re-sampling techniques. Compatible with scikit-learn, is part of scikit-learn-contrib projects.
mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.
numpy - The fundamental package for scientific computing with Python.
PyOD - Outlier detection library
RAPIDS - Open GPU Data Science. More here or in cheatsheet
scikit-learn - machine learning in Python
scikit-learn-laboratory (SKLL) - CLI for sklearn, working with configuration files
scipy - open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.
statsmodels - statistical modeling and econometrics in Python: time-series analysis, survival analysis, discrete models, Generalized Linear Models
SymPy - A computer algebra system written in pure Python, library for symbolic mathematics
Vaex - Out-of-Core DataFrames for Python, visualize and explore big tabular data at a billion rows per second. Project page

Natural Language Processing

Allen NLP - An open-source NLP research library, built on PyTorch.
PyText - A natural language modeling framework based on PyTorch by Facebook Research
pytorch-transformers - A library of state-of-the-art pretrained models for (NLP) including BERT, GPT, GPT-2, Transformer-XL, XLNet and XLM with multiple pre-trained model weights
flair - A very simple framework for state-of-the-art Natural Language Processing (NLP) by Zalando Research
gensim - Topic modeling for humans. Enables analysis of plain-text documents for semantic structure. Compatible with Word2Vec, FastText and other NLP models.
spaCy - spaCy is a library for advanced Natural Language Processing in Python and Cython. spaCy comes with pretrained statistical models and word vectors, and supports tokenization for 50+ languages.
TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Optimization

hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python
nevergrad - A Python toolbox for performing gradient-free optimization by Facebook Research

Probabilistic Programming

pyro - Deep universal probabilistic programming with Python and PyTorch by Uber
pgmpy - Python Library for Probabilistic Graphical Models

Recommender systems

surprise - A Python scikit for building and analyzing recommender systems

Speech Processing

warp-ctc - loss function to train on misaligned data and labels by Baidu Research
DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
speech-to-text-wavenet - End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
pykaldi - A Python wrapper for Kaldi - a toolkit for speech recognition
pytorch-kaldi - pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Model Deployment

gradio - Library to easily integrate models into existing python (web) apps.

Tools

Compilers

glow - Compiler for Neural Network hardware accelerators by PyTorch
jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more by Google
numba - NumPy aware dynamic Python compiler using LLVM

Data Adapters

csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
redash - Connect to any data source, easily visualize, dashboard and share your data.
odo - Odo migrates between many formats. These include in-memory structures like list, pd.DataFrame and np.ndarray and also data outside of Python like CSV/JSON/HDF5 files, SQL databases, data on remote machines, and the Hadoop File System.

Data Annotation

doccano - Open source text annotation tool for machine learning practitioners.
snorkel - A system for quickly generating training data with weak supervision

Data Gathering

scrapy - high-level library to write crawlers and spiders.

Data Management

Quilt - Quilt versions and deploys data

Data Visualization

matplotlib - plotting with Python
bokeh - Interactive Web Plotting for Python
plotly - An open-source, interactive graphing library for Python
dash - Analytical Web Apps for Python. No JavaScript Required.
Jupyter Dashboards - Jupyter layout extension
vega - visualization grammar, a declarative format for creating, saving, and sharing interactive visualization designs. With Vega you can describe data visualizations in a JSON format, and generate interactive views using either HTML5 Canvas or SVG.
schema crawler - a tool to visualize database schema
scikit-plot - sklearn wrapper to automate frequently used machine learning visualizations.

Feature engineering

featuretools - an open source python framework for automated feature engineering.

Hardware Management

nvtop - a (h)top like task monitor for NVIDIA GPUs. It can handle multiple GPUs and print information about them in a htop familiar way.
s2i - Source-to-Image (S2I) is a toolkit and workflow for building reproducible container images from source code. S2I produces ready-to-run images by injecting source code into a container image and letting the container prepare that source code for execution. By creating self-assembling builder images, you can version and control your build environments exactly like you use container images to version your runtime environments.

Job Management

luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in. By Spotify.

Parallelization

pywren - parfor on AWS Lambda
horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet by Uber.
dask - library for parallel computing in Python with dynamic task scheduling: numpy computation graphs.

Reporting

shap - A unified approach to explain the output of any machine learning model
tensorboardX - tensorboard for pytorch (and chainer, mxnet, numpy, ...)
Weights and Biases - Experiment Tracking for Deep Learning
pandas-profiling - tool for generating exploratory data analysis for the provided DataFrame - presenting results in the form of HTML report

License

To the extent possible under law, Netguru has waived all copyright and related or neighboring rights to this work. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.circleci		.circleci
.github		.github
research		research
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-list

Contribute

Table of Contents

Datasets

Speach

Environments

Graphical

Hybrid

Libraries

Deep Learning

Dimensionality Reduction

Multipurpose

Natural Language Processing

Optimization

Probabilistic Programming

Recommender systems

Speech Processing

Model Deployment

Tools

Compilers

Data Adapters

Data Annotation

Data Gathering

Data Management

Data Visualization

Feature engineering

Hardware Management

Job Management

Parallelization

Reporting

License

About

Releases

Packages

Contributors 9

Languages

License

netguru/awesome-list

Folders and files

Latest commit

History

Repository files navigation

awesome-list

Contribute

Table of Contents

Datasets

Speach

Environments

Graphical

Hybrid

Libraries

Deep Learning

Dimensionality Reduction

Multipurpose

Natural Language Processing

Optimization

Probabilistic Programming

Recommender systems

Speech Processing

Model Deployment

Tools

Compilers

Data Adapters

Data Annotation

Data Gathering

Data Management

Data Visualization

Feature engineering

Hardware Management

Job Management

Parallelization

Reporting

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages