Skip to content

netguru/awesome-list

Repository files navigation

awesome-list Awesome

The research awesome list can be found here

List of curated machine learning frameworks and tools, inspired by awesome-machine-learning.

Contribute

Contributions welcome! Read the contribution guidelines first.

Table of Contents

Datasets

Speach

  • Common-Voice - Multi language, open source database with voice samples that anyone can use to train speech-enabled applications.

Environments

Graphical

  • AI-Blocks - a powerful and intuitive WYSIWYG interface that allows anyone to create Machine Learning models.

Hybrid

  • Luna Studio - Hybrid textual and visual functional programming

Libraries

Deep Learning

  • fast.ai - The fastai library simplifies training fast and accurate neural nets using modern best practices
  • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
  • tensorflow - An Open Source Machine Learning Framework for Everyone by Google
  • neon - Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
  • cleverhans - An adversarial example library for constructing attacks, building defenses, and benchmarking both
  • Netron - a viewer for neural network, deep learning and machine learning models.
  • Online viewer -
  • List of conversion tools for DNN models - list of many libraries (github projects) that provides options for converting DNN models between different frameworks

Dimensionality Reduction

  • umap - dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction

Multipurpose

  • DALI - A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications docs
  • gin-config - Gin provides a lightweight configuration framework for Python, by Google.
  • imbalanced-learn - A python package offering a number of re-sampling techniques. Compatible with scikit-learn, is part of scikit-learn-contrib projects.
  • mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.
  • numpy - The fundamental package for scientific computing with Python.
  • PyOD - Outlier detection library
  • RAPIDS - Open GPU Data Science. More here or in cheatsheet
  • scikit-learn - machine learning in Python
  • scikit-learn-laboratory (SKLL) - CLI for sklearn, working with configuration files
  • scipy - open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.
  • statsmodels - statistical modeling and econometrics in Python: time-series analysis, survival analysis, discrete models, Generalized Linear Models
  • SymPy - A computer algebra system written in pure Python, library for symbolic mathematics
  • Vaex - Out-of-Core DataFrames for Python, visualize and explore big tabular data at a billion rows per second. Project page

Natural Language Processing

  • Allen NLP - An open-source NLP research library, built on PyTorch.
  • PyText - A natural language modeling framework based on PyTorch by Facebook Research
  • pytorch-transformers - A library of state-of-the-art pretrained models for (NLP) including BERT, GPT, GPT-2, Transformer-XL, XLNet and XLM with multiple pre-trained model weights
  • flair - A very simple framework for state-of-the-art Natural Language Processing (NLP) by Zalando Research
  • gensim - Topic modeling for humans. Enables analysis of plain-text documents for semantic structure. Compatible with Word2Vec, FastText and other NLP models.
  • spaCy - spaCy is a library for advanced Natural Language Processing in Python and Cython. spaCy comes with pretrained statistical models and word vectors, and supports tokenization for 50+ languages.
  • TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Optimization

  • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python
  • nevergrad - A Python toolbox for performing gradient-free optimization by Facebook Research

Probabilistic Programming

  • pyro - Deep universal probabilistic programming with Python and PyTorch by Uber
  • pgmpy - Python Library for Probabilistic Graphical Models

Recommender systems

  • surprise - A Python scikit for building and analyzing recommender systems

Speech Processing

  • warp-ctc - loss function to train on misaligned data and labels by Baidu Research
  • DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
  • speech-to-text-wavenet - End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
  • pykaldi - A Python wrapper for Kaldi - a toolkit for speech recognition
  • pytorch-kaldi - pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Model Deployment

  • gradio - Library to easily integrate models into existing python (web) apps.

Tools

Compilers

  • glow - Compiler for Neural Network hardware accelerators by PyTorch
  • jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more by Google
  • numba - NumPy aware dynamic Python compiler using LLVM

Data Adapters

  • csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
  • redash - Connect to any data source, easily visualize, dashboard and share your data.
  • odo - Odo migrates between many formats. These include in-memory structures like list, pd.DataFrame and np.ndarray and also data outside of Python like CSV/JSON/HDF5 files, SQL databases, data on remote machines, and the Hadoop File System.

Data Annotation

  • doccano - Open source text annotation tool for machine learning practitioners.
  • snorkel - A system for quickly generating training data with weak supervision

Data Gathering

  • scrapy - high-level library to write crawlers and spiders.

Data Management

  • Quilt - Quilt versions and deploys data

Data Visualization

  • matplotlib - plotting with Python
  • bokeh - Interactive Web Plotting for Python
  • plotly - An open-source, interactive graphing library for Python
  • dash - Analytical Web Apps for Python. No JavaScript Required.
  • Jupyter Dashboards - Jupyter layout extension
  • vega - visualization grammar, a declarative format for creating, saving, and sharing interactive visualization designs. With Vega you can describe data visualizations in a JSON format, and generate interactive views using either HTML5 Canvas or SVG.
  • schema crawler - a tool to visualize database schema
  • scikit-plot - sklearn wrapper to automate frequently used machine learning visualizations.

Feature engineering

  • featuretools - an open source python framework for automated feature engineering.

Hardware Management

  • nvtop - a (h)top like task monitor for NVIDIA GPUs. It can handle multiple GPUs and print information about them in a htop familiar way.

  • s2i - Source-to-Image (S2I) is a toolkit and workflow for building reproducible container images from source code. S2I produces ready-to-run images by injecting source code into a container image and letting the container prepare that source code for execution. By creating self-assembling builder images, you can version and control your build environments exactly like you use container images to version your runtime environments.

Job Management

  • luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in. By Spotify.

Parallelization

  • pywren - parfor on AWS Lambda
  • horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet by Uber.
  • dask - library for parallel computing in Python with dynamic task scheduling: numpy computation graphs.

Reporting

  • shap - A unified approach to explain the output of any machine learning model
  • tensorboardX - tensorboard for pytorch (and chainer, mxnet, numpy, ...)
  • Weights and Biases - Experiment Tracking for Deep Learning
  • pandas-profiling - tool for generating exploratory data analysis for the provided DataFrame - presenting results in the form of HTML report

License

CC0

To the extent possible under law, Netguru has waived all copyright and related or neighboring rights to this work. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages