Highlights
- Pro
Data-Science
📝 An awesome Data Science repository to learn and apply for real world problems.
ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
We will keep updating the paper list about machine learning + causal theory. We also internally discuss related papers between NExT++ (NUS) and LDS (USTC) by week.
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Blogs on Machine Learning and Deep learning
Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events
Temporal Causal Discovery Framework (PyTorch): discovering causal relationships between time series
The Machine Learning & Deep Learning Compendium was a list of references in my private & single document, which I curated in order to expand my knowledge, it is now an open knowledge-sharing projec…
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristi…
Labs and demos for courses for GCP Training (http://cloud.google.com/training).
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collec…
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.