🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
- 
            Updated
            Dec 2, 2024 
- Python
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Machine learning with dataframes
Scalable data pre processing and curation toolkit for LLMs
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Open source project for data preparation for GenAI applications
Data Preparation for Satellite Machine Learning
A New, Interactive Approach to Learning Data Science
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
ABAP unit testing framework, prepare in Excel, reuse in abap code
This repository contains my implementations of the algorithms which MoNuSAC participants could use for data preparation to train their models at ISBI 2020.
Go web crawler to scrape documentation sites and convert content to clean Markdown for LLM ingestion (RAG, training data).
Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
GWAS summary statistics files QC tool
Data preparation for data science projects.
Foofah: programming-by-example data transformation program synthesizer
Add a description, image, and links to the data-preparation topic page so that developers can more easily learn about it.
To associate your repository with the data-preparation topic, visit your repo's landing page and select "manage topics."