A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
-
Updated
Aug 29, 2024
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A light-weight, flexible, and expressive statistical data testing library
Concurrent and multi-stage data ingestion and data processing with Elixir
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Large-scale pretraining for dialogue
Extract Transform Load for Python 3.5+
Python Stream Processing
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Kubernetes-native platform to run massively parallel data/streaming jobs
Data and tools for generating and inspecting OLMo pre-training data.
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."