polars-bio is a Python library for genomics built on top of polars, Apache Arrow and Apache DataFusion. It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.
- optimized for peformance and large-scale genomics datasets
- popular genomics operations with a DataFrame API (both Pandas and polars)
- SQL-powered bioinformatic data querying or manipulation
- native parallel engine powered by Apache DataFusion and sequila-native
- out-of-core/streaming processing (for data too large to fit into a computer's main memory) with Apache DataFusion and polars
- support for direct streamed reading data from cloud storages (e.g. S3, GCS) enabling processing large-scale genomics data without materializing in memory
- zero-copy data exchange with Apache Arrow
- bioinformatics file formats with noodles and exon
- pre-built wheel packages for Linux, Windows and MacOS (arm64 and x86_64) available on PyPI
Read the documentation