Truly universal encoding detector in pure Rust - port of Python version
-
Updated
Sep 21, 2025 - Rust
Truly universal encoding detector in pure Rust - port of Python version
A lightweight, fast, and optimized XML file splitter with build in tag data validation, written with the XMLParser library. The main goal of this is to split an XML file into multiple small chunks (hence the name), then save it into multiple different little XML files.
Charset and encoding detection for Python. Identifies the encoding of any byte sequence using byte-distribution fingerprinting, language coherence scoring, and mess detection. Supports streaming detection, mojibake repair, multi-encoding document analysis, and in-band HTML/XML hints. Drop-in replacement for chardet and charset-normalizer.
Add a description, image, and links to the charset-normalizer topic page so that developers can more easily learn about it.
To associate your repository with the charset-normalizer topic, visit your repo's landing page and select "manage topics."