Marker v2
Basically a full rewrite!
Main features:
- Extracts and saves images
- Improved table formatting
- Better markdown wrapping
- Better reading order on complex docs
- Improved OCR engine with more language options
- Simple pip package install (no more required system dependencies), so can be used easily on Windows
- Can be used commercially (pymupdf and layoutlmv3 dependencies removed)
It takes ~2x as long to run now, but seems like a decent tradeoff.
See the README for details.