kolibr2zim
allows you to create a ZIM file from a Kolibri Channel.
It downloads the video (webm
or mp4
extension – optionally
recompress them in lower-quality, smaller size), the thumbnails, the
subtitles and the authors' profile pictures ; then, it create a static
HTML files folder of it before creating a ZIM off of it.
Warning
This scraper is under heavy modifications to prepare a v2 including a brand new UI for navigating the tree of content and a move to Vue.JS. These changes
are already merged into main
branch but not yet completed. Should you be interested in a stable version, please used published versions (PyPI or Docker).
We also have a v1
branch for any urgent patch needed to current production version.
- Node 20.x
- Python 3.11
ffmpeg
for video transcoding (only used with--use-webm
or--low-quality
).curl
andunzip
to install Javascript dependencies. Seeget_web_deps.sh
if you want to do it manually.
kolibri2zim
is a Python3 software. If you are not using the
Docker image, you are advised to use it in a
virtual environment to avoid installing software dependencies on your system.
python3 -m venv env # Create virtualenv
source env/bin/Activate # Activate the virtualenv ('env/Scripts/Activate' in Windows)
pip3 install kolibri2zim # Install dependencies
kolibri2zim --help # Display kolibri2zim help
Call deactivate
to quit the virtual environment.
See pyproject.toml
for the list of python dependencies.
To test epubs and pdfs rendering, a potential usefull command is:
kolibri2zim --name "Biblioteca Elejandria" --output /output --tmp-dir /tmp --zim-file Biblioteca_Elejandria.zim --channel-id "fed29d60e4d84a1e8dcfc781d920b40e" --node-ids 'd92c07655128458f8248416154b18a68,89fe2f86ee3f4fbaa7fb2bf9bd56d088,75f99e6b97d14b14a4e74762ad77391f,89fe2f86ee3f4fbaa7fb2bf9bd56d088'
docker run -v my_dir:/output ghcr.io/openzim/kolibri kolibri2zim --help
kolibri2zim
works off a channel-id
that you must provide. This is a 32-characters long ID that you can find in the URL of the channel you want, either from Kolibri Studio or the Kolibri Catalog
kolibri2zim adheres to openZIM's Contribution Guidelines.
kolibri2zim has implemented openZIM's Python bootstrap, conventions and policies v1.0.0.
Before contributing be sure to check out the CONTRIBUTING.md guidelines.
Some usefull test channels:
- 7f744ce8d28b471eaf663abd60c92267: a very minimal channel with all kind of content
- 9f15f4e9aeaa48b5ae271e5749d6fe80 : a small channel with significantly nested items and all kind of content
You have to:
- build the
zimui
frontend which will be embededed inside the ZIM (and redo it every time you make modifications to thezimui
) - run the
scraper
to retrieve FCC curriculum and build the ZIM
Sample commands:
cd zimui
yarn install
yarn build
cd ../scraper
hatch run kolibri2zim --name "Biblioteca Elejandria" --output output --zim-file Biblioteca_Elejandria.zim --channel-id "fed29d60e4d84a1e8dcfc781d920b40e" --node-ids 'd92c07655128458f8248416154b18a68,89fe2f86ee3f4fbaa7fb2bf9bd56d088,75f99e6b97d14b14a4e74762ad77391f,89fe2f86ee3f4fbaa7fb2bf9bd56d088'
Run from official version (published on GHCR.io) ; ZIM will be available in the output
sub-folder of current working directory.
docker run --rm -it -v $(pwd)/output:/output ghcr.io/openzim/kolibri:latest kolibri2zim --name "Biblioteca Elejandria" --output /output --tmp-dir /tmp --zim-file Biblioteca_Elejandria.zim --channel-id "fed29d60e4d84a1e8dcfc781d920b40e" --node-ids 'd92c07655128458f8248416154b18a68,89fe2f86ee3f4fbaa7fb2bf9bd56d088,75f99e6b97d14b14a4e74762ad77391f,89fe2f86ee3f4fbaa7fb2bf9bd56d088'