A python web scraping script to download manga from mangareader.to using Selenium, beautifulsoup4, and requests libraries.
You can either run the standalone executable packaged by PyInstaller or run the script directly.
-
Use
pip
to install the packagepip install dist/manga-dl-0.3.0.tar.gz
-
Create a python3.11 virtual environment by
virtualenv venv --python=python3.11
-
Activate the virtual environment by
source venv/bin/activate
then install the dependencies by
pip install -r requirements.txt
manga-dl.py [OPTIONS] ✨Manga URL✨ ✨Save Path✨
-
Go to mangareader.to and copy the URL of the manga you want to download. The URL pattern should be
https://mangareader.to/read/<manga-name>/<language>/<chapter/volume>
-
Run
python3 app/manga-dl.py <URL> [PATH]
to download the manga and save it in path. If path is not specified, the manga will be saved in the current directory. -
You can run
python3 app/manga-dl.py --help
to see the help message.
-
The script uses Selenium instantiate a Chrome webdriver to open the manga page in a headless browser (run in background).
-
It is required to answer cookie consent and select the reading mode when visiting the website for the first time. It uses Selenium ActionChain to simulate user actions, such as clicking and scrolling, so that the manga pictures can be loaded into the to webpage.
-
There are two options when it comes to reading mode, Vertical Follow and Horizontal Follow. For my implementation, the webdriver will then find Vertical Follow button by its XPATH and click it in order to display manga pictures. However, I will try to implement the Horizontal Follow mode in the future because it is more efficient to download manga in this mode.
-
Most manga pictures are located in
canvas
tag of the HTML which has a URL in thedata-url
attribute linking to a shuffled image. They are dynamically restored and loaded by JavaScript when user the image is in close to user's viewport. Below is an example of a shuffled image from the Attack on Titan Manga Volume 1 cover: