A node.js book downloader from Archive.org
For downloading borrowed books from Archive.org you will first need:
apt-get install npm
npm install sleep
npm install request
To convert and OCR the downloaded images into a pdf with make_pdf.sh you will also need:
apt-get install imagemagick tesseract-ocr poppler-tools
- Install EditThisCookie for Chrome, or use something else for cookie extraction
- Login to archive.org
- Borrow a book
- Copy your cookies:
- In EditThisCookie options, first set the preferred export format to 'Semicolon separated name=value pairs'
- Click export and paste just the cookies (without comments) into the
cookies = '';
in the node_dl.js - If you are using another way to retrieve your cookies, just put your cookies into the cookies variable in node_dl.js
- Set other variables like
ua
(user-agent),pages
(how many pages the book has),local_name
(where to download and how to name the files)
- You might want to create a directory for the files, eg.
books/my_book
. In that case the local_name should bebooks/my_book/book_name
- Run
node node_dl.js
- Run
make_pdf.sh books/my_book output_name
- This will convert all jp2 files in the folder
books/my_book
into jpg's, OCR those jpg files and output into separate pdfs and finally join all pdfs intooutput_name.pdf