Skip to content

Latest commit

 

History

History
132 lines (108 loc) · 5.7 KB

README.md

File metadata and controls

132 lines (108 loc) · 5.7 KB

About:

Python Script To Copy WuxiaWorld Chapters Into EPUB File.

Copies The Novel Chapters Along With Novel Details And Sometimes(Once Every 6-10 Times Code Is Run) 'Not' The Cover Image (IDK Why ? Maybe Because Of BeautifulSoup4 Internal Problem).

How Does The Script Work ? Just Enter The Novel URL Inside The Script And The Rest Follows.

I'll Try To Add Any Necessary Updates.

Initial Implementation By : Aundinn


Note :

Check this other novel webiste: https://wxuiaworld.co. Ask Me, Why This Website? Well, It Has Novels From Webnovel(Qidan) & WuxiaWorld With All Latest Chapters Unlocked. No Spirit Stones, No Patreon, No Subscription Or Any Of Those Things Required To Read The Latest Chapters! Don't Take My Word For It ? Check It Out.


Task(s) :

  • Get List Of Chapters From Novel Website And Use Links From The List Rather Than Progress Sequentially Because Of The Arising Problem Of Some Pages Not Having Sequential Names.
  • Implement multiprocessing to speed up process.

Problem(s) :

  • None Yet(Report if any).

Screenshot :

Image Not Avialable

Documentation :

  1. For Beginners, After Setting Up A Working Python 3 Environment(Along With Latest pip), You Need To Install Some Packages. To Install, Run These Commands In Your CMD/Terminal :

    • pip3 install bs4
    • pip3 install ebooklib
    • pip3 install requests
    • pip3 install html5lib=="0.9999999"
  2. Download The Python Script And Unzip It.

  3. Open The Script With A Text Editor And Read The Details Inside.

  4. In Case The Script Was Not Updated According To The Changes In Website, You Might Refer The BeautifulSoup Docs To Make Changes Accordingly.

  5. To Run, Open CMD/Terminal, Navigate To The Unzip Location And Type :

    • Linux -python3 code.py
    • Windows - python code.py or py code.py
  6. EPUB File Will Be Saved At The Location Of Script.

Working :

Parsing :

html5lib Is Used Because Although Being Tiny Winy Bit Slow, It Generates Valid HTML. You May Compare Others Here, Differences Between Parsers. I've Copied The Table From BS4 Website Below To Give A Faint Overview.

Parser Typical usage Advantages Disadvantages
Python’s html.parser BeautifulSoup(markup, "html.parser")
  • Batteries included
  • Decent speed
  • Lenient (as of Python 2.7.3 and 3.2.)
  • Not very lenient (before Python 2.7.3 or 3.2.2)
lxml’s HTML parser BeautifulSoup(markup, "lxml")
  • Very fast
  • Lenient
  • External C dependency
lxml’s XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")
  • Very fast
  • The only currently supported XML parser
  • External C dependency
html5lib BeautifulSoup(markup, "html5lib")
  • Extremely lenient
  • Parses pages the same way a web browser does
  • Creates valid HTML5
  • Very slow
  • External Python dependency

If Any Problem Occurs With html5lib :

  • In Case You Update It Accidentally, You Can Reinstall The Specific Version By Checking The Details For Beginners.
  • Another Choice, Change html5lib To lxml - If Installed, Otherwise To Python's Inbuilt html.parser .

License

Copyright © 2018 Kogam22. Released under the terms of the Apache 2.0 license.