Skip to content

KitaitiMakoto/epub-parser

Repository files navigation

EPUB Parser

EPUB Parser

pipeline epub parser coverage

INSTALLATION

gem install epub-parser

USAGE

As a library

require 'epub/parser'

book = EPUB::Parser.parse('book.epub')
book.metadata.titles # => Array of EPUB::Publication::Package::Metadata::Title. Main title, subtitle, etc...
book.metadata.title # => Title string including all titles
book.metadata.creators # => Creators(authors)
book.each_page_on_spine do |page|
  page.media_type # => "application/xhtml+xml"
  page.entry_name # => "OPS/nav.xhtml" entry name in EPUB package(zip archive)
  page.read # => raw content document
  page.content_document.nokogiri # => Nokogiri::XML::Document. The same to Nokogiri.XML(page.read)
  # do something more
  #    :
end
book.cover_image # => EPUB::Publication::Package::Manifest::Item which represents cover image file

See document’s {file:docs/Home.markdown} or API Documentation for more info.

epubinfo command-line tool

epubinfo tool extracts and shows the metadata of specified EPUB book.

% epubinfo ./linear-algebra.epub
Title:              A First Course in Linear Algebra
Identifiers:        code.google.com.epub-samples.linear-algebra
Titles:             A First Course in Linear Algebra
Languages:          en
Contributors:
Coverages:
Creators:           Robert A. Beezer
Dates:
Descriptions:
Formats:
Publishers:
Relations:
Rights:             This work is shared with the public using the GNU Free Documentation License, Version 1.2., © 2004 by Robert A. Beezer.
Sources:
Subjects:
Types:
Modified:           2012-03-05T12:47:00Z
Unique identifier:  code.google.com.epub-samples.linear-algebra
Epub version:       3.0
Navigations:        toc, landmarks

See {file:docs/Epubinfo.markdown} for more info.

epub-open command-line tool

epub-open tool provides interactive shell(IRB) which helps you research about EPUB book.

epub-open path/to/book.epub

IRB starts. self becomes the EPUB book and can access to methods of EPUB.

title
=> "Title of the book"
metadata.creators
=> [Author 1, Author2, ...]
resources.first.properties
=> #<Set: {"nav"}> # You know that first resource of this book is nav document
nav = resources.first
=> ...
nav.href
=> #<Addressable::URI:0x15ce350 URI:nav.xhtml>
nav.media_type
=> "application/xhtml+xml"
puts nav.read
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
    :
    :
    :
</html>
=> nil
exit # Enter "exit" when exit the session

See {file:docs/EpubOpen.markdown} for more info.

epub-cover command-line tool

epub-cover tool extract cover image from EPUB book.

% epub-cover childrens-literature.epub
Cover image output to cover.png

See {file:docs/EpubCover.adoc} for details.

DOCUMENTATION

Documentation is available in homepage.

If you installed EPUB Parser by gem command, you can also generate documentaiton yourself(rubygems-yardoc gem is needed):

$ gem install epub-parser
$ gem yardoc epub-parser
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc

It will show you path to generated documentation(/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc here) at the end.

Or, generating by yardoc command is possible, too:

$ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented

Then documentation will be available in doc directory.

REQUIREMENTS

  • Ruby 2.3.0 or later

SIMILAR EFFORTS

  • gepub - a generic EPUB library for Ruby

  • epubinfo - Extracts metadata information from EPUB files. Supports EPUB2 and EPUB3 formats.

  • ReVIEW - ReVIEW is a easy-to-use digital publishing system for books and ebooks.

  • epzip - epzip is EPUB packing tool. It’s just only doing 'zip.' :)

  • eeepub - EeePub is a Ruby ePub generator

  • epub-maker - This library supports making and editing EPUB books based on this EPUB Parser library

  • epub-cfi - EPUB CFI library extracted this EPUB Parser library.

If you find other gems, please tell me or request a pull request.

RECENT CHANGES

0.5.0

  • Follow Ruby 3.5 inspection change

0.4.9

  • Restructure test

  • Restructure Rake tasks

  • Update required Ruby version to 2.6

0.4.8

  • Add Rubyzip adapter

0.4.7

  • [BUG FIX]Fix a bug that epubinfo doesn’t handle navigation properly

0.4.6

  • [BUG FIX]Prevent epubinfo tool raise exception when no nav elements

  • Tiny modifcation on Zip archive manipulation

  • Remove version specification from Nokogiri to migrate to Ruby 3.1

See {file:CHANGELOG.adoc} for older changelogs and details.

TODOS

  • Consider to implement IRI feature instead of to use Addressable

  • EPUB 3.2

  • Help features for epub-open tool

  • Vocabulary Association Mechanisms

  • Implementing navigation document and so on

  • Media Overlays

  • Content Document

  • Digital Signature

  • Handle with encodings other than UTF-8

DONE

  • Simple inspect for epub-open tool

  • Using zip library instead of unzip command, which has security issue

  • Modify methods around fallback to see bindings element in the package

  • Content Document(only for Navigation Documents)

  • Fixed Layout

  • Vocabulary Association Mechanisms(only for itemref)

  • Archive library abstraction

  • Extracting and organizing common behavior from some classes to modules

  • Multiple rootfiles

  • Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)

LICENSE

This library is distribuetd under the term of the MIT License. See {file:MIT-LICENSE} file for more info.