Skip to content

A simple WXR parser to parse the XML export from WordPress and export it into different formats.

License

Notifications You must be signed in to change notification settings

johncylee/wxr-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WXR Parser

A simple WXR parser written in Python to parse the XML export from WordPress and store the information in it in Python's basic data structures, i.e. dictionaries and lists. It also goes with a backend to export it in Markdown syntax suitable for Wintersmith. In its current form, it can simplify the migration from WordPress to Wintersmith, but it's easy to extend it to export more formats.

It's created because the author failed to find a simple one to use.

Usage

python wxr_parser.py -h and python wxr_backend.py -h to check usage info. Get a WXR document, try them on it and observe the logs to understand how it works.

Note

wxr_parser can optionally download the image files used by WordPress articles. This is for convenience but should only be used with caution.

wxr_backend uses Pandoc to translate html into Markdown, so Pandoc must be installed first.

ToDo

  1. unsupported wordpress tags:

    1. [gallery]
    2. [slideshare]
    3. [youtube]
    4. more...
  2. disqus support

About

A simple WXR parser to parse the XML export from WordPress and export it into different formats.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages