mbox_tools/mbox_parser at master · searchisko/mbox_tools

History

Name		Name	Last commit message	Last commit date
parent directory ..
src		src
README.md		README.md
pom.xml		pom.xml

README.md

The main goal of this code is to parse mbox file and render JSON representation out of it. It is assumed that the JSON file will be indexed (into Lucene, Elasticsearch, Searchisko, ... etc) thus some content is preprocessed in order to remove garbage.

It rely on Apache James Mime4J when parsing mbox file.

Possibly similar projects

There are some other libraries that you can use to parse mbox format.

Apache Tika (I did not find mbox parser flexible enough for our needs.)
ScaleUnlimited/text-similarity accompanied by articles: part #1, part #2. This is using Tika to parse mbox files as well, however, it discusses interesting related topics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mbox_parser

mbox_parser

README.md

Possibly similar projects

Files

mbox_parser

Directory actions

More options

Directory actions

More options

Latest commit

History

mbox_parser

Folders and files

parent directory

README.md

Possibly similar projects