Skip to content

A Java parser for Outlook messages (.msg files)

Notifications You must be signed in to change notification settings

bbottema/outlook-message-parser

Repository files navigation

APACHE v2 License Latest Release Javadocs Codacy

Outlook Message Parser

Outlook Message Parser is a small open source Java library that parses Outlook .msg files.

<dependency>
  <groupId>org.simplejavamail</groupId>
  <artifactId>outlook-message-parser</artifactId>
  <version>1.14.1</version>
</dependency>

Outlook Message Parser is a continuation (or fork if that project independently continues) of msgparser.

Under the hood it uses the Apache POI - POIFS library to parse the message files which use the OLE 2 Compound Document format. Thus, it is merely a convenience library that covers the details of the .msg file. The implementation is based on the information provided at fileformat.info.

v1.14.0 - v1.14.1

  • 1.14.1 (08-06-2024): #64: [Bug] Parsing lists to HTML has double bullet points
  • 1.14.0 (25-05-2024): #80: RTF converted to HTML doesn't always detect charset properly

v1.13.0 - v1.13.4

  • 1.13.4 (04-May-2024): bumped apache poi to 5.2.5 and managed commons-io to 2.16.1
  • 1.13.3 (04-May-2024): bumped angus-activation from 2.0.2 to 2.0.3
  • 1.13.2 (05-April-2024): #73 B: Don't overwrite existing address, but do retain X500 address if available
  • 1.13.1 (04-April-2024): #73 A: Further improve X500 addresses detection
  • 1.13.0 (18-January-2024): #71: Update to latest Jakarta+Angus dependencies

v1.12.0 (10-December-2023)

  • #70: [Enhancement] ignore recipients with null-address

v1.11.0 - v1.11.1

  • 1.11.1 (08-December-2023): #69: Enhancement: instead of ignoring them completely, only ignore for embedded images
  • 1.11.0 (08-December-2023): #69: Enhancement: ignore attachment with missing content

v1.10.0 - v1.10.2

  • 1.10.2 (03-December-2023): #68 Improved heuristics for X500 Names
  • 1.10.1 (24-October-2023): #67 Fixed "possibility to parse X500 Names"
  • 1.10.0 (24-October-2023): #67 Adding possibility to parse X500 Names (dont' use this version)

v1.9.0 - v1.9.6

  • v1.9.6 (18-July-2022): #57 Same, but now with Collection values to support duplicate headers
  • v1.9.5 (18-July-2022): #57 Headers should be more accessible, rather than just a big string of text
  • v1.9.x - a bunch of dependency fixes and tries apparently, my release train was not so smooth here, sorry
  • v1.9.0 (13-May-2021): #55 CVE issue: Update Apache POI and POI Scratchpad

v1.8.0 - v1.8.1

  • v1.8.1 (31-January-2022): #41 OutlookMessage.getPropertyValue() should be public
  • v1.8.0 (31-January-2022): #52 Adjust dependencies and make Java 9+ friendly
  • v1.8.0 (31-January-2022): #45 Bump commons-io from 2.6 to 2.7

v1.7.10 - v1.7.13 (17-November-2021)

  • #49 bugfix solved by improved charset handling
  • #46 bugfix Rare NPE case of producing empty nested outlook attachment when there should be no attachments
  • #43 bugfix bugfix getFromEmailFromHeaders cannot handle "quoted-name-with@at-sign"
  • some minor code improvements

v1.7.9 (10-October-2020)

  • #28 / #36 bugfix NumberFormatException on parsing .msg files

v1.7.8 (4-August-2020)

  • #35 Clarify permission to publish project using Apache v2 license

v1.7.0 - v1.7.7 (9-January-2020 - 17-July-2020)

  • v1.7.7 - #34 Wrong encoding for bodyHTML
  • v1.7.5 - #31 Bugfix for attachments with special characters in the name
  • v1.7.4 - #27 Same as 1.7.3, but now also for chinese senders
  • v1.7.3 - #27 When from name/address are not available (unsent emails), these fields are filled with binary garbage
  • v1.7.2 - #26 To email address is not handled properly when name is omitted
  • v1.7.1 - #25 NPE on ClientSubmitTime when original message has not been sent yet
  • v1.7.1 - #23 Bug: _nameid directory should not be parsed (and causing invalid HTML body)
  • v1.7.0 - #18 Upgrade Apache POI 3.9 -> 4.x

Note: Apache POI requires minimum Java 8

v1.6.0 (8-January-2020)

  • #21 Multiple TO recipients are not handles properly

v1.5.0 (18-December-2019)

  • #20 CC and BCC recipients are not parsed properly
  • #19 Use real Outlook ContentId Attribute to resolve CID Attachments

v1.4.1 (22-October-2019)

  • #17 Fixed encoding error for UTF-8's Windows legacy name (cp)65001

v1.4.0 (13-October-2019)

  • #9 Replaced the RFC to HTML converter with a brand new RFC-compliant convert! (thanks to @fadeyev!)

v1.3.0 (4-October-2019)

  • #14 Dependency problem with Java9+, missing Jakarta Activation Framework
  • #13 HTML start tags with extra space not handled correctly
  • #11 SimpleRTF2HTMLConverter inserts too many
    tags
  • #10 Embedded images with DOS-like names are classified as attachments
  • #9 SimpleRTF2HTMLConverter removes some valid tags during conversion

v1.2.1 (12-May-2019)

  • Ignore non S/MIME related content types when extracting S/MIME metadata
  • Added toString and equals methods to the S/MIME data classes

v1.1.21 (4-May-2019)

  • Upgraded mediatype recognition based on file extension for incomplete attachments
  • Added / improved support for public S/MIME meta data

v1.1.20 (14-April-2019)

  • #7 Fix missing S/MIME header details that are needed to determine the type of S/MIME application

v1.1.19 (10-April-2019)

  • Log rtf compression error, but otherwise ignore it and keep going and extract what we can.

v1.1.18 (5-April-2019)

  • #6 Missing mimeTag for attachments should be guessed based on file extension

v1.1.17 (19-August-2018)

  • #3 implemented robust support for character sets / code pages in RTF to HTML conversion (fixes chinese support #3)
  • fixed bug where too much text was cleaned up as part of superfluous RTF cleanup step when converting to HTML
  • Performance boost in the RTF -> HTML converter

v1.1.16 (~28-Februari-2017)

v1.16

  • Added support for replyTo name and address
  • cleaned up code (1st wave)