com.auxilii.msgparser
Class MsgParser

Object
  extended by com.auxilii.msgparser.MsgParser

public class MsgParser
extends Object

Main parser class that does the actual parsing of the Outlook .msg file. It uses the POI library for parsing the .msg container file and is based on a description posted by Peter Fiskerstrand at fileformat.info.

It parses the .msg file and stores the information in a Message object. Attachments are put into an FileAttachment object. Hence, please keep in mind that the complete mail is held in the memory! If an attachment is another .msg file, this attachment is not processed as a normal attachment but rather included as a MsgAttachment. This attached mail is, again, a Message object and may have further attachments and so on.

Furthermore there is a feature which allows us to extract HTML bodies when only RTF bodies are available. In order to achieve this a conversion class implementing RTF2HTMLConverter is used. This can be overridden with a custom implementation as well (see code below for an example).

Note: this code has not been tested on a wide range of .msg files. Use in production level (as in any other level) at your own risk.

Usage:

MsgParser msgp = new MsgParser();
msgp.setRtf2htmlConverter(new SimpleRTF2HTMLConverter()); //optional (if you want to use your own implementation)
Message msg = msgp.parseMsg("test.msg");

Author:
roman.kurmanowytsch

Field Summary
protected static Logger logger
           
protected static String propertyStreamPrefix
           
protected static String propsKey
           
protected  RTF2HTMLConverter rtf2htmlConverter
           
 
Constructor Summary
MsgParser()
          Empty constructor.
 
Method Summary
protected  FieldInformation analyzeDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de)
          Analyzes the DocumentEntry and returns a FieldInformation object containing the class (the field name, so to say) and type of the entry.
protected  void checkDirectoryDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de, Message msg)
          Parses a directory document entry which can either be a simple entry or a stream that has to be split up into multiple document entries again.
protected  void checkDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir, Message msg)
          Recursively parses the complete .msg file with the help of the POI library.
protected  void checkRecipientDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir, Message msg)
          Parses a recipient directory entry which holds informations about one of possibly multiple recipients.
protected  void checkRecipientDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de, RecipientEntry recipient)
          Parses a recipient document entry which can either be a simple entry or a stream that has to be split up into multiple document entries again.
protected  Object getData(org.apache.poi.poifs.filesystem.DocumentEntry de, FieldInformation info)
          Reads the information from the InputStream and creates, based on the information in the FieldInformation object, either a String or a byte[] (e.g., for attachments) Object containing this data.
protected  void parseAttachment(org.apache.poi.poifs.filesystem.DirectoryEntry dir, Message msg)
          Creates an Attachment object based on the given directory entry.
 Message parseMsg(File msgFile)
          Parses a .msg file provided in the specified file.
 Message parseMsg(InputStream msgFileStream)
          Parses a .msg file provided by an input stream.
 Message parseMsg(InputStream msgFileStream, boolean closeStream)
          Parses a .msg file provided by an input stream.
 Message parseMsg(String msgFile)
          Parses a .msg file provided in the specified file.
 void setRtf2htmlConverter(RTF2HTMLConverter rtf2htmlConverter)
          Setter for overriding the default RTF2HTMLConverter implementation which is used to get HTML code from an RTF body.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final Logger logger

propsKey

protected static final String propsKey
See Also:
Constant Field Values

propertyStreamPrefix

protected static final String propertyStreamPrefix
See Also:
Constant Field Values

rtf2htmlConverter

protected RTF2HTMLConverter rtf2htmlConverter
Constructor Detail

MsgParser

public MsgParser()
Empty constructor.

Method Detail

parseMsg

public Message parseMsg(File msgFile)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided in the specified file.

Parameters:
msgFile - The .msg file.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

parseMsg

public Message parseMsg(String msgFile)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided in the specified file.

Parameters:
msgFile - The .msg file as a String path.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

parseMsg

public Message parseMsg(InputStream msgFileStream)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided by an input stream.

Parameters:
msgFileStream - The .msg file as a InputStream.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

parseMsg

public Message parseMsg(InputStream msgFileStream,
                        boolean closeStream)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided by an input stream.

Parameters:
msgFileStream - The .msg file as a InputStream.
closeStream - Indicates whether the provided stream should be closed after the message has been read.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

checkDirectoryEntry

protected void checkDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                                   Message msg)
                            throws IOException,
                                   UnsupportedOperationException
Recursively parses the complete .msg file with the help of the POI library. The parsed information is put into the Message object.

Parameters:
dir - The current node in the .msg file.
msg - The resulting Message object.
Throws:
IOException - Thrown if the .msg file could not be parsed.
UnsupportedOperationException - Thrown if the .msg file contains unknown data.

checkRecipientDirectoryEntry

protected void checkRecipientDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                                            Message msg)
                                     throws IOException
Parses a recipient directory entry which holds informations about one of possibly multiple recipients. The parsed information is put into the Message object.

Parameters:
dir - The current node in the .msg file.
msg - The resulting Message object.
Throws:
IOException - Thrown if the .msg file could not be parsed.

checkDirectoryDocumentEntry

protected void checkDirectoryDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de,
                                           Message msg)
                                    throws IOException
Parses a directory document entry which can either be a simple entry or a stream that has to be split up into multiple document entries again. The parsed information is put into the Message object.

Parameters:
de - The current node in the .msg file.
msg - The resulting Message object.
Throws:
IOException - Thrown if the .msg file could not be parsed.

checkRecipientDocumentEntry

protected void checkRecipientDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de,
                                           RecipientEntry recipient)
                                    throws IOException
Parses a recipient document entry which can either be a simple entry or a stream that has to be split up into multiple document entries again. The parsed information is put into the RecipientEntry object.

Parameters:
de - The current node in the .msg file.
recipient - The resulting RecipientEntry object.
Throws:
IOException - Thrown if the .msg file could not be parsed.

getData

protected Object getData(org.apache.poi.poifs.filesystem.DocumentEntry de,
                         FieldInformation info)
                  throws IOException
Reads the information from the InputStream and creates, based on the information in the FieldInformation object, either a String or a byte[] (e.g., for attachments) Object containing this data.

Parameters:
de - The Document Entry.
info - The field information that is needed to determine the data type of the input stream.
Returns:
The String/byte[] object representing the data.
Throws:
IOException - Thrown if the .msg file could not be parsed.
UnsupportedOperationException - Thrown if the .msg file contains unknown data.

analyzeDocumentEntry

protected FieldInformation analyzeDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de)
Analyzes the DocumentEntry and returns a FieldInformation object containing the class (the field name, so to say) and type of the entry.

Parameters:
de - The DocumentEntry that should be examined.
Returns:
A FieldInformation object containing class and type of the document entry or, if the entry is not an interesting field, an empty FieldInformation object containing FieldInformation.UNKNOWN class and type.

parseAttachment

protected void parseAttachment(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                               Message msg)
                        throws IOException
Creates an Attachment object based on the given directory entry. The entry may either point to an attached file or to an attached .msg file, which will be added as a MsgAttachment object instead.

Parameters:
dir - The directory entry containing the attachment document entry and some other document entries describing the attachment (name, extension, mime type, ...)
msg - The Message object that this attachment should be added to.
Throws:
IOException - Thrown if the attachment could not be parsed/read.

setRtf2htmlConverter

public void setRtf2htmlConverter(RTF2HTMLConverter rtf2htmlConverter)
Setter for overriding the default RTF2HTMLConverter implementation which is used to get HTML code from an RTF body.

Parameters:
rtf2htmlConverter - The converter instance to be used.


Copyright © 2007 Roman Kurmanowytsch