Failing to parse this multi-part mime message body in Java

33,602

Solution 1

There are a few things wrong with the text you posted.

It is not a valid multi-part mime. Check out wikipedia reference which, while non-normative, is still correct.

The mime boundary is not defined. From the wikipedia example: Content-Type: multipart/mixed; boundary="frontier" shows that the boundary is "frontier". In your example, "----=_NextPart_000_005D_01CC73D5.3BA43FB0" is the boundary, but that can only be determined by scanning the text (i.e. the mime is malformed). You need to instruct the goofball that is passing you the mime content that you also need to know the mime boundary value, which is not defined in a message header. If you get the entire body of the message you will have enough because the body of the message starts with MIME-Version: 1.0 followed by Content-Type: multipart/mixed; boundary="frontier" where frontier will be replaced with the value of the boundary for the encoded mime.

If the person who is sending the body is a goofball (changed from monkey because monkey is too judgemental - my bad DwB), and will not (more likely does not know how to) send the full body, you can derive the boundary by scanning the text for a line that starts and ends with "--" (i.e. --boundary--). Note that I mentioned a "line". The terminal boundary is actually "--boundary--\n".

Finally, the stuff you posted has 2 parts. The first part appears to define substitutions to take place in the second part. If this is true, the Content-Type: of the first part should probably be something other than "text/plain". Perhaps "companyname/substitution-definition" or something like that. This will allow for multiple (as in future enhancements) substitution formats.

Solution 2

First I took your example message and replaced all occurrences of \n with newlines and \t with tabs.

Then I downloaded the JARs from the Mime4J project, a subproject of Apache James, and executed the GUI parsing example org.apache.james.mime4j.samples.tree.MessageTree with the transformed message above as input. And apparently Mime4J was able to parse the message and to extract the HTML message part.

Solution 3

Can create MimeMultipart from http request.

javax.mail.internet.MimeMultipart m = new MimeMultipart(new ServletMultipartDataSource(httpRequest));

public class ServletMultipartDataSource implements DataSource {
    String contentType;
    InputStream inputStream;
    public ServletMultipartDataSource(ServletRequest request) throws IOException {
        inputStream = new SequenceInputStream(new ByteArrayInputStream("\n".getBytes()), request.getInputStream());
        contentType = request.getContentType();
    }
    public InputStream getInputStream() throws IOException {
        return inputStream;
    }
    public OutputStream getOutputStream() throws IOException {
        return null;
    }
    public String getContentType() {
        return contentType;
    }
    public String getName() {
        return "ServletMultipartDataSource";
    }
}

For get submitted form parameter need parse BodyPart headers:

public String getStringParameter(String name) throws MessagingException, IOException {
    for (int i = 0; i < getCount(); i++) {
        BodyPart bodyPart = m.getBodyPart(i);
        String[] nameHeader = bodyPart.getHeader("Content-Disposition");
        if (nameHeader != null && content instanceof String) {
            for (String bodyName : nameHeader) {
                if (bodyName.contains("name=\"" + name + "\"")) return String.valueOf(bodyPart.getContent());
            }
        }
    }
    return null;
}

Solution 4

If you are using javax.servlet.http.HttpServlet to receive the message, you will have to use HttpServletRequests.getHeaders to obtain the value of the HTTP header content-type. You will then use org.apache.james.mime4j.stream.MimeConfig.setHeadlessParsing to set the MimeConfig with the information so that it can properly process the mime message.

It appears that you are using HttpServletRequest.getInputStream to read the contents of the request. The input stream returned only has the content of the message after the HTTP headers (terminated by a blank line). That is why you have to extract content-type from the HTTP headers and feed it to the parser using setHeadlessParsing.

Share:
33,602
Bynan
Author by

Bynan

Updated on July 09, 2022

Comments

  • Bynan
    Bynan over 1 year

    I'm not writing a mail application, so I don't have access to all the headers and such. All I have is something like the block at the end of this question. I've tried using the JavaMail API to parse this, using something like

    Session s = Session.getDefaultInstance(new Properties());
    InputStream is = new ByteArrayInputStream(<< String to parse >>);
    MimeMessage message = new MimeMessage(s, is);
    Multipart multipart = (Multipart) message.getContent();
    

    But, it just tells me that message.getContent is a String, not a Multipart or MimeMultipart. Plus, I don't really need all the overhead of the whole JavaMail API, I just need to parse the text into it's parts. Here's an example:

    This is a multi-part message in MIME format.\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/plain;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\nStuff:\n\n            Please read this stuff at the beginning of each week.  =\nFeel free to discuss it throughout the week.\n\n\n--=20\n\nMrs. Suzy M. Smith\n555-555-5555\[email protected]\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/html;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\n\n\n\n\n\n\n\n\nStuff:

    \n           =20\nPlease read this stuff at the beginning of each =\nweek.  Feel=20\nfree to discuss it throughout the week.

    \n
    --

    Mrs. Suzy M. Smith
    555-555-5555
    [email protected]\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0--\n\n
  • Bynan
    Bynan about 12 years
    I downloaded those JARs and tried them out. I created a ContentHandler class and used MimeStreamParser.parse. In my handler, startMessage gets called, then startHeader and endHeader get called, which is fine, since there is no header. Then, body gets called, and then endMessage. I would expect to get some start and end bodyParts in there. Or, have body called twice, since there are 2 parts?
  • Bynan
    Bynan about 12 years
    While very informative, there's really nothing I can do about it. The text comes to me in that fashion. I need a way to parse it, and it there are no existing tools out there that can, I guess I have to do it by hand...
  • Bynan
    Bynan about 12 years
    I put some header info in as DwB suggested above, and MimeStreamParser.parse is working great now. Thanks for the info. I can't +1 you, as I don't have enough rep...
  • Bynan
    Bynan about 12 years
    By putting some header info in as you suggested, the parser that vanje suggested below worked great. Thanks for the info. I can't +1 you, as I don't have enough rep...