Rails - Mail, getting the body as Plain Text

29,417

Solution 1

The code above:

message = Mail.new(params[:message])

will create a new instance of the mail gem from the full message. You can then use any of the methods on that message to get the content. You can therefore get the plain content using:

message.text_part

or the HTML with

message.html_part

These methods will just guess and find the first part in a multipart message of either text/plain or text/html content type. CloudMailin also provides these as convenience methods however via params[:plain] and params[:html]. It's worth remembering that the message is never guaranteed to have a plain or html part. It may be worth using something like the following to be sure:

plain_part = message.multipart? ? (message.text_part ? message.text_part.body.decoded : nil) : message.body.decoded
html_part = message.html_part ? message.html_part.body.decoded : nil

As a side note it's also important to extract the content encoding from the message when you use these methods and make sure that the output is encoded into the encoding method you desire (such as UTF-8).

Solution 2

What is Mail?

The message defined in the question appears to be an instance of the same Mail or Mail::Message class, which is also used in ActionMailer::Base, or in the mailman gem.

I'm not sure where this is integrated into rails, but Steve Smith has pointed out that this is defined in the mail gem.

Extracting a Part From a Multipart Email

In the gem's readme, there is an example section on reading multipart emails.

Besides the methods html_part and text_part, which simply find the first part of the corresponding mime type, one can access and loop through the parts manually and filter by the criteria as needed.

message.parts.each do |part|
  if part.content_type == 'text/plain'
    # ...
  elsif part.content_type == 'text/html'
    # ...
  end 
end

The Mail::Part is documented here.

Encoding Issues

Depending on the source of the received mail, there might be encoding issues. For example, rails could identify the wrong encoding type. If, then, one tries to convert the body to UTF-8 in order to store it in the database (body_string.encode('UTF-8')), there might be encoding errors like

Encoding::UndefinedConversionError - "\xFC" from ASCII-8BIT to UTF-8

(like in this SO question).

In order to circumvent this, one can readout the charset from the message part and tell rails what charset it has been before encoding to UTF-8:

encoding = part_to_use.content_type_parameters['charset']
body = part_to_use.body.decoded.force_encoding(encoding).encode('UTF-8')

Here, the decoded method removes the header lines, as shown in the encoding section of the mail gem's readme.

EDIT: Hard Encoding Issues

If there are really hard encoding issues, the former approach does not solve, have a look at the excellent charlock_holmes gem.

After adding this gem to the Gemfile, there is a more reliable way to convert email encodings, using the detect_encoding method, which is added to Strings by this gem.

I found it helpful to define a body_in_utf8 method for mail messages. (Mail::Part also inherits from Mail::Message.):

module Mail
  class Message
    def body_in_utf8
      require 'charlock_holmes/string'
      body = self.body.decoded
      if body.present?
        encoding = body.detect_encoding[:encoding]
        body = body.force_encoding(encoding).encode('UTF-8')
      end
      return body
    end
  end
end

Summary

# select the part to use, either like shown above, or as one-liner
part_to_use = message.html_part || message.text_part || message

# readout the encoding (charset) of the part
encoding = part_to_use.content_type_parameters['charset'] if part_to_use.content_type_parameters

# get the message body without the header information
body = part_to_use.body.decoded

# and convert it to UTF-8
body = body.force_encoding(encoding).encode('UTF-8') if encoding

EDIT: Or, after defining a body_in_utf8 method, as shown above, the same as one-liner:

(message.html_part || message.text_part || message).body_in_utf8

Solution 3

email = Mail.new(params[:message])
text_body = (email.text_part || email.html_part || email).body.decoded

I'm using this solution on RedmineCRM Helpdesk plugin

Solution 4

I believe if you call message.text_part.body.decoded you will get it converted to UTF-8 for you by the Mail gem, the documentation isn't 100% clear on this though.

Share:
29,417

Related videos on Youtube

AnApprentice
Author by

AnApprentice

working on Matter, a new way to gather professional feedback.

Updated on November 18, 2021

Comments

  • AnApprentice
    AnApprentice over 2 years

    Given: message = Mail.new(params[:message])

    as seen here: http://docs.heroku.com/cloudmailin

    It shows how to get the message.body as HTML, how to do you get the plain/text version?

    Thanks

  • Steve
    Steve about 12 years
    Thanks! I was having some issues with parsing out an email after decoding but getting the text_part helped fix this.
  • David Morales
    David Morales over 11 years
    Excellent answer. I must say this is working for the default Rails Action Mailer. No need for any mail gem.
  • Arnold Roa
    Arnold Roa over 11 years
    How i can extract the encoding? im doing this ..force_encoding("ISO-8859-1").encode('utf_8') and on some message works, in others dont.
  • RocketR
    RocketR about 11 years
    @David "default Rails Action Mailer" is the mail gem. At least, depends on it much.
  • RocketR
    RocketR about 11 years
    No, it doesn't. It returns a string like \xF0\xD2\x12...
  • New Alexandria
    New Alexandria almost 11 years
    Seriously?? Answers like this need a special quality stamp.
  • coderuby
    coderuby over 10 years
    What a GREAT answer. Thank you very much!!
  • Paul Danelli
    Paul Danelli about 4 years
    OMG Thank you. No wonder I missed it before, its on line 1600+ in the Message class.
  • Paul Watson
    Paul Watson almost 3 years
    Such a great answer, thank you. Working with emails is a real 80/20 headache.
  • Dorian
    Dorian over 2 years
    yeah... html_safe on user-provided content, that's not gonna end well (XSS)