Python : How to parse things such as : from, to, body, from a raw email source w/Python
Solution 1
I don't really understand what your final code snippet has to do with anything - you haven't mentioned anything about HTML until that point, so I don't know why you would suddenly be giving an example of parsing HTML (which you should never do with a regex anyway).
In any case, to answer your original question about getting the headers from an email message, Python includes code to do that in the standard library:
import email
msg = email.message_from_string(email_string)
msg['from'] # '[email protected]'
msg['to'] # '[email protected]'
Solution 2
Fortunately Python makes this simpler: http://docs.python.org/2.7/library/email.parser.html#email.parser.Parser
from email.parser import Parser
parser = Parser()
emailText = """PUT THE RAW TEXT OF YOUR EMAIL HERE"""
email = parser.parsestr(emailText)
print email.get('From')
print email.get('To')
print email.get('Subject')
The body is trickier. Call email.is_multipart()
. If that's false, you can get your body by calling email.get_payload()
. However, if it's true, email.get_payload()
will return a list of messages, so you'll have to call get_payload()
on each of those.
if email.is_multipart():
for part in email.get_payload():
print part.get_payload()
else:
print email.get_payload()
Solution 3
"Body" is not present in your sample email
Can use email module :
import email
msg = email.message_from_string(email_message_as_text)
Then use:
print email['To']
print email['From']
... ... etc
Solution 4
You should probably use email.parser
s = """
From [email protected] Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
for <[email protected]>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
Thu, 25 Jul 2013 19:28:59 -0700
From: [email protected]
Subject: ooooooooooooooooooooooo
To: [email protected]
Cc:
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"
This is a multi-part message in MIME format.
--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
--bound1374805739--
"""
import email.parser
msg = email.parser.Parser().parsestr(s)
help(msg)
Admin
Updated on July 09, 2022Comments
-
Admin almost 2 years
The raw email usually looks something like this
From [email protected] Thu Jul 25 19:28:59 2013 Received: from a1.local.tld (localhost [127.0.0.1]) by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866 for <[email protected]>; Thu, 25 Jul 2013 19:28:59 -0700 Received: (from root@localhost) by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865; Thu, 25 Jul 2013 19:28:59 -0700 From: [email protected] Subject: ooooooooooooooooooooooo To: [email protected] Cc: X-Originating-IP: 192.168.15.127 X-Mailer: Webmin 1.420 Message-Id: <1374805739.3861@a1> Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="bound1374805739" This is a multi-part message in MIME format. --bound1374805739 Content-Type: text/plain Content-Transfer-Encoding: 7bit ooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooooo --bound1374805739--
So if I wanted to code a PYTHON script to get the
From To Subject Body
Is this the code I am looking for to built on of or is there a better method?
a='<title>aaa</title><title>aaa2</title><title>aaa3</title>' import re a1 = re.findall(r'<(title)>(.*?)<(/title)>', a)