Reading .eml files with Python 3.6 using emaildata 0.3.4
Using the email package, we can read in the .eml files. Then, use the BytesParser
library to parse the file. Finally, use a plain
preference (for plain text) with the get_body()
method, and get_content()
method to get the raw text of the email.
import email
from email import policy
from email.parser import BytesParser
import glob
file_list = glob.glob('*.eml') # returns list of files
with open(file_list[2], 'rb') as fp: # select a specific email file from the list
msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
print(text) # print the email content
>>> "Hi,
>>> This is an email
>>> Regards,
>>> Mister. E"
Granted, this is a simplified example - no mention of HTML or attachments. But it gets done essentially what the question asks and what I want to do.
Here is how you would iterate over several emails and save each as a plain text file:
file_list = glob.glob('*.eml') # returns list of files
for file in file_list:
with open(file, 'rb') as fp:
msg = BytesParser(policy=policy.default).parse(fp)
fnm = os.path.splitext(file)[0] + '.txt'
txt = msg.get_body(preferencelist=('plain')).get_content()
with open(fnm, 'w') as f:
print('Filename:', txt, file = f)
Related videos on Youtube
PyRsquared
Updated on November 01, 2022Comments
-
PyRsquared over 1 year
I am using python 3.6.1 and I want to read in email files (.eml) for processing. I am using the emaildata 0.3.4 package, however whenever I try to import the Text class as in the documentation, I get the module errors:
import email from email.text import Text >>> ModuleNotFoundError: No module named 'cStringIO'
When I tried to correct using this update, I get the next error relating to
mimetools
>>> ModuleNotFoundError: No module named 'mimetools'
Is it possible to use emaildata 0.3.4 with python 3.6 to parse .eml files? Or are there any other packages I can use to parse .eml files? Thanks
-
Dmitri Chubarov over 6 yearsemaildata module has not been updated for over 2 years. It is not compatible with python 3. Consider using the email package from the standard library.
-
-
PatrickT over 5 yearsadded an edit with a loop over file names, which I guess you had intended to add (feel free to roll back edit).
-
Amey P Naik almost 5 yearsis there a way to extract only the sender address ?
-
DGS about 4 yearsHow to find meta-data information (from, sender,cc,subject,etc) and check if any attachment is present?