Reading .eml files with Python 3.6 using emaildata 0.3.4

12,353

Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body() method, and get_content() method to get the raw text of the email.

import email
from email import policy
from email.parser import BytesParser
import glob
file_list = glob.glob('*.eml') # returns list of files
with open(file_list[2], 'rb') as fp:  # select a specific email file from the list
    msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
print(text)  # print the email content
>>> "Hi,
>>> This is an email
>>> Regards,
>>> Mister. E"

Granted, this is a simplified example - no mention of HTML or attachments. But it gets done essentially what the question asks and what I want to do.

Here is how you would iterate over several emails and save each as a plain text file:

file_list = glob.glob('*.eml') # returns list of files
for file in file_list:
    with open(file, 'rb') as fp:
        msg = BytesParser(policy=policy.default).parse(fp)
        fnm = os.path.splitext(file)[0] + '.txt'
        txt = msg.get_body(preferencelist=('plain')).get_content()
        with open(fnm, 'w') as f:
            print('Filename:', txt, file = f) 
Share:
12,353

Related videos on Youtube

PyRsquared
Author by

PyRsquared

Updated on November 01, 2022

Comments

  • PyRsquared
    PyRsquared over 1 year

    I am using python 3.6.1 and I want to read in email files (.eml) for processing. I am using the emaildata 0.3.4 package, however whenever I try to import the Text class as in the documentation, I get the module errors:

    import email
    from email.text import Text
    >>> ModuleNotFoundError: No module named 'cStringIO'
    

    When I tried to correct using this update, I get the next error relating to mimetools

    >>> ModuleNotFoundError: No module named 'mimetools'
    

    Is it possible to use emaildata 0.3.4 with python 3.6 to parse .eml files? Or are there any other packages I can use to parse .eml files? Thanks

    • Dmitri Chubarov
      Dmitri Chubarov over 6 years
      emaildata module has not been updated for over 2 years. It is not compatible with python 3. Consider using the email package from the standard library.
  • PatrickT
    PatrickT over 5 years
    added an edit with a loop over file names, which I guess you had intended to add (feel free to roll back edit).
  • Amey P Naik
    Amey P Naik almost 5 years
    is there a way to extract only the sender address ?
  • DGS
    DGS about 4 years
    How to find meta-data information (from, sender,cc,subject,etc) and check if any attachment is present?