如何获取电子邮件正文的纯文本部分

时间:2012-10-05 00:23:50

标签: python

我正在试图找出如何仅获取电子邮件的文本部分。使用下面的代码,我能够得到正文,但总是跟着电子邮件的html,我不需要。如何告诉我的脚本忽略html?

import imaplib
import email

def extract_body(payload):
    if isinstance(payload,str):
        return payload
    else:
        return '\n'.join([extract_body(part.get_payload()) for part in payload])

conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
conn.login("username", "password")
conn.select()
typ, data = conn.search(None, 'UNSEEN')
try:
    for num in data[0].split():
        typ, msg_data = conn.fetch(num, '(RFC822)')
        for response_part in msg_data:
            if isinstance(response_part, tuple):
                msg = email.message_from_string(response_part[1])
                subject=msg['subject']                   
                print(subject)
                payload=msg.get_payload()
                body=extract_body(payload)
                print(body)
        typ, response = conn.store(num, '+FLAGS', r'(\Seen)')
finally:
    try:
        conn.close()
    except:
        pass
    conn.logout()

1 个答案:

答案 0 :(得分:0)

您在多部分容器的每个项目上调用get_payload(),并将它们串在一起。只需遍历多部分容器中的每个有效负载,然后选择您要查找的Content-Type的负载。