如何从电子邮件中删除HTML和附件?

时间:2008-12-16 12:27:31

标签: html perl email attachment

我正在使用以下程序对电子邮件进行排序并最终打印出来。某些消息可能包含附件或HTML代码,这对打印不利。是否有一种简单的方法可以删除附件并删除HTML而不是HTML中格式化的文本?

#!/usr/bin/perl
use warnings;
use strict;
use Mail::Box::Manager;

open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');


my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
    access          => 'r',
);

my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";

for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
    my $to          = join( ', ', map { $_->format } $msg->to );
    my $from        = join( ', ', map { $_->format } $msg->from );
    my $date        = localtime( $msg->timestamp );
    my $subject     = $msg->subject;
    my $body        = $msg->decoded->string;

    # Strip all quoted text
    $body =~ s/^>.*$//msg;

    print MYFILE <<"";
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

}

4 个答案:

答案 0 :(得分:3)

Mail::Message::isMultipart会告诉您某条消息是否包含任何附件。 Mail::Message::parts将为您提供邮件部分列表。

因此:

if ( $msg->isMultipart ) {
    foreach my $part ( $msg->parts ) {
        if ( $part->contentType eq 'text/html' ) {
           # deal with html here.
        }
        elsif ( $part->contentType eq 'text/plain' ) {
           # deal with text here.
        }
        else {
           # well?
        }
    }
}

答案 1 :(得分:1)

剥离HTML方面在FAQ#9(或perldoc -q html中的第一项)中进行了解释。简而言之,相关模块是HTML :: Parser和HTML :: FormatText。

对于附件,带附件的电子邮件将以MIME格式发送。从this example开始,您可以看到格式非常简单,您可以非常轻松地提出解决方案,或者检查MIME modules at CPAN

答案 2 :(得分:0)

看起来某人已经solved this on the linuxquestions forum

来自论坛:

            # This is part of Mail::POP3Client to get the headers and body of the POP3 mail in question
            $body = $connection->HeadAndBody($i);
            # Parse the message with MIME::Parser, declare the body as an entitty
            $msg = $parser->parse_data($body);
            # Find out if this is a multipart MIME message or just a plaintext
            $num_parts=$msg->parts;
            # So its its got 0 parts i.e. is a plaintext
            if ($num_parts eq 0) {
            # Get the message by POP3Client
            $message = $connection->Body($i);
            # Use this series of regular expressions to verify that its ok for MySQL
            $message =~ s/</&lt;/g;
            $message =~ s/>/&gt;/g;
            $message =~ s/'//g;
                                  }
            else {
                  # If it is MIME the parse the first part (the plaintext) into a string
                 $message = $msg->parts(0)->bodyhandle->as_string;
                  }

答案 3 :(得分:0)

你在perl Mail-Box-2.117中得到了一个完整的例子:

http://cpansearch.perl.org/src/MARKOV/Mail-Box-2.117/examples/strip-attachments.pl