从文本文件中提取电子邮件

时间:2011-02-11 15:38:55

标签: regex email

在此表单下编写电子邮件地址时,从文本文件中提取电子邮件地址的权利是什么?

某人在某事上。某人在某事上或在某事上某人。某人在某事上。恶意xtension

是否可以使用正则表达式将这些地址转换为普通的电子邮件地址?

Thanx提前

3 个答案:

答案 0 :(得分:0)

我使用Ruby,但它在Perl中是相同的

>> "someone.someone at something.domainextension".sub(/\bat\b/,"@").gsub(/\s+/,"")
=> "someone.someone@something.domainextension"

基本上只用“@”替换“at”并删除所有空格。

答案 1 :(得分:0)

我相信以下代码可以完成您的任务。但是,如果您的电子邮件地址被分成多行,它将无法正常工作,如果您只有“at something.com”,它也会给您一个误报。如果您可以发布,我可以使此代码更具体地处理您的情况来自数据集的一些示例数据。

如上面的评论中所述,这不会绝对找到在RFC下有效的每个电子邮件地址,但我认为它应该处理您的问题。

my @lines_from_file; #holds our test info

#load the test info
$lines_from_file[0] = 'this is some text.  We like to type to someone at somthing.com but sometimes';
$lines_from_file[1] = 'they go by someone.someone at something.com just to confuse us and hey you never';
$lines_from_file[2] = 'know, maybe they use parens like (someone at something.com).';
$lines_from_file[3] = 'make sure we do not find someone at .com. or someone something.com or someone at somethingcom';

my @all_email_addresses; #holds all found email addresses


#foreach line in the file
foreach my $line (@lines_from_file){
    while($line =~ /([0-9a-zA-Z.]+)    #capture any number or letter or dot 1 or more times
                    \sat\s             #" at "
                    ([0-9a-zA-Z.]+     #capture any number or letter or dot 1 or more times
                    \.                 #dot
                    \w{2,4})           #com or net or us or tv or info etc., 
                   /xg){
        #everytime the line matches an email save the email in email form
        push @all_email_addresses, "$1\@$2" ;
    }

}

print "@all_email_addresses";

答案 2 :(得分:0)

/^(?:(\w+)\.)?(\w+)\s+at\s+(\w+)\.(\w+)$/

这不会捕获所有电子邮件地址,只会捕获您提供的表单。