我有两个文件如下。
POST OFFICE,PO SUITE ACCESS ROOM, SAR SUITE,STE STREET,ST NEW YORK,NY POST,PST LONG LINE STREET,LLS
ARIJIT, 192 POST OFFICE, SUITE CHANDA, 13 HP STREET, NY RAM, POSTING POST, LONG LINE STREET ROY, POST 3009, SUITE ACCESS ROOM
预期产出:
ARIJIT, 192 PO, STE CHANDA, 13 HP ST, NEW YORK RAM, POSTING PST, LLS ROY, PST 3009, SAR
我正在使用下面的代码,但仍无法取得任何成功。我是PERL的新手。该代码适用于单个WORD但不适用于多个单词。
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "Address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", $ARGV[0] ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my @line = split;
for my $char ( @line ) {
( exists $lookup{$char} ) ? print $out_fh "$lookup{$char} " : print $out_fh "$char ";
}
print $out_fh "\n";
}
答案 0 :(得分:4)
您的问题位于my @line = split;
,将该行拆分为字。由于您的某些替换词包含多个单词,因此您无法执行此操作。
相反,您应该构建一个匹配所有密钥的正则表达式,例如:
my $keywords = join '|', map quotemeta, sort { length($b) <=> length($a) } keys %lookup;
my $keywords_rx = qr/\b$keywords\b/;
\b
断言在字边界处匹配。我们还必须对密钥进行排序,以便在较短匹配之前尝试更长的替代方案。否则,SUITE ACCESS ROOM
可能永远不会匹配。
然后在s/($keywords_rx)/$lookup{$1}/g
之类的行上执行替换。