Perl正则表达式用于重复句子

时间:2014-03-16 23:29:32

标签: perl pcre

我正在寻找与重复模式匹配的正则表达式。

例如

The great eagle flied high flied high.  

重复:flied high

The call was done at night was done at night.  

重复:was done at night

有没有办法实现这个目标?我只想要正则表达式,以便我可以使用grep -P来过滤一些文件。

Found 5 files under folders: home folder, home folder, home folder, home folder, home folder  

重复:home folder

The query returned this preferences for this user: color black, fried chicken, color black, fried chicken, white shirt, brown color

重复:color black,

从本质上讲,我想做的是找到“重复句子”以匹配“。

2 个答案:

答案 0 :(得分:1)

您还没有很好地定义您的问题。目前你可以写

my $s = 'The great eagle flied high flied high.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

<强>输出

" flied high"

但是,如果你应用第二个字符串

my $s = 'The call was done at night was done at night.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

<强>输出

"l"

因此,解决方案取决于您拥有的数据集。如果您能更严格地定义问题,我们可以更好地帮助您。

答案 1 :(得分:0)

是的,只需在正则表达式中使用\1来表示重复匹配的模式。我故意将此正则表达式限制为仅匹配2-4个单词短语以限制它必须工作的难度:

#!usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    if (my @phrases = /\b(\S+(?:\s+\S+){1,3})\s+\1/g) {
        print "$_\n" for @phrases;
    }
}

__DATA__
The great eagle flied high flied high.
The call was done at night was done at night.

<强>输出

flied high
was done at night