Question

我有一个巨大的文件，我想删除仅每3行发生的重复行。是否可以使用sed或任何类似的命令？

我的文件如下：

this is text

1234

1234

this is another text

5678

5678

第二个数字是第一个数字的副本，我想删除文件的每3行的第二个数字（第三行）。我之所以没有使用较少的文件名| uniq是数字可能在文件中重复（在3行范围之外），我不希望它们被删除。

由于

Answer 1

这会解决您的问题吗？

$ awk 'NR%3!=0' input
this is text


1234
this is another text


5678

使用sed：

$ sed '0~3d' input
this is text


1234
this is another text


5678

的Perl：

$ perl -n -e '$.%3!=0&&print' input
this is text


1234
this is another text


5678

但是，再说一遍，我可能会错过解释这个问题......

Answer 2

uniq实用程序仅过滤掉相邻的行（您的输入在每行之间是否确实有空行？）。否则可以使用它：

this is text
1234
1234
this is another text
1234
1234

uniq input.txt给出：

this is text
1234
this is another text
1234

Answer 3

这可能适合你（GNU sed）：

sed -r 'n;$!N;s/^([^\n]*)\n\1$/\1/' file

打印第一行的三行，如果它与第二行重复，则删除第三行。

Answer 4

假设您的输入确实是什么以及您想要输出什么：

awk 'NR%3 == 2 {val=$0} NR%3 == 0 && $0 == val {next} 1' <<END
this is text
1234
1234
this is another text
5678
5678
foo
bar
qux
END

this is text
1234
this is another text
5678
foo
bar
qux