Question

这里有两个文件，我需要消除它们没有共同的数据：

a.txt：

hello world 
tom tom 
super hero

b.txt：

hello dolly 1
tom sawyer 2
miss sunshine 3
super man 4

我试过了：

grep -f a.txt b.txt >> c.txt

而且：

awk '{print $1}' test1.txt

因为我只需检查两行文件中是否存在该行的第一个单词（即使不在同一行号）。

但是，在新文件中获得以下输出的最佳方法是什么？

{p}输出c.txt：

hello dolly 1
tom sawyer 2
super man 4

Answer 1

在迭代两个文件的地方使用awk：

$ awk 'NR == FNR { a[$1] = 1; next } a[$1]' a.txt b.txt
hello dolly 1
tom sawyer 2
super man 4

NR == FNR仅适用于仅在所述文件上运行{ a[$1] = 1; next }的第一个文件。

Answer 2

使用sed从输入生成sed脚本，然后使用另一个sed执行它。

sed 's=^=/^=;s= .*= /p=' a.txt | sed -nf- b.txt

第一个sed将你的a.txt变成

/^hello /p
/^tom /p
/^super /p

每当一行在行（p）的开头包含hello，tom或super后跟一个空格时，

会打印（^）

Answer 3

这将grep，cut和sed与进程替换结合起来：

$ grep -f <(cut -d ' ' -f 1 a.txt | sed 's/^/^/') b.txt
hello dolly 1
tom sawyer 2
super man 4

流程替换的输出是这样的（管道到cat -A以显示空格）：

$ cut -d ' ' -f 1 a.txt | sed 's/^/^/;s/$/ /' | cat -A
^hello $
^tom $
^super $

然后我们将其用作grep -f的输入，从而产生上述内容。

如果您的shell不支持进程替换，但是您的grep支持使用-f选项（它应该）从stdin读取，您可以改为使用它：

$ cut -d ' ' -f 1 a.txt | sed 's/^/^/;s/$/ /' | grep -f - b.txt
hello dolly 1
tom sawyer 2
super man 4