Question

我得到了两个txt文件，每个文件的信息都对齐在由制表符分隔的几列中。我想做的就是在两个文件中查找这些列之一匹配的行。 -不是整行，而只是它们的第一列部分应该是相同的。如何在bash脚本中做到这一点？

我尝试使用grep -Fwf。

这就是文件的样子

aaaa   bbbb
cccc   dddd

和

aaaa   eeee
ffff   gggg

我想要得到的输出是这样的：

bbbb and eeee match

我真的没有找到可以同时进行逐行和逐字比较的命令。很抱歉没有提供我自己的代码，我是编程新手，到目前为止还没有提出任何合理的建议。预先感谢！

Answer 1

您看过join命令吗？这可能与您正在寻找的排序结合在一起。 https://shapeshed.com/unix-join/

例如：

$ cat a
aaaa   bbbb
cccc   dddd
$ cat b
aaaa   eeee
ffff   gggg
$ join a b
aaaa bbbb eeee

如果第一列中的值未排序，则必须先对其进行排序，否则join将不起作用。

join <(sort a) <(sort b)

亲切的问候奥利弗

Answer 2

有不同种类和不同的工具进行比较：

差异
CMP
通讯
...

所有命令都有用于更改比较的选项。

对于每个命令，您可以指定过滤器。例如

# remove comments before comparison
diff <( grep -v ^# file1) <( grep -v ^# file2)

没有具体示例，就不可能更精确。

Answer 3

假设您使用制表符分隔的文件保持正确的文件结构，则此方法应该起作用：

([^|]+)+

有不同之处时的输出，

diff <(awk '{print $2}' f1) <(awk '{print $2}' f2) 
# File names: f1, f2
# Column: 2nd column.

列相同时无输出。

我尝试了@Wiimm的答案，但对我却不起作用。

Answer 4

您可以使用awk，如下所示：

awk 'NR==FNR{a[NR]=$1;b[NR]=$2;next}
     a[FNR]==$1{printf "%s and %s match\n", b[FNR], $2}' file1 file2

输出：

bbbb and eeee match

说明（同一代码分成多行）：

# As long as we are reading file1, the overall record
# number NR is the same as the record number in the
# current input file FNR
NR==FNR{
    # Store column 1 and 2 in arrays called a and b
    # indexed by the record number
    a[NR]=$1
    b[NR]=$2
    next # Do not process more actions for file1
}

# The following code gets only executed when we read
# file2 because of the above _next_ statement

# Check if column 1 in file1 is the same as in file2
# for this line
a[FNR]==$1{
    printf "%s and %s match\n", b[FNR], $2
}

如何比较两个txt文件的部分行？

4 个答案: