使用gawk / awk / sed从多个文件中删除行

时间:2012-05-13 10:41:47

标签: sed awk gawk

我有两组文本文件。第一组是在AA文件夹中。第二组是在BB文件夹中。第一组(AA文件夹)中ff.txt文件的内容如下所示。

Name        number     marks
john            1         60
maria           2         54
samuel          3         62
ben             4         63

如果标记> 60,我想从此文件中打印第二列(数字)。输出将是3,4。接下来,读取BB文件夹中的ff.txt文件并删除包含数字3,4的行。

BB文件夹中的

文件如下所示。第二列是数字。

 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

我使用了以下代码。这段代码适用于一个文件。

gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done

但是当我使用多个文件运行此代码时,我收到错误。

gawk 'BEGIN {getline} $3>60{print $2}' AA/*.txt | while read number; do gawk -v number=$number '$2 != number' BB/*.txt > /tmp/*.txt; mv /tmp/*.txt BB/*.txt; done

error:-
mv: target `BB/kk.txt' is not a directory

两天前我问过这个问题。请帮我解决这个错误。

3 个答案:

答案 0 :(得分:1)

> /tmp/*.txtmv /tmp/*.txt BB/*.txt错误。


对于单个文件

awk 'NR>1 && $3>60{print $2}' AA/ff.txt > idx.txt

awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt BB/ff.txt

对于多个文件

awk 'FNR>1 && $3>60{print $2}' AA/*.txt >idx.txt

cat BB/*.txt | awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt -

答案 1 :(得分:1)

这将创建文件夹AA中所有文件的索引,并检查文件夹BB中的所有文件:

cat AA/*.txt | awk 'FNR==NR { if ($3 > 60) array[$2]; next } !($2 in array)' - BB/*.txt

这会比较两个单独的文件,假设它们在文件夹AABB中具有相同的名称:

ls AA/*.txt | sed "s%AA/\(.*\)%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 %" | sh

HTH

修改

这应该有帮助: - )

ls AA/*.txt | sed "s%AA/\(.*\)%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 > \1_tmp \&\& mv \1_tmp BB/\1 %" | sh

答案 2 :(得分:0)

一个perl解决方案:

use warnings;
use strict;
use File::Spec;

## Hash to save data to delete from files of BB folder.
## key -> file name.
## value -> string with numbers of second column. They will be
## joined separated with '-...-', like: -2--3--1-. And it will be easier to
## search for them using a regexp.
my %delete;

## Check arguments:
## 1.- They are two.
## 2.- Both are directories.
## 3.- Both have same number of regular files and with identical names.
die qq[Usage: perl $0 <dir_AA> <dir_BB>\n] if
        @ARGV != 2 ||
        grep { ! -d } @ARGV;

{
        my %h;
        for ( glob join q[ ], map { qq[$_/*] } @ARGV ) {
                next unless -f;
                my $file = ( File::Spec->splitpath( $_ ) )[2] or next;
                $h{ $file }++;
        }

        for ( values %h ) {
                if ( $_ != 2 ) {
                        die qq[Different files in both directories\n];
                }
        }
}

## Get files from dir 'AA'. Process them, print to output lines which 
## matches condition and save the information in the %delete hash.
for my $file ( glob( shift . qq[/*] ) ) {
        open my $fh, q[<], $file or do { warn qq[Couldn't open file $file\n]; next };
        $file = ( File::Spec->splitpath( $file ) )[2] or do { 
                warn qq[Couldn't get file name from path\n]; next };
        while ( <$fh> ) {
                next if $. == 1;
                chomp;
                my @f = split;
                next unless @f >= 3;
                if ( $f[ $#f ] > 60 ) {
                        $delete{ $file } .= qq/-$f[1]-/;
                        printf qq[%s\n], $_;
                }
        }
}

## Process files found in dir 'BB'. For each line, print it if not found in
## file from dir 'AA'.
{
        @ARGV  = glob( shift . qq[/*] );
        $^I = q[.bak];
        while ( <> ) {

                ## Sanity check. Shouldn't occur.
                my $filename = ( File::Spec->splitpath( $ARGV ) )[2];
                if ( ! exists $delete{ $filename } ) {
                        close ARGV;
                        next;
                }

                chomp;
                my @f = split;
                if ( $delete{ $filename } =~ m/-$f[1]-/ ) {
                        next;
                }

                printf qq[%s\n], $_;
        }
}

exit 0;

测试

假设下一个文件树。命令:

ls -R1

输出:

.:
AA
BB
script.pl

./AA:
ff.txt
gg.txt

./BB:
ff.txt
gg.txt

下一个文件内容。命令:

head AA/*

输出:

==> AA/ff.txt <==
Name        number     marks
john            1         60
maria           2         54
samuel          3         62
ben             4         63
==> AA/gg.txt <==
Name        number     marks
john            1         70
maria           2         54
samuel          3         42
ben             4         33

命令:

head BB/*

输出:

==> BB/ff.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18
==> BB/gg.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       2      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

运行如下脚本:

perl script.pl AA/ BB

随后输出到屏幕:

samuel          3         62
ben             4         63
john            1         70

BB目录的文件修改如下:

head BB/*

输出:

==> BB/ff.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82

==> BB/gg.txt <==
 marks       2      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

因此,来自ff.txt行的数字34已被删除,而1中的行号gg.txt已全部删除大于最后一列中的60。我想这就是你想要实现的目标。我希望它有所帮助,尽管不是awk