Question

我有一个名为part2.txt的输入文件，其中包含以下几千行的输入，如

   46742       1   48276   48343   48199   48198
   46744       1   48343   48344   48200   48199
   46746       1   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

我必须将第二列中所有整数的数字更改为文件名（part2.txt）中的数字，以便将所有整数1更改为2 ，而不是1，也可能有任何其他整数，它不只是3行，它可能是数千行，它将成为：

   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

请注意，所有列都以空格分隔，并且第一列左侧还有一些空格。我曾尝试将它与FNR一起使用，但它并不强大，并且在linux中请求使用sed或awk的某些方法。

Answer 1

使用gawk（RT），尽可能保持格式完整：

$ gawk -v RS='\\s+' 'NR == 1 { n = FILENAME; gsub(/[^0-9]/, "", n) } NR % 6 == 3 && int($0) == $0 { $0 = n } { printf $0 RT }' part2.txt
   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

RS为\s+，每个字段都是一个记录，记录后的空格会被记为RT，我们稍后会用它来打印。代码是

NR == 1 {                      # First record of the file:
  n = FILENAME                 # isolate the number from the file name
  gsub(/[^0-9]/, "", n) 
}
NR % 6 == 3 && int($0) == $0 { # after that: For every sixth record, if it
                               # is an integer,
  $0 = n                       # replace it with the isolated number.
                               # it is NR % 6 == 3 instead of == 2 because
                               # the file begins with whitespaces that our
                               # RS matches, so the first record is an empty
                               # one and the first row in the first column
                               # is the second record.
}
{ printf $0 RT }               # after that: print everything separated by the
                               # remembered record terminators.

Answer 2

您可以使用以下功能与FILENAME一起玩：

awk 'function name(file) {
        gsub(/[^0-9]*/, "", file)
        return file
     }
     {digits = name(FILENAME)}
     $2 ~ /^[0-9]*$/ {$2=digits}
     1' a2

我不明白为什么我不能在BEGIN{}内调用该函数，我猜是因为到那时文件名还没有。问题是这意味着每次调用该函数。好吧，我们可以设置一个标志，但我将把它留作练习：）

更新：在我引导我编写该功能之前，我不知道自己错过了什么，因为这样可以正常工作：

awk '{digits = FILENAME; gsub(/[^0-9]*/, "", digits) } $2 ~ /^[0-9]*$/ {gsub(/\s$2\s/,digits)}1' a2.txt

为了防止每次都计算digits，您可以使用NR==1{}技巧（获得Wintermute的答案，+1）。

测试

$ awk '{digits = FILENAME; gsub(/[^0-9]*/, "", digits) } $2 ~ /^[0-9]*$/ {gsub(/\s$2\s/,digits)}1' a2.txt
46742       1   48276   48343   48199   48198
46744       1   48343   48344   48200   48199
46746       1   48344   48332   48201   48200
465645       1   48566   48234  45201   48435
48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

Answer 3

这可以通过sed和shell变量的组合来完成。这里有三个场景，每个场景应该按照您的预期进行。此外，如果您想要就地更改文件，则可以使用sed -i代替sed。

如果您知道文件的编号，那么这将是有效的，假设$ n具有文件编号（例如，对于part2.txt，n = 2）：

n=2; sed 's:^\(\s*[0-9]\+\s\+\)\([0-9]\+\)\(\s\):\1'"$n"'\3:' part$n.txt

否则，如果您的文件名为.txt扩展名存储在$ f（例如.f = part2.txt）中，那么这应该有效：

f=part2.txt; n=$(sed 's:^\(.*[^0-9]\|\)\([0-9]\+\)\.txt:\2:' <<<"$f"); sed 's:^\(\s*[0-9]\+\s\+\)\([0-9]\+\)\(\s\):\1'"$n"'\3:' "$f"

如果您使用的是sh或旧版本的bash，则上述代码段可能会失败。在这种情况下，您可以尝试以下方法。它稍长，因为它不使用$（...）和＆lt;＆lt;＆lt;。

f=part2.txt; n=`echo "$f" | sed 's:^\(.*[^0-9]\|\)\([0-9]\+\)\.txt:\2:'`; sed 's:^\(\s*[0-9]\+\s\+\)\([0-9]\+\)\(\s\):\1'"$n"'\3:' "$f"

Answer 4

将GNU awk用于gensub（）：

$ cat tst.awk
{
    fmt = gensub(/(\s*\S+\s+)\S+/,"\\1%s","",$0)"\n"
    printf fmt, ($2~/^[0-9]+$/ ? gensub(/[^0-9]/,"","g",FILENAME) : $2)
}
$
$ awk -f tst.awk part2.txt
   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

你可以在任何带有match（）和substr（）的awk中做同样的事。

上面通过将每个输入行转换为格式化字符串来保留输入间距，只需用%s替换您想要更改的特定字段。如果输入已经包含了像%s这样的printf格式化字符串但是你没有这种情况，那么它会失败，如果你这样做了，你可能可以通过一个简单的gsub(/%/,"%%")作为第一行解决所有问题将每个输入行中的所有%符号转换为文字。

这是一个适用于任何POSIX awk的版本：

$ cat tst.awk
{
    match($0,/[[:space:]]*[^[:space:]]+[[:space:]]+/)
    fmt = substr($0,1,RLENGTH) "%s" 
    match($0,/[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+/)
    fmt = fmt substr($0,RLENGTH+1) "\n"
    num = FILENAME
    gsub(/[^0-9]/,"",num)
    printf fmt, ($2~/^[0-9]+$/ ? num : $2)
}
$ 
$ awk -f tst.awk part2.txt
   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

根据linux中的文件名更改文件中的某个数字

4 个答案:

测试