Question

我正在尝试编写一个shell脚本来平均几个格式相同的文件，其名称为file1，file2，file3等等。

在每个文件中，数据位于格式表中，例如4列和5行数据。我们假设file1，file2和file3位于同一目录中。我想要做的是创建一个平均文件，其格式与file1 / file2 / file3相同，其中应该具有表中每个元素的平均值。例如，

{(Element in row 1, column 1 in file1)+
 (Element in row 1, column 1 in file2)+
 (Element in row 1, column 1 in file3)} >> 
(Element in row 1, column 1 in average file)

同样，我需要为表中的每个元素执行此操作，平均文件与file1，file2，file3具有相同数量的元素。

我尝试编写一个shell脚本，但它不起作用。我想要的是读取循环中的文件并从每个文件中grep相同的元素，添加它们并将它们平均在文件数量上，最后写入类似的文件格式。这就是我试着写的：

#!/bin/bash       
s=0
for i in {1..5..1} do
    for j in {1..4..1} do
        for f in m* do
            a=$(awk 'FNR == i {print $j}' $f)
            echo $a
            s=$s+$a
            echo $f
        done
        avg=$s/3
        echo $avg > output
    done
done

Answer 1

这是一种相当低效的解决方法：对于您尝试提取的每个数字，您可以完全处理其中一个输入文件 - 即使您只有三个文件，也可以处理60个！

另外，以这种方式混合Bash和awk是一个庞大的反模式。 This here是一个很好的Q＆amp; A解释原因。

还有一些评论：

对于大括号展开，默认步长为1，因此{1..4..1}与{1..4}相同。
Awk不清楚i和j是什么。就它而言，那些从未定义过。如果确实希望将shell变量放入awk，那么可以
```
a=$(awk -v i="$i" -v j="$j" 'FNR == i { print $j }' $f)
```
但这种做法无论如何都不合适。
Shell算法不像s=$s+$a或avg=$s/3那样工作 - 这些只是连接字符串。要让shell为你做计算，你需要算术扩展：
```
s=$(( s + a ))
```
或者，更短一些，
```
(( s += a ))
```
和
```
avg=$(( s / 3 ))
```
请注意，您不需要算术上下文中的$符号。
echo $avg > output会在单独的行上打印每个号码，这可能不是您想要的。
缩进很重要！如果不是机器，那么对于人类读者。

Bash解决方案

这解决了使用Bash的问题。它被硬编码为三个文件，但每行的行数和元素数量都很灵活。没有检查以确保所有行和文件的元素数量相同。

请注意，Bash在这种情况下不快，只应该用于小文件，如果有的话。另外，使用整数算术，所以＆＃34;平均＆＃34; 3和4将成为3。

我已添加评论来解释会发生什么。

#!/bin/bash

# Read a line from the first file into array arr1
while read -a arr1; do

    # Read a line from the second file at file descriptor 3 into array arr2
    read -a arr2 <&3

    # Read a line from the third file at file descriptor 4 into array arr3
    read -a arr3 <&4

    # Loop over elements
    for (( i = 0; i < ${#arr1[@]}; ++i )); do

        # Calculate average of element across files, assign to res array
        res[i]=$(( (arr1[i] + arr2[i] + arr3[i]) / 3 ))
    done

    # Print res array
    echo "${res[@]}"

# Read from files supplied as arguments
# Input for the second and third file is redirected to file descriptors 3 and 4
# to enable looping over multiple files concurrently
done < "$1" 3< "$2" 4< "$3"

这必须像

一样调用

./bashsolution file1 file2 file3

可以根据需要重定向输出。

awk解决方案

这是纯awk的解决方案。它有点灵活，因为它需要很多文件作为参数提供的平均值;它也应该比Bash解决方案快一个数量级。

#!/usr/bin/awk -f

# Count number of files: increment on the first line of each new file
FNR == 1 { ++nfiles }

{
    # (Pseudo) 2D array summing up fields across files
    for (i = 1; i <= NF; ++i) {
        values[FNR, i] += $i
    }
}

END {
    # Loop over lines of array with sums
    for (i = 1; i <= FNR; ++i) {

        # Loop over fields of current line in array of sums
        for (j = 1; j <= NF; ++j) {

            # Build record with averages
            $j = values[i, j]/nfiles
        }
        print
    }
}

必须像

一样调用

./awksolution file1 file2 file3

并且，如上所述，平均文件数没有限制。

多个文件的平均值

1 个答案:

Bash解决方案

awk解决方案