修改管道输入

时间:2019-07-10 15:37:57

标签: bash

考虑字符串,例如:

I have two apples
He has 4 apples 
They have 10 pizzas

我想用一个由外部脚本计算的不同值的字符串替换我找到的每个数字。就我而言,python程序digit_to_word.py将数字转换为字母格式,但是一切正常,这样我就可以完成此过程。

预期输出:

I have two apples
He has four apples 
They have ten pizzas

从概念上讲:

echo "He has four apples" |
while read word;
do
    if [[ "$word" == +([0-9+]) ]]; then
    NUM='${python digit_to_word.py "$word"}'
    $word="$NUM"
fi
done |
other_operation... | etc..

我从概念上说 ,因为我什至没有使它生效。我什至很难找到有关该问题的信息,仅仅是因为我不完全知道如何将其概念化。在这一点上,我主要是在process substitution上进行推理,但恐怕这不是最好的方法。

任何可能真正有用的提示。预先感谢您与我分享您的知识!

4 个答案:

答案 0 :(得分:2)

regex='([[:space:]])([0-9]+)([[:space:]])'

echo "He has 4 apples" |
while IFS= read -r line; do
  line=" ${line} "  # pad with space so first and last words work consistently
  while [[ $line =~ $regex ]]; do       # loop while at least one replacement is pending
    pre_space=${BASH_REMATCH[1]}                # whitespace before the word, if any
    word=${BASH_REMATCH[2]}                     # actual word to replace
    post_space=${BASH_REMATCH[3]}               # whitespace after the word, if any
    replace=$(python digit_to_word.py "$word")  # new word to use
    in=${pre_space}${word}${post_space}         # old word padded with whitespace
    out=${pre_space}${replace}${post_space}     # new word padded with whitespace
    line=${line//$in/$out}                      # replace old w/ new, keeping whitespace
  done
  line=${line#' '}; line=${line%' '}            # remove the padding we added earlier
  printf '%s\n' "$line"                         # write the output line
done

即使在一些棘手的情况下,也要谨慎工作:

  • 4 score and 14 years ago仅将4中的4 score替换为four,也不会修改4中的14
  • 混合制表符和空格的输入会生成具有相同种类空格的输出; printf '1\t2 3\n'作为输入,您将在onetwo之间获得一个制表符,而在twothree之间则留有一个空格。

请参见https://ideone.com/SOsuAD

答案 1 :(得分:2)

我建议这对perl来说是更好的工作。

要重新创建场景:

$ cat digit_to_word.sh
case $1 in
4) echo four;;
8) echo eight;;
10) echo ten;;
*) echo "$1";;
esac
$ bash digit_to_word.sh 10
ten

然后这个

perl -pe 's/(\d+)/ chomp($word = qx{bash digit_to_word.sh $1}); $word /ge' <<END
I have two apples
He has 4 apples
They have 10 pizzas but only 8 cookies
END

输出

I have two apples
He has four apples
They have ten pizzas but only eight cookies

但是,您已经有了一些python,为什么也不要在python中实现替换部分?

答案 2 :(得分:1)

修订

此方法将每一行分解为两个数组-一个用于单词,一个用于空格。然后,通过交织数组元素来重建每行,并用Python脚本将数字转换为单词。感谢@Charles Duffy用我的原始答案指出了一些常见的Bash陷阱。

while IFS= read -r line; do
  # Decompose the line into an array of words delimited by whitespace
  IFS=" " read -ra word_array <<< $(echo "$line" | sed 's/[[:space:]]/ /g')

  # Invert the decomposition, creating an array of whitespace delimited by words
  IFS="w" read -ra wspace_array <<< $(echo "$line" | sed 's/\S/w/g' | tr -s 'w')

  # Interleave the array elements in the output, translating digits to text
  for ((i=0; i<${#wspace_array[@]}; i++))
  do
    printf "%s" "${wspace_array[$i]}"
    if [[ "${word_array[$i]}" =~ ^[0-9]+$ ]]; then
      printf "%s" "$(digit_to_word.py ${word_array[$i]})"
    else
      printf "%s" "${word_array[$i]}"
    fi
  done
  printf "\n"
done < sample.txt

答案 3 :(得分:0)

您可以为此使用sed。这是一个示例:

$ echo "He has 4 apples" | sed 's/4/four/'
He has four apples

不过,从示例数据来看,sed可能不是一个很好的选择。如果看到“ 1”,则要替换为“一个”,但是示例将“ 10”替换为“十”。您是否需要支持多位数字,例如将“ 230”替换为“ 230”?