Question

我正在编写一个相当长的命令来转换文件中的文本：

grep -o '^[^#]*' file.txt | grep ':' | cut -d ':' -f1 | uniq | gcut -d '/' -f1,3 --output-delimiter=$'\t'

我想将其转换为shell脚本。虽然这个脚本作为一系列管道正确运行，但我很难将其分解并一次进行一次转换。

我最初尝试在每个阶段设置变量，如：

CONTENT=$(grep -o '^[^#]*' $1)
SEGMENTS=$($CONTENT | grep ':')

但一直得到：

命令太长了：

我也把它分成了子弹（我认为这是他们所谓的）：

CONTENT=(grep -o '^[^#]*' $1)

我可以看到echo $CONTENT将打印命令，而不是文本，所以我想我可以：

SEGMENTS=($CONTENT | grep ':')

解析`|＆＃39;
附近的错误

我也尝试过：

CONTENT=$(grep -o '^[^#]*' $1)
SEGMENTS=(cat <($CONTENT) | grep ':')

但这似乎也无效。

如何以更易读的形式分解一长串文本转换？非常感谢你的帮助！

Answer 1

如果您正在寻找的只是可读性，只需添加换行符：

grep -o '^[^#]*' file.txt |
grep ':' |
cut -d ':' -f1 |
uniq |
gcut -d '/' -f1,3 --output-delimiter=$'\t'

您也可以对齐|符号并添加评论：

 grep -o '^[^#]*' file.txt | # Find the lines
 grep :                    | # use grep and cut instead
 cut -d : -f1              | # of awk for no particular reason
 uniq                      | # remove duplicates
 gcut -d '/' -f1,3 --output-delimiter=$'\t'

Answer 2

字面答案可能如下所示：

#!/bin/bash
#      ^^^^- needed for herestrings (the <<< syntax)

content=$(grep -o '^[^#]*' <file.txt)
segments=$(grep ':'        <<<"$content")
fields=$(cut -d ':' -f1    <<<"$segments")
uniq_fields=$(uniq         <<<"$fields")
result=$(gcut -d '/' -f1,3 --output-delimiter=$'\t')

如果没有bash，这些阶段可能会像：

segments=$(printf '%s\n' "$content" | grep ':')

不要这样做：它效率极低，使用的内存远远多于原始代码并且无法并行运行（因此如果输入文件的大小很大，则运行时间会更长）。

如果您的目标是允许检查，请考虑以下内容：

grep -o '^[^#]*' file.txt | tee without_comments.txt \
  | grep ':'              | tee colons_only.txt \
  | cut -d ':' -f1        | tee fields_only.txt \
  | uniq                  | tee fields_uniq.txt \
  | gcut -d '/' -f1,3 --output-delimiter=$'\t'

...这将为每个阶段提供单独的输出。或者，如果您需要代码，则无需在开发和生产模式之间进行更改，请考虑使用函数：

set -o pipefail # prevent presence of a pipeline from changing exit status

logging() {
  filename=$1; shift
  if [ -n "$logdir" ]; then
    "$@" | tee -- "$logdir/$filename"
  else
    "$@"
  fi
}

logging     without_comments.txt grep -o '^[^#]*' file.txt \
  | logging colons_only.txt      grep ':' \
  | logging fields_only.txt      cut -d ':' -f1 \
  | logging fields_uniq.txt      uniq \
  | gcut -d '/' -f1,3 --output-delimiter $'\t'

...只有在变量logdir非空时才会记录。

考虑到手头的工作，我建议改用awk;以下将 far 更有效率，并且有一个论点要求它更具可读性：

awk '
  BEGIN { IFS=":"; OFS="\t"; }    # split input on :s, combine output with tabs
  /#/ { gsub(/#.*/, "") }         # remove comments
  /:/ { seen[$1]++ }              # put field 1 of each line with a : into a map
  END {
    for (i in seen) {
      split($1, pieces, "/")      # split each map key on "/"s
      print pieces[0], pieces[2]  # and put the 1st and 3rd in output
    }
  }  
'

在Bash中存储文件处理阶段

2 个答案: