Question

听起来有点复杂，但实际上并非如此。我们说我们有以下（apache2配置）文件：

[...]

 <VirtualHost 123.123.123.123:80>
   ServerName one.domain.tld
   ServerAlias 1.domain.tld

   DocumentRoot "/path/to/anything"
   [...]
 </VirtualHost>

 <VirtualHost 321.321.321.321:80>
   ServerName two.domain.tld
   ServerAlias 2.domain.tld

   DocumentRoot "/path/to/something/else"
   [...]
 </VirtualHost>

 <VirtualHost 123.123.123.123:443>
   ServerName one.domain.tld
   ServerAlias 1.domain.tld
   ServerAlias secure.one.domain.tld
   ServerAlias secure.1.domain.tld

   DocumentRoot "/path/to/anything"
   [...]
 </VirtualHost>

 <VirtualHost 321.321.321.321:443>
   ServerName two.domain.tld
   ServerAlias 2.domain.tld
   ServerAlias secure.two.domain.tld
   ServerAlias secure.2.domain.tld

   DocumentRoot "/path/to/another/something/else"
   [...]
 </VirtualHost>

[...]

我需要知道指向每个文档根目录的（子）域。因为我需要在for循环中运行某些命令，所以单独处理每个DocumentRoot对我来说很重要。我用bash和一些其他程序做了一个方法，如下所示：

DOCROOTS="$(egrep -ni '^(DocumentRoot|[ ]*DocumentRoot|\t*DocumentRoot) ?=' ${HTCONF} |sed -r 's/(DocumentRoot|"|'"'"'| |   )//gI')"
for DOCROOT in $(echo "${DOCROOTS}"); do
  LINE="$(printf ${DOCROOT} |cut -d':' -f1)"
  ROOT="$(printf ${DOCROOT} |cut -d':' -f2)"
  DOMAINS=$(sed "${LINE},\$d" ${HTCONF} |tac |sed '/VirtualHost/Iq' |tac |egrep -i 'ServerAlias|ServerName' |sed -r 's/(ServerName|ServerAlias| |   )//gI')

  [... rest of document root specific code goes here ...]
done

说明：

将所有DocumentRoot定义及其行号存储到变量DOCROOTS
为DOCROOTS中的每一行做：
1. 将变量LINE设置为行号
2. 将变量ROOT设置为文档根
3. sed "${LINE},\$d" ${HTCONF} - 这会删除$LINE
4. tac - 这与rev非常相似，但不仅仅是反转一行的内容，而是完整的输入
5. sed '/VirtualHost/q' - 这会删除包含VirtualHost不区分大小写的行
6. tac - 这会将完整输入反转回原来的订单
7. egrep -i 'ServerAlias|ServerName' - 这会捕获仅包含ServerName或ServerAlias
8. sed -r 's/(ServerName|ServerAlias| | )//gI' - 这会删除ServerName和ServerAlias指令不区分的情况以及空格和制表符。

printf "${ROOT}\n${DOMAINS}\n\n"的所需输出将是这样的：

/path/to/anything
one.domain.tld
1.domain.tld

/path/to/something/else
two.domain.tld
2.domain.tld

/path/to/anything
one.domain.tld
1.domain.tld
secure.one.domain.tld
secure.1.domain.tld

/path/to/another/something/else
two.domain.tld
2.domain.tld
secure.two.domain.tld
secure.2.domain.tld

有没有更好的方法来实现这一点（可能使用awk）？

bash这是错误的做法吗？我应该考虑使用适当的脚本语言吗？如果是这样，哪一个是值得推荐的？

Answer 1

awk救援！

$ awk  '/VirtualHost/{s=RS} 
 /Server(Name|Alias)/{s=s $2 RS} 
       /DocumentRoot/{gsub("\"",""); print $2,s}' file

/path/to/anything
one.domain.tld
1.domain.tld

/path/to/something/else
two.domain.tld
2.domain.tld

/path/to/anything
one.domain.tld
1.domain.tld
secure.one.domain.tld
secure.1.domain.tld

/path/to/another/something/else
two.domain.tld
2.domain.tld
secure.two.domain.tld
secure.2.domain.tld

<强>解释此awk脚本的结构是模式{action}对（类似于if / then语句）。我聚合包含服务器（名称|别名）的行的第二个字段，并在找到DocumentRoot时打印带有聚合字段的路径。在VirtualHost中重置聚合字段。初始值和连接是记录分隔符（RS），默认为新行。此外，在打印时剥离引号。

坦率地说，没有太多可以进一步解释的。也许有一个技巧，这种方式设置聚合（而不是说s=s RS $2默认情况下会在组之间生成空行。

Answer 2

它仍然不完全清楚，但听起来你需要的是：

$ cat tst.awk
{
    gsub(/^[[:space:]]+|[[:space:]]+$/,"")
    name = value = $0
    sub(/[[:space:]].*/,"",name)
    sub(/[^[:space:]]+[[:space:]]*/,"",value)
}

name ~ /^Server/ {
    flds = flds ORS value
}

name == "DocumentRoot" {
    gsub(/^"|"$/,"",value)

    print value flds
    print "doing rest of document root specific code ..."
    print ""

    flds = ""
}

$ awk -f tst.awk file
/path/to/anything
one.domain.tld
1.domain.tld
doing rest of document root specific code ...

/path/to/something/else
two.domain.tld
2.domain.tld
doing rest of document root specific code ...

/path/to/anything
one.domain.tld
1.domain.tld
secure.one.domain.tld
secure.1.domain.tld
doing rest of document root specific code ...

/path/to/another/something/else
two.domain.tld
2.domain.tld
secure.two.domain.tld
secure.2.domain.tld
doing rest of document root specific code ...

这完全取决于＆＃34;其余的文档特定代码＆＃34;是 - 如果它更多的文本操作它属于awk但是如果它属于shell的其他东西。

在行X之前的另一个模式的最后一次出现之后找到所有出现的模式

2 个答案: