使用多个字段进行awk过滤

时间:2011-05-09 23:08:34

标签: awk

假设我有以下文本文件(它可以有更多州,城市和学院:

begin_state
New York
end_state

begin_cities
Albany
Buffalo
Syracuse
end_cities

begin_colleges
Cornell
Columbia
Stony Brook
end_colleges

begin_state
California
end_ state

begin_cities
San Francisco
Sacramento
Los Angeles
end cities

begin_colleges
Berkeley
Stanford
Caltech
end_colleges

我想使用awk过滤所有城市并在状态下列出它们或选择所有学院并在状态下列出它们:例如 - 如果我想要城市,它们应该输出如下。

**New York**
Albany
Buffalo
Syracuse
**California**
San Francisco
Sacramento
Los Angeles

欢迎任何建议。

1 个答案:

答案 0 :(得分:1)

以下是awk中的两个解决方案。第一个是天真和重复,但更容易遵循和学习。后者是试图减少重复。

对于处理数据文件中的错误,这两种解决方案都很脆弱。如果你可以自由选择实现语言,我建议你用ruby,perl或python这样做。

保存到文件(例如showinfo.sh)并使用单个参数调用:“cities”或“schools”,以确定模式。您还必须将数据文件重定向到stdin。

示例调用(对于任一解决方案):

./showinfo.sh cities < states.txt
./showinfo.sh colleges < states.txt

天真的解决方案:

#!/bin/bash
set -e
set -u
#mode=cities
mode=$1

awk -v mode=$mode '
/begin_state/    {st="states"; next} 
/end_state/      {next} 
/begin_cities/   {st="cities"; next} 
/end_cities/     {next} 
/begin_colleges/ {st="coll"; next} 
/end_colleges/   {next} 

{ 
  if (st=="states") {
    sn=$0; 
  }
  else 
    if (st=="cities") cities[sn]=cities[sn]"\n"$0
    else if (st=="coll") colleges[sn]=colleges[sn]"\n"$0; 
} 

END {
  if (mode=="cities") {
    for (sn in cities) { print "=="sn"=="cities[sn] } ; 
  } 
  else if (mode=="colleges") {
    for (sn in colleges) { print "=="sn"=="colleges[sn] } ; 
  } 
  else { print "set mode either cities or colleges" }
}'

第二个解决方案,删除重复:

#!/bin/bash
set -e
set -u
mode=$1
awk -v mode=$mode '
/begin_/    {st=$1; next} 
/end_/      {st=""; next} 

{ 
  if (st=="begin_state") { sn=$0 }
  else { data[st, sn]=data[st, sn]"\n"$0 }
} 

END {
  for (combo in data) {
    split(combo, sep, SUBSEP);
    type = sep[1];
    state_name = sep[2];
    if (type == "begin_"mode) {
      print "==" state_name "==" data[combo];
    }
  }
}'

使用的输入文件(因为我注意到它最近在问题中已经改变):

begin_state
New York
end_state
begin_cities
Albany
Buffalo
Syracuse
end_cities
begin_colleges
Cornell
Columbia
Stony Brook
end_colleges
begin_state
California
end_state
begin_cities
San Francisco
Sacramento
Los Angeles
end_cities
begin_colleges
Berkeley
Stanford
Caltech
end_colleges

运行第一个解决方案时的会话:

$ bash showinfo.sh cities < states.txt 
==New York==
Albany
Buffalo
Syracuse
==California==
San Francisco
Sacramento
Los Angeles