Question

我的任务非常痛苦，我想知道是否有人可以提供帮助。

我们的供应商提供了一个SNMP mib文件（txt）。不幸的是，这个文件很多都已经过时，需要为我们的监控应用程序删除。

我一直在努力做到这一点，但它已经超过800,000行了，而且它正在削弱我的生存意愿。

结构类似于：

-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /* Deprecated from: 600.3                                                        */
-- /*********************************************************************************/

Some text 
some text 
Some text

-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /*********************************************************************************/

Some text 
some text 
Some text

-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 364.44                                                            */
-- /* Deprecated from: 594.3                                                        */
-- /*********************************************************************************/

随机重复和恶心

我在想的是一个脚本：

找到文字“已弃用” 然后

delete that line, 
delete the preceding 3 lines, 
delete the following one line, 
delete then all following lines until the next
"-- /*********************************************************************************/"

这有意义吗？这种事情是可能的，还是我只是在做梦？

三江源！

Answer 1

编辑：我刚刚意识到我的错误读到了你的问题，即使已经被投票了几次。之前我的回复是关闭的！它现在应该更正确，但有一些额外的假设。简单的解决方案只能让你到目前为止！

这可能会帮助你，有一些假设：

cat -s data | awk -vFS='\n' -vRS='\n\n' '/Deprecated from/ { getline; next } 1'

cat命令只是为了挤出多余的换行符，因此awk可以更轻松地操作。对于awk，-vFS='\n'告诉它字段由换行符分隔，-vRS='\n\n'告诉它记录由一行中的两个换行符分隔。然后/Deprecated from/查找具有该文本的记录，{ getline; next }读取其后的下一条记录，并使其继续前进。 1是打印到达以下点的行的快捷方式。

这将采用以下：

所有评论和文本块由两侧至少一个空行分隔
只有评论块和文本块均匀穿插
文本块中没有空行

所以对你来说可能不太完美。如果这些假设是可以的，那么awk就是这项工作的不错选择，你可以看到：脚本很小！

$ cat -s data | awk -vFS='\n' -vRS='\n\n' '/Deprecated from/ { getline; next } 1'
-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /*********************************************************************************/
Some text
some text
Some text

此外，正如您所看到的，剩下的新行被推出。为此，您可以像这样修改命令：

$ cat -s data | awk -vFS='\n' -vRS='\n\n' '/Deprecated from/ { getline; next } { printf "%s\n\n", $0 }'
-- /*********************************************************************************/
-- /* MIB table for Hardware                                                        */
-- /* Valid from: 543.44                                                            */
-- /*********************************************************************************/

Some text
some text
Some text

Answer 2

这可能对您有用：

 sed '$!N;$!N;:a;$q;N;/Deprecated from/!{P;s/^[^\n]*\n//;ba};$d;$!N;$d;s/.*//;:b;$d;N;/^\n-- \/\*\+\/$/!{s/.*//;bb};D' file

这是一个稍微简单的解决方案（效率较低，需要2次通过）：

awk '/Deprecated from/{a=NR-3;getline;next};a>0 && /^-- \/\*+\/$/{b=NR-1;print a "," b "d";a=b=0};END{if(a>0)print a ",$d"}' file |
sed -f - file

Answer 3

这是一个简单的vim宏。

打开文件：$ vim filename
按q a在记录a
键入/Deprecated from:，然后按Enter键（搜索文字）
3k（上升3行）
4dd（删除此行和下一行3）
d/\*\*\*\*\*\*（删除splats之前的行）
（如有必要）按dd（删除当前行）
按q结束微距录制
键入1000000@a（执行宏一百万次）

Answer 4

我非常赞同使用其他脚本语言来解决这个问题的评论。 Ruby，Perl或Python可能会更好。但是为了好玩，这里有一个丑陋的Awk脚本。如果不合适，比赛可能会使用一些工作。但是实现了一个简单的状态机。它会跟踪它是否在标题中，并确定它是否已被弃用。它将标题行存储在数组中。当它到达标题的末尾时，它会打印标题（如果没有弃用）。当不在标题中时，如果前一部分未被弃用，则打印行。

{
   if ( $0 ~ /-- \/\**+\// ) {
      # This matches one of the -- /*********...****/ lines
      if ( headercount > 0 ) {
         # this must be the closing line in the header
         if ( !deprecated ) {
            for ( i = 0; i < headercount; i++ ) {
                print headers[i]
            }
            # print closing line
            print
         } # if not deprecated

         headercount = 0
      }
      else {
         # must be starting a new section
         headers[0] = $0
         headercount = 1
         deprecated = 0
      }
   }
   else {
      if ( headercount == 0 ) {
         # not in a header section - print if not deprecated
         if ( !deprecated ) {
            print
         }
      }
      else {
         # in a header section - track if it is a deprecated section
         if ( $0 ~ /Deprecated from/ ) {
            deprecated = 1
         }
         # store the header info to dump when we hit the end
         headers[headercount++] = $0;
      }

   }
}

从巨大的文本文件中剥离文本块

4 个答案: