sed:截断多行的长条目

时间:2012-01-18 22:03:31

标签: regex sed

首先,我认为这是一个可行的解决方案。但是,测试用例 是一回事......现实并非总是如此善良。这是“这看起来不错吗?” 问题......或者更好的是,“这可能会失败?建议改进?”问题

问题:
标题不应该超过一行。

测试文件:

You have a hold available for pickup as of 2012-01-13:
Title: Really Long Test Title Regarding Random Gibberish. Volume 1, A-B, United States
 and affiliated territories, United Nations, countries of the world
Author: Barrel Roll Morton
Copy: 3
#end-of-record
You have a hold available for pickup as of 2012-01-13:
Title: Short Catalogue of Random Gibberish. Volume 1, A-B, United States
Author: Skippy Credenza
Copy: 12
#end-of-record

预期产出:

You have a hold available for pickup as of 2012-01-13:
Title: Really Long Test Title Regarding Random Gibberish. Volume 1, A-B, United States
Author: Barrel Roll Morton
Copy: 3
#end-of-record
You have a hold available for pickup as of 2012-01-13:
Title: Short Catalogue of Random Gibberish. Volume 1, A-B, United States
Author: Skippy Credenza
Copy: 12
#end-of-record

我的解决方案:

sed -e '/^Title/{N;/\nAuthor:/!{s/\n.*$//}}' test-file.txt

我的逻辑:上面提出的解决方案

  • 寻找正则表达式/ ^标题/
  • 抓住下一行
  • 如果下一行匹配/ ^作者/
  • 然后搜索正则表达式/ \ n。* $ /
  • 替换为nada。

有更多防弹方法吗?

4 个答案:

答案 0 :(得分:2)

看起来不错,但是如果你无法控制第一行文本的长度,你可以使用类似

的内容进一步截断它。
sed '/^Title/{N;/\nAuthor:/!{s/^\(....................\).*\n.*$/\1/;};}' test-file.txt

(你不需要-e,但它也不会受伤)。

我使用的是旧式的sed,所以我需要;};}额外位。

调整匹配模式中'。'的数量,以获取要捕获的值的长度。

较新的sed支持花括号范围,例如,cut我无权确认。

sed '/^Title/{N;/\nAuthor:/!{s/^\(.\{30,50\}\).*\n.*$/\1/;};}' test-file.txt
每个@JonathanLeffler的评论

编辑。固定范围表示法,将30,50更改为适合您的值。

我希望这会有所帮助。

答案 1 :(得分:2)

这可能对您有用:

sed '/^Title/,/^Author/{//!d}' file

如果您希望截断Title行,那么

sed '/^Title/,/^Author/{//!d;s/^\(Title.\{25\}\).*/\1/}' file

这会将Title缩短为30个字符。

答案 2 :(得分:1)

虽然不完全符合你的要求(potong的解决方案似乎是最好的),但以下内容会将N行标题附加到一行而不是截断它。

sed '/^Title:/{:a;N;/\nAuthor:/!s/\n//;ta;P;D}' test-file.txt

输出

$ sed '/^Title:/{:a;N;/\nAuthor:/!s/\n//;ta;P;D}' test-file.txt
You have a hold available for pickup as of 2012-01-13:
Title: Really Long Test Title Regarding Random Gibberish. Volume 1, A-B, United States and affiliated territories, United Nations, countries of the world.  Also, this title is a whole three lines in length
Author: Barrel Roll Morton
Copy: 3
#end-of-record
You have a hold available for pickup as of 2012-01-13:
Title: Short Catalogue of Random Gibberish. Volume 1, A-B, United States
Author: Skippy Credenza
Copy: 12
#end-of-record

答案 3 :(得分:1)

如果你对awk没问题,那么你可以做这样的事情 -

awk '/Title:/{print $0; getline; while ($0!~/Author:/) {getline}}1' file