Question

这是我在网站wget的标准网页上使用的命令。

tr '<' '\n<' < index.html

然而它给了我换行符，但没有再添加左侧的broket。 e.g。

 echo "<hello><world>" | tr '<' '\n<'

返回

 (blank line which is fine)
 hello>
 world>

而不是

 (blank line or not)
 <hello>
 <world>

出了什么问题？

Answer 1

那是因为tr只进行了字符换字符（或删除）。

请尝试sed。

echo '<hello><world>' | sed -e 's/</\n&/g'

或awk。

echo '<hello><world>' | awk '{gsub(/</,"\n<",$0)}1'

或perl。

echo '<hello><world>' | perl -pe 's/</\n</g'

或ruby。

echo '<hello><world>' | ruby -pe '$_.gsub!(/</,"\n<")'

或python。

echo '<hello><world>' \
| python -c 'for l in __import__("fileinput").input():print l.replace("<","\n<")'

Answer 2

如果你有GNU grep，这可能适合你：

grep -Po '<.*?>[^<]*' index.html

应该传递所有HTML，但是每个标记应该从行的开头开始，并且在同一行上可能包含非标记文本。

如果你只想要标签：

grep -Po '<.*?>' index.html

然而，您应该知道使用正则表达式解析HTML是not a good idea。

Answer 3

这对你有用吗？

awk -F"><" -v OFS=">\n<" '{print $1,$2}'

[jaypal:~/Temp] echo "<hello><world>" | awk -F"><" -v OFS=">\n<" '{$1=$1}1';
<hello>
<world>

您可以在awk {}操作前放置正则表达式/ /（您希望这样做的行）。

Answer 4

放置换行符的位置非常重要。你也可以逃避“＆lt;”。

tr '\/<' '\/<\n' < index.html

`tr '<' '<\n' < index.html` works as well.