Question

我在一个文件夹中有很多html文件。我需要以某种方式从所有这些中删除<div id="user-info" ...>...</div>。据我所知，我需要使用Perl脚本，但我不知道Perl这样做。有人可以帮我吗？

以下是“坏”代码的样子：

<div id="user-info" class="logged-in">
    <a class="icon icon-key-delete" href="https://test.dev/login.php?0,logout=1">Log Out</a>
    <a class="icon icon-user-edit" href="https://test.dev/control.php">Control Center</a>


</div> <!-- end of div id=user-info -->

提前谢谢！

Answer 1

使用XML::XSH2：

for { glob '*.html' } {
    open :F html (.) ;
    delete //div[@id="user-info" and @class="logged-in"] ;
    save :b ;
}

Answer 2

perl -0777 -i.withdiv -pe 's{<div[^>]+?id="user-info"[^>]*>.*?</div>}{}gsmi;' test.html

-0777意味着什么都不分开，所以在整个文件中啜饮（而不是逐行，-p

的默认值

-i.withdiv表示将文件保存到位，保留原始文件扩展名为.withdiv（默认为-p仅打印）。

-p表示逐行传递（除非我们正在啜饮）传递代码（请参阅-e）

-e期待代码运行。

man perlrun或perldoc perlrun了解详情。

这是另一个解决方案，对于了解jquery的人来说会稍微熟悉一下，因为语法类似。这使用Mojolicious'ojo模块将html内容加载到Mojo :: DOM对象中，对其进行转换，然后打印转换后的版本：

perl -Mojo -MFile::Slurp -E 'for (@ARGV) { say x(scalar(read_file $_))->at("#user-info")->replace("")->root; }' test.html test2.html test*.html

直接替换内容：

perl -Mojo -MFile::Slurp -E 'for (@ARGV) { write_file( $_, x(scalar(read_file $_))->at("#user-info")->replace("")->root ); }' test.html

注意，这不会 JUST 删除div，它也会根据Mojo的Mojo :: DOM模块重写内容，因此标签属性的顺序可能不同。具体来说，我看到<div id="user-info2" class="logged-in">被重写为<div class="logged-in" id="user-info2">。

Mojolicious至少需要perl 5.10，但之后没有非核心要求。

用于搜索和替换多个html文件中的多行的Perl脚本

2 个答案: