Question

我想使用grep来查找是否/在哪里使用html类来处理一堆文件。正则表达式模式不仅应该找到<p class="foo">，还应该找到<p class="foo bar foo-bar">。

到目前为止，我能够通过下面的示例找到class =“foo”，无法使其与多个类名一起使用：

grep -Ern "class=\"result+(\"| )" *

有什么建议吗？谢谢！麦克

Answer 1

这样的事情怎么样：

grep -Erno 'class[ \t]*=[ \t]*"[^"]+"' *

这也将允许更多的空格，并且应该提供类似于：

的输出

1:class="foo bar baz"
3:class = "haha"

要查看所有使用的类，您可以将上面的输出管道输入以下内容：

cut -f2 -d'"' | xargs | sort | uniq

Answer 2

取决于你的grep支持的元字符，试试：

'class = \“（[a-z] +？）+ \”'

Answer 3

相反，请使用HTML解析器。这并不难。

编辑：这是PowerShell中的一个示例

Get-ChildItem -Recurse *.html | where { 
    ([xml](Get-Content $_)).SelectNodes( '//*' ) | where { $_.GetAttribute( "class" ).Contains( "foo" ) } 
}

Answer 4

正则表达式是解析HTML的非常糟糕的工具。尝试查看simpleXML（http://php.net/manual/en/book.simplexml.php）。在HTML上滚动自己的regEx是乞求的麻烦。