Question

我匹配了一个群组，但我不知道如何获得所需的摘录。

我有这个字符串：

<myTagName >
<mySecondTagName >

我有这个正则表达式：

^(\s|\d).*?\<+([a-zA-Z])([0-9]|_|-|[a-zA-Z])*+(\s|\d)+(>)

然后我想仅在条件匹配时获取标签名称。我该如何指定这个捕获组？

我的意思是我说Javascript只在它出现某些特定字符之前返回标签名称，并且后面跟着其他一些字符。规则已在给定的正则表达式中指定，但它们是无条件的，没有任何外观。

Answer 1

首先，您不希望使用正则表达式解析xml / html。只是不要（RegEx match open tags except XHTML self-contained tags）

其次，这里有一些关于你的正则表达式的想法：

~~.*?这没有用，你用了两个量词来表示点。~~
\<+你真的想要匹配像<<<tag>这样的东西吗？
([0-9]|_|-|[a-zA-Z])这可以简化为([-0-9_a-zA-Z])
~~*+再次双倍量词~~

Answer 2

除非整个事情都匹配，否则你不会被捕获。

但是，您似乎没有在一个项目中捕获整个tagName，因为$1是开头的空格或数字，$2是标记名称的第一个字母， $3是重复的下一个字符（最终会成为标记名称的最后一个字符，因为*不在()内），您的$4是标记末尾的空格或数字（与+之外的()相同的问题），$5是最终的>

你可能正在尝试做更多这样的事情：

^(\s|\d).*?<([A-Za-z][A-Za-z0-9_-]*+)((?:\s|\d)+)> 其中$1是第一个空格或数字，$2是整个tagName，$3是空格或数字的最终运行。（注意在(?: )内使用非捕获组$3。）

Answer 3

实际上这就是你需要的东西

<([a-zA-Z][\w-]*)[\s]*>

第一个捕获组中的字符串是您的标记名。

你的正则表达式的注释：

前导^ [\ s \ d] *表示在实际标记之前只允许使用空格或数字...为什么是数字？

一些原始结构对您需要的匹配行为没有多大意义：

(\s|\d)+ // This means capture at least one space or digit and put in a group

.*?\<    // Non greedy any character until < is found => use [^<]* instead, better performing

<+       // Means at least one <,a and is here just a workaround for the not neccessary non greedy match all

([a-zA-Z])([0-9]|_|-|[a-zA-Z])*+  // Here you wanted to say a string that starts with a character but actually you have two capture groups here and the *+ makes no sense for me (at least one word?) 

(\s|\d)+   // At least one space or digit? why digit? there really MUST be a space? You really want to capture it?

(>)       // You want to capture the last > ? for what?

正则表达式以获取匹配的组

3 个答案: