Question

我发现目前的工作是使用空格来匹配。我希望能够匹配任意HTML标签和标点符号。

var text = "<div>The Quick brown fox ran through it's forest darkly!</div>"

//this one uses spaces only but will match "darkly!</div>" as 1 element
console.log(text.match(/\S+/g));

//outputs: ["<div>The", "Quick", "brown", "fox", "ran", "through", "it's", "forest", "darkly!</div>"]

我想要一个匹配的表达式输出：

["<div>", "The", "Quick", "brown", "fox", "ran", "through", "it's", "forest", "darkly", "!", "</div>"]

这是一个小提琴： https://jsfiddle.net/scottpatrickwright/og0bd0xj/2/

最终我将所有匹配存储在一个数组中，进行一些处理（在每个整个单词周围添加一些带有条件数据属性的span标记）并以更改的形式重新输出原始字符串。我提到这个解决方案，不会让字符串或多或少完整无效。

我在网上找到了很多难得的解决方案，但我的正则表达不足以利用他们的工作。

Answer 1

怎么样：

/(<\/?)?[\w']+>?|[!\.,;\?]/g

展示here。

Answer 2

您可以在HTML标记之前和之后添加空格，如下所示：

data <- read.delim("path to the data.txt")
data.pca <- prcomp (data, center = TRUE, scale =TRUE)
library(ggbiplot)
g <- ggbiplot(data.pca, obs.scale =1, var.scale=1, ellipse = TRUE, circle=TRUE)
g <- g + scale_color_discrete(name='')
g <- g + theme(legend.direction = 'horizontal', legend.position = 'top')
print(g)

Answer 3

我的建议是：

console.log(text.match(/(<.+?>|[^\s<>]+)/g));

我们的正则表达式中的位置：(<.+?>|[^\s<>]+)我们指定了两个字符串来捕获

<.+?> returns all <text> strings
[^\s<>]+ returns all strings that don't contain space,<,>

在第二部分你可以添加你想忽略的角色

使用正则表达式将任何字符串分成整个单词，标点符号和数组的数组。 html代码

3 个答案: