如何用REGEX匹配JSON-LD标记?

时间:2019-05-23 21:21:39

标签: html regex json-ld

我试图匹配整个json-ld条目,而不管特定的标记,换行符等等。

为什么没有这么简单的事情:

    \<script type\=\"application\/ld\+json\"\>(.*?)\<\/script\>
<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "Recipe",
  "author": "John Smith",
  "cookTime": "PT1H",
  "datePublished": "2009-05-08",
  "description": "This classic banana bread recipe comes from my mom -- the walnuts add a nice texture and flavor to the banana bread.",
  "image": "bananabread.jpg",
  "recipeIngredient": [
    "3 or 4 ripe bananas, smashed",
    "1 egg",
    "3/4 cup of sugar"
  ],
  "interactionStatistic": {
    "@type": "InteractionCounter",
    "interactionType": "http://schema.org/Comment",
    "userInteractionCount": "140"
  },
  "name": "Mom's World Famous Banana Bread",
  "nutrition": {
    "@type": "NutritionInformation",
    "calories": "240 calories",
    "fatContent": "9 grams fat"
  },
  "prepTime": "PT15M",
  "recipeInstructions": "Preheat the oven to 350 degrees. Mix in the ingredients in a bowl. Add the flour last. Pour the mixture into a loaf pan and bake for one hour.",
  "recipeYield": "1 loaf",
  "suitableForDiet": "http://schema.org/LowFatDiet"
}
</script>

我希望输出内容是标签中的所有内容。

1 个答案:

答案 0 :(得分:0)

在这里,我们可能想用一个开放的json / ld标签作为开始边界来绑定表达式,然后收集所有字符和换行符,最后添加一个右边界并使用关闭脚本标签,可能类似于:

(<script type="application\/ld\+json">)([\s\S]*)(<\/script>)

^(<script type="application\/ld\+json">)([\w\W]*)(<\/script>)$

DEMO

但是,也许对用户正则表达式而言,这并不是最好的主意,并且应该有很多方法可以使操作起来容易得多。

测试

const regex = /^(<script type="application\/ld\+json">)([\w\W]*)(<\/script>)$/gm;
const str = `<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "Recipe",
  "author": "John Smith",
  "cookTime": "PT1H",
  "datePublished": "2009-05-08",
  "description": "This classic banana bread recipe comes from my mom -- the walnuts add a nice texture and flavor to the banana bread.",
  "image": "bananabread.jpg",
  "recipeIngredient": [
    "3 or 4 ripe bananas, smashed",
    "1 egg",
    "3/4 cup of sugar"
  ],
  "interactionStatistic": {
    "@type": "InteractionCounter",
    "interactionType": "http://schema.org/Comment",
    "userInteractionCount": "140"
  },
  "name": "Mom's World Famous Banana Bread",
  "nutrition": {
    "@type": "NutritionInformation",
    "calories": "240 calories",
    "fatContent": "9 grams fat"
  },
  "prepTime": "PT15M",
  "recipeInstructions": "Preheat the oven to 350 degrees. Mix in the ingredients in a bowl. Add the flour last. Pour the mixture into a loaf pan and bake for one hour.",
  "recipeYield": "1 loaf",
  "suitableForDiet": "http://schema.org/LowFatDiet"
}
</script>`;
const subst = `$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

DEMO

RegEx

如果不需要此表达式,可以在regex101.com中对其进行修改或更改。

RegEx电路

jex.im可视化正则表达式:

enter image description here