如何在没有结束标记的自定义动态html标记之间获取文本

时间:2018-08-25 18:15:59

标签: php html regex parsing

我的文本由多个带有部分动态名称的自定义标签分隔,并且没有结束标签。

我需要的是在自定义标签之间获取文本的所有各个部分,包括标签。

对于文本的最后一部分,我只能在 标记之后获得文本,因为它没有结束标记。

我已经看到很多类似的问题,但是我发现它们不足以解决我的问题。

示例:

<*fixedTagName|Dynamic part of tag name> // * and | are included in fixed part of tag name
                                   //dynamic part can have spaces between words

  Random text I need to get of unknown length

  some paragraphs of text can start like this(look bellow)

  » name: value
  » name: value

<*fixedTagName|Dynamic part of tag>

  More random text I need to get

<*fixedTagName|Dynamic part of tag>

  Final part of random text I need to get

2 个答案:

答案 0 :(得分:1)

要在正则表达式匹配之间获取文本,可以使用preg_split函数:

$result = preg_split('/<\*[^|]+\|[^>]+>/', $input);

在此正则表达式中:

  • <\*<*匹配;
  • [^|]+|以外的任何符号匹配1 .. *次;
  • \||匹配;
  • [^>]+>以外的任何符号匹配1 .. *次;
  • >>匹配。

使用此输入:

$input = <<<EOL
<*fixedTagName|Dynamic part of tag name> // * and | are included in fixed part of tag name
                                   //dynamic part can have spaces between words

  Random text I need to get of unknown length

  some paragraphs of text can start like this(look bellow)

  » name: value
  » name: value

<*fixedTagName|Dynamic part of tag>

  More random text I need to get

<*fixedTagName|Dynamic part of tag>

  Final part of random text I need to get
EOL;

$result将是这样的字符串数组:

Array
(
    [0] => 
    [1] =>  // * and | are included in fixed part of tag name
                                   //dynamic part can have spaces between words

  Random text I need to get of unknown length

  some paragraphs of text can start like this(look bellow)

  » name: value
  » name: value


    [2] => 

  More random text I need to get


    [3] => 

  Final part of random text I need to get
)

答案 1 :(得分:0)

我认为这个StackOverflow答案充分说明了如何执行此操作:https://stackoverflow.com/a/3577662/7578179