使用正则表达式替换第二次出现的子字符串

时间:2014-06-16 17:02:40

标签: php regex

我有一个xml格式的字符串,如下所示。

如何找到第二次出现<para>并替换为<para align='left'>

$str = "<para>From the New York Times</para>

<para>Instacart, a two-year-old grocery delivery company, announced a $44 million round of financing on Monday led by Andreessen Horowitz. Three venture capital firms that previously invested in the company, Sequoia Capital, Khosla Ventures and Canaan Partners, participated in the latest round.</para>

<para>The company, which is based in San Francisco, lets customers shop online from grocery stores in their area. The orders are filled by other people who have signed up to be shoppers and who receive a cut of the delivery fees. Information about a store’s inventory comes from store managers and from the shoppers. The company says it can have groceries delivered within an hour.</para>";

2 个答案:

答案 0 :(得分:3)

使用带有XPath表达式的DOM解析器来实现此目的。表达式//para[2]将仅选择第二个<para>节点。一旦选择了所需的节点,您可以使用DOMDocument提供的可用功能集,随意更改它。在这种情况下,您只需使用setAttribute()设置属性align,其值为left

$dom = new DOMDocument;

libxml_use_internal_errors(true); // Disable error reporting
$dom->loadHTML($str);

$xpath = new DOMXPath($dom);

$secondPara = $xpath->query('//para[2]');
$secondPara->item(0)->setAttribute('align', 'left');

echo $dom->saveHTML();

答案 1 :(得分:2)

使用此正则表达式(请参阅demo):

(?s)\A.*?<para>.*?\K<para>

在你的PHP代码中:

$regex = "~(?s)\A.*?<para>.*?\K<para>~";
$replaced = preg_replace($regex,"<para align='left'>",$string);

解释正则表达式

(?s)                     # set flags for this block (with . matching
                         # \n) (case-sensitive) (with ^ and $
                         # matching normally) (matching whitespace
                         # and # normally)
\A                       # the beginning of the string
.*?                      # any character (0 or more times (matching
                         # the least amount possible))
<para>                   # '<para>'
.*?                      # any character (0 or more times (matching
                         # the least amount possible))
\K                       # 'Keep out:' Abandon what was matched so far.
<para>                   # '<para>'