如何使用XPath从段落中提取前三个句子?

时间:2019-04-24 18:14:28

标签: xpath

我需要使用XPath从一个段落中抓取前三个句子(如果存在的话)。

我已经隔离了要使用的段落:

//h3[contains(., 'Synopsis')]/following-sibling::p[1]

哪个将返回一个无格式的普通段落:

What do we do when the world's walls - its family structures, its value-systems, it political forms - crumble? The central character of this novel, 'Moor' Zogoiby, only son of a wealthy, artistic-bohemian Bombay family, finds himself in such a moment of crisis. His mother, a famous painter and an emotional despot, worships beauty, but Moor is ugly, he has a deformed hand. Moor falls in love, with a married woman; when their secret is revealed, both are expelled; a suicide pact is proposed, but only the woman dies. Moor chooses to accept his fate, plunges into a life of depravity in Bombay, then becomes embroiled in a major financial scandal. The novel ends in Spain, in the studio of a painter who was a lover of Moor's mother: in a violent climax Moor has, one more, to decide whether to save the life of his lover by sacrificing his own. 

我只想要前三个句子,并且我愿意宽容并忽略第一个问号,我只想要前三个句号之前的内容。

1 个答案:

答案 0 :(得分:0)

concat(
  substring-before(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'),
  '.',
  substring-before(substring-after(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'), '.'),
  '.',
  substring-before(substring-after(substring-after(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'), '.'), '.'),
  '.'
)

(用XPath做疯狂的事情很有趣,但是在现实生活中,除非绝对没有其他可能性,否则我不会将它用于类似的任务。)