如何使用没有元素ID的XPath获取HTML元素的内容?

时间:2016-01-08 13:10:33

标签: html xpath

我正在尝试使用xpath查找元素并获取元素文本值。请耐心地帮助我解决问题。

访问Click here

访问Click here

1

<?php if($argv[1] == "hello"): echo "goodbye"; elseif($argv[1] == "echo"): echo "echo"; else: echo "response"; endif; ?> 中 - 我需要将段落文本提取到“进一步的历史”(即停在“进一步的历史”,不包括“进一步的历史”)。

2

<div class=“medium-8 columns”> - 这里我需要在“进一步的历史”之后提取段落文本(不包括“进一步的历史”)。

我正在使用XPath表达式,它返回任何内容。

  

(// STRONG [not(contains(text(),&#39; Further History&#39;))/ following-sibling :: text()| // STRONG [not(contains(text(), &#39;进一步的历史&#39;))/ / / follow-sibling :: p / text())| // div [包含(@class,&#39; articlecontent&#39;)]

1 个答案:

答案 0 :(得分:1)

HTML可能不区分大小写,但XML(以及因此,XPath)是:“STRONG”与“strong”不同,并且在您链接的HTML中,只有“强”。< / p>

用于检索您感兴趣的文本的有用XPath表达式可能是

//div[@class="medium-8 columns"]/p[following-sibling::p/strong]/text()

表示

//div                           select all `div` elements, anywhere in the document
[@class="medium-8 columns"]     but only if they have a `class` attribute whose value is 
                                equal to "medium-8 columns"
/p                              of those `div` elements select all `p` child elements
[following-sibling::p/strong]   but only if they have a following sibling `p` which has a
                                `strong` element as a child
/text()                         of the remaining `p` elements, select the text content

将返回(个别结果由------分隔):

Tim Bajarin is recognized as one of the leading industry
consultants, analysts and futurists, covering the field of
personal computers and consumer technology. Mr. Bajarin has
been with Creative Strategies since 1981 and has served as a
consultant to most of the leading hardware and software
vendors in the industry including IBM, Apple, Xerox, Hewlett
Packard/Compaq, Dell, AT&amp;T, Microsoft, Polaroid, Lotus,
Epson, Toshiba and numerous others.
-----------------------
His articles and/or analysis have appeared in USA Today, Wall
Street Journal, The New York Times, Time and Newsweek
magazines, BusinessWeek and most of the leading business and
trade publications. He has appeared as a business analyst
commenting on the computer industry on all of the major
television networks and was a frequent guest on PBS’ The
Computer Chronicles.
-----------------------
Mr. Bajarin has been a columnist for US computer industry
publications such as PC Week and Computer Reseller News and
wrote for ABCNEWS.COM for two years and Mobile Computing for
10 years. His columns currently appear in Asia Computer
Weekly, Personal Computer World (UK), and Microscope (UK) as
well as Mobile Enterprise Magazine. His various columns and
analyses are syndicated in over 30 countries.

对于你的第二个案例:

  

这里我需要在“进一步的历史”之后提取段落文本(不包括“进一步的历史”)

只需在路径表达式中将following-sibling替换为preceding-sibling