Question

有一个锚标记，有时后跟一个或两个span标记。我必须根据与

中的文本进行相等比较来选择锚点的href

所有三个标签（achor，兄弟跨度1，兄弟跨度2）
两个标签（锚，兄弟1）
仅限于锚标记

在任何时候，对于锚的特定排列，兄弟跨度1和兄弟跨度2，上述之一都是正确的。如果在上述任何一种标签中找到文本，我想要那个锚标签＃s href进行进一步处理。

示例：请考虑以下HTML代码段

<table class="table table-striped" width="95%">
    <tbody>
      <tr>
        <td ><span class="badge">P</span>
    <a href="/abc" title="Title of anchor">some text</a>
    (
    <span style="font-weight:600;color:#666">ABC</span> 
    <span style="font-weight:600;color:#666">DEF</span>
    )
      </td>
      </tr>
    </table>

现在，我想从锚，跨度和跨度的这种排列得到所有文本，即＃34;一些文字ABC DEF＆＃34;，我将检查它是否包含我的字符串恰好是ABC DEF（完整的字符串应该在文本中）并且现在是时间来获取锚点的href，因为我的字符串在文本中。

Answer 1

我建议单独检查它们，因为xpath可能非常复杂，甚至可能使程序变慢。

另一个提示是创建一个选择器，只有你知道的部分包含必要的信息（如果整个文档很大，这将有很大帮助）：

from scrapy import Selector
...
sel = Selector(text=response.css('table.table').extract_first())
anchor_selector = sel.css('a')
anchor_text = anchor_selector.css('::text').extract_first()
span_siblings = anchor_selector.xpath('./following-sibling::span/text()').extract()
# now play with anchor_text and the list of span_siblings

需要一个xpath表达式来提取特定节点及其两个兄弟节点（如果它们在那里）

1 个答案: