Question

我尝试从此表中获取href

<div class="squad-container">
  <table class="table squad sortable" id="page_team_1_block_team_squad_8-table">
    <thead>
      <tr class="group-head">
        <th colspan="4">Goalkeepers </th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="width:50px;"><a href="/474798/" style="display:block;width:50px; height:50px;">Reda Sayed</a></td>
        <td style="vertical-align: top;">
          <div><a href="/474798/" >Reda Sayed</a></div>
          <div style="padding-left: 27px;">25 years old</div>
          </td>
      </tr>
    </tbody>

我用

response.xpath('//table[@class="table squad sortable"]//tr//td//a/@href').extract_first()

并且没有合作我需要知道代码中的问题是什么，如果我使用双//或单斜杠，有什么不同

Answer 1

从我们人类的角度来看，我认为你的xpath没有任何问题。但是，xpath或css可能与您的蜘蛛视角不同，即您的蜘蛛可能会以不同方式“看到”页面。

尝试使用'scrapy shell'来测试你的xpath或css，看看是否可以提取任何数据。以下是您需要的文档链接：https://doc.scrapy.org/en/latest/topics/shell.html

总结一下：修改你写的xpath，'因为你的蜘蛛找不到任何带有xpath的数据，scrapy shell可以帮助你。:)

scrapy xpath从表

1 个答案: