Question

这是我正在使用的HTML，评论是我想要存储的。

<dd>
   Wed Sep 17, 2014 1:11 am
</dd>
<dd>
</dd>
<dd>
   Forum:
   <a href="./viewforum.php?f=12">
      Minewind Chat
   </a>
</dd>
<dd>
   Thread:
   # I wish to grab this href link extension:
   <a href="./viewtopic.php?f=12&amp;t=201&amp;hilit=yeah"> 
      1.8
   </a>
</dd>
<dd>
   Replies:
   <strong>
      3
   </strong>
</dd>
<dd>
   Views:
   <strong>
      108
   </strong>
</dd>

我可以将它打印到我打印两个href链接的地方（不知道我这样做有多高效）：

cleanup = BeautifulSoup(s2.content)

for links in cleanup.find_all("dd"):
    if links.find("a") != None:
        print (links.a['href'])

输出：

./viewforum.php?f=12
./viewtopic.php?f=12&t=201&hilit=yeah

但是如何存储第二行？有什么提示吗？

Answer 1

如果您知道其内容可以与href匹配，例如

if 'viewtopic' in links.a['href']:
    results.append(links.a)

Answer 2

您可以使用一个CSS selection query ：

获取主题链接
topic_links = soup.select('dd a[href*=viewtopic.php]')

这匹配以下链接：

在dd个标签内

在viewtopic.php属性
中包含href

结果是一个只包含匹配的<a>元素的列表：

>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('''\ ... <dd> ... Wed Sep 17, 2014 1:11 am ... </dd> ... <dd> ... </dd> ... <dd> ... Forum: ... <a href="./viewforum.php?f=12"> ... Minewind Chat ... </a> ... </dd> ... <dd> ... Thread: ... # I wish to grab this href link extension: ... <a href="./viewtopic.php?f=12&t=201&hilit=yeah"> ... 1.8 ... </a> ... </dd> ... <dd> ... Replies: ... <strong> ... 3 ... </strong> ... </dd> ... <dd> ... Views: ... <strong> ... 108 ... </strong> ... </dd> ... ''') >>> topic_links = soup.select('dd a[href*=viewtopic.php]') >>> for link in topic_links: ... print link['href'] ... ./viewtopic.php?f=12&t=201&hilit=yeah

BeautifulSoup抓住某个href链接并存储它

2 个答案: