Question

我正在尝试在此网页上提取评论的网址 http://uk.ign.com/games/reviews 然后在单独的标签页中打开前5名

现在，我尝试了不同的选择来尝试选择正确的数据，但似乎没有任何东西返回。我似乎无法提取列表中每个评论的网址，更不用说在单独的标签中打开前5个。

我在Python IDE中使用Python 3

这是我的代码：

var maxDepth = 10;
function cloneObject(obj,depth) {
  if (!depth) depth = 1;
    var clone = {};
    for (var i in obj) {
        if (typeof(obj[i])=="object" && obj[i] != null) {
          try {
            if (obj[i].wowImCloned) clone[i] = '[I\'ve seen you somewhere..]';
            else if (depth >= maxDepth) clone[i]  = '[I\'m not going deeper]'
            else {
              obj[i].wowImCloned = true;
              clone[i] = cloneObject(obj[i],depth+1);
            }
          } catch(err) clone[i] = err.message;

        }
      else if (typeof(obj[i])=="function") clone[i]  = obj[i].toString()
      else clone[i] = obj[i];   
    }
    return clone;
}
var clone = cloneObject(window)
//console.log(JSON.stringify(clone))

Answer 1

使用bs4，BeautifulSoup及其返回的soup对象（您拥有webPage，可以调用：

webLinks = webPage.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

find_all根据标题返回元素列表（在您的情况下，a。这些是HTML元素;要获取您需要更进一步的链接。您可以访问HTML元素的属性（在您的情况下，您希望 href ）就像 dict ：

for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']

有关详细信息，请参阅BeautifulSoup getting href。或者当然是docs

ps python通常用snake_case而不是CamelCase编写：）

如何从IGN网站提取网址链接

1 个答案: