无限循环与美丽的汤4 WHILE声明

时间:2017-01-10 15:33:14

标签: python-3.x beautifulsoup

我正在尝试使用这个简短的脚本从YIFY页面提取数据(因为他们的网站缺少一些基本的过滤器选项),但是虽然它与其他页面完美配合,但它没有显示这个数据。实际上,它在无限循环中运行。

import requests
from bs4 import BeautifulSoup

def praca_crawler(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.yify-torrent.org/search/1080p/t-" + str(page) + "/"
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('a', {'class': 'mv'}):
            title = link.string
            link_url = link.get('href')
            print(title)
            print(url + link_url)
            page += 1

praca_crawler(4)

好像这里有两个问题。 while循环(尽管“page + = 1”没有增加页码,并且还有用于数据的过滤器。 想获得移动标题(没有任何HTML或CSS标签)和链接。

1 个答案:

答案 0 :(得分:0)

import requests
from bs4 import BeautifulSoup

def praca_crawler(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.yify-torrent.org/search/1080p/t-" + str(page) + "/"
        source_code = requests.get(url)
        source_code.raise_for_status()
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for div in soup.findAll('div', {'class': 'mv'}):
            title = div.a.string
            link_url = div.a.get('href')
            print(title)
            print(url + link_url)

        page += 1

praca_crawler(4)

出:

Kommissar Maigret: Ein toter Mann (2016) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50486/download-kommissar-maigret-ein-toter-mann-2016-1080p-mp4-yify-torrent.html
Love Me (2014) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50485/download-love-me-2014-1080p-mp4-yify-torrent.html
Blood Car (2007) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50484/download-blood-car-2007-1080p-mp4-yify-torrent.html
SS Experiment Love Camp (1976) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50481/download-ss-experiment-love-camp-1976-1080p-mp4-yify-torrent.html
Paper Tiger (1975) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50479/download-paper-tiger-1975-1080p-mp4-yify-torrent.html
The Soft Skin (1964) 1080p
https://www.yify-torrent.org/search/1080p/t-1//movie/50477/download-the-soft-skin-1964-1080p-mp4-yify-torrent.html

问题:

  1. page+=1放在for循环之外,您应该在遍历页面时增加数字,而不是每次打印标题时都增加。
  2. <div class="mv"><h3><a href="/movie/50486/download-kommissar-maigret-ein-toter-mann-2016-1080p-mp4-yify-torrent.html" target="_blank" title="Kommissar Maigret: Ein toter Mann (2016) 1080p">Kommissar Maigret: Ein toter Mann (2016) 1080p</a></h3

    1. div的类值是&#39; mv&#39;,而不是a标记