Python Web Crawler不打印任何结果

时间:2015-04-15 10:00:22

标签: python web-crawler

我正在尝试在Python中创建一个简单的Web爬虫,当我运行它时,它没有显示任何错误,但它也没有按预期打印任何结果。 我已将当前的代码放在下面,有人可以指出我的方向吗?

import requests
from bs4 import BeautifulSoup

def stepashka_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "http://online.stepashka.com/filmy/#/page/" + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for resoult in soup.findAll("a", {"class": "video-title"}):
            href = resoult.get(href)
            print(href)
        page += 1

stepashka_spider(1)

1 个答案:

答案 0 :(得分:6)

"video-title"位于div标记中,您还需要传递字符串"href"

def stepashka_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "http://online.stepashka.com/filmy/#/page/" + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for resoult in soup.findAll("div", {"class": "video-title"}):
            a_tag = resoult.a
            print(a_tag["href"])
        page += 1

stepashka_spider(1)

输出:

http://online.stepashka.com/filmy/komedii/37878-klub-grust.html
http://online.stepashka.com/filmy/dramy/37875-kadr.html
http://online.stepashka.com/filmy/multfilmy/37874-betmen-protiv-robina.html
http://online.stepashka.com/filmy/fantastika/37263-hrustalnye-cherepa.html
http://online.stepashka.com/filmy/dramy/34369-bozhiy-syn.html
http://online.stepashka.com/filmy/trillery/37873-horoshee-ubiystvo.html
http://online.stepashka.com/filmy/trillery/34983-zateryannaya-reka.html
http://online.stepashka.com/filmy/priklucheniya/37871-totem-volka.html
http://online.stepashka.com/filmy/fantastika/35224-zheleznaya-shvatka.html
http://online.stepashka.com/filmy/dramy/37870-bercy.html

您实际上使用了错误的网址格式,我们也可以使用范围而不是循环:

def stepashka_spider(max_pages):
    for page in range(1,max_pages+1):
        url = "http://online.stepashka.com/filmy/page/{}/".format(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        print("Movies for page {}".format(page))
        for resoult in soup.findAll("div", {"class": "video-title"}):
            a_tag = resoult.a
            print(a_tag["href"])
        print()

输出:

Movies for page 1
http://online.stepashka.com/filmy/dramy/37895-raskop.html
http://online.stepashka.com/filmy/semejnyj/36275-domik-v-serdce.html
http://online.stepashka.com/filmy/dramy/35371-enni.html
http://online.stepashka.com/filmy/trillery/37729-igra-na-vyzhivanie.html
http://online.stepashka.com/filmy/trillery/37893-vosstavshie-mertvecy.html
http://online.stepashka.com/filmy/semejnyj/30104-sedmoy-syn-seventh-son-2013-treyler.html
http://online.stepashka.com/filmy/dramy/37892-sekret-schastya.html
http://online.stepashka.com/filmy/uzhasy/37891-davayte-poohotimsya.html
http://online.stepashka.com/filmy/multfilmy/3404-specagent-archer-archer-archer-2010-2013.html
http://online.stepashka.com/filmy/trillery/37334-posledniy-reys.html

Movies for page 2
http://online.stepashka.com/filmy/komedii/37890-top-5.html
http://online.stepashka.com/filmy/komedii/37889-igra-v-doktora.html
http://online.stepashka.com/filmy/dramy/36651-vrozhdennyy-porok.html
http://online.stepashka.com/filmy/komedii/37786-superforsazh.html
http://online.stepashka.com/filmy/fantastika/35003-voshozhdenie-yupiter.html
http://online.stepashka.com/filmy/sport/37888-ufc-on-fox-15-machida-vs-rockhold.html
http://online.stepashka.com/filmy/semejnyj/37558-prizrak.html
http://online.stepashka.com/filmy/boeviki/36865-mordekay.html
http://online.stepashka.com/filmy/dramy/37884-stanovlenie-legendy.html
http://online.stepashka.com/filmy/trillery/37883-tainstvo.html

Movies for page 3
http://online.stepashka.com/filmy/dramy/37551-nochnoy-beglec.html
http://online.stepashka.com/filmy/dramy/37763-mech-drakona.html
http://online.stepashka.com/filmy/trillery/36471-paren-po-sosedstvu.html
http://online.stepashka.com/filmy/dramy/36652-amerikanskiy-snayper.html
http://online.stepashka.com/filmy/dramy/37555-feniks.html
http://online.stepashka.com/filmy/semejnyj/35156-gnezdo-drakona-vosstanie-chernogo-drakona.html
http://online.stepashka.com/filmy/kriminal/37882-ch-b.html
http://online.stepashka.com/filmy/priklucheniya/37881-admiral-bitva-za-men-ryan.html
http://online.stepashka.com/filmy/trillery/37880-malyshka.html
http://online.stepashka.com/filmy/trillery/36417-poteryannyy-ray.html