我如何用美丽的汤从谷歌刮一张图片?

时间:2015-10-30 18:45:12

标签: python beautifulsoup mechanize

我知道这很顽皮,但我只是为了学习而这样做。我知道如何通过xpath获取一个链接以及如何通过使用标签获取所有链接,但我试图通过标签获得一个图像链接,这是我到目前为止所拥有的

它不会返回错误或链接

import urllib
import mechanize
from bs4 import BeautifulSoup
from urlparse import urlparse

def get_pic(search):

    try:
        browser = mechanize.Browser()
        browser.set_handle_robots(False)
        browser.addheaders = [('user-agent','Mozilla')]

        htmltext = browser.open("https://www.google.com/search?site=imghp&tbm=isch&source=hp&biw=1366&bih=648&q=" + search)
        img_urls = []
        soup = BeautifulSoup(htmltext)

        iti = 0

        for link in soup.find_all("a"):
            img = link.get('href')
            img_urls.append(img)
            iti += 1
            if iti == 25:
                break


        print img_urls[24]

    except:
        print "error"

get_pic("ccd")

2 个答案:

答案 0 :(得分:0)

尝试使用link['src']link.get('src')代替href。继承了src和href之间的区别。 Difference between SRC and HREF

答案 1 :(得分:0)

我会选择requests模块 - 但最好尝试使用selenium(仅用于测试目的

import requests
from bs4 import BeautifulSoup

data = {"Host": "www.google.com","User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0","Accept": "image/png,image/*;q=0.8,*/*;q=0.5",
"Accept-Language": "en-US,am;q=0.7,zh-HK;q=0.3","Accept-Encoding": "gzip, deflate","Referer": "https://www.google.com","Cookie": "PREF=ID=1111111111111111:FF=0:LD=en:CR=2:TM=1439993585:LM=1445565646:V=1:S=cGak3Dk6YKLPadm7; NID=72=Kd8i3KUAz64m6HZ9YDGnSAXDTGzj5YOqDuEIq52mLqcOwyyp4LXeUfoK_S76eoOys8GQu0k26e7DCMj2N48l75-mdQDKXzLghZPQMzPGiYH7wt4yVAVDjJ4WrGba5VhogYWnEoDVb3IJbcRJgAxkS29vEYFQGxOkZ_PGtvrFWg_5oR9rbc1XNysRIS0rZGTgBkI0L-FwD-tJSqvHS7R4zKMbxfCZv9u0pIeFbA; OGP=-5061451:","Connection": "keep-alive"}
def get_pic(search):
    img_urls = []
    try:
        res = requests.get("https://www.google.com/search?site=imghp&tbm=isch&source=hp&biw=1366&bih=648&q=" + search,params=data)
        soup = BeautifulSoup(res.content,'html.parser')
        for link in soup.find_all("a"):
            img = link.get('href')
            img_urls.append(img)
        for i in img_urls:
            print i

    except:
        print "error"

get_pic("ccd")

打印

https://www.google.com/search?biw=1366&bih=648&q=ccd&Accept-Language=en-US,am%3Bq%3D0.7,zh-HK%3Bq%3D0.3&Connection=keep-alive&Accept=image/png,image/*%3Bq%3D0.8,*/*%3Bq%3D0.5&Host=www.google.com&Referer=https://www.google.com&um=1&ie=UTF-8&sa=N&tab=iw
https://maps.google.com/maps?biw=1366&bih=648&q=ccd&Accept-Language=en-US,am%3Bq%3D0.7,zh-HK%3Bq%3D0.3&Connection=keep-alive&Accept=image/png,image/*%3Bq%3D0.8,*/*%3Bq%3D0.5&Host=www.google.com&Referer=https://www.google.com&um=1&ie=UTF-8&hl=en&sa=N&tab=il
https://play.google.com/?biw=1366&bih=648&q=ccd&Accept-Language=en-US,am%3Bq%3D0.7,zh-HK%3Bq%3D0.3&Connection=keep-alive&Accept=image/png,image/*%3Bq%3D0.8,*/*%3Bq%3D0.5&Host=www.google.com&Referer=https://www.google.com&um=1&ie=UTF-8&hl=en&sa=N&tab=i8
https://www.youtube.com/results?biw=1366&bih=648&q=ccd&Accept-Language=en-US,am%3Bq%3D0.7,zh-HK%3Bq%3D0.3&Connection=keep-alive&Accept=image/png,image/*%3Bq%3D0.8,*/*%3Bq%3D0.5&Host=www.google.com&Referer=https://www.google.com&um=1&ie=UTF-8&sa=N&tab=i1
https://news.google.com/nwshp?hl=en&tab=in
https://mail.google.com/mail/?tab=im
https://drive.google.com/?tab=io
https://www.google.com/intl/en/options/
http://www.google.com/history/optout?hl=en
/preferences?hl=en
https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/search%3Fsite%3Dimghp%26tbm%3Disch%26source%3Dhp%26biw%3D1366%26bih%3D648%26q%3Dccd%26Accept-Language%3Den-US,am%253Bq%253D0.7,zh-HK%253Bq%253D0.3%26Connection%3Dkeep-alive%26Accept%3Dimage/png,image/*%253Bq%253D0.8,*/*%253Bq%253D0.5%26Host%3Dwww.google.com%26Referer%3Dhttps://www.google.com
/webhp?hl=en
/preferences?q=ccd&biw=1366&bih=648&ie=UTF-8&tbm=isch&sa=F
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&source=lnms&sa=X&ved=0CAQQ_AVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=vid&source=lnms&sa=X&ved=0CAYQ_AVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=nws&source=lnms&sa=X&ved=0CAcQ_AVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=shop&source=lnms&sa=X&ved=0CAgQ_AVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
https://maps.google.com/maps?biw=1366&bih=648&q=ccd&um=1&ie=UTF-8&sa=X&ved=0CAkQ_AVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=bks&source=lnms&sa=X&ved=0CAoQ_AVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=isz:l&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=isz:m&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=isz:i&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=ic:color&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=ic:gray&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=ic:trans&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=itp:face&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=itp:photo&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=itp:clipart&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=itp:lineart&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=itp:animated&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=qdr:d&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=qdr:w&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=sur:fmc&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=sur:fc&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=sur:fm&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&source=lnt&tbs=sur:f&sa=X&ved=0CA8QpwVqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&tbas=0&sa=X&ved=0CBAQuAtqFQoTCKSnl6P76sgCFYETpgodBT0O-g
/url?q=https://en.wikipedia.org/wiki/Charge-coupled_device&sa=U&ved=0CBYQwW4wAGoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNFxLr62u--uN3Gn8LXaisGss7tjpA
/url?q=https://en.wikipedia.org/wiki/Charge-coupled_device&sa=U&ved=0CBgQwW4wAWoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNFxLr62u--uN3Gn8LXaisGss7tjpA
/url?q=http://www.olympusmicro.com/primer/digitalimaging/concepts/fullframe.html&sa=U&ved=0CBoQwW4wAmoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNGRbaM6tPF3q0Lu0KbaQ_3lVdS_NQ
/url?q=http://www.globalspec.com/learnmore/video_imaging_equipment/video_cameras_accessories/ccd_cameras&sa=U&ved=0CBwQwW4wA2oVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNETFZVB3HYh4YhkQf7nLoN7k-TRNQ
/url?q=http://www.gxccd.com/art%3Fid%3D374&sa=U&ved=0CB4QwW4wBGoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNG520BsoU3bRh3HwMvtFRUCSAx9SQ
/url?q=http://hamamatsu.magnet.fsu.edu/articles/frametransfer.html&sa=U&ved=0CCAQwW4wBWoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNGEsiVVqmZA109KsbZDTp1gDGc7AA
/url?q=http://www.directindustry.com/prod/dalsa/product-25439-1173223.html&sa=U&ved=0CCIQwW4wBmoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNEt8IenufbdqPkp2EaBOmt-ET4-gA
/url?q=https://www.microscopyu.com/articles/digitalimaging/digitalintro.html&sa=U&ved=0CCQQwW4wB2oVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHxJf1kM4zJrMojSCn977uhHS3uig
/url?q=https://commons.wikimedia.org/wiki/File:IR.Lowpass.Filter.CCD.jpg&sa=U&ved=0CCYQwW4wCGoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNFWfyl-cnHiTg094w5GJBE6VpQp7g
/url?q=http://oneslidephotography.com/ccd-vs-cmos-dslr-camera-wich-one-is-better/&sa=U&ved=0CCgQwW4wCWoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNGyPh6c7w-CHCIiSUQDpqcO9ZQsyg
/url?q=https://www.microscopyu.com/articles/digitalimaging/ccdintro.html&sa=U&ved=0CCoQwW4wCmoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNEBGoOu8i-EJpb_S04_s67hKww88Q
/url?q=http://www.2mcctv.com/blog/2012_07_25-ccd-vs-cmos-image-sensor-technology/&sa=U&ved=0CCwQwW4wC2oVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHCRYMh0bSBgdrUlM0WN_RV48x_ZQ
/url?q=http://www.globalspec.com/learnmore/video_imaging_equipment/video_cameras_accessories/ccd_cameras&sa=U&ved=0CC4QwW4wDGoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNETFZVB3HYh4YhkQf7nLoN7k-TRNQ
/url?q=http://www.astrosurf.com/cavadore/technical/detectors/chungara/&sa=U&ved=0CDAQwW4wDWoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNGzkrvZ13Dmp4239t-puzALRnJpEg
/url?q=https://commons.wikimedia.org/wiki/File:CCD_sensor.JPG&sa=U&ved=0CDIQwW4wDmoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHh50a1Hi0xQiBOQtOb8BKOTQEqFw
/url?q=http://thegadgetsquare.com/1539/difference-between-cmos-and-ccd-image-sensors/&sa=U&ved=0CDQQwW4wD2oVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHKbhLuw1N3MH1uS_nqHU9QX7Sx9g
/url?q=http://www.digitalbolex.com/global-shutter/&sa=U&ved=0CDYQwW4wEGoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNErEeyDj4kT5z8PAVFDQpB4xuTeZA
/url?q=http://www.olympusmicro.com/primer/digitalimaging/concepts/ccdanatomy.html&sa=U&ved=0CDgQwW4wEWoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHF8Vz-uXRmyIv9egrSpoYYB3NWvA
/url?q=http://kcs.kcjh.ptc.edu.tw/~spt/computer/digital-image/CCD-CMOS.htm&sa=U&ved=0CDoQwW4wEmoVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHFRugkrrJ_MNdqfmGhbKdTzAc9_Q
/url?q=https://commons.wikimedia.org/wiki/File:CCD_Image_sensor.jpg&sa=U&ved=0CDwQwW4wE2oVChMIpKeXo_vqyAIVgROmCh0FPQ76&usg=AFQjCNHkZhfp_OdU2HyR0RzHp6hkSL6baA
/search?biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&q=colony+collapse+disorder&revid=483561372&sa=X&ved=0CD8Q1QIoAGoVChMIpKeXo_vqyAIVgROmCh0FPQ76
/search?biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&q=ccd+catholic&revid=483561372&sa=X&ved=0CEAQ1QIoAWoVChMIpKeXo_vqyAIVgROmCh0FPQ76
/search?biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&q=ccd+school&revid=483561372&sa=X&ved=0CEEQ1QIoAmoVChMIpKeXo_vqyAIVgROmCh0FPQ76
/search?biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&q=colony+collapse+disorder+empty+hive&revid=483561372&sa=X&ved=0CEIQ1QIoA2oVChMIpKeXo_vqyAIVgROmCh0FPQ76
/search?biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&q=ccd+bees&revid=483561372&sa=X&ved=0CEMQ1QIoBGoVChMIpKeXo_vqyAIVgROmCh0FPQ76
/search?biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&q=carbonate+compensation+depth&revid=483561372&sa=X&ved=0CEQQ1QIoBWoVChMIpKeXo_vqyAIVgROmCh0FPQ76
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=20&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=40&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=60&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=80&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=100&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=120&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=140&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=160&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=180&sa=N
/search?q=ccd&biw=1366&bih=648&site=imghp&ie=UTF-8&tbm=isch&ei=b8kzVuSzD4GnmAWF-rjQDw&start=20&sa=N
https://www.google.com/advanced_image_search?biw=1366&bih=648&q=ccd&tbm=isch
https://www.google.com/imghp?hl=en
http://images.google.com/support/?hl=en
/tools/feedback/survey/html?productId=196&hl=en&query=ccd
/
/intl/en/ads
/services
/intl/en/policies/privacy/
/intl/en/policies/terms/
/intl/en/about.html
相关问题