我如何在beautifulsoup中抓取image-src

时间:2019-03-27 19:29:30

标签: python beautifulsoup

我正在尝试通过此代码获取image-src

<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>

我尝试了此代码,但无法正常工作

  

images = soup.find('img')。get('image-src')

通常我使用`get('src'),它可以工作,但是问题是我需要使用image-src,这不起作用

4 个答案:

答案 0 :(得分:1)

如果你想拿 src 你可以这样做...

new_var = soup.find(attrs={"attribute" : "name_attr"})
imageItem= new_var.get('src')

答案 1 :(得分:0)

您可以通过将标签视为字典来访问标签的属性。您可以直接以.attrs

访问该词典。
soup.find('img').attrs['image-src']

答案 2 :(得分:0)

通过查看this文档,我发现适用于这种情况的find_all方法:

这对我有用:

for link in soup.find_all('img'):
    print(link.get('image-src'))

这是我的完整代码:

from bs4 import BeautifulSoup

html_doc = """
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

for link in soup.find_all('img'):
    print(link.get('image-src'))

和结果:

//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg  

答案 3 :(得分:0)

如果id是静态的,则可以使用css id选择器来选择元素,然后使用子集来获取img-src属性

from bs4 import BeautifulSoup as bs

html = '''
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
'''
soup = bs(html, 'lxml')
print(soup.select_one('#limage_32856997152')['image-src'])

如果id不是静态的,并且要定位的对象不止一个,则可能要使用结合了属性的类选择器

srcs = [ img['image-src'] for img in soup.select('.picCore[image-src]')]
print(srcs)

任何image-src,只需使用属性选择器

srcs = [img['image-src'] for img in soup.select('[image-src]')]