如何使用python在文本中找到所有图像uris?

时间:2015-12-08 04:53:03

标签: python regex

我想在这样的文字中得到img uris:

    hello bla

    <br> <img src="/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif" class="someclass" />
    some blablabla 
    <br> <img src="/media/photos/344/tgrfgregfwe_540.jpg" class="otherclass" /> 
    </br>
   more blabla

所以结果应该是:

['/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif', '/media/photos/344/tgrfgregfwe_540.jpg']

2 个答案:

答案 0 :(得分:2)

尝试BeautifulSoup

>>> soup = BeautifulSoup(html, "html.parser")
>>> for i in soup.find_all('img'):
...     print(i.get('src'))
...     
... 
/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif
/media/photos/344/tgrfgregfwe_540.jpg

>>> [i.get('src') for i in soup.find_all('img')]
['/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif', '/media/photos/344/tgrfgregfwe_
540.jpg']
>>> 

答案 1 :(得分:0)

我们有xml解析器让我们的事情变得简单。

from xml.dom import minidom
image = "<img src='/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif' class='someclass' />"
xml_object = minidom.parseString(image)
image_tags = image_xml.getElementsByTagName('img')
list_of_srcs = []
for image_tag in image_tags:
    list_of_srcs.append(image_tag.getAttributeNode('src').value)
print list_of_srcs