从href解析文本

时间:2014-08-28 00:12:28

标签: python parsing web beautifulsoup

我想从href中解析文本。网站上的代码如下所示:

<ul class="ListSearches">
<li>
<a href="/example.com">Textiwant</a>
</li>

我用这样的东西尝试过它

from bs4 import BeautifulSoup
import requests

r  = requests.get("http://www.example.com")

data = r.text

soup = BeautifulSoup(data)

for ul in soup.find_all('li'):
    print(ul)

我得到了这个输出

<li><button class="button grey" id="btnEurope">Europe</button></l <li><button class="button grey" id="btnAsia">Asia</button></li>

当我只想要来自href

的文本时

1 个答案:

答案 0 :(得分:0)

from bs4 import BeautifulSoup
import requests

r  = requests.get("http://www.example.com")

data = r.text

soup = BeautifulSoup(data)

for ul in soup.find_all('li'):
    try:
        print ul.find_all("a")[0]['href']
    except:
        print "sorry Kenny94 gulliUser. this time soup failed  "