美体汤身体标签损坏

时间:2015-11-13 18:29:56

标签: python python-2.7 beautifulsoup

我想以表格方式显示网页的内容:http://movie.webindia123.com/movie/showtimes/asp/search_result.asp?language=57&district_name=42&city_name=118但是当我使用汤时,身体标签似乎被每个角色之间的空间损坏。我使用的源代码:

from bs4 import BeautifulSoup
import requests

url="http://movie.webindia123.com/movie/showtimes/asp/search_result.asp?language=57&district_name=42&city_name=118"
r = requests.get(url)
soup = BeautifulSoup(r.text)
print soup

for hit in soup.findAll(attrs={'class' :'section group'}):
   text=hit
   print text.get_text()

1 个答案:

答案 0 :(得分:0)

请使用JSON模块访问Web文档,之后使用漂亮的汤解析文档。 下面给出了代码片段:

        #Get HTML
cj = cookielib.CookieJar()
browser = mechanize.Browser()
cj = mechanize.LWPCookieJar()
browser.set_cookiejar(cj)
#browser = mechanize.OpenerFactory().build_opener(mechanize.HTTPCookieProcessor(cj))
#request = mechanize.Request(url)
response = browser.open(url)
html = response.read()
browser.select_form(name="trace")
browser["mobilenumber"] = str(site)
browser.submit()
html=browser.response().read()
#print browser.geturl() 
#print html
#Parse HTML with BeautifulSoup
soup = BeautifulSoup(html,"lxml")