通过beautifulsoup python从网页中提取数据

时间:2013-05-30 19:29:56

标签: python beautifulsoup

大家好,我正在做一个python脚本,需要从网站提取数据并将日期存储到sqlite3中。我在内容提取方面遇到了问题。这是我做的代码

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
import urllib2
import re

url="http://m.harveynorman.com.au/tv-audio/portable-audio/ipods"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
A=soup.findAll('strong',{'class':'name fn'})
for B in A:

   print = B.renderContents()

,输出就是这样:

 "iPod touch 16GB - White   
   iPod touch 4th Gen 32GB  
 Apple iPod Shuffle 2GB  
 iPod touch 16GB - Black  
 iPod nano 16GB  
  iPod touch 32GB"   

我尝试使用

   print = B.renderContents()[0]

获取指定一个插入sqlite3,但输出如下:

i 
i
A
i
i
i

所以我的问题是如何提取指定的一个(如:iPod touch 16GB - White)???

1 个答案:

答案 0 :(得分:0)

from BeautifulSoup import BeautifulSoup
import urllib2
import re

url="http://m.harveynorman.com.au/tv-audio/portable-audio/ipods"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
A = soup.findAll('strong',{'class':'name fn'})[0]
print(A.renderContents())

产量

iPod touch 16GB - White

for B in A:
    print B.renderContents()[0]

正在打印每行

的第一个字符
iPod touch 16GB - White
iPod touch 4th Gen 32GB
Apple iPod Shuffle 2GB
iPod touch 16GB - Black
iPod nano 16GB
iPod touch 32GB