Python3:来自urlopen的字节数组解码

时间:2013-06-10 19:41:10

标签: python python-3.x web-crawler urlopen utf8-decode

我正在尝试使用python在网页上找到一些单词(只是为了练习),但我一直遇到问题。就是这样:

url = 'someWikipage'
hdrs = { 'User-Agent': "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11" } 
req = request.Request(url,None,hdrs)
response = urlopen(req)
htmlBytes = response.read()
htmlBytes.decode('utf-8')

它在最后一行刹车,给我一个错误(一个常见的错误);

UnicodeEncodeError: 'charmap' codec can't encode character '\u2010' in position 18573: character maps to <undefined>

有关如何预防或忽视此问题的任何想法?

0 个答案:

没有答案