编码URL的字符串 - Python

时间:2015-01-20 03:38:52

标签: python web-scraping urlencode

我有以下需要为网址编码的网址:This is currently the top headline on Reddit TIL Pimps wear lots of gold jewelry bought at pawn shops to “re-pawn” for bail money since cash is confiscated upon arrest but jewelry is not

我遇到了问题,因为此字符串包含unicode字符,特别是引号。

我已尝试urllib.quote_plus(message),但这会引发以下异常:

Traceback (most recent call last):
  File "testProgram.py", line 44, in <module>
    main()                                      # Run
  File "testProgram.py", line 41, in main
    testProgram(headline)                                   # Make phone call
  File "testProgram.py", line 31, in testProgram
    urllib.quote_plus(message)
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1293, in quote_plus
    s = quote(s, safe + ' ')
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1288, in quote
    return ''.join(map(quoter, s))
KeyError: u'\u201c'

有人知道这是为什么吗?

1 个答案:

答案 0 :(得分:4)

如果message是Unicode字符串,请尝试:

urllib.quote_plus(message.encode('utf-8'))
唉,{p> utf-8并不是普遍使用的网址(我不认为有一个普遍接受的标准,唉),但由于其“普遍”性质(每个<),它非常普遍/ strong> Unicode字符可以用utf-8表示,而许多其他流行的编码则不然。)