美丽的汤解码字符串

时间:2014-05-25 12:31:33

标签: python beautifulsoup

我正在尝试从以下位置提取和解码电话号码:

<script>eval(unescape("document.write('%u003c%u0062%u0072%u003e%u003c%u0064%u0069%u0076%u0020%u0063%u006c%u0061%u0073%u0073%u003d%u0022%u0074%u0065%u006c%u0065%u0066%u006f%u006e%u006f%u0073%u0022%u003e%u0020%u003c%u0069%u006d%u0067%u0020%u0077%u0069%u0064%u0074%u0068%u003d%u0022%u0031%u0032%u0022%u0020%u0068%u0065%u0069%u0067%u0068%u0074%u003d%u0022%u0031%u0030%u0022%u0020%u0073%u0072%u0063%u003d%u0022%u0068%u0074%u0074%u0070%u003a%u002f%u002f%u0038%u0039%u002e%u0032%u0030%u0032%u002e%u0031%u0036%u0032%u002e%u0036%u0030%u002f%u0069%u006d%u0061%u0067%u0065%u006e%u0065%u0073%u002f%u0074%u0065%u0066%u002e%u0067%u0069%u0066%u0022%u003e%u0036%u0033%u0036%u0030%u0034%u0039%u0039%u0031%u0038%u003c%u002f%u0064%u0069%u0076%u003e')"))</script>  

我如何解码和提取电话号码(636049918)?

谢谢!

1 个答案:

答案 0 :(得分:1)

s = "%u003c%u0062%u0072%u003e%u003c%u0064%u0069%u0076%u0020%u0063%u006c......"
s = s.replace("%", "\\")

print s.decode('unicode-escape')

并解析数字:

s = "%u003c%u0062%u0072%u003e%u003c%u0064%u0069%u0076%u0020%u0063%u006c......"
s = s.replace("%", "\\")

html = s.decode('unicode-escape')
html = BeautifulSoup(html)

print html.find("img").text