剥离标头响应 - Python

时间:2016-03-03 07:07:23

标签: python html http parsing

典型的switch ($_GET['function_to_call']) { case 0: { function1($_GET['id']); break; } case 1: { function2(); break; } default: break; } function function1() { echo "ID= ".$_GET['id']; return json_encode(array ( "id" => $param ) ); } function function2() { echo "This is function 2"; } 标题如下所示:

HTTP 1.0

我最简单的方法是将页面的开头(Server: nginx/1.6.2 (Ubuntu) Date: Thu, 03 Mar 2016 07:00:00 GMT Content-Type: text/html Content-Length: 13471 Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT Connection: close ETag: "5674c418-349f" Cache-Control: no-store Accept-Ranges: bytes <!doctype html> // or <!DOCTYPE html> # remaining of the page content here. <!doctype html>标记为<!DOCTYPE html>请求的标题分开?例如

HTTP

效果不佳。我正在寻找一种方法来分割前半部分(标题响应)和后半部分(正文)

1 个答案:

答案 0 :(得分:2)

您可以将find()与响应的小写版本一起使用,如下所示:

response = """
Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.
"""

print response[response.lower().find('<!doctype html>'):]

这将打印:

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.

或者只是搜索<!doctype