使用Python中的请求保存页面内容

时间:2018-12-10 15:50:31

标签: python python-3.x python-2.7 web-scraping python-requests

我一直在尝试使用“请求”模块从网站上访问.txt文件。当我手动使用用户名和密码登录时,可以在浏览器中查看真实数据。

Point Code  Issue Date  Trade Date  Region  Pricing Point   Low High    Average Volume  Deals   Delivery Start Date Delivery End Date
RMTNWW  2018-10-09  2018-10-08  Rocky Mountains Northwest Wyoming Pool  2.910   2.955   2.935   323 44  2018-10-09  2018-10-09
RMTOPAL 2018-10-09  2018-10-08  Rocky Mountains Opal    2.925   3.050   2.965   209 40  2018-10-09  2018-10-09

但是当我尝试使用脚本访问同一页面并使用以下内容打印内容时

print(page.content)

输出显示为html源:

   b'<!DOCTYPE html>\n<html>\n<head>\n\n<meta name="csrf-param" content="authenticity_token"/>\n<meta name="csrf-token" content="s35g4TAUN6+5V8Xi8x7u6f2FwziX3gbW9iY9D45HnEw="/>\n<meta http-equiv="content-type" content="text/html;charset=utf-8">
\n<meta name="description" content="Natural Gas Intelligence">\n<meta name="keywords" content="gas, natural gas, natural gas prices, enery prices, NYMEX, nymex settlement, aga, storage, natural gas data, henry hub, ferc, power, electricity, electric, megawatt, methane, reliability, inside, ngi">\n\n\n\n<meta content="false" name="has-log-view" />\n<!--<meta content="IE=EmulateIE7" http-equiv="X-UA-Compatible"/>
    .
    .
    .

此HTML内的所有内容都没有上面显示的任何标签(点代码,发布日期等),因此我认为这可能是登录问题。登录URL为https://www.naturalgasintel.com/user/login,而文件位于路径https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt中。

我的脚本是:

import requests
with requests.Session() as c:
    data_url = 'https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/'
    username = ''
    password = ''
    login_data = dict(username=username, password=password)
    c.post(data_url, data=login_data, headers={'Referer':'https://www.naturalgasintel.com/'})
    page = c.get('https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt', stream=True)
    print(page.content)

我想使用open函数来保存页面的实际.txt内容而不是html源,在这里我可以使用诸如以下内容将write的内容保存到文件中:

localfile = 'output_{}.csv'
datafile = open(localfile, "w", encoding="utf-8")
datafile.write(page)
datafile.close()

如何获取这些内容而不是html源?

0 个答案:

没有答案