Question

url = "http://" + str(input)
t = urllib.request.urlopen(url)

如何在.txt文件中保存任何网站的源代码？我使用python版本3

Answer 1

有多种方法可以完成这项工作。

第1步：获取数据

这可以使用您选择的任何库来完成，我个人最喜欢的是请求，代码如下

import requests
headers = {'User-agents':'Mozilla/5.0'}
html_data = requests.get('Your url goes here',headers=headers)

此代码将对象存储在某个位置，以获取您可以使用的文本格式的数据

html_data = html_data.text

步骤2：将此数据保存到本地计算机上的文本文件

file = open('your file path goes here','ab') //this will open the file you have specified in the path
file.write(html.text.encode('UTF-8')) //Most of the HTML pages are encoded in ascii, you need to convert it into 'UTF-8' encoding to write it into a txt file.
file.close() //Close the file. all the mishaps in the world will happen if you don't close the file which is opened

这会将网站上的所有html代码保存到您在路径中提到的文本文件中。

如果您明确提到在网站中保存可见数据，请尝试使用一些解析器库，我推荐使用BeautifulSoup。

以下是使用和推荐的库的实际python文档的链接。

Lib - 请求 - link to the documentation
Lib - BeautifulSoup - Link to the Documentation

Answer 2

有很多关于此的视频和教程，但仍然是：

import urllib

t = urllib.urlopen(url).read()

with open("c:\\source_code.txt",'w') as source_code:
    source_code.write(t)

Answer 3

这是最快捷的方式：

import urllib.request
a = str(input())
url = "http://" + a
urllib.request.urlretrieve(url, 'page.txt')

请注意，网站可能并非始终为http://而input()始终为()

如何在.txt文件中保存网站的源代码？

3 个答案:

第1步：获取数据

步骤2：将此数据保存到本地计算机上的文本文件