我正在使用以下代码从2个站点中抓取特定数据。
firstclass = input("First class: ")
nestedclass = input("Nested class: ")
classend = input("Class close tag: ")
exportlist = []
def getNames(i): #i is the html string.
i=str(i)
check = i.find(firstclass)
while check != -1:
logging("Making new loop...") #function to show the message together with time in the console
i = str(i)
i = i.replace(firstclass, '\n', 1)
logging("progress = 25%")
i = i.split('\n')
i = str(i[1])
logging("progress = 50%")
i = i.replace(nestedclass, '\n', 1)
i = i.split('\n')
logging("progress = 75%")
i = str(i[1])
i = i.replace(classend, '\n', 1)
logging("Loop done ! ")
i = i.split('\n')
exportlist.append(i[0])
i = str(i[1])
check = i.find(firstclass)
if check < 500 and check!= -1: #This part removes the next data piece,
logging("In short Check") #if it's very close to the previous one.
i = str(i) #In case of double data in short distance
i = i.replace(firstclass, '\n', 1)
i = i.split('\n')
i = str(i[1])
i = i.replace(nastedclass, '\n', 1)
i = i.split('\n')
i = str(i)
i = i.replace(classend, '\n', 1)
i = i.split('\n')
i = str(i)
check = i.find(firstclass), '\n', 1)
这是代码中的部分,我遇到了大多数问题。最近2周,它运行缓慢,但是还不错。即使经过10到20分钟,我也得到了正确的结果。但是也许由于文件大小的不断增加,现在我可以在运行文件时得到它:
Memmory error.
我试图删除一流的东西,但是没有一流的东西就无法正常工作,因为嵌套的东西也存在于一流的外面,并且带来了错误的结果。那么,有什么建议可以更好地编写该代码?
此外,我正在使用Python 64位。有64GB RAM,在这种情况下,我认为足够了。如果有办法增加python使用的内存,我已经准备好了。