使用Python

时间:2018-01-30 20:19:50

标签: python python-3.x file text

我有一个文本文件,其中所有文件都具有相同的结束字符(N),用于标识系统的进度。如果程序通过错误或其他中断结束,我想将结束字符更改为“Y”,以便在重新启动程序时将搜索直到一行具有结束字符“N”并从那里开始工作。下面是我的代码以及文本文件中的示例。

更新代码:

def GeoCode():
    f = open("geocodeLongLat.txt", "a")
    with open("CstoGC.txt",'r') as file:
        print("Geocoding...")
        new_lines = []
        for line in file.readlines():
            check = line.split('~')
            print(check)
            if 'N' in check[-1]:
                geolocator = Nominatim()
                dot_number, entry_name, PHY_STREET,PHY_CITY,PHY_STATE,PHY_ZIP = check[0],check[1],check[2],check[3],check[4],check[5] 
                address = PHY_STREET + " " + PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
                f.write(dot_number + '\n')
                try:
                    location = geolocator.geocode(address)
                    f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
                except AttributeError:
                    try:
                        address = PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
                        location = geolocator.geocode(address)
                        f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
                    except AttributeError:
                        print("Cannot Geocode")
            check[-1] = check[-1].replace('N','Y')
        new_lines.append('~'.join(check))

    with open('CstoGC.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
        for line in new_lines:
            file.writelines(line)        

    f.close()

输出:

2967377~DARIN COLE~22112 TWP RD 209~ALVADA~OH~44802~Y
WAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
143608~LARRY A PETERSON & DONNA M PETERSON~W6359 450TH AVE~ELLSWORTH~WI~54011~N
635528~JAMES E WEBB~3926 GREEN ROAD~SPRINGFIELD~TN~37172~N
805496~WAYNE MLADY~22272 135TH ST~CRESCO~IA~52136~N
704996~SAVINA C MUNIZ~814 W LA QUINTA DR~PHARR~TX~78577~N
893169~BINDEWALD MAINTENANCE INC~213 CAMDEN DR~SLIDELL~LA~70459~N
948130~LOGISTICIZE LTD~861 E PERRY ST~PAULDING~OH~45879~N
438760~SMOOTH OPERATORS INC~W8861 CREEK ROAD~DARIEN~WI~53114~N
518872~A B C RELOCATION SERVICES INC~12 BOCKES ROAD~HUDSON~NH~03051~N
576143~E B D ENTERPRISES INC~29 ROY ROCHE DRIVE~WINNIPEG~MB~R3C 2E6~N
968264~BRIAN REDDEMANN~706 WESTGOR STREET~STORDEN~MN~56174-0220~N
721468~QUALITY LOGISTICS INC~645 LEONARD RD~DUNCAN~SC~29334~N

正如您所看到的,我已经通过使用x来跟踪我所处的行。我应该使用像file.readlines()这样的东西吗?

文本文件样本:

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N

谢谢!

编辑:感谢@idlehands

更新了代码

2 个答案:

答案 0 :(得分:1)

有几种方法可以做到这一点。

选项#1

我原来的想法是使用tell()seek()方法返回几步,但很快就会显示当您未在{中打开文件时 - 无法方便地执行此操作{1}}并且绝对不在bytes的{​​{1}}循环中。你可以在这里看到参考线程:

Is it possible to modify lines in a file in-place?
How to solve "OSError: telling position disabled by next() call"

调查导致了这段代码:

for

在第一个引用的线程中,提到的答案中的一个可能会引发潜在的问题,并且在读取缓冲区时直接修改缓冲区上的字节可能被认为是一个坏主意。许多专业人士可能会骂我甚至暗示它。

选项#2a

(如果文件大小不是很大)

readlines()

这种方法首先将所有行加载到内存中,因此您可以在内存中进行修改,但只保留缓冲区。然后重新加载文件并写入已更改的行。需要注意的是,从技术上讲,您将逐行重写整个文件 - 而不仅仅是字符串with open('file.txt','rb+') as file: line = file.readline() # initiate the loop while line: # continue while line is not None print(line) check = line.split(b'~')[-1] if check.startswith(b'N'): # carriage return is expected for each line, strip it # ... do stuff ... # file.seek(-len(check), 1) # place the buffer at the check point file.write(check.replace(b'N', b'Y')) # replace "N" with "Y" line = file.readline() # read next line ,即使它是唯一改变的。

选项#2b

从技术上讲,你可以从开始就以with open('file.txt','r') as file: new_lines = [] for line in file.readlines(): check = line.split('~') if 'N' in check[-1]: # ... do stuff ... # check[-1] = check[-1].replace('N','Y') new_lines.append('~'.join(check)) with open('file.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file! for line in new_lines: file.writelines(line) 模式打开文件,然后在迭代完成之后执行此操作(仍然在N块内,但在循环之外):

r+

我不确定这与选项#1的区别是什么,因为您仍在同一时间阅读和修改文件。如果有更精通IO /缓冲/内存管理的人想要进入,请做。

选项2a / b的缺点是你总是最终存储和重写文件中的行,即使你只剩下几行需要从'N'到' Y'。

结果(适用于所有解决方案):

with

如果您要说,在# ... new_lines.append('~'.join(check)) # file.seek(0) for line in new_lines: file.writelines(line) 开头的行遇到中断,该文件将变为:

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~Y
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~Y
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~Y
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~Y
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~Y

这些方法有利有弊。试着看哪一个最适合你的用例。

答案 1 :(得分:0)

我会将整个输入文件读入一个列表,并将.pop()一行一行地删除。如果出现错误,请将弹出的项追加到列表中并写入覆盖输入文件。这样它总是最新的,你不需要任何其他逻辑。

相关问题