嗨,我有一个Windows路径对象列表,我正在运行if语句。背景:我有几个csv文件。我的代码检查了这些csv文件。如果csv文件正确,则脚本会将文件移动到名为“存档”的目录中。如果有错误,则移至“错误”,如果为空,则移至“空”。
所以我有一个文件已移至存档。我将此文件复制回基本目录以供脚本处理。 但是,应该捕获该重复项的if语句不会执行,而是脚本尝试将文件移至存档目录。发生这种情况时,由于我使用Path.rename()方法移动文件,因此出现以下错误: FileExistsError:[WinError 183]该文件已存在时无法创建文件:'C:\ Users \ sys_nsgprobeingestio \ Documents \ dozie \ odfs \ odfshistory \ 06_17_2020_FMGN520.csv'->'C:\ Users \ sys_nsgprobeingestio \ Documents \ dozie \ odfs \ odfshistory \ archive \ 06_17_2020_FMGN520.csv'
这些是涉及的功能。有人知道为什么会这样吗?:
def make_dict_of_csvprocessing_dirs():
dir_dict = process_dirconfig_file(dirconfig_file)
# print(dir_dict)
dictofpdir_flist = {} #dictionary of lists of files in different processing dirs
csvbase_file_dir = dir_dict["base_dir"]
csvhistory_Phandler = Path(csvbase_file_dir)
csvbase_path_list = [file for file in csvhistory_Phandler.glob("*.*")]
dictofpdir_flist["csvbase_path_list"] = csvbase_path_list
archive_dir = dir_dict["archive_dir"]
archive_Phandler = Path(archive_dir)
archivefiles_path_set = {file for file in archive_Phandler.rglob("*.*")}
dictofpdir_flist["archivefiles_path_set"] = archivefiles_path_set
发生错误的函数:
def odf_history_from_csv_to_dbtable(db_instance):
odfsdict = db_instance['odfs_tester_history']
#table_row = {}
totalresult_list = []
dir_dict, dictofpdir_flist = make_dict_of_csvprocessing_dirs()
print(dir_dict)
csvbase_path_list = dictofpdir_flist["csvbase_path_list"]
archivefiles_path_set = dictofpdir_flist["archivefiles_path_set"]
for csv in csvbase_path_list: # is there a faster way to compare the list of files in archive and history?
if csv in archivefiles_path_set:
print(csv.name + " is in archive folder already")
else:
csvhistoryfilelist_to_dbtable(csv, db_instance)
df_tuple = process_csv_formatting(csv)
df_cnum, odfscsv_df = df_tuple
if df_cnum == 1:
trg_path = Path(dir_dict['empty_dir'])
csv.rename(trg_path.joinpath(csv.name))
return totalresult_list
当我调试Pycharm时,会得到以下值:请注意目录列表的勾号是如何反转的。我想知道这是否是问题吗?:
archivefiles_path_set={WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/archive/06_17_2020_FMGN520.csv')}
csv = {WindowsPath}C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\06_17_2020_FMGN520.csv
csvbase_path_list =
[WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/06_17_2020_FMGN520.csv')]
答案 0 :(得分:1)
获取要复制的文件的最快方法(如果您是同时访问两个目录的唯一进程):
from os import listdir
basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")
def what_to_copy(frm_dir, to_dir):
return set(os.listdir(frm_dir)).difference(os.listdir(to_dir))
copy_names = what_to_copy(basedir, archdir)
print(copy_names) # you need to prepend the dirs when copying, use os.path.join
看来,您的代码很复杂(将大量内容存储在字典中,以进行转移以再次获取它),只需要完成少量任务即可。这就是它的工作方式:
import os
# boiler plate code to create files and make some of them already "archived"
names = [ f"file_{i}.csv" for i in range(10,60)]
basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")
os.makedirs(basedir, exist_ok = True)
os.makedirs(archdir, exist_ok = True)
def create_files():
for idx, fn in enumerate(names):
# create all files in basedir
with open(os.path.join(basedir,fn),"w") as f:
f.write(" ")
# every 3rd file goes into archdir as well
if idx%3 == 0:
with open(os.path.join(archdir,fn),"w") as f:
f.write(" ")
create_files()
“不”复制文件的功能:
def copy_from_to_if_not_exists(frm,to):
"""'frm' full path to file, 'to' directory to copy to"""
# norm paths so they compare equally regardless of C:/temp or C:\\temp
frm = os.path.normpath(frm)
to = os.path.normpath(to)
fn = os.path.basename(frm)
dir = os.path.dirname(frm)
if dir != to:
if fn in os.listdir(to):
print(fn, " -> already exists!")
else:
# you would copy the file instead ...
print(fn, " -> could be copied")
# print whats in the basedir as well as the archivedir (os.walk descends subdirs)
for root,dirs,files in os.walk(basedir):
print(root + ":", files, sep="\n")
for file in os.listdir(basedir):
copy_from_to_if_not_exists(os.path.join(basedir,file),archdir)
如果硬盘驱动器的读取缓存优化不足以满足您的需求,则可以缓存os.listdir(to)
的结果,但它可能仍然可以使用。
输出:
c:/temp/csvs:
['file_10.csv','file_11.csv','file_12.csv','file_13.csv','file_14.csv','file_15.csv',
'file_16.csv','file_17.csv','file_18.csv','file_19.csv','file_20.csv','file_21.csv',
'file_22.csv','file_23.csv','file_24.csv','file_25.csv','file_26.csv','file_27.csv',
'file_28.csv','file_29.csv','file_30.csv','file_31.csv','file_32.csv','file_33.csv',
'file_34.csv','file_35.csv','file_36.csv','file_37.csv','file_38.csv','file_39.csv',
'file_40.csv','file_41.csv','file_42.csv','file_43.csv','file_44.csv','file_45.csv',
'file_46.csv','file_47.csv','file_48.csv','file_49.csv','file_50.csv','file_51.csv',
'file_52.csv','file_53.csv','file_54.csv','file_55.csv','file_56.csv','file_57.csv',
'file_58.csv','file_59.csv']
c:/temp/csvs\temp:
['file_10.csv','file_13.csv','file_16.csv','file_19.csv','file_22.csv','file_25.csv',
'file_28.csv','file_31.csv','file_34.csv','file_37.csv','file_40.csv','file_43.csv',
'file_46.csv','file_49.csv','file_52.csv','file_55.csv','file_58.csv']
file_10.csv -> already exists!
file_11.csv -> could be copied
file_12.csv -> could be copied
file_13.csv -> already exists!
file_14.csv -> could be copied
file_15.csv -> could be copied
file_16.csv -> already exists!
file_17.csv -> could be copied
file_18.csv -> could be copied
[...snipp...]
file_55.csv -> already exists!
file_56.csv -> could be copied
file_57.csv -> could be copied
file_58.csv -> already exists!
file_59.csv -> could be copied
有关lru_cache缓存功能结果的方法,请参见{{3}}-如果IO读取成为瓶颈,则考虑将os.listdir(archdir)
放入缓存结果的函数中(首先测量,然后优化)>