Question

嗨，我有一个Windows路径对象列表，我正在运行if语句。背景：我有几个csv文件。我的代码检查了这些csv文件。如果csv文件正确，则脚本会将文件移动到名为“存档”的目录中。如果有错误，则移至“错误”，如果为空，则移至“空”。

所以我有一个文件已移至存档。我将此文件复制回基本目录以供脚本处理。但是，应该捕获该重复项的if语句不会执行，而是脚本尝试将文件移至存档目录。发生这种情况时，由于我使用Path.rename（）方法移动文件，因此出现以下错误： FileExistsError：[WinError 183]该文件已存在时无法创建文件：'C：\ Users \ sys_nsgprobeingestio \ Documents \ dozie \ odfs \ odfshistory \ 06_17_2020_FMGN520.csv'->'C：\ Users \ sys_nsgprobeingestio \ Documents \ dozie \ odfs \ odfshistory \ archive \ 06_17_2020_FMGN520.csv'

这些是涉及的功能。有人知道为什么会这样吗？：

def make_dict_of_csvprocessing_dirs():
    dir_dict = process_dirconfig_file(dirconfig_file)
    # print(dir_dict)
    dictofpdir_flist = {} #dictionary of lists of files in different processing dirs
    csvbase_file_dir = dir_dict["base_dir"]
    csvhistory_Phandler = Path(csvbase_file_dir)
    csvbase_path_list = [file for file in csvhistory_Phandler.glob("*.*")]
    dictofpdir_flist["csvbase_path_list"] = csvbase_path_list

    archive_dir = dir_dict["archive_dir"]
    archive_Phandler = Path(archive_dir)
    archivefiles_path_set = {file for file in archive_Phandler.rglob("*.*")}
    dictofpdir_flist["archivefiles_path_set"] = archivefiles_path_set

发生错误的函数：

def odf_history_from_csv_to_dbtable(db_instance):
    odfsdict = db_instance['odfs_tester_history']
    #table_row = {}
    totalresult_list = []

    dir_dict, dictofpdir_flist = make_dict_of_csvprocessing_dirs()
    print(dir_dict)
    csvbase_path_list = dictofpdir_flist["csvbase_path_list"]
    archivefiles_path_set = dictofpdir_flist["archivefiles_path_set"]

    for csv in csvbase_path_list:  # is there a faster way to compare the list of files in archive and history?
        if csv in archivefiles_path_set:
            print(csv.name + " is in archive folder already")
        else:
            csvhistoryfilelist_to_dbtable(csv, db_instance)
            df_tuple = process_csv_formatting(csv)
            df_cnum, odfscsv_df = df_tuple
            if df_cnum == 1:
                trg_path = Path(dir_dict['empty_dir'])
                csv.rename(trg_path.joinpath(csv.name))

    return totalresult_list

当我调试Pycharm时，会得到以下值：请注意目录列表的勾号是如何反转的。我想知道这是否是问题吗？：

archivefiles_path_set={WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/archive/06_17_2020_FMGN520.csv')}

csv = {WindowsPath}C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\06_17_2020_FMGN520.csv

csvbase_path_list = 
[WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/06_17_2020_FMGN520.csv')]

Answer 1

获取要复制的文件的最快方法（如果您是同时访问两个目录的唯一进程）：

from os import listdir 

basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")

def what_to_copy(frm_dir, to_dir):
    return set(os.listdir(frm_dir)).difference(os.listdir(to_dir))

copy_names = what_to_copy(basedir, archdir)
print(copy_names) # you need to prepend the dirs when copying, use os.path.join

看来，您的代码很复杂（将大量内容存储在字典中，以进行转移以再次获取它），只需要完成少量任务即可。这就是它的工作方式：

import os

# boiler plate code to create files and make some of them already "archived"
names = [ f"file_{i}.csv" for i in range(10,60)]
basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")

os.makedirs(basedir, exist_ok = True)
os.makedirs(archdir, exist_ok = True)

def create_files():
    for idx, fn in enumerate(names):
        # create all files in basedir
        with open(os.path.join(basedir,fn),"w") as f:
            f.write(" ")
        # every 3rd file goes into archdir as well
        if idx%3 == 0:
            with open(os.path.join(archdir,fn),"w") as f:
                f.write(" ")


create_files()

“不”复制文件的功能：

def copy_from_to_if_not_exists(frm,to):
    """'frm' full path to file, 'to' directory to copy to"""
    # norm paths so they compare equally regardless of C:/temp or C:\\temp
    frm = os.path.normpath(frm)
    to =  os.path.normpath(to)

    fn  = os.path.basename(frm)
    dir = os.path.dirname(frm)

    if dir != to:
        if fn in os.listdir(to):
            print(fn, " -> already exists!")
        else:
            # you would copy the file instead ...
            print(fn, " -> could be copied")

# print whats in the basedir as well as the archivedir (os.walk descends subdirs)
for root,dirs,files in os.walk(basedir):
    print(root + ":", files, sep="\n")

for file in os.listdir(basedir):
    copy_from_to_if_not_exists(os.path.join(basedir,file),archdir)

如果硬盘驱动器的读取缓存优化不足以满足您的需求，则可以缓存os.listdir(to)的结果，但它可能仍然可以使用。

输出：

c:/temp/csvs:
['file_10.csv','file_11.csv','file_12.csv','file_13.csv','file_14.csv','file_15.csv',
 'file_16.csv','file_17.csv','file_18.csv','file_19.csv','file_20.csv','file_21.csv',
 'file_22.csv','file_23.csv','file_24.csv','file_25.csv','file_26.csv','file_27.csv',
 'file_28.csv','file_29.csv','file_30.csv','file_31.csv','file_32.csv','file_33.csv',
 'file_34.csv','file_35.csv','file_36.csv','file_37.csv','file_38.csv','file_39.csv', 
 'file_40.csv','file_41.csv','file_42.csv','file_43.csv','file_44.csv','file_45.csv',
 'file_46.csv','file_47.csv','file_48.csv','file_49.csv','file_50.csv','file_51.csv', 
 'file_52.csv','file_53.csv','file_54.csv','file_55.csv','file_56.csv','file_57.csv',
 'file_58.csv','file_59.csv']

c:/temp/csvs\temp:
['file_10.csv','file_13.csv','file_16.csv','file_19.csv','file_22.csv','file_25.csv', 
 'file_28.csv','file_31.csv','file_34.csv','file_37.csv','file_40.csv','file_43.csv',
 'file_46.csv','file_49.csv','file_52.csv','file_55.csv','file_58.csv']

file_10.csv  -> already exists!
file_11.csv  -> could be copied
file_12.csv  -> could be copied
file_13.csv  -> already exists!
file_14.csv  -> could be copied
file_15.csv  -> could be copied
file_16.csv  -> already exists!
file_17.csv  -> could be copied
file_18.csv  -> could be copied
[...snipp...]
file_55.csv  -> already exists!
file_56.csv  -> could be copied
file_57.csv  -> could be copied
file_58.csv  -> already exists!
file_59.csv  -> could be copied

有关lru_cache缓存功能结果的方法，请参见{{3}}-如果IO读取成为瓶颈，则考虑将os.listdir(archdir)放入缓存结果的函数中（首先测量，然后优化）

如果语句条件满足但不执行（Python）

1 个答案: