比较列表中字符串的一部分

时间:2019-09-20 15:42:00

标签: python list

我有一个字符串列表:

fileList = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

我想确认每个日期都有Run.1-Final和Run.2-Initial。

我尝试过类似的事情:

for i in range(len(directoryList)):
    if directoryList[i][5:15] != directoryList[i + 1][5:15]:
        print(directoryList[i] + ' is missing.')
    i += 2

我希望输出为

'YMML.2019.09.14-Run.2-Initial.pdf is missing,

也许类似

dates = [directoryList[i][5:15] for i in range(len(directoryList))]
counter = collections.Counter(dates)

但是从字典中提取时遇到麻烦。

4 个答案:

答案 0 :(得分:1)

这是一个O(n)解决方案,该解决方案按日期将项目收集到defaultdict中,然后根据看到的数量进行过滤,并从剩余的值中恢复原始名称:

from collections import defaultdict

files = [
    'YMML.2019.09.10-Run.1-Final.pdf',
    'YMML.2019.09.10-Run.2-Initial.pdf',
    'YMML.2019.09.11-Run.2-Initial.pdf',
    'YMML.2019.09.11-Run.1-Final.pdf',
    'YMML.2019.09.12-Run.2-Initial.pdf',
    'YMML.2019.09.13-Run.2-Initial.pdf',
    'YMML.2019.09.12-Run.1-Final.pdf',
    'YMML.2019.09.13-Run.1-Final.pdf',
    'YMML.2019.09.14-Run.1-Final.pdf',
]

seen = defaultdict(list)

for x in files:
    seen[x[5:15]].append(x)

missing = [v[0] for k, v in seen.items() if len(v) < 2]
print(missing) # => ['YMML.2019.09.14-Run.1-Final.pdf']

获取合作伙伴的名称可以通过以下条件完成:

names = [
    x[:20] + "2-Initial.pdf" if x[20] == "1" else
    x[:20] + "1-Final.pdf" for x in missing
]
print(names) # => ['YMML.2019.09.14-Run.2-Initial.pdf']

答案 1 :(得分:1)

要使其更具可读性,您可以先创建一个日期列表,然后在这些日期上循环。

file_list = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

dates = set([item[5:15] for item in file_list])

for date in dates:
   if 'YMML.' + date + '-Run.1-Final.pdf' not in file_list:
      print('YMML.' + date + '-Run.1-Final.pdf is missing')
   if 'YMML.' + date + '-Run.2-Initial.pdf' not in file_list:
      print('YMML.' + date + '-Run.2-Initial.pdf is missing')

set()采用列表中的唯一值,以避免重复遍历它们两次。

答案 2 :(得分:1)

我来晚了,但这是我发现的最简单的方法,也许不是最有效的方法:

for file in fileList:
    if file[20:27] == "1-Final":
        if (file[0:20] + "2-Initial.pdf") not in fileList:
            print(file)
    elif file[19:29] is "2-Initial.pdf":
        if (file[0:20] + "1-Final.pdf") not in fileList:
            print(file)

答案 3 :(得分:0)

这有效:

fileList = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

initial_set = {filename[:15] for filename in fileList if 'Initial' in filename}
final_set = {filename[:15] for filename in fileList if 'Final' in filename}

for filename in final_set - initial_set:
    print(filename + '-Run.2-Initial.pdf is missing.')
for filename in initial_set - final_set:
    print(filename + '-Run.1-Final.pdf is missing.')