查找python中的文件中是否存在子字符串

时间:2017-11-16 06:55:38

标签: python search substring

我有2个文件。一个是我的"密钥文件"和其他是" lookupfile"。我正在尝试检查查找中是否存在密钥文件中的行。这是我的代码段

lookupfile = open("riskeng_recon_e_mso_transact_db_msoinputapplication_t2.txt","r")

with open("1.txt","r") as my_file:
    for line in my_file:
        print "-------------checking for "+line+"-----------"
        for x in lookupfile:
            #print x
            if str(line) in str(x):
                print "Line present"+line           

我的2个文件有这种格式的记录。

Lookupfile:

1234asfd
32453sdfvs
sfgagss234

keyfile:

123
3245
124

我的问题是,在从密钥文件中获取第一条记录并将其与lookupfile进行比较后,它不会继续使用lookupfile中的下一条记录。

3 个答案:

答案 0 :(得分:1)

现在这样做,你在第一个外循环迭代中耗尽了查找迭代器。嵌套循环的时间复杂度为O(M*N*L),其中L是查找行的平均长度,对于两个长文件可能过多。您可以创建查找字符串的排序后缀数组,并对每个键使用二进制搜索:

from bisect import bisect_left

with open("1.txt") as myfile, open('...') as lookup:
    # sorted lookup suffix array
    l_u = sorted(l[i:] for l in lookup for i in range(1, len(l)))
    for line in myfile:
        if l_u[bisect_left(l_u, line)].startswith(line):
            print('Line "{}" exists'.format(line))

时间复杂度现在为O(N*L*log(N*L) + M*log(N*L))。对于行相对较短的大型文件(L*log(N*L)log(N*L)远小于M,N),这应该明显优于O(M*N)

答案 1 :(得分:0)

也可以在阅读时进行操作: 由于成对不是你想要的,我们需要创建一个列表,其中包含来自lookupfile的行以重用它。

with open("file1.txt", "w") as f:
    f.write("""\
1234asfd
32453sdfvs
sfgagss234""")

with open("file2.txt", "w") as f:
    f.write("""\
123
3245
124
324""")


with open("file1.txt") as f1, open("file2.txt") as f2:
    # Store lookupfile in list  
    lookup = f1.read().split("\n")

    # Loop lookupfile for every line in keyfile
    for idx, line in enumerate(f2,1):
        for idy, row in enumerate(lookup,1):
            # Look for match
            if line.strip() in row:
                print("line {} present on line {}".format(idx,idy))

打印

line 1 present on line 1
line 2 present on line 2
line 4 present on line 2

答案 2 :(得分:-1)

通过阅读文件

创建两个列表listkflistlookup
listkf=[]
with open("keyfile", 'r') as kf:
    for line in kf:
        listkf.append(line.strip())    # adding key to list after stripping 

listlookup=[]
with open("Lookupfile", 'r') as lf:
    for line in lf:
        listlookup.append(line)

假设您需要逐行匹配

for i in range(len(listlookup)):
    if listkf[i] in listlookup[i]:
         print("key exists")
    else:
         print("key does not exist")

如果要在整个key中查找keyfile中的Lookupfile

for x in listkf:
    for y in listlookup:
        if x in str(y):
            print("key ", x, " exists")
            break