如何将字符串拆分为重复的子字符串

时间:2016-07-27 08:39:34

标签: python

我有一些字符串,每个字符串都是一些字符串的一个或多个副本。例如:

L = "hellohellohello"
M = "good"
N = "wherewhere"
O = "antant"

我想将这些字符串拆分成一个列表,以便每个元素只包含重复的部分。例如:

splitstring(L) ---> ["hello", "hello", "hello"]
splitstring(M) ---> ["good"]
splitstring(N) ---> ["where", "where"]
splitstring(O) ---> ["ant", "ant"]

由于琴弦长度大约为1000个字符,因此如果速度相当快也会很棒。

请注意,在我的情况下,重复都是从字符串的开头开始,并且它们之间没有间隙,因此它比在字符串中找到最大重复的一般问题简单得多。

怎么能这样做?

5 个答案:

答案 0 :(得分:4)

使用正则表达式查找重复单词,然后只需创建适当长度的列表:

def splitstring(string):
    match= re.match(r'(.*?)(?:\1)*$', string)
    word= match.group(1)
    return [word] * (len(string)//len(word))

答案 1 :(得分:1)

试试这个。它不是削减你的列表,而是专注于找到最短的模式,然后通过重复这个模式适当的次数来创建一个新的列表。

def splitstring(s):
    # searching the number of characters to split on
    proposed_pattern = s[0]
    for i, c in enumerate(s[1:], 1):
        if proposed_pattern == s[i:(i+len(proposed_pattern))]:
            # found it
            break
        else:
            proposed_pattern += c
    else:
        print 'found no pattern'
        exit(1)
    # generating the list
    n = len(proposed_pattern)
    return [proposed_pattern]*(len(s)//n)


if __name__ == '__main__':
    L = 'hellohellohellohello'
    print splitstring(L)  # prints ['hello', 'hello', 'hello', 'hello']

答案 2 :(得分:0)

我将使用的方法:

Thread 0 Crashed:
0   libobjc.A.dylib                      0x0000000180eedb90 objc_msgSend + 16
1   CoreData                             0x0000000183773010 -[NSManagedObjectContext _mergeRefreshObject:mergeChanges:withPersistentSnapshot:] + 132
2   CoreData                             0x00000001837745fc -[NSManagedObjectContext _mergeChangesFromDidSaveDictionary:usingObjectIDs:] + 2276
3   CoreData                             0x000000018377cd04 __90+[NSManagedObjectContext(_NSCoreDataSPI) _mergeChangesFromRemoteContextSave:intoContexts:]_block_invoke1353 + 68
4   CoreData                             0x000000018377508c developerSubmittedBlockToNSManagedObjectContextPerform + 192
5   CoreData                             0x0000000183774f54 -[NSManagedObjectContext performBlockAndWait:] + 216
6   CoreData                             0x000000018377c698 +[NSManagedObjectContext(_NSCoreDataSPI) _mergeChangesFromRemoteContextSave:intoContexts:] + 3420
7   CoreData                             0x0000000183774bb0 -[NSManagedObjectContext mergeChangesFromContextDidSaveNotification:] + 384
8   RTCoreDataStack                      0x00000001005e8d34 __43-[RTCoreDataManager handleMOCNotification:]_block_invoke (RTCoreDataManager.m:294)
9   CoreData                             0x000000018377508c developerSubmittedBlockToNSManagedObjectContextPerform + 192
10  libdispatch.dylib                    0x00000001812c147c _dispatch_client_callout + 12
11  libdispatch.dylib                    0x00000001812c6b84 _dispatch_main_queue_callback_4CF + 1840
12  CoreFoundation                       0x000000018182cd50 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 8
13  CoreFoundation                       0x000000018182abb8 __CFRunLoopRun + 1624
14  CoreFoundation                       0x0000000181754c50 CFRunLoopRunSpecific + 380
15  GraphicsServices                     0x000000018303c088 GSEventRunModal + 176
16  UIKit                                0x0000000186a3e088 UIApplicationMain + 200
17  MyApp                         0x0000000100131910 main (main.m:14)
18  ???                                  0x00000001812f28b8 0x0 + 0

使用相应的变量提供以下输出:

import re

L = "hellohellohello"
N = "good"
N = "wherewhere"

cnt = 0
result = ''
for i in range(1,len(L)+1):
    if cnt <= len(re.findall(L[0:i],L)):
        cnt = len(re.findall(L[0:i],L))
        result = re.findall(L[0:i],L)[0]

print(result)

答案 3 :(得分:0)

假设重复单词的长度大于1,这将起作用:

a = "hellohellohello"

def splitstring(string):
    for number in range(1, len(string)):
        if string[:number] == string[number:number+number]:
            return string[:number]
    #in case there is no repetition
    return string

splitstring(a)

答案 4 :(得分:0)

str
相关问题