Question

Python有string.find()和string.rfind()来获取字符串中子字符串的索引。

我想知道是否有类似string.find_all()的内容可以返回所有找到的索引（不仅是从开头的第一个索引，还是从结尾开始的第一个索引）。

例如：

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

Answer 1

没有简单的内置字符串函数能够满足您的需求，但您可以使用功能更强大的regular expressions：

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果您想找到重叠的匹配项，lookahead会这样做：

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

如果你想要一个没有重叠的反向搜索 - 你可以将正面和负面的前瞻结合到这样的表达式中：

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer返回generator，因此您可以将上面的[]更改为()以获取生成器，而不是列表，如果您更有效'只会迭代一次结果。

Answer 2

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

因此，我们可以自己构建它：

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

不需要临时字符串或正则表达式。

Answer 3

这是一种（非常低效）获得所有（即使是重叠）匹配的方式：

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

Answer 4

您可以将re.finditer()用于非重叠匹配。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

但不会为：

工作

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

Answer 5

再次，旧线程，但这是我使用生成器和普通str.find的解决方案。

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

实施例

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

返回

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Answer 6

来吧，让我们一起复说。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

不需要这种正则表达式。

Answer 7

如果您只是在寻找单个角色，这将有效：

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

此外，

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

我的预感是，这些（特别是＃2）都不是非常高效。

Answer 8

这是一个老线程，但我感兴趣并希望分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

它应该返回找到子字符串的位置列表。如果您发现错误或改进空间，请发表评论。

Answer 9

这个帖子有点旧，但这对我有用：

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

Answer 10

这对我来说可以使用re.finditer

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

Answer 11

其他人提供的解决方案完全基于可用的方法find（）或任何可用的方法。

查找a的所有出现的核心基本算法是什么字符串中的子字符串？

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

您也可以将str类继承到新类，并可以使用此函数下方。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

调用方法

newstr.find_all（＆＃39;你觉得这个答案有用吗？然后upvote ！这＆＃39;＆＃39;这＆＃39）

Answer 12

此函数不会查看字符串中的所有位置，不会浪费计算资源。我的尝试：

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

要使用它，请这样称呼：

result=findAll('this word is a big word man how many words are there?','word')

Answer 13

如果您只想使用 numpy，这里有一个解决方案

import numpy as np

S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

Answer 14

在文档中查找大量关键字时，请使用flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

在大量搜索词中，Flashtext的运行速度比正则表达式快。

Answer 15

您可以轻松使用：

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

干杯！

Answer 16

这是来自hackerrank的类似问题的解决方案。希望对您有所帮助。

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

输出：

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

Answer 17

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

Answer 18

def find_index(string, let):
    enumerated = [place  for place, letter in enumerate(string) if letter == let]
    return enumerated

例如：

find_index("hey doode find d", "d")

返回：

[4, 7, 13, 15]

Answer 19

不完全是 OP 所要求的，但您也可以使用 split function 获取所有子字符串不出现的位置的列表。 OP 没有指定代码的最终目标，但是如果您的目标是无论如何都要删除子字符串，那么这可能是一个简单的单行。对于较大的字符串，可能有更有效的方法；在这种情况下，正则表达式会更可取

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

简要浏览了其他答案，如果这已经在那里了，请道歉。

Answer 20

def count_substring(string, sub_string):
    c=0
    for i in range(0,len(string)-2):
        if string[i:i+len(sub_string)] == sub_string:
            c+=1
    return c

if __name__ == '__main__':
    string = input().strip()
    sub_string = input().strip()
    
    count = count_substring(string, sub_string)
    print(count)

Answer 21

pythonic方式是：

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>>

Answer 22

通过切片，我们找到了所有可能的组合，并将它们添加到列表中，并使用does not conform to protocol 'Decodable'函数查找出现的次数

count

Answer 23

我遇到了同样的问题并这样做了：

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []

while True:
    o = hw.find('o')
    if o != -1:
        o_in_hw.append(o)
        list_hw[o] = ' '
        hw = ''.join(list_hw)
    else:
        print(o_in_hw)
        break

我在编码方面很新，所以你可以简化它（如果计划连续使用，当然让它成为一个函数）。

所有一切都按我的预期工作。

编辑：请考虑这仅适用于单个字符，它会更改您的变量，因此您必须在新变量中创建该字符串的副本以保存它，我没有将其放入代码中，因为它很简单且它只是为了展示我是如何让它工作的。

Answer 24

请看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

如何查找所有出现的子字符串？

24 个答案:

实施例