按列表中匹配的任何第一项来拆分文本

时间:2018-09-15 08:48:43

标签: python list parsing

我正在寻找一种优雅的方法来从文本中的介词列表中查找第一个匹配项,以便我可以解析诸如“在窗口后面添加鞋子”之类的文本,结果应为[“ shoes”,“ behind窗口”]

只要文本中没有多个介词

  

窗口后的我的键之前:我的键之后:   窗口

     

我在厨房桌子下方的钥匙之前:我的钥匙在   桌子之后:在厨房

     

我在厨房桌子下方框中的钥匙之前:我的   之后键:在厨房桌子下方的框中

在第二个示例中,结果应为[“我的钥匙”,“厨房桌子下面”]

找到列表中所有单词的第一个匹配项的优雅方法是什么?

def get_text_after_preposition_of_place(text):
    """Returns the texts before[0] and after[1] <preposition of place>"""

prepositions_of_place = ["in front of","behind","in","on","under","near","next to","between","below","above","close to","beside"]
    textres = ["",""]

    for key in prepositions_of_place:
        if textres[0] == "":
            if key in text:
                textres[0] = text.split(key, 1)[0].strip()
                textres[1] = key + " " + text.split(key, 1)[1].strip()
    return textres

1 个答案:

答案 0 :(得分:3)

您可以使用re.split

import re

def get_text_after_preposition_of_place(text):
    """Returns the texts before[0] and after[1] <preposition of place>"""

    prepositions_of_place = ["in front of","behind","in","on","under","near","next to","between","below","above","close to","beside"]
     preps_re = re.compile(r'\b(' + '|'.join(prepositions_of_place) + r')\b')

    split = preps_re.split(text, maxsplit=1)
    return split[0], split[1]+split[2]

print(get_text_after_preposition_of_place('The cat in the box on the table'))  
# ('The cat ', 'in the box on the table')

首先,我们创建一个看起来像(in|on|under)的正则表达式。请注意括号:它们将使我们能够捕获在其上分割字符串的字符串,以便将其保留在输出中。

然后,我们拆分,最多允许1个拆分,然后连接最后两个部分:字符串的介词和其余部分。