如何在python中的特定单词周围拉出一些单词?

时间:2017-08-10 13:57:34

标签: python

因此,我需要一种简单的方法从段落中的搜索词之前和之后拉出10个单词,并将其全部提取到句子中。

示例:

  

段='家犬(Canis lupus familiaris或Canis familiaris)是Canis(犬科动物)属的成员,是狼类犬科动物的一部分,是最广泛的食肉动物。狗和现存的灰狼是姐妹分类群,现代狼与最初被驯化的狼没有密切关系,这意味着狗的直接祖先已经灭绝。这只狗是第一个驯化的物种,并且因为各种行为,感官能力和身体特征而被选择性繁殖了几千年。'

输入

  
    

  

输出

  
    

最广泛的食肉动物。狗和现存的灰狼是姐妹分类群,与现代狼没有密切关系

  

3 个答案:

答案 0 :(得分:4)

paragraph = 'The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes.'
word = "wolf"
wordlist = paragraph.split(" ")

index = wordlist.index(word)
first_part = wordlist[index-10:index]
second_part = wordlist[index:index+11]
print("%s %s" % (" ".join(first_part), " ".join(second_part)))

输出:

most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to

答案 1 :(得分:2)

这是可以帮助您提取所需文本的正则表达式:

(?:[^ ]+ ){0,10}wolf(?: [^ ]+){0,10}

也应该是一个python示例,虽然我现在无法测试它:

import re

t = "The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes"

m = re.search("(?:[^ ]+ ){0,10}wolf\s(?:[^ ]+ ){0,10}", t)

if m:
    print (m.group(0))

答案 2 :(得分:0)

您可以在找到目标词的位置后尝试使用子串。你到目前为止试过编码吗?