在句子中使用POS标记查找名词 - 动词组合

时间:2017-10-26 06:57:36

标签: python nlp feature-extraction pos-tagger

我想从带有pos-tagging的文本中提取一些功能。我的目标是在列表中检索Noun-Verb组合。对于POS标签,我使用了Spacy 现在我的代码看起来像这样:

from spacy.de import German
nlp = German() 
Verb = ["VERB"]  
NN = ["NOUN"]

sentence = [["Du musst folgendes tun: Scheibe schließen, Tuer oeffnen, Fenster", ["Das ist deine Loesung: Sitz zurückstellen"])

texts = somePreprocessing(sentence) #Tokenization, Stopword removal

list2 = []
verb_toks = []
noun_toks = []
verblist = []
nounlist = []
pairlist = []

for text in texts:
    for s in text:
         st = nlp(unicode(s))
         list.append(st)
         for word in st:
            if word.pos_ in Verb:
                verblist.append(word)
            if word.pos_ in NN:
                nounlist.append(word)
        if len(verblist) != 0 and len(nounlist) != 0:
        pairlist.append((verblist, nounlist))
        verblist = []
        nounlist = []

    list2.append(list)
    list = []
print verblist
print nounlist
print pairlist

输出应如下所示:[[“Scheibe”,“schließen”,“Tuer”,“oeffnen”,“Fenster”,“anheben”],[“Sitz”,“zurückstellen”]

总结一下:
给出一个句子列表,如[[“这是一个例句”],[“这是另一个例句”]。 我的目的是基于POS标记检索这样的[[“名词”,“动词”,“名词”,“动词”,“名词”,“动词”,[“名词”,“动词]]等列表。

listOfSentence = [[".."],[".."]]
pos = posTagger(listOfSentences)
list = matchingNounVerb(pos)
print list
=> [["Noun", "Verb, "...", "..., "...", "...], ["Noun", "Verb]])

感谢您的帮助;)

0 个答案:

没有答案