我是 NLTK
的新手这是我用过的代码,
text="The pizza was 66 and brilliant"
pattern = r"""
P: {<NN>+<VBD>+<CD>+}
"""
for sent in sent_tokenize(text):
sentence = sent.split()
PChunker = RegexpParser(pattern)
output= PChunker.parse(pos_tag(sentence))
print(output)
我收到了输出,
(S The/DT (P pizza/NN was/VBD 66/CD) and/CC brilliant/VB)
我需要输出,
pizza was 66
我怎么能得到这个?
答案 0 :(得分:0)
RegexpParser.parse的输出是一棵树,您可以使用tree.subtrees进行循环。请尝试以下操作,立即过滤您感兴趣的非终端节点(在您的情况下为P):
from nltk import sent_tokenize
from nltk import RegexpParser
from nltk import pos_tag
text="The pizza was 66 and brilliant"
pattern = r"""
P: {<NN>+<VBD>+<CD>+}
"""
for sent in sent_tokenize(text):
sentence = sent.split()
PChunker = RegexpParser(pattern)
output= PChunker.parse(pos_tag(sentence))
print(output)
for subtree in output.subtrees(filter=lambda t: t.label() == 'P'):
print(subtree)
print(' '.join([x[0] for x in subtree]))