树中的密钥对值提取

时间:2014-11-12 06:49:41

标签: python tree nltk

我从nltk获取树结构,在访问树值时,我得到的结果如下:

(NE Stallone/NNP)
('jason', 'NN')
("'s", 'POS')
('film', 'NN')
(NE Rocky/NNP)
('was', 'VBD')
('inducted', 'VBN')
('into', 'IN')
('the', 'DT')
(NE National/NNP Film/NNP Registry/NNP)
('as', 'IN')
('well', 'RB')
('as', 'IN')
('having', 'VBG')
('its', 'PRP$')
('film', 'NN')
('props', 'NNS')
('placed', 'VBN')
('in', 'IN')
('the', 'DT')
(NE Smithsonian/NNP Museum/NNP)
('.', '.')

如何仅检索NNVBN的值?

我试过这种方式:

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"]

for x in namedEnt:
    if x[0] == 'NN':
        print x[1]

np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"]正确地给了我NE标签但是无法单独获得NN,NNP,NNS。如果有其他方法可以告诉我。

1 个答案:

答案 0 :(得分:1)

好像你必须在键/值查找中做一个小的交换。此外,您必须考虑元组具有try / except的单个值的情况。这是一个小方法,允许您从树中检索所需的值:

def values_for(tree, tag):
    ret = []
    for x in tree:
        try:
            if x[1] == tag:
                ret.append(x[0])
        except IndexError, e:
            pass
    return ret

然后你应该能够过滤你想要的节点:

>>> text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
>>> tokenized = nltk.word_tokenize(text)
>>> tagged = nltk.pos_tag(tokenized)
>>> namedEnt = nltk.ne_chunk(tagged, binary = True)
>>> values_for(namedEnt, 'NN')
['jason', 'film', 'film']
>>> values_for(namedEnt, 'VBN')
['inducted', 'placed']
>>> values_for(namedEnt, 'NNP')
[]
>>> values_for(namedEnt, 'NNS')
['props']

希望这会有所帮助。干杯!