使用NLTK WordNet寻找合适的名词

时间:2013-07-16 06:57:32

标签: python nltk wordnet

有没有办法找到使用NLTK WordNet的专有名词?也就是说,我可以使用nltk Wordnet标记占有名词吗?

2 个答案:

答案 0 :(得分:47)

我认为你不需要WordNet来找到合适的名词,我建议使用词性标注器pos_tag

要查找专有名词,请查找NNP代码:

from nltk.tag import pos_tag

sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]

propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Jackson', 'McDonalds']

您可能不会非常满意,因为MichaelJackson被分成2个令牌,那么您可能需要更复杂的内容,例如Name Entity tagger。

正如penntreebank标记集所记录的那样,对于所有格名词,您只需查找POS标记http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html即可。但是,当标记符为POS时,标记符通常不标记NNP

要查找占有名词,请查找str.endswith(“'s”)或str.endswith(“s”“):

from nltk.tag import pos_tag

sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]

possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Jackson's", "Agnes'"]

或者,您可以使用NLTK ne_chunk,但除非您担心从句子中获得什么样的专有名词,否则它似乎没有做太多其他事情:

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Jackson', 'Daniel']

使用ne_chunk有点冗长,它不会让你拥有所有权。

答案 1 :(得分:2)

我认为您需要的是一个标记器,一个词性标记器。此工具为句子中的每个单词指定词性标记(例如专有名词,后代词等)。

NLTK 包含一些标记: http://nltk.org/book/ch05.html

还有Stanford Part-Of-Speech Tagger(也是开源,性能更好)。

相关问题