如何找到一个单词与所有其他单词的路径相似度?

时间:2018-03-26 07:21:36

标签: python nltk wordnet

synset1=wordnet.synsets(answer)
            for word in words.words():
                    synset2=wordnet.synsets(word);
                    d=synset1.path_similarity(synset2)

我是wordnet的新手。代码有什么问题?我试图用英语中的所有单词找到一个单词的路径相似性。有没有办法做到这一点?

1 个答案:

答案 0 :(得分:1)

WordNet由Synsets(含义/概念)索引而不是单词/词条。

>>> from nltk.corpus import wordnet as wn
>>> dogs = wn.synsets('dog')
>>> cats = wn.synsets('cats')

>>> for ss in cats:
...     print(ss, ss.definition())
... 
(Synset('cat.n.01'), u'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats')
(Synset('guy.n.01'), u'an informal term for a youth or man')
(Synset('cat.n.03'), u'a spiteful woman gossip')
(Synset('kat.n.01'), u'the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant')
(Synset('cat-o'-nine-tails.n.01'), u'a whip with nine knotted cords')
(Synset('caterpillar.n.02'), u'a large tracked vehicle that is propelled by two endless metal belts; frequently used for moving earth in construction and farm work')
(Synset('big_cat.n.01'), u'any of several large cats typically able to roar and living in the wild')
(Synset('computerized_tomography.n.01'), u'a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross-sectional scans along a single axis')
(Synset('cat.v.01'), u"beat with a cat-o'-nine-tails")
(Synset('vomit.v.01'), u'eject the contents of the stomach through the mouth')

>>> for ss in dogs:
...     print(ss, ss.definition())
... 
(Synset('dog.n.01'), u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds')
(Synset('frump.n.01'), u'a dull unattractive unpleasant girl or woman')
(Synset('dog.n.03'), u'informal term for a man')
(Synset('cad.n.01'), u'someone who is morally reprehensible')
(Synset('frank.n.02'), u'a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll')
(Synset('pawl.n.01'), u'a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward')
(Synset('andiron.n.01'), u'metal supports for logs in a fireplace')
(Synset('chase.v.01'), u'go after with the intent to catch')

因此,基于单词/词条的路径相似性可能 NOT

但要获得两个同义词之间的路径相似性,例如

>>> first_dog = dogs[0]
>>> first_cat = cats[0]
>>> type(first_dog)
<class 'nltk.corpus.reader.wordnet.Synset'>
>>> type(first_dog), type(first_cat)
(<class 'nltk.corpus.reader.wordnet.Synset'>, <class 'nltk.corpus.reader.wordnet.Synset'>)
>>> first_dog.path_similarity(first_cat)
0.2

看看以下内容: