EDITED

Question

我一直试图将所有名词，动词......等与棕色语料库分开，所以我尝试使用代码

brown.all_synsets('n')

但显然此代码仅适用于wordnet。我顺便使用python 3.4。

EDITED

@alvas回答有效。但是当我随机使用它时会出现错误。看看。

nn = {word for word, pos in brown.tagged_words() if pos.startswith('NN')}
print(nn)

输出

{'such', 'rather', 'Quite', 'Such', 'quite'}

但是当我使用

时

random.choice(nn)

我得到了

Traceback (most recent call last):
  File "/home/aziz/Desktop/2222.py", line 5, in <module>
    print(random.choice(NN))
  File "/usr/lib/python3.4/random.py", line 256, in choice
    return seq[i]
TypeError: 'set' object does not support indexing

Answer 1

<强> TL; DR

>>> from nltk.corpus import brown
>>> {word for word, pos in brown.tagged_words() if pos.startswith('NN')}

更长

遍历.tagged_words()函数，这将返回('word', 'POS')元组的列表：

>>> from nltk.corpus import brown
>>> brown.tagged_words()
[(u'The', u'AT'), (u'Fulton', u'NP-TL'), ...]

请阅读本章以了解NLTK corpora API的工作原理：http://www.nltk.org/book/ch02.html

然后，对其进行列表理解并保存用名词标签标记的单词的集合（即唯一列表），例如， NN, NNS, NNP, etc.。

>>> {word for word, pos in brown.tagged_words() if pos.startswith('NN')}

请注意，输出可能不是，因为使用句法和语法名词标记POS的单词不一定是语义参数/实体。

另外，我不认为你提取的字是正确的。仔细检查清单：

>>> nouns = {word for word, pos in brown.tagged_words() if pos.startswith('NN')} 
>>> 'rather' in nouns
False
>>> 'such' in nouns
False
>>> 'Quite' in nouns
False
>>> 'quite' in nouns
False
>>> 'Such' in nouns
False

列表理解的输出：http://pastebin.com/bJaPdpUk

当random.choice(nn)成立时，为什么nn失败？

random.choice()的输入是一个序列（请参阅https://docs.python.org/2/library/random.html#random.choice）。

<强> random.choice（SEQ）

从非空序列中返回一个随机元素   起。如果seq为空，则引发IndexError。

python中的python序列类型是

str, unicode, list, tuple, bytearray, buffer, xrange（参见https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange）。
list, tuple, range（参见https://docs.python.org/3.6/library/stdtypes.html#sequence-types-list-tuple-range）
（二进制序列类型）bytes, bytearray, memoryview
（文本字符串序列）str

由于set不是一个序列，您将获得IndexError。

我如何从棕色语料库中获得动词，名词，形容词？

EDITED

1 个答案: