我怎么能修复NLTK分块错误?

时间:2016-02-20 07:50:58

标签: python python-2.7 python-3.x nltk

我正在尝试使用教程http://streamhacker.com/2008/12/29/how-to-train-a-nltk-chunker/

训练我自己的NLTK chunker

我把代码编写为,

Eclipse-RegisterBuddy

但是在这里,我收到了错误,

>>> import nltk
>>> import nltk.chunk
>>> def conll_tag_chunks(chunk_sents):
    tag_sents = [nltk.chunk.tree2conlltags(tree) for tree in chunk_sents]
    return [[(t, c) for (w, t, c) in chunk_tags] for chunk_tags in tag_sents]

>>> import nltk.corpus, nltk.tag
>>> from nltk.metrics import accuracy
>>> def ubt_conll_chunk_accuracy(train_sents, test_sents):
    train_chunks = conll_tag_chunks(train_sents)
        test_chunks = conll_tag_chunks(test_sents)

        u_chunker = nltk.tag.UnigramTagger(train_chunks)
        print 'u:', accuracy(u_chunker, test_chunks)

        ub_chunker = nltk.tag.BigramTagger(train_chunks, backoff=u_chunker)
        print 'ub:', accuracy(ub_chunker, test_chunks)

        ubt_chunker = nltk.tag.TrigramTagger(train_chunks, backoff=ub_chunker)
        print 'ubt:', accuracy(ubt_chunker, test_chunks)

        ut_chunker = nltk.tag.TrigramTagger(train_chunks, backoff=u_chunker)
        print 'ut:', accuracy(ut_chunker, test_chunks)

        utb_chunker = nltk.tag.BigramTagger(train_chunks, backoff=ut_chunker)
        print 'utb:', accuracy(utb_chunker, test_chunks)


>>> conll_train = nltk.corpus.conll2000.chunked_sents('train.txt')
>>> conll_test = nltk.corpus.conll2000.chunked_sents('test.txt')
>>> ubt_conll_chunk_accuracy(conll_train, conll_test)

如果有人可能会建议,我该如何解决此错误?提前致谢。 我在MS-Windows 10上使用NLTK 3.1,Python2.7.11。

1 个答案:

答案 0 :(得分:0)

查看accuracy

nltk方法的文档
  

nltk.metrics.scores.accuracy(参考,测试)

     

参考值和相应的测试值列表,返回   相应值的分数相等。特别是回归   指数0的分数      

<强>参数
    - 参考(列表) - 参考值的有序列表     - test (list) - 要与相应值进行比较的值列表   参考值。