NLTK:语料级别的蓝色vs句级BLEU得分

时间:2016-11-11 06:44:48

标签: python machine-learning nlp nltk bleu

我已经在python中导入了nltk来计算Ubuntu上的BLEU分数。我理解句子级别的BLEU分数是如何运作的,但我不明白语料库级别的BLEU分数是如何运作的。

以下是我的语料库级BLEU分数代码:

DECLARE @roles int = 0

WHILE @roles < 10   --assuming that you have **10 roles** to insert
BEGIN
    INSERT INTO Roles(MenuId, RoleId)
    SELECT MenuId, @roles --it's a roleId you want to insert
    FROM Menus m

    SET @roles = @roles + 1
END

出于某种原因,上述代码的bleu得分为0。我期待BLEU语料库得分至少为0.5。

这是我的句子级BLEU分数代码

import nltk

hypothesis = ['This', 'is', 'cat'] 
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)

考虑到简短惩罚和缺失的单词&#34; a&#34;这里的句子级BLEU分数是0.71。但是,我不明白语料库级别的BLEU分数是如何运作的。

任何帮助都将不胜感激。

2 个答案:

答案 0 :(得分:17)

<强> TL; DR

>>> import nltk
>>> hypothesis = ['This', 'is', 'cat'] 
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453

(注意:您必须在develop分支上提取最新版本的NLTK才能获得稳定版本的BLEU分数实现)

长期

实际上,如果整个语料库中只有一个引用和一个假设,则corpus_bleu()sentence_bleu()都应返回与上例所示相同的值。

在代码中,我们看到sentence_bleu is actually a duck-type of corpus_bleu

def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                  smoothing_function=None):
    return corpus_bleu([references], [hypothesis], weights, smoothing_function)

如果我们查看sentence_bleu的参数:

 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                      smoothing_function=None):
    """"
    :param references: reference sentences
    :type references: list(list(str))
    :param hypothesis: a hypothesis sentence
    :type hypothesis: list(str)
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The sentence-level BLEU score.
    :rtype: float
    """

sentence_bleu引用的输入为list(list(str))

所以如果你有一个句子字符串,例如"This is a cat",您必须将其标记为获取字符串列表["This", "is", "a", "cat"],并且因为它允许多个引用,所以它必须是字符串列表的列表,例如如果你有第二个参考,“这是猫科动物”,你对sentence_bleu()的输入将是:

references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)

说到corpus_bleu() list_of_references参数,它基本上是a list of whatever the sentence_bleu() takes as references

def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
                smoothing_function=None):
    """
    :param references: a corpus of lists of reference sentences, w.r.t. hypotheses
    :type references: list(list(list(str)))
    :param hypotheses: a list of hypothesis sentences
    :type hypotheses: list(list(str))
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The corpus-level BLEU score.
    :rtype: float
    """

除了查看nltk/translate/bleu_score.py中的doctest之外,您还可以查看nltk/test/unit/translate/test_bleu_score.py处的单元测试,了解如何使用bleu_score.py中的每个组件。< / p>

顺便说一句,由于sentence_bleu导入为bleuhttps://github.com/nltk/nltk/blob/develop/nltk/translate/init.py#L21)中的nltk.translate.__init__.py,因此使用

from nltk.translate import bleu 

与:

相同
from nltk.translate.bleu_score import sentence_bleu

并在代码中:

>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False

答案 1 :(得分:5)

我们来看看:

>>> help(nltk.translate.bleu_score.corpus_bleu)
Help on function corpus_bleu in module nltk.translate.bleu_score:

corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None)
    Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all 
    the hypotheses and their respective references.  

    Instead of averaging the sentence level BLEU scores (i.e. marco-average 
    precision), the original BLEU metric (Papineni et al. 2002) accounts for 
    the micro-average precision (i.e. summing the numerators and denominators
    for each hypothesis-reference(s) pairs before the division).
    ...

你比我更了解算法的描述,所以我不会试图向你“解释”它。如果文档字符串不能清楚,请查看the source本身。或者在当地找到它:

>>> nltk.translate.bleu_score.__file__
'.../lib/python3.4/site-packages/nltk/translate/bleu_score.py'