Question

假设您有一个玩具语法，例如：（更新后输出看起来更自然）

S -> ${NP} ${VP} | ${S} and ${S} | ${S}, after which ${S}

NP -> the ${N} | the ${A} ${N} | the ${A} ${A} ${N}

VP -> ${V} ${NP}

N -> dog | fish | bird | wizard

V -> kicks | meets | marries

A -> red | striped | spotted

例如，“狗踢红色巫师”，“鸟遇到斑点鱼或巫师嫁给条纹狗”

如何根据必须包含总共 n Vs + As + Ns的约束条件从该语法中生成一个句子。给定一个整数，句子必须包含许多终端。（当然，在这个语法中，最小可能的 n 是3）。

Answer 1

以下Python代码将生成具有给定端数的随机句子。它通过计算产生给定长度的句子的方式的数量，产生大的随机数，以及计算指示的句子来工作。计数以递归方式完成，并带有记忆。如果n为0则空的右侧产生1个句子，否则产生0个句子。为了计算非空右侧产生的句子数，总和i，右侧第一个符号使用的终端数。对于每个i，将右侧其余部分的可能性数乘以第一个符号的可能性数。如果第一个符号是终端，则i为1时有1种可能，否则为0。如果第一个符号是非终结符号，请将每个替代符号的可能性相加。为了避免无限循环，我们必须小心在数量为0时修剪递归调用。如果语法具有无限多个一个句子的推导，这仍然可以无限循环。例如，在语法

中

S -> S S
S ->

空句子的推导无限多：S =＆gt; ，S =＆gt; S S =＆gt; ，S =＆gt; S S =＆gt; S S S =＆gt;等查找特定句子的代码是对代码进行直接修改以对其进行计数。这段代码相当有效，在不到一秒的时间内生成100个句子，每个句子有100个终端。

import collections
import random

class Grammar:
    def __init__(self):
        self.prods = collections.defaultdict(list)
        self.numsent = {}
        self.weight = {}

    def prod(self, lhs, *rhs):
        self.prods[lhs].append(rhs)
        self.numsent.clear()

    def countsent(self, rhs, n):
        if n < 0:
            return 0
        elif not rhs:
            return 1 if n == 0 else 0
        args = (rhs, n)
        if args not in self.numsent:
            sym = rhs[0]
            rest = rhs[1:]
            total = 0
            if sym in self.prods:
                for i in xrange(1, n + 1):
                    numrest = self.countsent(rest, n - i)
                    if numrest > 0:
                        for rhs1 in self.prods[sym]:
                            total += self.countsent(rhs1, i) * numrest
            else:
                total += self.countsent(rest, n - self.weight.get(sym, 1))
            self.numsent[args] = total
        return self.numsent[args]

    def getsent(self, rhs, n, j):
        assert 0 <= j < self.countsent(rhs, n)
        if not rhs:
            return ()
        sym = rhs[0]
        rest = rhs[1:]
        if sym in self.prods:
            for i in xrange(1, n + 1):
                numrest = self.countsent(rest, n - i)
                if numrest > 0:
                    for rhs1 in self.prods[sym]:
                        dj = self.countsent(rhs1, i) * numrest
                        if dj > j:
                            j1, j2 = divmod(j, numrest)
                            return self.getsent(rhs1, i, j1) + self.getsent(rest, n - i, j2)
                        j -= dj
            assert False
        else:
            return (sym,) + self.getsent(rest, n - self.weight.get(sym, 1), j)

    def randsent(self, sym, n):
        return self.getsent((sym,), n, random.randrange(self.countsent((sym,), n)))

if __name__ == '__main__':
    g = Grammar()
    g.prod('S', 'NP', 'VP')
    g.prod('S', 'S', 'and', 'S')
    g.prod('S', 'S', 'after', 'which', 'S')
    g.prod('NP', 'the', 'N')
    g.prod('NP', 'the', 'A', 'N')
    g.prod('NP', 'the', 'A', 'A', 'N')
    g.prod('VP', 'V', 'NP')
    g.prod('N', 'dog')
    g.prod('N', 'fish')
    g.prod('N', 'bird')
    g.prod('N', 'wizard')
    g.prod('V', 'kicks')
    g.prod('V', 'meets')
    g.prod('V', 'marries')
    g.prod('A', 'red')
    g.prod('A', 'striped')
    g.prod('A', 'spotted')
    g.weight.update({'and': 0, 'after': 0, 'which': 0, 'the': 0})
    for i in xrange(100):
        print ' '.join(g.randsent('S', 3))

Answer 2

也许不是最好的解决方案，但我会逐步完成每个语法规则，直到我超出约束条件，然后回弹并探索语法中的另一条路径。保留符合约束条件的所有句子，并抛弃所有不符合约束条件的句子。

例如，n = 3：

S - ＆gt; （$ {NP} $ {VP}） - ＆gt; （（$ {N}）$ {VP}） - ＆gt; （（（狗）$ {VP}） - ＆gt; ... - ＆gt;（（（狗）（（踢）（$ {NP}）））） - ＆gt;（（狗（）（（踢）（（狗）））））

然后回弹

（（（狗）（（踢）（$ {N}）））） - ＆gt; （（（狗）（（踢）（（鱼）））））

不久之后......

（（（狗）（$ {V} $ {N}））） - ＆gt; （（（狗）（（符合）$ {N}））） - ＆gt; （（（狗）（（遇见）（狗））））

等

本质上是深度优先图搜索，只有在搜索时才构建图形（并且您停止构建超出约束的部分）。

Answer 3

此问题包含类别错误。您指定的语法具有无上下文语法的外观，但是要求有特定数量的终端节点需要递归可枚举的语法。

从具有给定数量的终端的语法产生句子

3 个答案: