Question

最近我尝试使用Python计算加泰罗尼亚语数字！我尝试了两种计算加泰罗尼亚数字的方法：

dp = np.zeros(160)
dp[0] = 1
for i in range(1, 100):
    for j in range(i):
        dp[i] += dp[j] * dp[i - j - 1]

和

n = int(raw_input())
catalan_n = int(math.factorial(n * 2) / math.factorial(n) / math.factorial(n + 1))

根据wiki，这些应该产生相同的答案，但在我的计算机中，当n等于大约31或更多时，它们会得到不同的结果。

例如，给定n = 31，第一个实现产量

14544636039226908

当第二个

14544636039226909

当n更大时，差异会变大。

那么，原因是什么？以及如何处理以便两个实现都给出相同（和正确）的结果？

Answer 1

避免numpy，避免浮点，只要求Python使用其原生整数：

dp = [0] * 160
dp[0] = 1
for i in range(1, 100):
    for j in range(i):
        dp[i] += dp[j] * dp[i - j - 1]

您获得了所需的结果：

>>> dp[31]
14544636039226909

Answer 2

在Python 2.x中使用整数除法（//;在Python 3.x中使用>>> n = 31 >>> math.factorial(n * 2) // math.factorial(n) // math.factorial(n + 1) 14544636039226909;请参阅PEP-0238）。

如果使用整数运算，Python可以处理任意长度的整数。

我在Python 3.4中测试了这个：

>>> n = 300
>>> math.factorial(n * 2) // math.factorial(n) // math.factorial(n + 1)
448863594671741755862042783981742625904431712455792292112842929523169934910317996551330498997589600726489482164006103817421596314821101
633539230654646302151568026806610883615856

更多：

>>> n = 3000
>>> math.factorial(n * 2) // math.factorial(n) // math.factorial(n + 1)
519462652919542881721365123011179975310102937604940266719385892606880110765316718891395071497514229126429925976055679251223128074749037
835401036449153787085998615080079472024510673995437465556202913988662201476481724554419588352460788248600870845757882846138810676725538
563107883030181266599172195406194674262178494218158106628185084640318133660880669410879631422165901582338980573378926964500556169385404
736100270128669761789892432503454091737948987203916800528049625631943853069946630768308689117691085645832918187925556506072761147675438
429882843604702193420613753732662694259398583327509305925877958076192508779774600671550059625449220766972323426048569573870742646138682
330665271970741737026351041002094725570021658043868050133870464978010862336227347394228402203592519509440711956260056901367528427111161
296369965071015622062369906953928825160542499316029260848901981705520546040735573456838161278143205046287274001985209051501791057064860
777924614712880895844889661062906810651227996795699200705689167041491295132678905362506739442596941049468768934515387686685216725429630
569388433843181310525905915079353425197760576036382793301451923253554632457764696533239230792374371551049829770586784317601794822668699
762524880276131689250405042237665587324829345738473826128110671929192283799781962486065016982222602138402014572024398921586637930463872
133232259555872008143437104541075975585105539708870387267774173630656199269799668692949281254988538412342931876350743005256155083395855
293674222742887729441736406441460871100319788599494948199980318713167545334283812660431840713561226653525108082181718879207846399491603
046897066186692086000900551598963656721594748873629207464689206076706897152859647808013130407215834207952366890322422542440601278699142
2249907274578524259056058561900439043252745600

真的达到了任何长度：

math.factorial

使用p1=''' dp = [1] + [0] * n; for i in range(1,1+n): for j in range(i): dp[i] += dp[j] * dp[i - j - 1] ''' p2=''' math.factorial(n * 2) // math.factorial(n) // math.factorial(n + 1) ''' # benchmarks: import timeit >>> timeit.timeit(p1, 'n=300', number=1000) 14.639895505999448 >>> timeit.timeit(p2, 'import math; n=300', number=1000) 0.06054379299166612 >>> timeit.timeit(p1, 'n=3000', number=10) 207.88161920005223 >>> timeit.timeit(p2, 'import math; n=3000', number=10) 0.042887639498803765的实现也比迭代版本快几个数量级：

{{1}}

Answer 3

原因是numpy.zeros使用float作为默认元素数据类型。您没有显示如何在第一个版本中检索结果数字，但我假设您将其转换为int，否则，您会看到结果为1.45446360392e+16或类似的内容：

def your_version(n):
    dp = np.zeros(n + 1)
    dp[0] = 1
    for i in range(1, n + 1):
        for j in range(i):
            dp[i] += dp[j] * dp[i - j - 1]
    return int(dp[n])

如果您指定要在Numpy中使用整数，则会删除舍入错误：使用np.zeros(n + 1, dtype=np.uint64)并且结果是正确的。

顺便说一下，@ dlask建议你避开Numpy是正确的。您可以找到相同公式on Rosetta code的备忘版本，它们都比Numpy版本更快：

# The recurrent formula, memoized
In [9]: %timeit catalan.catalan(31)
The slowest run took 7.65 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 249 ns per loop

# The factorial formula, memoized
In [10]: %timeit catalan.cat_direct(31)
The slowest run took 10.21 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.19 µs per loop

# The recurrent formula with Numpy
In [11]: %timeit catalan.your_version(31)
1000 loops, best of 3: 259 µs per loop

# The factorial version without memoization
In [12]: %timeit catalan.your_other_version(31)
100000 loops, best of 3: 6.98 µs per loop

Answer 4

我很确定这只是一个分区问题，就像5/2给你2一样。将它更改为float也不会修复它，因为浮动只能保存一些（可能是10个？我不记得了）数字。

您可以先尝试计算整个顶部，然后计算整个底部，然后再将它们分开。通过这样做你也可以检查％的问题，我不记得加泰罗尼亚数字究竟是如何工作的，但它应该为你修复它。我希望这会有所帮助。

Python计算大量错误

4 个答案: