Question

I'm writing a multi-precision library in C++, using a base of 2^64, and currently I'm working on the mod operation. I'm using Algorithm D described in Donald E. Knuth's 1998 edition of "The Art Of Computer Programming" Vol. 2, Section 4.3.1, for division, which yields a quotient and a remainder. For the mod operation, I'm performing a division, throwing away the quotient in the end. Although Knuth's Algorithm D is very fast if implemented in C++ with some ASM enhancements for the partial division and the concurrent multi-precision multiplication/subtraction in each step, I'm not sure if there is a better way, since throwing away a painstakingly computed result doesn't seem efficient to me.

Unfortunately, it's not possible to get rid of the partial division in Algorithm D, because the partial quotient is required to compute the remainder, by subtracting the product of the partial quotient and the divisor from the dividend iteratively.

I've searched the Internet for alternative solutions, and found the influential papers written by Paul Barrett and Peter L. Montgomery. However, the fancy tricks they use seem to pay off only if lots of mod operations are performed in a row with the same modulus, since they involve heavy precomputations. This is the case in complex operations like modular exponentiation, where the mod of several squares and products is required for a single modulus. Barrett starts with the basic definition of the remainder, r = a - b * (a / b), and changes the division to a multiplication with the reciprocal of b. Then he presents an efficient way to compute this multiplication, which pays off if the reciprocal is computed once for several similar computations. Montgomery transforms the operands into a completely different residue system, where modular arithmetic is cheap, but for the price of transformations to and fro.

Additionally, Both algorithms introduce some restrictions, which need to be met for correct operation. Montgomery, for instance, usually requires the operands to be odd, which is the case in RSA calculations with primes, but which cannot be assumed in the general case. Outside these restrictions, even more overhead for normalizations is required.

So what I need, is an efficient one-shot mod function without overhead and special restrictions. Hence my question is: Is it possible to compute a remainder without computing the quotient in the first place, in a way that is more efficient than division?

Answer 1

一个建议是编写一个简单的函数，该函数将计算A%B=C并将A，B和C的值存储到数组中，然后存储所有结果成向量。然后将它们打印出来以查看所有输入和输出值的关系。

可以做一件事来简化某些工作，即了解mod函数的某些属性。这两个语句将帮助您使用该功能。

 0 mod N = 0
 N mod 0 = undefined

自0 mod N = 0起，我们可以为A放置一个测试用例，如果它是0，我们可以使用它来填充数组。同样，如果B = 0，我们可以用C填充数组的-1值，以表示未定义，因为您不能执行A mod 0，因为编译将由于失败除以0。

我写了这个函数来做到这一点；然后我针对A中的B和[0,15]进行循环运行。

#include <array>
#include <vector>
#include <iostream>

std::array<int, 3> calculateMod(int A, int B) {
    std::array<int, 3 > res;
    if (A == 0) {       
        res = std::array<int, 3>{ 0, B, 0 };
    }
    else if (B == 0) {
        res = std::array<int, 3>{ A, 0, -1 };
    }
    else {
        res = std::array<int, 3>{ A, B, A%B };
    }
    return res;
}

int main() {
    std::vector<std::array<int, 3>> results;

    int N = 15; 
    for (int A = 0; A <= N; A++) {
        for (int B = 0; B <= N; B++) {
            results.push_back(calculateMod(A, B));
        }
    }

    // Now print out the results in a table form:
    int i = 0; // Index for formatting output
    for (auto& res : results) {
        std::cout << res[0] << " % " << res[1] << " = " << res[2] << '\n';

        // just for formatting output data to make it easier to read.
        i++;
        if ( i > N ) {
            std::cout << '\n';
            i = 0;
        }
    }
    return 0;
}

这是输出：

0 % 0 = 0
0 % 1 = 0
0 % 2 = 0
0 % 3 = 0
0 % 4 = 0
0 % 5 = 0
0 % 6 = 0
0 % 7 = 0
0 % 8 = 0
0 % 9 = 0
0 % 10 = 0
0 % 11 = 0
0 % 12 = 0
0 % 13 = 0
0 % 14 = 0
0 % 15 = 0

1 % 0 = -1
1 % 1 = 0
1 % 2 = 1
1 % 3 = 1
1 % 4 = 1
1 % 5 = 1
1 % 6 = 1
1 % 7 = 1
1 % 8 = 1
1 % 9 = 1
1 % 10 = 1
1 % 11 = 1
1 % 12 = 1
1 % 13 = 1
1 % 14 = 1
1 % 15 = 1

2 % 0 = -1
2 % 1 = 0
2 % 2 = 0
2 % 3 = 2
2 % 4 = 2
2 % 5 = 2
2 % 6 = 2
2 % 7 = 2
2 % 8 = 2
2 % 9 = 2
2 % 10 = 2
2 % 11 = 2
2 % 12 = 2
2 % 13 = 2
2 % 14 = 2
2 % 15 = 2

3 % 0 = -1
3 % 1 = 0
3 % 2 = 1
3 % 3 = 0
3 % 4 = 3
3 % 5 = 3
3 % 6 = 3
3 % 7 = 3
3 % 8 = 3
3 % 9 = 3
3 % 10 = 3
3 % 11 = 3
3 % 12 = 3
3 % 13 = 3
3 % 14 = 3
3 % 15 = 3

4 % 0 = -1
4 % 1 = 0
4 % 2 = 0
4 % 3 = 1
4 % 4 = 0
4 % 5 = 4
4 % 6 = 4
4 % 7 = 4
4 % 8 = 4
4 % 9 = 4
4 % 10 = 4
4 % 11 = 4
4 % 12 = 4
4 % 13 = 4
4 % 14 = 4
4 % 15 = 4

5 % 0 = -1
5 % 1 = 0
5 % 2 = 1
5 % 3 = 2
5 % 4 = 1
5 % 5 = 0
5 % 6 = 5
5 % 7 = 5
5 % 8 = 5
5 % 9 = 5
5 % 10 = 5
5 % 11 = 5
5 % 12 = 5
5 % 13 = 5
5 % 14 = 5
5 % 15 = 5

6 % 0 = -1
6 % 1 = 0
6 % 2 = 0
6 % 3 = 0
6 % 4 = 2
6 % 5 = 1
6 % 6 = 0
6 % 7 = 6
6 % 8 = 6
6 % 9 = 6
6 % 10 = 6
6 % 11 = 6
6 % 12 = 6
6 % 13 = 6
6 % 14 = 6
6 % 15 = 6

7 % 0 = -1
7 % 1 = 0
7 % 2 = 1
7 % 3 = 1
7 % 4 = 3
7 % 5 = 2
7 % 6 = 1
7 % 7 = 0
7 % 8 = 7
7 % 9 = 7
7 % 10 = 7
7 % 11 = 7
7 % 12 = 7
7 % 13 = 7
7 % 14 = 7
7 % 15 = 7

8 % 0 = -1
8 % 1 = 0
8 % 2 = 0
8 % 3 = 2
8 % 4 = 0
8 % 5 = 3
8 % 6 = 2
8 % 7 = 1
8 % 8 = 0
8 % 9 = 8
8 % 10 = 8
8 % 11 = 8
8 % 12 = 8
8 % 13 = 8
8 % 14 = 8
8 % 15 = 8

9 % 0 = -1
9 % 1 = 0
9 % 2 = 1
9 % 3 = 0
9 % 4 = 1
9 % 5 = 4
9 % 6 = 3
9 % 7 = 2
9 % 8 = 1
9 % 9 = 0
9 % 10 = 9
9 % 11 = 9
9 % 12 = 9
9 % 13 = 9
9 % 14 = 9
9 % 15 = 9

10 % 0 = -1
10 % 1 = 0
10 % 2 = 0
10 % 3 = 1
10 % 4 = 2
10 % 5 = 0
10 % 6 = 4
10 % 7 = 3
10 % 8 = 2
10 % 9 = 1
10 % 10 = 0
10 % 11 = 10
10 % 12 = 10
10 % 13 = 10
10 % 14 = 10
10 % 15 = 10

11 % 0 = -1
11 % 1 = 0
11 % 2 = 1
11 % 3 = 2
11 % 4 = 3
11 % 5 = 1
11 % 6 = 5
11 % 7 = 4
11 % 8 = 3
11 % 9 = 2
11 % 10 = 1
11 % 11 = 0
11 % 12 = 11
11 % 13 = 11
11 % 14 = 11
11 % 15 = 11

12 % 0 = -1
12 % 1 = 0
12 % 2 = 0
12 % 3 = 0
12 % 4 = 0
12 % 5 = 2
12 % 6 = 0
12 % 7 = 5
12 % 8 = 4
12 % 9 = 3
12 % 10 = 2
12 % 11 = 1
12 % 12 = 0
12 % 13 = 12
12 % 14 = 12
12 % 15 = 12

13 % 0 = -1
13 % 1 = 0
13 % 2 = 1
13 % 3 = 1
13 % 4 = 1
13 % 5 = 3
13 % 6 = 1
13 % 7 = 6
13 % 8 = 5
13 % 9 = 4
13 % 10 = 3
13 % 11 = 2
13 % 12 = 1
13 % 13 = 0
13 % 14 = 13
13 % 15 = 13

14 % 0 = -1
14 % 1 = 0
14 % 2 = 0
14 % 3 = 2
14 % 4 = 2
14 % 5 = 4
14 % 6 = 2
14 % 7 = 0
14 % 8 = 6
14 % 9 = 5
14 % 10 = 4
14 % 11 = 3
14 % 12 = 2
14 % 13 = 1
14 % 14 = 0
14 % 15 = 14

15 % 0 = -1
15 % 1 = 0
15 % 2 = 1
15 % 3 = 0
15 % 4 = 3
15 % 5 = 0
15 % 6 = 3
15 % 7 = 1
15 % 8 = 7
15 % 9 = 6
15 % 10 = 5
15 % 11 = 4
15 % 12 = 3
15 % 13 = 2
15 % 14 = 1
15 % 15 = 0

从上面的数据中我们可以看到，如果A == B的结果将是0。我们还可以看到，如果B > A，则B == A。最终，我们看到odd的{{1}}和even的{{1}}和A值之间存在模式。如果您能理解这些模式，那么大多数将成为代数运算。从这里开始，下一步将是创建一个算法，该算法将获取所有这些数据并将其转换为等效的二进制数据。

出于某种原因，我选择了上面B < A的值作为N。这是由于在再次重复之前，所有可能的二进制数字组合的二进制表示。我们知道一个字节的数据是8位。我们知道[0,15]中的值将适合其中的一半；例如：

在这15个不同的0和1序列之后，这些模式将重复。因此，通过上面的表格，您可以将它们转换为二进制表示。现在，一旦您以二进制形式检查了binary byte: hex decimal 0000 0000 0x00 0 ... 0000 1111 0xFF 15输入及其A & B输出的表示形式，并了解了我上面提到的结果的3个属性；您应该能够设计一种算法，以非常容易地快速计算任何C A组合的模数。要记住的一个技巧是要考虑其他三件事。首先是用户B所说的话：

“特别是，幂为2的模可以用按位运算代替。”

接下来的是偶数或奇数值，因为当eerokia时，偶数和奇数情况的确表现出A mod B的不同模式。

我已经为您提供了一些信息工具，但是剩下的我会留给您，包括转换B < A，A和B的任务。值转换为其二进制表示形式。

一旦您了解了C和A输入根据其B输出的二进制模式，并且您了解了逻辑门的真值表-诸如{{1 }}，C，And - &，Or - |，Nand - (!&) Nor - (!|)和Xor - ^以及称赞Xnor - (!^)。您应该能够高效地设计算法。

Is it possible to efficiently compute A % B without having to compute A / B?

1 个答案: