Question

我正在使用softfloat库（http://www.jhauser.us/arithmetic/SoftFloat.html）来实现单精度划分算法。我试图理解实现的倒数近似函数作为softfloat库的一部分。请参阅下面的代码。任何人都可以解释他们是如何想出LUT的？它看起来像是LUT和NR近似的组合，但详细的解释肯定会有所帮助。

/*
  Returns an approximation to the reciprocal of the number represented by `a',
  where `a' is interpreted as an unsigned fixed-point number with one integer
  bit and 31 fraction bits.  The `a' input must be "normalized", meaning that
  its most-significant bit (bit 31) must be 1.  Thus, if A is the value of
  the fixed-point interpretation of `a', then 1 <= A < 2.  The returned value
  is interpreted as a pure unsigned fraction, having no integer bits and 32
  fraction bits.  The approximation returned is never greater than the true
  reciprocal 1/A, and it differs from the true reciprocal by at most 2.006 ulp 
  (units in the last place).
*/

uint32_t softfloat_approxRecip32_1( uint32_t a )
{
    int index;
    uint16_t eps;
    static const uint16_t k0s[] = {
      0xFFC4, 0xF0BE, 0xE363, 0xD76F, 0xCCAD, 0xC2F0, 0xBA16, 0xB201,
      0xAA97, 0xA3C6, 0x9D7A, 0x97A6, 0x923C, 0x8D32, 0x887E, 0x8417
    };
    static const uint16_t k1s[] = {
      0xF0F1, 0xD62C, 0xBFA1, 0xAC77, 0x9C0A, 0x8DDB, 0x8185, 0x76BA,
      0x6D3B, 0x64D4, 0x5D5C, 0x56B1, 0x50B6, 0x4B55, 0x4679, 0x4211
    };

    uint16_t r0;
    uint32_t delta0;
    uint_fast32_t r;
    uint32_t sqrDelta0;

    index = a>>27 & 0xF;
    eps = (uint16_t) (a>>11);
    r0 = k0s[index] - ((k1s[index] * (uint_fast32_t) eps)>>20);
    delta0 = ~(uint_fast32_t) ((r0 * (uint_fast64_t) a)>>7);
    r = ((uint_fast32_t) r0<<16) + ((r0 * (uint_fast64_t) delta0)>>24);
    sqrDelta0 = ((uint_fast64_t) delta0 * delta0)>>32;
    r += ((uint32_t) r * (uint_fast64_t) sqrDelta0)>>48;
    return r;

}

Answer 1

初始近似r0是通过分段线性近似计算的，使用16个区间，从[1,17 / 16]到[15 / 16,2]，由四个最重要的小数位选择。 1.31定点论证。然后使用广义牛顿迭代对倒数r _new = r _old + r _old *（1 - a * r）进行细化初始估计。 _old）+ r _old *（1 - a * r _old）² + ... + r < sub> old *（1 - a * r _old）^k [见paper by Liddicoat and Flynn]。 delta0是（1 - a * r ₀）。使用扩展的前三个项：r = r ₀ + r ₀ * delta ₀ + r ₀ * delta ₀ ²。该迭代具有立方收敛，在每次迭代中将正确位的数量增加三倍。在此实现中，r0中的最坏情况相对误差约为9.44e-4，而最终结果r中的最坏情况相对误差约为9.32e-10。

选择定点计算中的比例因子以最大化中间计算的准确性（通过保留尽可能多的位），并使定点方便地落在单词边界上，如计算delta0，其中1可以省略。

代码要求delta0为正数，因此r0必须始终低估数学结果1 / a。因此，每个区间的线性近似不能是最小极大近似值。而是计算每个区间的端点的函数值1 / a之间的斜率，并且将由2 ¹⁶缩放的绝对值存储在k0s中，这意味着数组元素是0.16定点数。从每个间隔的中点的函数值开始，然后应用斜率以找到每个间隔的左端点的截距。该值同样由2 ¹⁶缩放并存储在k1s中，因此也保存0.16个定点数。

根据我的分析，似乎在k0s中的条目的浮点到定点转换中采用向0舍入，而在浮点到定点中采用向正无穷的舍入转换k1s中的条目。以下程序实现上述算法，并生成与问题代码中使用的表条目相同的表条目。

#include <stdlib.h>
#include <stdio.h>

int main (void)
{
    printf ("interval  k0    k1\n");
    for (int i = 0; i < 16; i++) {
        double x0 = 1.0+i/16.0;       // left endpoint of interval
        double x1 = 1.0+(i+1)/16.0;   // right endpoint of interval
        double f0 = 1.0 / x0;
        double f1 = 1.0 / x1;
        double df = f0 - f1;
        double sl = df * 16.0;        // slope across interval
        double mp = (x0 + x1) / 2.0;  // midpoint of interval
        double fm = 1.0 / mp;
        double ic = fm + df / 2.0;    // intercept at start of interval

        printf ("%5d     %04x  %04x\n",
                i, (int)(ic * 65536.0 - 0.9999), (int)(sl * 65536.0 + 0.9999));
    }
    return EXIT_SUCCESS;
}

上述程序的输出应如下：

interval  k0    k1
    0     ffc4  f0f1
    1     f0be  d62c
    2     e363  bfa1
    3     d76f  ac77
    4     ccad  9c0a
    5     c2f0  8ddb
    6     ba16  8185
    7     b201  76ba
    8     aa97  6d3b
    9     a3c6  64d4
   10     9d7a  5d5c
   11     97a6  56b1
   12     923c  50b6
   13     8d32  4b55
   14     887e  4679
   15     8417  4211

使用softfloat库的浮点倒数近似

1 个答案: