pru_49297910.c

Question

我正在寻找将整数流转换为计算连续1和0的列表的最快方法。

例如整数 [4294967295,4194303,3758096384]

处于比特级别：

11111111111111111111111111111111
11111111111111111111110000000000
00000000000000000000000000000111

（每个位串都以小端顺序排列）

所以程序应该输出三个值：[54 39 3]有54个，其次是39个零，最后是3个。

我一直在研究这些算法： http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightLinear

我可能需要沿着这些方向写一些东西

i=(the first bit of the first integer)
repeat till the end
    find the number of consecutive i's in this integer
    if we reach the end of the integer, continue with the next
    else i = (not)i

但我想知道是否有人能想出更好的方法来做到这一点。

此时函数是在Matlab中构建的，如下所示：

%get all bits in a long vector
data = uint32([4294967295,4194303,3758096384]);
logi = false([1,length(data)*32]);
for ct = 1:length(data)
    logi(1+32*(ct-1):ct*32)=bitget(data(1+(ct-1)),1:32);
end
%count consecutive 1s and 0s
Lct=1;
L=1;i = logi(1);
for ct = 2:length(logi)
    if logi(ct)==i
        L(Lct)=L(Lct)+1;
    else
        i=logi(ct);
        Lct=Lct+1;
        L(Lct)=1;
    end
end

>> L = 54    39     3

注意：我花了一些时间来解决问题。因此，关于语言的评论和问题的确切性质。希望（在经过多次编辑之后）这个问题现在处于可以找到的形式，答案也可以对其他人有用。

Answer 1

早些时候我误解了这个问题。现在我知道你在问什么。这应该有效，我已经测试过了：

#include <iostream>
#include <deque>

using namespace std;

//old version for whole collection
void ConsecutiveOnesAndZeros(deque<uint32_t> values, deque<uint8_t> &outCount)
{
    int i;
    if (!values.empty()) {
        uint8_t count = 0, lastBit = (values[0] & 1);
        for (uint32_t &value : values)
        {
            for (i = 0; (i < 32) && (value != 0); i++)
            {
                if (lastBit != uint8_t((value >> i) & 1))
                {
                    outCount.push_back(count);
                    count = 0;
                    lastBit = !lastBit;
                }
                count++;
            }
            if (i < 32) count += (32 - i);
        }
        outCount.push_back(count);
    }
}

//stream version for receiving integer
void ConsecutiveOnesAndZeros(uint32_t value, uint8_t &count, uint8_t &lastBit, deque<uint8_t> &outCount)
{
    int i;
    for (i = 0; (i < 32) && (value != 0); i++)
    {
        if (lastBit != uint8_t((value >> i) & 1))
        {
            if(count) outCount.push_back(count);
            count = 0;
            lastBit = !lastBit;
        }
        count++;
    }
    if (i < 32) count += (32 - i);
}

int main()
{
    deque<uint8_t> outCount;
    deque<uint32_t> stream = { 4294967295u,4194303u,3758096384u };

    ConsecutiveOnesAndZeros(stream, outCount);
    for (auto res : outCount) {
        printf_s("%d,", res);
    }
    printf_s("\n");

    uint8_t count = 0, bit = 0;
    outCount.clear();
    for (auto val : stream) 
        ConsecutiveOnesAndZeros(val, count, bit, outCount);
    if (count) outCount.push_back(count);

    for (auto res : outCount) {
        printf_s("%d,", res);
    }
    printf_s("\n");

    system("pause");
}

更新 - 我对检查值进行了一些优化！= 0.我还将ConsecutiveOnesAndZeros分为两个函数，用于从接收到的流中提供下一个整数。

Answer 2

好吧，你可以尝试通过将第一部分分成线程来加快速度。

例如，如果您有自己描述的功能，则可以将其中的几个称为std::thread或std::future，具体取决于您希望如何接近它。完成后你可以比较两个边界位（一个在前一个结束，一个在下一个开始），并将第一个结果计数添加到最后一个结果计数或将结果推到结果之前，结果的所有其他部分都被推送到上一个而没有任何比较。

如果您的输入很短，那么这当然会过度。

Answer 3

首先，要说您的样本编号是错误的，因为第二个样本的位数最高，因此它应该大于2147483643，但仅4194303，并且第三位应该是7，所以我想您在将它们转换为十进制时已经反转了位的位置。在main()的开头，请参阅我的上一个完整代码以获取注释，说明如何确定数字（在您的示例中看起来是这样）与位模式相对应的数字为（hex / dec）：

[0xffffffff/4294967295][0xfffffc00/4294966272][0x00000007/7]

（如果我们将更多的体重数字放在左边，为什么我们也不要以二进制形式呢？）

要解决您的问题，您可以考虑当数字的LSB部分中有n个连续的 1个，并且将该值加1时，所有这些连续的零（通过进位传播）切换为零，直到下一个o成为最后一个零，并且如果您有n个连续的零并递减值，那么您将所有这些零都转换为一...好，再多一点，因为进位级联又再多一点。这个想法是检查我们在LSB中有多少位，并根据此值递增或递减值，然后将其与原始值进行XOR。...您将得到的结果是一个数字，该数字在LSB与LSB相等，再加上一个，例如：

 1100100011111111

由于LSB为1，我们将其递增：

 1100100100000000
        ^^^^^^^^^ changed bits.

如果我们现在将此值与上一个值进行异或：

 0000000111111111  => 9 "1" bits, that indicate that 8 "1" consecutive bits were present

如果我们准备一个switch语句，其中包含我们可以从此函数获得的所有可能值，则可以用一种非常有效的方式获得以下结果：

 int get_consecutive_bits(unsigned value)
 {
     unsigned next = value;
     switch (value) {
     case 0: case ~0: return 32; /* these are special cases, see below */
     }
     switch (value & 1) { /* get the lower bit */
     case 0: next--; break; /* decrement */
     case 1: next++; break; /* increment */
     }
     switch (value ^ next) { /* make the xor */
     case 0x00000003: return 1;
     case 0x00000007: return 2;
     case 0x0000000f: return 3;
     case 0x0000001f: return 4;
     case 0x0000003f: return 5;
     case 0x0000007f: return 6;
     /* ... */
     case 0xffffffff: return 31;
     } /* switch */
 }

现在，您必须累加该值，以防下一个数组单元以与完成前一个数组相同的位值开头。我们之所以没有case的{{1}}语句，是因为我们要强制在第二位中加进位，因此我们总是有一个0x00000001或更大的值，并且更改了两位（ 1和...0000001 => ...0000010 => ...0000011），这也意味着对于值...11111110 => ...11111101 => ...00000011和0000...0000，我们应该得到比字长大一点的值，使这些值变得特殊（因为它们使进位到msb的下一位，第33位），因此我们首先检查这些值。

这是一种非常有效的方法，可以在一个数组单元的块中完成任务。当您获得的值包括MSB时，您必须进行累加，因为下一个单词可以以您之前结束的同一位开头。

下面的代码应说明该算法：

pru_49297910.c

1111...1111

注意：

由于我们需要计算进位比特以一个增量跳跃的距离，因此我们必须从低有效位跳到最高位，使输出与您尝试的顺序相反，但是我敢肯定您会能够更改顺序以使其如您在问题中所述的那样出现。

程序输出显示：

/* pru_49297910.c -- answer to https://stackoverflow.com/questions/49297910/
 * Author: Luis Colorado <luiscoloradourcola@gmail.com>
 * Date: Wed Apr 24 11:12:21 EEST 2019
 * Copyright: (C) Luis Colorado.  All rights reserved.
 * License: BSD.  Open source.
 */

#include <cassert>
#include <iostream>

#define BITS_PER_ELEMENT    32

int get_consecutive_bits(unsigned value)
{
    switch (value) {
    case 0: case ~0: /* these are special cases, see below */
            return BITS_PER_ELEMENT;
    }
    unsigned next = value;
    switch (value & 1) { /* get the lower bit */
    case 0: next--; break; /* decrement */
    case 1: next++; break; /* increment */
    }
    switch (value ^ next) { /* make the xor */
    case 0x00000003: return 1;      case 0x00000007: return 2;
    case 0x0000000f: return 3;      case 0x0000001f: return 4;
    case 0x0000003f: return 5;      case 0x0000007f: return 6;
    case 0x000000ff: return 7;      case 0x000001ff: return 8;
    case 0x000003ff: return 9;      case 0x000007ff: return 10;
    case 0x00000fff: return 11;     case 0x00001fff: return 12;
    case 0x00003fff: return 13;     case 0x00007fff: return 14;
    case 0x0000ffff: return 15;     case 0x0001ffff: return 16;
    case 0x0003ffff: return 17;     case 0x0007ffff: return 18;
    case 0x000fffff: return 19; case 0x001fffff: return 20;
    case 0x003fffff: return 21; case 0x007fffff: return 22;
    case 0x00ffffff: return 23; case 0x01ffffff: return 24;
    case 0x03ffffff: return 25; case 0x07ffffff: return 26;
    case 0x0fffffff: return 27; case 0x1fffffff: return 28;
    case 0x3fffffff: return 29; case 0x7fffffff: return 30;
    case 0xffffffff: return 31;
    } /* switch */
    assert(!"Impossible");
    return 0;
}

#define FLUSH() do{                         \
            runlen(accum, state);   \
        state ^= 1;                         \
        accum = 0;                          \
    } while (0)

void run_runlen_encoding(unsigned array[], int n, void (*runlen)(int, unsigned))
{
    int state = 0; /* always begin in 0 */
    int accum = 0; /* accumulated bits */
    while (n--) {
        /* see if we have to change */
        if (state ^ (array[n] & 1)) /* we changed state */
                    FLUSH();
            int nb = BITS_PER_ELEMENT; /* number of bits to check */
            int w = array[n];
        while (nb > 0) {
                    int b = get_consecutive_bits(w);
                    if (b < nb) {
                            accum += b;
                            FLUSH();
                            w >>= b;
                            nb -= b;
                    } else {  /* b >= nb, we only accumulate nb */
                accum += nb;
                            nb = 0;
                    }
            }
    }
    if (accum)
            FLUSH();
} /* run_runlen_encoding */

void output_runlen(int n, unsigned kind)
{
    if (n) { /* don't print for n == 0 */
            static int i = 0;
            std::cout << "[" << n << "/" << kind << "]";
            if (!(++i % 10))
                    std::cout << std::endl;
    }
} /* output_runlen */

int main()
{
     /* 0b1111_1111_1111_1111_1111_1111_1111_1111, 0b1111_1111_1111_1111_1111_1100_0000_0000, 0b0000_0000_0000_0000_0000_0000_0000_0111 */
     /*    0xf____f____f____f____f____f____f____f,    0xf____f____f____f____f____c____0____0,    0x0____0____0____0____0____0____0____7 */
     /*                                0xffffffff,                                0xfffffc00,                                0x00000007 */
    unsigned int array[] =
#if 1
        { 0xffffffff, 0xfffffc00, 0x00000007 }; /* correct values for your example */
#else
            { 4294967295, 4194303, 3758096384 }; /* original values, only first matches. */
#endif
    size_t array_n = sizeof array / sizeof array[0];

    run_runlen_encoding(array, array_n, output_runlen);
    std::cout << std::endl;
} /* main */

找到连续的零和零

3 个答案:

pru_49297910.c

注意：