如何在字节数组中搜索“n位”?

时间:2010-05-31 12:51:25

标签: c++ c

我有一个字节数组。现在我需要知道长度为N的位模式的出现次数。

例如,我的字节数组是“00100100 10010010”,模式是“001”。这里N = 3,计数为5。

处理比特总是我的弱点。

4 个答案:

答案 0 :(得分:7)

你总是可以对前N位进行异或,如果你得到0,你就得到一个匹配。然后将搜索到的位“流”向左移一位并重复。假设您希望在这些子模式重叠时获得匹配。否则你应该在匹配时按模式长度移动。

答案 1 :(得分:1)

如果N可能是任意大的,您可以将位模式存储在矢量

vector<unsigned char> pattern;

矢量的大小应为

(N + 7) / 8

将图案存储在右侧。通过这个,我的意思是,例如,如果N == 19,你的矢量应该是这样的:

|<-    v[0]   ->|<-    v[1]   ->|<-    v[2]   ->|
 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
|         |<-             pattern             ->|

如果您的图案最初向左移动,您可以使用我将在下面显示的功能,将位移到右侧。

定义与模式长度相同的字节向量,以存储比特流的一部分,以便将其与模式进行比较。我称之为window

vector<unsigned char> window;

如果N不是8的整数倍,则在将window与模式进行比较时,需要屏蔽unsigned char mask = (1 << (N % 8)) - 1; 中最左边的位。您可以这样定义蒙版:

window

现在,假设window包含位,它应该,理论上可以使用向量的运算符==将此模式与window[0] &= mask; bool isMatch = (window == pattern); 进行比较

vector<int> shifts;

但是有充分的理由让自己变得更加复杂。如果N很大并且你的字节数组,你在寻找模式,是非常大的,值得的,处理模式并构建一个大小为N + 1的向量:

window

此向量将存储信息,移位比特流的位数,以便进行下一次比较,基于当前0001001100不匹配的位置。

考虑模式window。您应该从右到左比较位window。如果在第一位有一个不匹配,你知道它是1并且你的模式中第一次出现1位于从右到左的位置2计数形式0。那么在这种情况下,你知道,如果从比特流转移到window的新比特数小于2,那么进行比较是没有意义的。同样,如果不匹配发生在第三个位(位置2计数形式为0),window应移动7,因为模式中的3个连续零位于末尾。如果不匹配位于第4位,您可以将sifts移动8,依此类推。索引i处的window向量将保留位数,如果位置i处发生不匹配,则会移动window。如果匹配,shifts[N]应移动window中存储的位数。在上面的示例中,匹配表示移位8。

当然,在实践中,您将模式中的整个字节与来自if(window[i] != pattern[i]) { int j = 0; unsigned char mismatches = window[i] ^ pattern[i]; while((mismatches & 1) == 0) { mismatches >>= 1; ++j; } mismatch_position = 8 * (window.size() - i - 1) + j; } 的字节(从右到左)进行比较,如果存在不匹配,则检查字节中的位以查找不匹配位置

window

当您需要将某些位从您的位流转移到unsigned char时,这个函数可能会派上用场。我用C#编写它,但转换为C ++应该是微不足道的。 C#需要一些强制转换,这在C ++中可能不是必需的。使用byte代替vector<unsigned char> &byte []代替size()Length代替public static void shiftBitsIntoWindow_MSbFirst(byte[] window, byte[] source, int startBitPosition, int numberOfBits) { int nob = numberOfBits / 8; // number of full bytes from the source int ntsh = numberOfBits % 8; // number of bits, by which to shift the left part of the window, // in the case, when numberOfBits is not an integer multiple of 8 int nfstbb = (8 - startBitPosition % 8); // number Of bits from the start to the first byte boundary // The value is from the range [1, 8], which comes handy, // when checking if the substring of ntsh first bits // crosses the byte boundary in the source, by evaluating // the expression ntsh <= nfstbb. int nfbbte = (startBitPosition + numberOfBits) % 8; // number of bits from the last byte boundary to the end int sbtci; // index of the first byte in the source, from which to start // copying nob bytes from the source // The way in which the (sbtci) index is calculated depends on, // whether nob < window.Length if(nob < window.Length)// part of the window will be replaced // with bits from the source, but some part will remain in the // window, only moved to the beginning and possibly shifted { sbtci = (startBitPosition + ntsh) / 8; //Loop below moves bits form the end of the window to the front //making room for new bits that will come form the source // In the corner case, when the number by which to shift (ntsh) // is zero the expression (window[i + nob + 1] >> (8 - ntsh)) is // zero and the loop just moves whole bytes for(int i = 0; i < window.Length - nob - 1; ++i) { window[i] = (byte)((window[i + nob] << ntsh) | (window[i + nob + 1] >> (8 - ntsh))); } // At this point, the left part of the window contains all the // bytes that could be constructed solely from the bytes // contained in the right part of the window. Next byte in the // window may contain bits from up to 3 different bytes. One byte // form the right edge of the window and one or two bytes form // the source. If the substring of ntsh first bits crosses the // byte boundary in the source it's two. int si = startBitPosition / 8; // index of the byte in the source // where the bit stream starts byte byteSecondPart; // Temporary variable to store the bits, // that come from the source, to combine them later with the bits // form the right edge of the window int mask = (1 << ntsh) - 1; // the mask of the form 0 0 1 1 1 1 1 1 // |<- ntsh ->| if(ntsh <= nfstbb)// the substring of ntsh first bits // doesn't cross the byte boundary in the source { byteSecondPart = (byte)((source[si] >> (nfstbb - ntsh)) & mask); } else// the substring of ntsh first bits crosses the byte boundary // in the source { byteSecondPart = (byte)(((source[si] << (ntsh - nfstbb)) | (source[si + 1] >> (8 - ntsh + nfstbb))) & mask); } // The bits that go into one byte, but come form two sources // -the right edge of the window and the source, are combined below window[window.Length - nob - 1] = (byte)((window[window.Length - 1] << ntsh) | byteSecondPart); // At this point nob whole bytes in the window need to be filled // with remaining bits form the source. It's done by a common loop // for both cases (nob < window.Length) and (nob >= window.Length) } else// !(nob < window.Length) - all bits of the window will be replaced // with the bits from the source. In this case, only the appropriate // variables are set and the copying is done by the loop common for both // cases { sbtci = (startBitPosition + numberOfBits) / 8 - window.Length; nob = window.Length; } if(nfbbte > 0)// The bit substring coppied into one byte in the // window crosses byte boundary in the source, so it has to be // combined form the bits, commming form two consecutive bytes // in the source { for(int i = 0; i < nob; ++i) { window[window.Length - nob + i] = (byte)((source[sbtci + i] << nfbbte) | (source[sbtci + 1 + i] >> (8 - nfbbte))); } } else// The bit substring coppied into one byte in the window // doesn't cross byte boundary in the source, so whole bytes // are simply coppied { for(int i = 0; i < nob; ++i) { window[window.Length - nob + i] = source[sbtci + i]; } } } ,可能会进行一些小调整。该函数可能比您的场景中需要的更通用,因为它不使用事实,连续调用检索您的字节数组的连续块,这可能会使它更简单,但我不认为它伤害。在当前形式中,它可以从字节数组中检索任意位子串。

{{1}}

答案 2 :(得分:0)

假设您的数组符合unsigned int:

int main () {
    unsigned int curnum;
    unsigned int num = 0x2492;
    unsigned int pattern = 0x1;
    unsigned int i;
    unsigned int mask = 0;
    unsigned int n = 3;
    unsigned int count = 0;

    for (i = 0; i < n; i++) {
        mask |= 1 << i;
    }

    for (i = 8 * sizeof(num) - n; i >= 0; i--) {
        curnum = (num >> i) & mask;
        if (! (curnum ^ pattern)) {
            count++;
        }
    }
}

答案 3 :(得分:0)

将您的字节数组转换为std::vector<bool>,然后调用std::search(source.begin(), source.end(), pattern.begin(), pattern.end());。尽管vector<bool>具有{{1}}特质,但这仍然有效。