Question

示例：我有整数0000010010001110

的二进制表示

我如何通过110..... 0.......屏蔽这些位？我需要在掩码中保存零并保存所有有效位在以下整数110010001110

中

我是按位操作的新手所以请给我一些想法或建议，谢谢。

UPD。我需要屏蔽wchar_t并以unicode（UTF-8）输出表示

阅读UTF-8规范以获取更多详细信息，但需要高级别：

代码点0 - 007F存储为常规的单字节ASCII。码   点0080及以上转换为二进制并存储（编码）   一系列字节。第一个“count”字节表示数字   代码点的字节数，包括计数字节。这些字节开始   用11..0：

110xxxxx（前导“11”表示依次为2个字节，包括   “计数”字节）

1110xxxx（1110 - > 3个字节的顺序）

11110xxx（11110 - 依次为4个字节）

以10开头的字节数是“数据”字节并包含有关的信息   代码点。一个2字节的示例如下所示

110xxxxx 10xxxxxx

Answer 1

我需要屏蔽wchar_t并以unicode（UTF-8）表示形式输出

您是否已阅读UTF-8 in the Unicode standard（第3.9节 - Unicode编码表格）或RFC 3629，甚至UTF-8 documentation on Wikipedia的官方规范？

他们描述了将21位码点号拆分为编码字节序列所需的算法。请注意，wchar_t在Windows上为16位（UTF-16），但在大多数其他平台上为32位（UTF-32）。在UTF之间转换是相当简单的，但你必须考虑UTF实际上是什么，因为将UTF-16转换为UTF-8与将UTF-32转换为UTF-8略有不同。

简而言之，你需要这样的东西：

uint32_t codepoint = ...;
// This is the actual codepoint number, decoded from 1 or 2 wchar_t
// elements, depending on the UTF encoding of the wchar_t sequence.
// In UTF-32, the characters are the actual codepoint numbers as-is.
// In UTF-16, only the characters <= 0xFFFF are the actual codepoint
// numbers, the rest are encoded using surrogate pairs that you would
// have to decode to get the actual codepoint numbers...

uint8_t bytes[4];
int numBytes = 0;

if (codepoint <= 0x7F)
{
    bytes[0] = (uint8_t) codepoint;
    numBytes = 1;
}
else if (codepoint <= 0x7FF)
{
    bytes[0] = 0xC0 | (uint8_t) ((codepoint >> 6) & 0x1F);
    bytes[1] = 0x80 | (uint8_t) (codepoint & 0x3F);
    numBytes = 2;
}
else if (codepoint <= 0xFFFF)
{
    bytes[0] = 0xE0 | (uint8_t) ((codepoint >> 12) & 0x0F);
    bytes[1] = 0x80 | (uint8_t) ((codepoint >> 6) & 0x3F);
    bytes[2] = 0x80 | (uint8_t) (codepoint & 0x3F);
    numBytes = 3;
}
else if (codepoint <= 0x10FFFF)
{
    bytes[0] = 0xF0 | (uint8_t) ((codepoint >> 18) & 0x07);
    bytes[1] = 0x80 | (uint8_t) ((codepoint >> 12) & 0x3F);
    bytes[2] = 0x80 | (uint8_t) ((codepoint >> 6) & 0x3F);
    bytes[3] = 0x80 | (uint8_t) (codepoint & 0x3F);
    numBytes = 4;
}
else
{
    // illegal!
}

// use bytes[] up to numBytes as needed...

可以将其简化为：

uint32_t codepoint = ...; // decoded from wchar_t sequence...

uint8_t bytes[4];
int numBytes = 0;

if (codepoint <= 0x7F)
{
    bytes[0] = 0x00;
    numBytes = 1;
}
else if (codepoint <= 0x7FF)
{
    bytes[0] = 0xC0;
    numBytes = 2;
}
else if (codepoint <= 0xFFFF)
{
    bytes[0] = 0xE0;
    numBytes = 3;
}
else if (codepoint <= 0x10FFFF)
{
    bytes[0] = 0xF0;
    numBytes = 4;
}
else
{
    // illegal!
}

for(int i = 1; i < numBytes; ++i)
{
    bytes[numBytes-i] = 0x80 | (uint8_t) (codepoint & 0x3F);
    codepoint >>= 6;
}

bytes[0] |= (uint8_t) codepoint;

// use bytes[] up to numBytes as needed...

在您的示例中，0000010010001110是十进制1166，十六进制0x48E。 Codepoint U+048E以UTF-8编码为字节0xD2 0x8E，例如：

0000010010001110b -> 010010b 001110b
0xC0 or 010010b -> 0xD2
0x80 or 001110b -> 0x8E

Answer 2

目前还不清楚你需要什么，但是如果你需要“识别”“count”字节和“data”字节的类型，对于给定的例子：

1100000110100000（11000001 10100000）

“识别”您可以使用的“计数”字节：

#define BIT_MASK 0X8000 //which gives---1000 0000 0000 0000

然后使用运算符&来检查是否设置了位，counter来计算设置了多少位，以及<<运算符，向左移位（最多8次）。如果出现未设置的位，请中断。

  #include <stdio.h>
  #include <stdint.h>
  #define BIT_MASK        0x8000
  #define MAX_LEFT_SHIFT  8

int main(void)
{
    uint16_t exm_num = 49568;// for example 11000001 10100000 in binary
    int i,count=0;
    for(i=0;i<MAX_LEFT_SHIFT;++i){
    if (exm_num & BIT_MASK)
        ++count;
     else
         break;
     exm_num = exm_num<<1;
 }
     return 0;
}

然后，您可以使用count的最终值来识别类型。

该给定示例的输出为2

位掩码 - C中的按位运算

2 个答案: