c ++优化整数数组

时间:2012-03-12 20:50:54

标签: c++ arrays optimization multidimensional-array

我有一个int16_t的二维查找表。

int16_t my_array[37][73] = {{**DATA HERE**}}

我有一个值的混合,范围从int8_t的范围之上到刚好低于int8_t的范围,并且一些值重复自己。我试图减少这个查找表的大小。

到目前为止我所做的是将每个int16_t值拆分为两个int8_t值,以显示浪费的字节。

int8_t part_1 = original_value >> 4;
int8_t part_2 = original_value & 0x0000FFFF;

// If the upper 4 bits of the original_value were empty         
if(part_1 == 0) wasted_bytes_count++;

我可以轻松删除浪费一个字节空间的零值int8_t,我也可以删除重复值,但我的问题是如何在保留基于这两个索引查找的能力的同时删除这些值?

我打算将其转换为一维数组并在每个重复值后面添加一个数字,表示已删除的重复数量,但我正在努力解决我将如何识别什么是查找值以及什么是重复计数。此外,通过剥离浪费的字节的零int8_t值,它变得更加复杂。

编辑:此数组已存储在ROM中。 RAM比ROM更受限制,因此它已存储在ROM中。

编辑:我会尽快发布这个问题的赏金。我需要一个完整的答案,如何存储信息并检索它。只要我能得到相同的值,它就不需要是2D数组。

编辑:添加以下实际数组:

{150,145,140,135,130,125,120,115,110,105,100,95,90,85,80,75,70,65,60,55,50,45,40,35,30,25,20,15,10,5,0,-4,-9,-14,-19,-24,-29,-34,-39,-44,-49,-54,-59,-64,-69,-74,-79,-84,-89,-94,-99,104,109,114,119,124,129,134,139,144,149,154,159,164,169,174,179,175,170,165,160,155,150}, \
{143,137,131,126,120,115,110,105,100,95,90,85,80,75,71,66,62,57,53,48,44,39,35,31,27,22,18,14,9,5,1,-3,-7,-11,-16,-20,-25,-29,-34,-38,-43,-47,-52,-57,-61,-66,-71,-76,-81,-86,-91,-96,101,107,112,117,123,128,134,140,146,151,157,163,169,175,178,172,166,160,154,148,143}, \
{130,124,118,112,107,101,96,92,87,82,78,74,70,65,61,57,54,50,46,42,38,34,31,27,23,19,16,12,8,4,1,-2,-6,-10,-14,-18,-22,-26,-30,-34,-38,-43,-47,-51,-56,-61,-65,-70,-75,-79,-84,-89,-94,100,105,111,116,122,128,135,141,148,155,162,170,177,174,166,159,151,144,137,130}, \
{111,104,99,94,89,85,81,77,73,70,66,63,60,56,53,50,46,43,40,36,33,30,26,23,20,16,13,10,6,3,0,-3,-6,-9,-13,-16,-20,-24,-28,-32,-36,-40,-44,-48,-52,-57,-61,-65,-70,-74,-79,-84,-88,-93,-98,103,109,115,121,128,135,143,152,162,172,176,165,154,144,134,125,118,111}, \
{85,81,77,74,71,68,65,63,60,58,56,53,51,49,46,43,41,38,35,32,29,26,23,19,16,13,10,7,4,1,-1,-3,-6,-9,-13,-16,-19,-23,-26,-30,-34,-38,-42,-46,-50,-54,-58,-62,-66,-70,-74,-78,-83,-87,-91,-95,100,105,110,117,124,133,144,159,178,160,141,125,112,103,96,90,85}, \
{62,60,58,57,55,54,52,51,50,48,47,46,44,42,41,39,36,34,31,28,25,22,19,16,13,10,7,4,2,0,-3,-5,-8,-10,-13,-16,-19,-22,-26,-29,-33,-37,-41,-45,-49,-53,-56,-60,-64,-67,-70,-74,-77,-80,-83,-86,-89,-91,-94,-97,101,105,111,130,109,84,77,74,71,68,66,64,62}, \
{46,46,45,44,44,43,42,42,41,41,40,39,38,37,36,35,33,31,28,26,23,20,16,13,10,7,4,1,-1,-3,-5,-7,-9,-12,-14,-16,-19,-22,-26,-29,-33,-36,-40,-44,-48,-51,-55,-58,-61,-64,-66,-68,-71,-72,-74,-74,-75,-74,-72,-68,-61,-48,-25,2,22,33,40,43,45,46,47,46,46}, \
{36,36,36,36,36,35,35,35,35,34,34,34,34,33,32,31,30,28,26,23,20,17,14,10,6,3,0,-2,-4,-7,-9,-10,-12,-14,-15,-17,-20,-23,-26,-29,-32,-36,-40,-43,-47,-50,-53,-56,-58,-60,-62,-63,-64,-64,-63,-62,-59,-55,-49,-41,-30,-17,-4,6,15,22,27,31,33,34,35,36,36}, \
{30,30,30,30,30,30,30,29,29,29,29,29,29,29,29,28,27,26,24,21,18,15,11,7,3,0,-3,-6,-9,-11,-12,-14,-15,-16,-17,-19,-21,-23,-26,-29,-32,-35,-39,-42,-45,-48,-51,-53,-55,-56,-57,-57,-56,-55,-53,-49,-44,-38,-31,-23,-14,-6,0,7,13,17,21,24,26,27,29,29,30}, \
{25,25,26,26,26,25,25,25,25,25,25,25,25,26,25,25,24,23,21,19,16,12,8,4,0,-3,-7,-10,-13,-15,-16,-17,-18,-19,-20,-21,-22,-23,-25,-28,-31,-34,-37,-40,-43,-46,-48,-49,-50,-51,-51,-50,-48,-45,-42,-37,-32,-26,-19,-13,-7,-1,3,7,11,14,17,19,21,23,24,25,25}, \
{21,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,21,20,18,16,13,9,5,1,-3,-7,-11,-14,-17,-18,-20,-21,-21,-22,-22,-22,-23,-23,-25,-27,-29,-32,-35,-37,-40,-42,-44,-45,-45,-45,-44,-42,-40,-36,-32,-27,-22,-17,-12,-7,-3,0,3,7,9,12,14,16,18,19,20,21,21}, \
{18,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,18,17,16,14,10,7,2,-1,-6,-10,-14,-17,-19,-21,-22,-23,-24,-24,-24,-24,-23,-23,-23,-24,-26,-28,-30,-33,-35,-37,-38,-39,-39,-38,-36,-34,-31,-28,-24,-19,-15,-10,-6,-3,0,1,4,6,8,10,12,14,15,16,17,18,18}, \
{16,16,17,17,17,17,17,17,17,17,17,16,16,16,16,16,16,15,13,11,8,4,0,-4,-9,-13,-16,-19,-21,-23,-24,-25,-25,-25,-25,-24,-23,-21,-20,-20,-21,-22,-24,-26,-28,-30,-31,-32,-31,-30,-29,-27,-24,-21,-17,-13,-9,-6,-3,-1,0,2,4,5,7,9,10,12,13,14,15,16,16}, \
{14,14,14,15,15,15,15,15,15,15,14,14,14,14,14,14,13,12,11,9,5,2,-2,-6,-11,-15,-18,-21,-23,-24,-25,-25,-25,-25,-24,-22,-21,-18,-16,-15,-15,-15,-17,-19,-21,-22,-24,-24,-24,-23,-22,-20,-18,-15,-12,-9,-5,-3,-1,0,1,2,4,5,6,8,9,10,11,12,13,14,14}, \
{12,13,13,13,13,13,13,13,13,13,13,13,12,12,12,12,11,10,9,6,3,0,-4,-8,-12,-16,-19,-21,-23,-24,-24,-24,-24,-23,-22,-20,-17,-15,-12,-10,-9,-9,-10,-12,-13,-15,-17,-17,-18,-17,-16,-15,-13,-11,-8,-5,-3,-1,0,1,1,2,3,4,6,7,8,9,10,11,12,12,12}, \
{11,11,11,11,11,12,12,12,12,12,11,11,11,11,11,10,10,9,7,5,2,-1,-5,-9,-13,-17,-20,-22,-23,-23,-23,-23,-22,-20,-18,-16,-14,-11,-9,-6,-5,-4,-5,-6,-8,-9,-11,-12,-12,-12,-12,-11,-9,-8,-6,-3,-1,0,0,1,1,2,3,4,5,6,7,8,9,10,11,11,11}, \
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,9,9,7,6,3,0,-3,-6,-10,-14,-17,-20,-21,-22,-22,-22,-21,-19,-17,-15,-13,-10,-8,-6,-4,-2,-2,-2,-2,-4,-5,-7,-8,-8,-9,-8,-8,-7,-5,-4,-2,0,0,1,1,1,2,2,3,4,5,6,7,8,9,10,10,10}, \
{9,9,9,9,9,9,9,10,10,9,9,9,9,9,9,8,8,6,5,2,0,-4,-7,-11,-15,-17,-19,-21,-21,-21,-20,-18,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,0,-1,-2,-4,-5,-5,-6,-6,-5,-5,-4,-3,-1,0,0,1,1,1,1,2,3,3,5,6,7,8,8,9,9,9}, \
{9,9,9,9,9,9,9,9,9,9,9,9,8,8,8,8,7,5,4,1,-1,-5,-8,-12,-15,-17,-19,-20,-20,-19,-18,-16,-14,-11,-9,-7,-5,-4,-2,-1,0,0,1,1,0,0,-2,-3,-3,-4,-4,-4,-3,-3,-2,-1,0,0,0,0,0,1,1,2,3,4,5,6,7,8,8,9,9}, \
{9,9,9,8,8,8,9,9,9,9,9,8,8,8,8,7,6,5,3,0,-2,-5,-9,-12,-15,-17,-18,-19,-19,-18,-16,-14,-12,-9,-7,-5,-4,-2,-1,0,0,1,1,1,1,0,0,-1,-2,-2,-3,-3,-2,-2,-1,-1,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8,9}, \
{8,8,8,8,8,8,9,9,9,9,9,9,8,8,8,7,6,4,2,0,-3,-6,-9,-12,-15,-17,-18,-18,-17,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,2,2,1,0,0,-1,-1,-1,-2,-2,-1,-1,0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8}, \
{8,8,8,8,9,9,9,9,9,9,9,9,9,8,8,7,5,3,1,-1,-4,-7,-10,-13,-15,-16,-17,-17,-16,-15,-13,-11,-9,-6,-5,-3,-2,0,0,0,1,2,2,2,2,1,1,0,0,0,-1,-1,-1,-1,-1,0,0,0,0,-1,-1,-1,-1,-1,0,0,1,3,4,5,7,7,8}, \
{8,8,9,9,9,9,10,10,10,10,10,10,10,9,8,7,5,3,0,-2,-5,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,3,3,2,2,1,0,0,0,0,0,0,0,0,0,0,-1,-1,-2,-2,-2,-2,-2,-1,0,0,1,3,4,6,7,8}, \
{7,8,9,9,9,10,10,11,11,11,11,11,10,10,9,7,5,3,0,-2,-6,-9,-11,-13,-15,-16,-16,-15,-14,-13,-11,-9,-7,-5,-3,-2,0,0,1,1,2,3,3,3,3,2,2,1,1,0,0,0,0,0,0,0,-1,-1,-2,-3,-3,-4,-4,-4,-3,-2,-1,0,1,3,5,6,7}, \
{6,8,9,9,10,11,11,12,12,12,12,12,11,11,9,7,5,2,0,-3,-7,-10,-12,-14,-15,-16,-15,-15,-13,-12,-10,-8,-7,-5,-3,-1,0,0,1,2,2,3,3,4,3,3,3,2,2,1,1,1,0,0,0,0,-1,-2,-3,-4,-4,-5,-5,-5,-5,-4,-2,-1,0,2,3,5,6}, \
{6,7,8,10,11,12,12,13,13,14,14,13,13,11,10,8,5,2,0,-4,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-5,-3,-1,0,0,1,2,3,3,4,4,4,4,4,3,3,3,2,2,1,1,0,0,-1,-2,-3,-5,-6,-7,-7,-7,-6,-5,-4,-3,-1,0,2,4,6}, \
{5,7,8,10,11,12,13,14,15,15,15,14,14,12,11,8,5,2,-1,-5,-9,-12,-14,-16,-17,-17,-16,-15,-14,-12,-11,-9,-7,-5,-3,-1,0,0,1,2,3,4,4,5,5,5,5,5,5,4,4,3,3,2,1,0,-1,-2,-4,-6,-7,-8,-8,-8,-8,-7,-6,-4,-2,0,1,3,5}, \
{4,6,8,10,12,13,14,15,16,16,16,16,15,13,11,9,5,2,-2,-6,-10,-13,-16,-17,-18,-18,-17,-16,-15,-13,-11,-9,-7,-5,-4,-2,0,0,1,3,3,4,5,6,6,7,7,7,7,7,6,5,4,3,2,0,-1,-3,-5,-7,-8,-9,-10,-10,-10,-9,-7,-5,-4,-1,0,2,4}, \
{4,6,8,10,12,14,15,16,17,18,18,17,16,15,12,9,5,1,-3,-8,-12,-15,-18,-19,-20,-20,-19,-18,-16,-15,-13,-11,-8,-6,-4,-2,-1,0,1,3,4,5,6,7,8,9,9,9,9,9,9,8,7,5,3,1,-1,-3,-6,-8,-10,-11,-12,-12,-11,-10,-9,-7,-5,-2,0,1,4}, \
{4,6,8,11,13,15,16,18,19,19,19,19,18,16,13,10,5,0,-5,-10,-15,-18,-21,-22,-23,-22,-22,-20,-18,-17,-14,-12,-10,-8,-5,-3,-1,0,1,3,5,6,8,9,10,11,12,12,13,12,12,11,9,7,5,2,0,-3,-6,-9,-11,-12,-13,-13,-12,-11,-10,-8,-6,-3,-1,1,4}, \
{3,6,9,11,14,16,17,19,20,21,21,21,19,17,14,10,4,-1,-8,-14,-19,-22,-25,-26,-26,-26,-25,-23,-21,-19,-17,-14,-12,-9,-7,-4,-2,0,1,3,5,7,9,11,13,14,15,16,16,16,16,15,13,10,7,4,0,-3,-7,-10,-12,-14,-15,-14,-14,-12,-11,-9,-6,-4,-1,1,3}, \
{4,6,9,12,14,17,19,21,22,23,23,23,21,19,15,9,2,-5,-13,-20,-25,-28,-30,-31,-31,-30,-29,-27,-25,-22,-20,-17,-14,-11,-9,-6,-3,0,1,4,6,9,11,13,15,17,19,20,21,21,21,20,18,15,11,6,2,-2,-7,-11,-13,-15,-16,-16,-15,-13,-11,-9,-7,-4,-1,1,4}, \
{4,7,10,13,15,18,20,22,24,25,25,25,23,20,15,7,-2,-12,-22,-29,-34,-37,-38,-38,-37,-36,-34,-31,-29,-26,-23,-20,-17,-13,-10,-7,-4,-1,2,5,8,11,13,16,18,21,23,24,26,26,26,26,24,21,17,12,5,0,-6,-10,-14,-16,-16,-16,-15,-14,-12,-10,-7,-4,-1,1,4}, \
{4,7,10,13,16,19,22,24,26,27,27,26,24,19,11,-1,-15,-28,-37,-43,-46,-47,-47,-45,-44,-41,-39,-36,-32,-29,-26,-22,-19,-15,-11,-8,-4,-1,2,5,9,12,15,19,22,24,27,29,31,33,33,33,32,30,26,21,14,6,0,-6,-11,-14,-15,-16,-15,-14,-12,-9,-7,-4,-1,1,4}, \
{6,9,12,15,18,21,23,25,27,28,27,24,17,4,-14,-34,-49,-56,-60,-60,-60,-58,-56,-53,-50,-47,-43,-40,-36,-32,-28,-25,-21,-17,-13,-9,-5,-1,2,6,10,14,17,21,24,28,31,34,37,39,41,42,43,43,41,38,33,25,17,8,0,-4,-8,-10,-10,-10,-8,-7,-4,-2,0,3,6}, \
{22,24,26,28,30,32,33,31,23,-18,-81,-96,-99,-98,-95,-93,-89,-86,-82,-78,-74,-70,-66,-62,-57,-53,-49,-44,-40,-36,-32,-27,-23,-19,-14,-10,-6,-1,2,6,10,15,19,23,27,31,35,38,42,45,49,52,55,57,60,61,63,63,62,61,57,53,47,40,33,28,23,21,19,19,19,20,22}, \
{168,173,178,176,171,166,161,156,151,146,141,136,131,126,121,116,111,106,101,-96,-91,-86,-81,-76,-71,-66,-61,-56,-51,-46,-41,-36,-31,-26,-21,-16,-11,-6,-1,3,8,13,18,23,28,33,38,43,48,53,58,63,68,73,78,83,88,93,98,103,108,113,118,123,128,133,138,143,148,153,158,163,168}, \

感谢您的时间。

12 个答案:

答案 0 :(得分:31)

我看到了阵列压缩的几个选项。

1。单独的8位和1位数组

您可以将数组拆分为两部分:第一部分存储原始数组的8个低位,第二部分存储'1',如果值不适合8位,则存储'0'。每个值需要9位(与nightcracker的方法相同,但更简单一点)。要从这两个数组中读取值,请执行以下操作:

int8_t array8[37*73] = {...};
uint16_t array1[(37*73+15)/16] = {...};
size_t offset = 37 * x + y;
int16_t item = static_cast<int16_t>(array8[offset]); // sign extend
int16_t overflow = ((array1[offset/16] >> (offset%16)) & 0x0001) << 7;
item ^= overflow;

2。近似

如果可以使用一些有效计算的函数(如多项式或指数)来近似数组,则只能在数组中存储值和近似值之间的差值。这可能只需要每个值8位甚至更少。

3。增量编码

如果您的数据足够流畅,除了应用以前的任何一种方法之外,您还可以存储一个只包含部分数据值和其他表的较短表,仅包含所有值之间的差异,第一个表中不存在,以及来自第一个表的值。这需要为每个值减少位数。

例如,您可以存储其他值的每五个值和差异:

  Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7
     Short array: 0         2         3         5         6         6
Difference array:   0 1 1 2   0 0 0 1   0 1 1 2   0 0 0 1   0 0 0 0   0 1 1 1

或者,您可以使用先前值的差异,这需要每个值更少的位:

  Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7
     Short array: 0         2         3         5         6         6
     Delta array:   0 1 0 1   0 0 0 1   0 1 0 1   0 0 0 1   0 0 0 0   0 1 0 0

如果一组delta值完全符合int16_t,则可以使用按位运算有效地实现delta数组的方法。


<强>初始化

对于选项#2,可以使用预处理器。对于其他选项,预处理器是可能的,但可能不是很方便(预处理器不是很好处理长值列表)。预处理器和可变参数模板的某种组合可能更好。或者使用一些文本处理脚本可能更容易。


<强>更新

在查看实际数据后,我可以了解更多细节。选项#2(近似)对您的数据不是很方便。选项#1似乎更好。或者您可以使用Mark Ransom或Nightcracker的方法。没关系,哪一个 - 在所有情况下你都可以从16个中保存7位。

选项#3(Delta编码)允许节省更多空间。它不能直接使用,因为在阵列数据的某些单元格中突然发生变化。但是,据我所知,这些大的变化每行最多发生一次。这可以通过一个附加列实现,该列具有完整数据值和delta数组中的一个特殊值。

我注意到,(忽略这些突然变化)邻居值之间的差异绝不会超过+/- 32.这需要6位来编码每个delta值。这意味着每个值6.6位。压缩率为58%。大约2400字节。 (不多,但在你的评论中比2464K略好一点)。

阵列的中间部分更加流畅。每个值只需要5位就可以单独编码。这可以节省300..400个字节。将这个数组分成几个部分并对每个部分进行不同的编码可能是个好主意。

答案 1 :(得分:19)

作为nightcracker has noted,您的值将适合9位。有一种更简单的方法来存储这些值。将绝对值放入字节数组中,并将符号位放入单独的打包位数组中。

int8_t my_array[37][73] = {{**DATA ABSOLUTE VALUES HERE**}};
int8_t my_signs[37][10] = {{**SIGN BITS HERE**}};
int16_t my_value = my_array[i][j];
if (my_signs[i][j/8] & (1 << j%8))
    my_value = -my_value;

原始桌面尺寸减少44%而不需要太多努力。

答案 2 :(得分:16)

我从经验中知道,可视化事物可以帮助找到问题的良好解决方案。由于不清楚你的数据实际代表什么(因此我们对问题领域一无所知/很少)我们可能不会提出“最好的”解决方案(如果一个存在于所有方面)。所以我冒充了visualized the data;俗话说:一张图片胜过1000字: - )

抱歉我还没有比已发布的解决方案更好的解决方案但我认为该情节可能会帮助某人(或我自己)找到更好的解决方案。

enter image description here

答案 3 :(得分:8)

你想要的范围是+ -179。这意味着使用360值,您将得到解决。可以用9位表示360个唯一值。这是一个9位整数查找表的示例:

// size is ceil(37 * 73 * 9 / 16)
uint16_t my_array[1520];

int16_t get_lookup_item(int x, int y) {
    // calculate bitoffset
    size_t bitoffset = (37 * x + y) * 9;

    // calculate difference with 16 bit array offset
    size_t diff = bitoffset % 16;

    uint16_t item;

    // our item doesn't overlap a 16 bit boundary
    if (diff < (16 - 9)) {
        item = my_array[bitoffset / 16]; // get item
        item >>= diff;
        item &= (1 << 9) - 1;

    // our item does overlap a 16 bit boundary
    } else {
        item = my_array[bitoffset / 16];
        item >>= diff;
        item &= (1 << (16 - diff)) - 1;
        item += my_array[bitoffset / 16 + 1] & ((1 << (9 - 16 + diff)) - 1);
    }

    // we now have the unsigned item, substract 179 to bring in the correct range
    return item - 179;
}

答案 4 :(得分:6)

这是另一种方法,与我的第一种完全不同,这就是为什么它是一个单独的答案。

如果不适合8位的值的数量小于总数的1/8,则可以为每个值增加一个额外的字节,并且相对于保留另一个1位数组,结果仍然较小

为了简单和速度,我想坚持使用完整的字节值,而不是比特打包。你从来没有说过这个问题是否有速度限制,但解码整个文件只是为了查找一个值似乎很浪费。如果这对您来说确实不是问题,那么您的最佳结果可能来自实现一些现成的开源压缩实用程序的解码部分。

对于这个实现,我保持一个非常简单的编码。首先,我根据Evgeny Kluev的建议做了一个delta,从每行开始;您的数据非常适合这种方法。然后通过以下规则对每个字节进行编码:

  • 绝对值&gt; = 97被赋予97的前导字节。通过尝试不同的阈值并选择产生最小结果的阈值来获得该值。接下来是值减去97。
  • 仅检查运行长度为-96到96之间的值.3到32之间的运行长度编码为98到127,运行长度介于33和64之间,编码为-97到-128。
  • 最后,按原样输出-96到96之间的值。

这导致一个2014字节的编码数组,加上另一个36字节,用于索引到每行的开头,总共2050个字节。

可以在http://ideone.com/SNdRI找到完整的实施。输出与问题中发布的表格相同。

答案 5 :(得分:4)

正如其他人所建议的那样,通过将每个条目的绝对值存储在8位整数数组中,并将符号位存储在单独的打包位数组中,可以节省大量空间。 Mark Ransom的解决方案很简单,性能很好,并且将大小从5,402字节缩小到3,071字节,节省了43.1%。

如果你真的想要挤压每一个空间,你可以通过利用这个数据集的特性来做得更好。特别要注意的是,这些值大多是正值,并且有几个具有相同符号的值。您可以只跟踪负值的运行作为起始索引(两个字节,范围为[0..2701])和运行长度(一个字节,而不是跟踪“my_signs”数组中每个值的符号)。因为最长的跑步是36个条目长)。对于此数据集,将符号表的大小从370字节减少到168字节。 总存储量为2,869字节,与原始存储量相比节省了46.8%(减少了2,533字节)。

以下是实施此策略的代码:

uint8_t my_array[37][73] = {{ /* ABSOLUTE VALUES OF ORIGINAL ARRAY HERE */ }};

// Sign bits for the values in my_array.  The data is arranged in groups of
// three bytes.  The first two give the starting index of a run of negative
// values.  The third gives the length of the run.  To determine if a given
// value should be negated, compute it's index as (row * 73) + col, then scan this
// table to see if that index appears in any of the runs.  If it does, the value
// should be negated.

uint8_t my_signs[168]    = {
    0x00, 0x1f, 0x14, 0x00, 0x68, 0x15, 0x00, 0xb1, 0x16, 0x00, 0xfa, 0x18, 
    0x01, 0x42, 0x1a, 0x01, 0x8b, 0x1e, 0x01, 0xd2, 0x23, 0x02, 0x1a, 0x24, 
    0x02, 0x62, 0x24, 0x02, 0xaa, 0x25, 0x02, 0xf2, 0x25, 0x03, 0x3a, 0x25, 
    0x03, 0x83, 0x25, 0x03, 0xcb, 0x25, 0x04, 0x14, 0x24, 0x04, 0x5c, 0x24, 
    0x04, 0xa5, 0x23, 0x04, 0xee, 0x14, 0x05, 0x05, 0x0c, 0x05, 0x36, 0x14, 
    0x05, 0x50, 0x0a, 0x05, 0x7f, 0x13, 0x05, 0x9a, 0x09, 0x05, 0xc8, 0x12, 
    0x05, 0xe4, 0x07, 0x06, 0x10, 0x12, 0x06, 0x2f, 0x05, 0x06, 0x38, 0x05, 
    0x06, 0x59, 0x12, 0x06, 0x7f, 0x08, 0x06, 0xa2, 0x11, 0x06, 0xc7, 0x0b, 
    0x06, 0xeb, 0x11, 0x07, 0x10, 0x0c, 0x07, 0x34, 0x11, 0x07, 0x59, 0x0d, 
    0x07, 0x7c, 0x12, 0x07, 0xa2, 0x0d, 0x07, 0xc5, 0x12, 0x07, 0xeb, 0x0e, 
    0x08, 0x0e, 0x13, 0x08, 0x34, 0x0e, 0x08, 0x57, 0x13, 0x08, 0x7e, 0x0e, 
    0x08, 0x9f, 0x14, 0x08, 0xc7, 0x0e, 0x08, 0xe8, 0x14, 0x09, 0x10, 0x0e, 
    0x09, 0x30, 0x16, 0x09, 0x5a, 0x0d, 0x09, 0x78, 0x17, 0x09, 0xa4, 0x0c, 
    0x09, 0xc0, 0x18, 0x09, 0xef, 0x09, 0x0a, 0x04, 0x1d, 0x0a, 0x57, 0x14
};

int getSign(int row, int col)
{
    int want = (row * 73) + col;
    for (int i = 0 ; i < 168 ; i += 3) {
        int16_t start = (my_signs[i] << 8) | my_signs[i + 1];
        if (start > want) {
            // Not going to find it, so may as well stop now.

            break;
        }

        int runlength = my_signs[i + 2];
        if (want < start + runlength) {
            // Found this index in the signs array, so this entry is negative.

            return -1;
        }
    }
    return 1;
}

int16_t getValue(int row, int col)
{
    return getSign(row, col) * my_values[row][col];
}

事实上,你甚至可以做得更好一点,代价是更复杂的代码,通过识别sign表的游程编码版本,你真的只需要12位的起始索引和6运行长度的位,总共18位(与上面使用的简单实现的24位相比)。这样可以将另外42个字节的大小减少到2,827个,比原来减少了47.6%(减少了2,575个字节)。

答案 6 :(得分:4)

调查实际数组显示数据非常平滑并且可能会显着压缩。在以9位编码16位值之后,简单方法不会减少太多空间。这是因为阵列中不同位置的不同数据特征不同。将数组拆分为多个部分并以不同方式对它们进行编码可能会进一步减小数组大小,但这会更复杂并且会增加代码大小。

此处描述的方法允许对可变长度的数据块进行编码,从而可以相对快速地访问原始值(但比简单方法更慢)。对于速度的价格,压缩比显着增加。

主要思想是增量编码。但与前一篇文章中的简单算法相比,可变块长度和可变位深度是可能的。例如,这允许对重复值的增量使用零比特深度。这意味着只有固定的标头和没有增量值(类似于游程编码)。

此块中的所有增量都有一个基本值。这允许仅用基值对线性变化的数据(这对于实际阵列来说是很常见的)进行编码,再次为delta值花费零空间。并且在其他情况下略微降低平均位深度。

压缩数据存储在比特流数组中,由比特流读取器访问。为了快速访问每个比特流的开始,使用了索引表(只有37个16位索引的数组)。

每个比特流以流中的块数(5比特)开始,然后是块的索引,最后是数据块。块索引提供了一种在搜索期间跳过不需要的数据块的方法。索引包含:块中的元素数(4位允许编码9到24个delta值,加上起始值),所有增量的基值大小(4位或4位大小)和大小增量(2位用于大小0..3 - 如果基本大小为4或大小为2..5 - 如果基本大小为6)。这些特定的位深度可能接近最佳值,但可以更改为某些空间交换某些速度或使算法适应不同的数据阵列。

数据块包含起始值(9位),增量的基值(4或6位)和增量值(每个值为0..3或2..5位)。

这是函数,从压缩数据中提取原始值:

int get(size_t row, unsigned col)
{
  BitstreamReader bsr(indexTable[row]);
  unsigned blocks = bsr.getUI(5);

  unsigned block = 0;
  unsigned start = 0;
  unsigned nextStart = 0;
  unsigned offset = 0;
  unsigned nextOffset = 0;
  unsigned blockSize = 0;
  unsigned baseSize = 0;
  unsigned deltaSize = 0;
  while (col >= nextStart) // 3 iterations on average
  {
    start = nextStart;
    offset = nextOffset;
    ++block;
    blockSize = bsr.getUI(4) + 9;
    nextStart += blockSize;
    baseSize = bsr.getUI(1)*2 + 4;
    deltaSize = bsr.getUI(2) + baseSize - 4;
    nextOffset += deltaSize * blockSize + baseSize + 9;
  }
  -- block;

  bsr.skip((blocks - block) * 7 + offset);
  int value = bsr.getI(9);
  int base = bsr.getI(baseSize);

  while(col-- > start) // 12 iterations on average
  {
    int delta = base + bsr.getUI(deltaSize);
    value += delta;
  }

  return value;
}

以下是比特流阅读器的实现:

  class BitstreamReader
  {
  public:
    BitstreamReader(size_t start): word_(start), bit_(0) {}

    void skip(unsigned offset)
    {
      word_ += offset / 16 + ((bit_ + offset >= 16)? 1: 0);
      bit_ = (bit_ + offset) % 16;
    }

    unsigned getUI(unsigned size)
    {
      unsigned old = bit_;
      unsigned result = dataTable[word_] >> bit_;
      result &= ((1 << size) - 1);
      bit_ += size;

      if (bit_ >= 16)
      {
        ++word_;
        bit_ -= 16;

        if (bit_ > 0)
        {
          result += (dataTable[word_] & ((1 << bit_) - 1)) << (16 - old);
        }
      }

      return result;
    }

    int getI(unsigned size)
    {
      int result = static_cast<int>(getUI(size));
      return result | -(result & (1 << (size - 1)));
    }

  private:
    size_t word_;
    unsigned bit_;
  };

我为结果数据大小计算了一些估计值。 (我不发布允许我这样做的代码,因为代码质量非常低)。结果是1250个字节。哪个比最佳压缩程序大。但是比任何简单的方法都要低得多。


<强>更新

1250字节不是限制。可以改进该算法以更难以压缩数据并更快地工作。

我注意到,块数(5位)可以从比特流移动到行索引表的未使用位。这节省了大约30个字节。

为了节省20个字节,你可以用字节而不是uint16存储比特流,这样可以节省填充比特的空间。

所以我们有大约1200个字节。哪个不准确。大小可能有点被低估,因为我没有考虑到并非每个位深度都可以在行索引中编码。此大小也可能被高估,因为假设编码器的唯一启发式算法是计算前9个值的位深度,并且仅当该位深度需要增加超过2位时才限制块大小。当然,编码器可能比这更聪明。

解码速度也可能会提高。如果我们将第9位从原始值移动到行索引,则索引的每个元素恰好是8位。这允许以字节集开始比特流,每个字节可以用比一般比特流的访问器更快的方法解码。为了相同的目的,可以将剩余的8位原始值移动到行索引之后的位置。或者,可选地,它们可以包括在每个索引条目中,使得索引由16位值组成。在这些修改之后,比特流仅包含可变长度的数据字段。

答案 7 :(得分:3)

1049字节

我注意到大多数跑都是线性的。这就是为什么我决定编码不是delta值,而是delta-delta。把它想象成二阶导数。这使我在大多数时间存储值-1,0和1,但有一些值得注意的例外。

其次,我将数据设为1维。将其转换为2个版本很容易,但是在1维中将其转换为允许压缩跨越多行。

压缩数据以不同大小的块组织。每个块都以标题开头:

  • 9位 - 绝对值,input[x]
  • 的值
  • 7位 - 差异,input[x+1]-input[x]
  • 的值
  • 7位 - 差异,input[x+2]-input[x+1]
  • 的值
  • 9位 - 二阶导数的跟随数据的长度
  • 每个2位 - 二阶导数的数组

此示例中的二阶导数的运行时间非常长,尽管只能存储值-2,-1,0和1。

在下面的代码中,我提供了一个完整的,可编译的代码。它包含:

  • C(GCC)代码。没有C ++构造。
  • 您提供的输入数组
  • 用于打印数组内容的可视化功能
  • 压缩功能(如果输入稍有变化)
  • Getter函数 - 从数组中提取元素
  • 在主要功能中:我压缩,解压缩并执行检查

玩得开心!

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef int16_t Arr[37][73];
typedef int16_t ArrFlat[37*73];
typedef int16_t* ArrPtr;

Arr input = { {150,145,140,135,130,125,120,115,110,105,100,95,90,85,80,75,70,65,60,55,50,45,40,35,30,25,20,15,10,5,0,-4,-9,-14,-19,-24,-29,-34,-39,-44,-49,-54,-59,-64,-69,-74,-79,-84,-89,-94,-99,104,109,114,119,124,129,134,139,144,149,154,159,164,169,174,179,175,170,165,160,155,150}, \
{143,137,131,126,120,115,110,105,100,95,90,85,80,75,71,66,62,57,53,48,44,39,35,31,27,22,18,14,9,5,1,-3,-7,-11,-16,-20,-25,-29,-34,-38,-43,-47,-52,-57,-61,-66,-71,-76,-81,-86,-91,-96,101,107,112,117,123,128,134,140,146,151,157,163,169,175,178,172,166,160,154,148,143}, \
{130,124,118,112,107,101,96,92,87,82,78,74,70,65,61,57,54,50,46,42,38,34,31,27,23,19,16,12,8,4,1,-2,-6,-10,-14,-18,-22,-26,-30,-34,-38,-43,-47,-51,-56,-61,-65,-70,-75,-79,-84,-89,-94,100,105,111,116,122,128,135,141,148,155,162,170,177,174,166,159,151,144,137,130}, \
{111,104,99,94,89,85,81,77,73,70,66,63,60,56,53,50,46,43,40,36,33,30,26,23,20,16,13,10,6,3,0,-3,-6,-9,-13,-16,-20,-24,-28,-32,-36,-40,-44,-48,-52,-57,-61,-65,-70,-74,-79,-84,-88,-93,-98,103,109,115,121,128,135,143,152,162,172,176,165,154,144,134,125,118,111}, \
{85,81,77,74,71,68,65,63,60,58,56,53,51,49,46,43,41,38,35,32,29,26,23,19,16,13,10,7,4,1,-1,-3,-6,-9,-13,-16,-19,-23,-26,-30,-34,-38,-42,-46,-50,-54,-58,-62,-66,-70,-74,-78,-83,-87,-91,-95,100,105,110,117,124,133,144,159,178,160,141,125,112,103,96,90,85}, \
{62,60,58,57,55,54,52,51,50,48,47,46,44,42,41,39,36,34,31,28,25,22,19,16,13,10,7,4,2,0,-3,-5,-8,-10,-13,-16,-19,-22,-26,-29,-33,-37,-41,-45,-49,-53,-56,-60,-64,-67,-70,-74,-77,-80,-83,-86,-89,-91,-94,-97,101,105,111,130,109,84,77,74,71,68,66,64,62}, \
{46,46,45,44,44,43,42,42,41,41,40,39,38,37,36,35,33,31,28,26,23,20,16,13,10,7,4,1,-1,-3,-5,-7,-9,-12,-14,-16,-19,-22,-26,-29,-33,-36,-40,-44,-48,-51,-55,-58,-61,-64,-66,-68,-71,-72,-74,-74,-75,-74,-72,-68,-61,-48,-25,2,22,33,40,43,45,46,47,46,46}, \
{36,36,36,36,36,35,35,35,35,34,34,34,34,33,32,31,30,28,26,23,20,17,14,10,6,3,0,-2,-4,-7,-9,-10,-12,-14,-15,-17,-20,-23,-26,-29,-32,-36,-40,-43,-47,-50,-53,-56,-58,-60,-62,-63,-64,-64,-63,-62,-59,-55,-49,-41,-30,-17,-4,6,15,22,27,31,33,34,35,36,36}, \
{30,30,30,30,30,30,30,29,29,29,29,29,29,29,29,28,27,26,24,21,18,15,11,7,3,0,-3,-6,-9,-11,-12,-14,-15,-16,-17,-19,-21,-23,-26,-29,-32,-35,-39,-42,-45,-48,-51,-53,-55,-56,-57,-57,-56,-55,-53,-49,-44,-38,-31,-23,-14,-6,0,7,13,17,21,24,26,27,29,29,30}, \
{25,25,26,26,26,25,25,25,25,25,25,25,25,26,25,25,24,23,21,19,16,12,8,4,0,-3,-7,-10,-13,-15,-16,-17,-18,-19,-20,-21,-22,-23,-25,-28,-31,-34,-37,-40,-43,-46,-48,-49,-50,-51,-51,-50,-48,-45,-42,-37,-32,-26,-19,-13,-7,-1,3,7,11,14,17,19,21,23,24,25,25}, \
{21,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,21,20,18,16,13,9,5,1,-3,-7,-11,-14,-17,-18,-20,-21,-21,-22,-22,-22,-23,-23,-25,-27,-29,-32,-35,-37,-40,-42,-44,-45,-45,-45,-44,-42,-40,-36,-32,-27,-22,-17,-12,-7,-3,0,3,7,9,12,14,16,18,19,20,21,21}, \
{18,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,18,17,16,14,10,7,2,-1,-6,-10,-14,-17,-19,-21,-22,-23,-24,-24,-24,-24,-23,-23,-23,-24,-26,-28,-30,-33,-35,-37,-38,-39,-39,-38,-36,-34,-31,-28,-24,-19,-15,-10,-6,-3,0,1,4,6,8,10,12,14,15,16,17,18,18}, \
{16,16,17,17,17,17,17,17,17,17,17,16,16,16,16,16,16,15,13,11,8,4,0,-4,-9,-13,-16,-19,-21,-23,-24,-25,-25,-25,-25,-24,-23,-21,-20,-20,-21,-22,-24,-26,-28,-30,-31,-32,-31,-30,-29,-27,-24,-21,-17,-13,-9,-6,-3,-1,0,2,4,5,7,9,10,12,13,14,15,16,16}, \
{14,14,14,15,15,15,15,15,15,15,14,14,14,14,14,14,13,12,11,9,5,2,-2,-6,-11,-15,-18,-21,-23,-24,-25,-25,-25,-25,-24,-22,-21,-18,-16,-15,-15,-15,-17,-19,-21,-22,-24,-24,-24,-23,-22,-20,-18,-15,-12,-9,-5,-3,-1,0,1,2,4,5,6,8,9,10,11,12,13,14,14}, \
{12,13,13,13,13,13,13,13,13,13,13,13,12,12,12,12,11,10,9,6,3,0,-4,-8,-12,-16,-19,-21,-23,-24,-24,-24,-24,-23,-22,-20,-17,-15,-12,-10,-9,-9,-10,-12,-13,-15,-17,-17,-18,-17,-16,-15,-13,-11,-8,-5,-3,-1,0,1,1,2,3,4,6,7,8,9,10,11,12,12,12}, \
{11,11,11,11,11,12,12,12,12,12,11,11,11,11,11,10,10,9,7,5,2,-1,-5,-9,-13,-17,-20,-22,-23,-23,-23,-23,-22,-20,-18,-16,-14,-11,-9,-6,-5,-4,-5,-6,-8,-9,-11,-12,-12,-12,-12,-11,-9,-8,-6,-3,-1,0,0,1,1,2,3,4,5,6,7,8,9,10,11,11,11}, \
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,9,9,7,6,3,0,-3,-6,-10,-14,-17,-20,-21,-22,-22,-22,-21,-19,-17,-15,-13,-10,-8,-6,-4,-2,-2,-2,-2,-4,-5,-7,-8,-8,-9,-8,-8,-7,-5,-4,-2,0,0,1,1,1,2,2,3,4,5,6,7,8,9,10,10,10}, \
{9,9,9,9,9,9,9,10,10,9,9,9,9,9,9,8,8,6,5,2,0,-4,-7,-11,-15,-17,-19,-21,-21,-21,-20,-18,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,0,-1,-2,-4,-5,-5,-6,-6,-5,-5,-4,-3,-1,0,0,1,1,1,1,2,3,3,5,6,7,8,8,9,9,9}, \
{9,9,9,9,9,9,9,9,9,9,9,9,8,8,8,8,7,5,4,1,-1,-5,-8,-12,-15,-17,-19,-20,-20,-19,-18,-16,-14,-11,-9,-7,-5,-4,-2,-1,0,0,1,1,0,0,-2,-3,-3,-4,-4,-4,-3,-3,-2,-1,0,0,0,0,0,1,1,2,3,4,5,6,7,8,8,9,9}, \
{9,9,9,8,8,8,9,9,9,9,9,8,8,8,8,7,6,5,3,0,-2,-5,-9,-12,-15,-17,-18,-19,-19,-18,-16,-14,-12,-9,-7,-5,-4,-2,-1,0,0,1,1,1,1,0,0,-1,-2,-2,-3,-3,-2,-2,-1,-1,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8,9}, \
{8,8,8,8,8,8,9,9,9,9,9,9,8,8,8,7,6,4,2,0,-3,-6,-9,-12,-15,-17,-18,-18,-17,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,2,2,1,0,0,-1,-1,-1,-2,-2,-1,-1,0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8}, \
{8,8,8,8,9,9,9,9,9,9,9,9,9,8,8,7,5,3,1,-1,-4,-7,-10,-13,-15,-16,-17,-17,-16,-15,-13,-11,-9,-6,-5,-3,-2,0,0,0,1,2,2,2,2,1,1,0,0,0,-1,-1,-1,-1,-1,0,0,0,0,-1,-1,-1,-1,-1,0,0,1,3,4,5,7,7,8}, \
{8,8,9,9,9,9,10,10,10,10,10,10,10,9,8,7,5,3,0,-2,-5,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,3,3,2,2,1,0,0,0,0,0,0,0,0,0,0,-1,-1,-2,-2,-2,-2,-2,-1,0,0,1,3,4,6,7,8}, \
{7,8,9,9,9,10,10,11,11,11,11,11,10,10,9,7,5,3,0,-2,-6,-9,-11,-13,-15,-16,-16,-15,-14,-13,-11,-9,-7,-5,-3,-2,0,0,1,1,2,3,3,3,3,2,2,1,1,0,0,0,0,0,0,0,-1,-1,-2,-3,-3,-4,-4,-4,-3,-2,-1,0,1,3,5,6,7}, \
{6,8,9,9,10,11,11,12,12,12,12,12,11,11,9,7,5,2,0,-3,-7,-10,-12,-14,-15,-16,-15,-15,-13,-12,-10,-8,-7,-5,-3,-1,0,0,1,2,2,3,3,4,3,3,3,2,2,1,1,1,0,0,0,0,-1,-2,-3,-4,-4,-5,-5,-5,-5,-4,-2,-1,0,2,3,5,6}, \
{6,7,8,10,11,12,12,13,13,14,14,13,13,11,10,8,5,2,0,-4,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-5,-3,-1,0,0,1,2,3,3,4,4,4,4,4,3,3,3,2,2,1,1,0,0,-1,-2,-3,-5,-6,-7,-7,-7,-6,-5,-4,-3,-1,0,2,4,6}, \
{5,7,8,10,11,12,13,14,15,15,15,14,14,12,11,8,5,2,-1,-5,-9,-12,-14,-16,-17,-17,-16,-15,-14,-12,-11,-9,-7,-5,-3,-1,0,0,1,2,3,4,4,5,5,5,5,5,5,4,4,3,3,2,1,0,-1,-2,-4,-6,-7,-8,-8,-8,-8,-7,-6,-4,-2,0,1,3,5}, \
{4,6,8,10,12,13,14,15,16,16,16,16,15,13,11,9,5,2,-2,-6,-10,-13,-16,-17,-18,-18,-17,-16,-15,-13,-11,-9,-7,-5,-4,-2,0,0,1,3,3,4,5,6,6,7,7,7,7,7,6,5,4,3,2,0,-1,-3,-5,-7,-8,-9,-10,-10,-10,-9,-7,-5,-4,-1,0,2,4}, \
{4,6,8,10,12,14,15,16,17,18,18,17,16,15,12,9,5,1,-3,-8,-12,-15,-18,-19,-20,-20,-19,-18,-16,-15,-13,-11,-8,-6,-4,-2,-1,0,1,3,4,5,6,7,8,9,9,9,9,9,9,8,7,5,3,1,-1,-3,-6,-8,-10,-11,-12,-12,-11,-10,-9,-7,-5,-2,0,1,4}, \
{4,6,8,11,13,15,16,18,19,19,19,19,18,16,13,10,5,0,-5,-10,-15,-18,-21,-22,-23,-22,-22,-20,-18,-17,-14,-12,-10,-8,-5,-3,-1,0,1,3,5,6,8,9,10,11,12,12,13,12,12,11,9,7,5,2,0,-3,-6,-9,-11,-12,-13,-13,-12,-11,-10,-8,-6,-3,-1,1,4}, \
{3,6,9,11,14,16,17,19,20,21,21,21,19,17,14,10,4,-1,-8,-14,-19,-22,-25,-26,-26,-26,-25,-23,-21,-19,-17,-14,-12,-9,-7,-4,-2,0,1,3,5,7,9,11,13,14,15,16,16,16,16,15,13,10,7,4,0,-3,-7,-10,-12,-14,-15,-14,-14,-12,-11,-9,-6,-4,-1,1,3}, \
{4,6,9,12,14,17,19,21,22,23,23,23,21,19,15,9,2,-5,-13,-20,-25,-28,-30,-31,-31,-30,-29,-27,-25,-22,-20,-17,-14,-11,-9,-6,-3,0,1,4,6,9,11,13,15,17,19,20,21,21,21,20,18,15,11,6,2,-2,-7,-11,-13,-15,-16,-16,-15,-13,-11,-9,-7,-4,-1,1,4}, \
{4,7,10,13,15,18,20,22,24,25,25,25,23,20,15,7,-2,-12,-22,-29,-34,-37,-38,-38,-37,-36,-34,-31,-29,-26,-23,-20,-17,-13,-10,-7,-4,-1,2,5,8,11,13,16,18,21,23,24,26,26,26,26,24,21,17,12,5,0,-6,-10,-14,-16,-16,-16,-15,-14,-12,-10,-7,-4,-1,1,4}, \
{4,7,10,13,16,19,22,24,26,27,27,26,24,19,11,-1,-15,-28,-37,-43,-46,-47,-47,-45,-44,-41,-39,-36,-32,-29,-26,-22,-19,-15,-11,-8,-4,-1,2,5,9,12,15,19,22,24,27,29,31,33,33,33,32,30,26,21,14,6,0,-6,-11,-14,-15,-16,-15,-14,-12,-9,-7,-4,-1,1,4}, \
{6,9,12,15,18,21,23,25,27,28,27,24,17,4,-14,-34,-49,-56,-60,-60,-60,-58,-56,-53,-50,-47,-43,-40,-36,-32,-28,-25,-21,-17,-13,-9,-5,-1,2,6,10,14,17,21,24,28,31,34,37,39,41,42,43,43,41,38,33,25,17,8,0,-4,-8,-10,-10,-10,-8,-7,-4,-2,0,3,6}, \
{22,24,26,28,30,32,33,31,23,-18,-81,-96,-99,-98,-95,-93,-89,-86,-82,-78,-74,-70,-66,-62,-57,-53,-49,-44,-40,-36,-32,-27,-23,-19,-14,-10,-6,-1,2,6,10,15,19,23,27,31,35,38,42,45,49,52,55,57,60,61,63,63,62,61,57,53,47,40,33,28,23,21,19,19,19,20,22}, \
{168,173,178,176,171,166,161,156,151,146,141,136,131,126,121,116,111,106,101,-96,-91,-86,-81,-76,-71,-66,-61,-56,-51,-46,-41,-36,-31,-26,-21,-16,-11,-6,-1,3,8,13,18,23,28,33,38,43,48,53,58,63,68,73,78,83,88,93,98,103,108,113,118,123,128,133,138,143,148,153,158,163,168} };

void visual(Arr arr) {
  int row;
  int col;
  for (row=0; row<37; ++row) {
    for (col=0; col<73; ++col)
      printf("%3d",arr[row][col]);
    printf("\n");
  }
}

void visualFlat(ArrFlat arr) {
  int cell;
  for (cell=0; cell<37*73; ++cell) {
    printf("%3d",arr[cell]);
  }
  printf("\n");
}

typedef struct {
  int16_t absolute:9;
  int16_t adiff:7;
  int16_t diff:7;
  unsigned short diff2_length:9;
} __attribute__((packed)) Header;

typedef union {
  struct {
  int16_t diff2_a:2;
  int16_t diff2_b:2;
  int16_t diff2_c:2;
  int16_t diff2_d:2;
  } __attribute__((packed));
  unsigned char all;
} Chunk;

int16_t chunkGet(Chunk k, int16_t offset) {
  switch (offset) {
    case 0 : return k.diff2_a;
    case 1 : return k.diff2_b;
    case 2 : return k.diff2_c;
    case 3 : return k.diff2_d;
  }
}

void chunkSet(Chunk *k, int16_t offset, int16_t value) {
  switch (offset) {
    case 0 : k->diff2_a=value; break;
    case 1 : k->diff2_b=value; break;
    case 2 : k->diff2_c=value; break;
    case 3 : k->diff2_d=value; break;
    default: printf("Invalid offset %hd\n", offset);
  }
}

unsigned char data[1049];

void compress (ArrFlat src) {
  Chunk diffData;
  int16_t headerIdx=0;
  int16_t diffIdx;
  int16_t currentDiffValue;
  int16_t length=-3;
  int16_t shift=0;
  Header h;
  int16_t position=0;
  while (position<37*73) {
    if (length==-3) { //encode the absolute value
      h.absolute=currentDiffValue=src[position];
      ++position;
      ++length;
      continue;
    }
    if (length==-2) { //encode the first diff value
      h.adiff=currentDiffValue=src[position]-src[position-1];
      if (currentDiffValue<-64 || currentDiffValue>+63)
        printf("\nDIFF TOO BIG\n");
      ++position;
      ++length;
      continue;
    }
    if (length==-1) { //encode the second diff value
      h.diff=currentDiffValue=src[position]-src[position-1];
      if (currentDiffValue<-64 || currentDiffValue>+63)
        printf("\nDIFF TOO BIG\n");
      ++position;
      ++length;
      diffData.all=0;
      diffIdx=headerIdx+sizeof(Header);
      shift=0;
      continue;
    }
    //compute the diff2
    int16_t diff=src[position]-src[position-1];
    int16_t diff2=diff-currentDiffValue;
    if (diff2>1 || diff2<-2) { //big change - restart with header
      if (length>511)
        printf("\nLENGTH TOO LONG\n");
      if (shift!=0) { //store partial byte
        data[diffIdx]=diffData.all;
        diffData.all=0;
        ++diffIdx;
      }
      h.diff2_length=length;
      memcpy(data+headerIdx,&h,sizeof(Header));
      headerIdx=diffIdx;
      length=-3;
      continue;
    }
    chunkSet(&diffData,shift,diff2);
    shift+=1;
    currentDiffValue=diff;
    ++position;
    ++length;
    if (shift==4) {
      data[diffIdx]=diffData.all;
      diffData.all=0;
      ++diffIdx;
      shift=0;
    }
  }
  if (shift!=0) { //finalize
    data[diffIdx]=diffData.all;
    ++diffIdx;
  }
  h.diff2_length=length;
  memcpy(data+headerIdx,&h,sizeof(Header));
  headerIdx=diffIdx;
  printf("Ending byte=%hd\n",headerIdx);
}

int16_t get(int row, int col) {
  int idx=row*73+col;
  int dataIdx=0;
  int pos=0;
  int16_t absolute;
  int16_t diff;
  Header h;
  while (1) {
    memcpy(&h, data+dataIdx, sizeof(Header));
    if (idx==pos) return h.absolute;
    absolute=h.absolute+h.adiff;
    if (idx==pos+1) return absolute;
    diff=h.diff;
    absolute+=diff;
    if (idx==pos+2) return absolute;
    dataIdx+=sizeof(Header);
    pos+=3;
    if (pos+h.diff2_length <= idx) {
      pos+=h.diff2_length;
      dataIdx+=(h.diff2_length+3)/4;
    } else break;
  }
  int shift=4;
  Chunk diffData;
  while (pos<=idx) {
    if (shift==4) {
      diffData.all=data[dataIdx];
      ++dataIdx;
      shift=0;
    }
    diff+=chunkGet(diffData,shift);
    absolute+=diff;
    ++shift;
    ++pos;
  }
  return absolute;
}

int main() {
  printf("Input:\n");
  visual(input);
  int row;
  int col;
  ArrPtr flatInput=(ArrPtr)input;
  printf("sizeof(Header)=%lu\n",sizeof(Header));
  printf("sizeof(Chunk)=%lu\n",sizeof(Chunk));
  compress(flatInput);
  ArrFlat re;
  for (row=0; row<37; ++row)
    for (col=0; col<73; ++col) {
      int cell=row*73+col;
      re[cell]=get(row,col);
      if (re[cell]!=flatInput[cell])
        printf("ERROR DETECTED IN CELL %d\n",cell);
    }
  visual(re);
  return 0;
}

Visual Studio版本(使用VS 2010编译)

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

typedef int16_t Arr[37][73];
typedef int16_t ArrFlat[37*73];
typedef int16_t* ArrPtr;

Arr input = { [... your array as above ...] };

void visual(Arr arr) {
    int row;
    int col;
    for (row=0; row<37; ++row) {
        for (col=0; col<73; ++col)
            printf("%3d",arr[row][col]);
        printf("\n");
    }
}

void visualFlat(ArrFlat arr) {
    int cell;
    for (cell=0; cell<37*73; ++cell) {
        printf("%3d",arr[cell]);
    }
    printf("\n");
}

#pragma pack(1)
typedef struct {
    int16_t absolute:9;
    int16_t adiff:7;
    int16_t diff:7;
    unsigned short diff2_length:9;
} Header;

#pragma pack(1)
typedef union {
    struct {
        char diff2_a:2;
        char diff2_b:2;
        char diff2_c:2;
        char diff2_d:2;
    };
    unsigned char all;
} Chunk;

int16_t chunkGet(Chunk k, int16_t offset) {
    switch (offset) {
    case 0 : return k.diff2_a;
    case 1 : return k.diff2_b;
    case 2 : return k.diff2_c;
    case 3 : return k.diff2_d;
    }
}

void chunkSet(Chunk *k, int16_t offset, int16_t value) {
    switch (offset) {
    case 0 : k->diff2_a=value; break;
    case 1 : k->diff2_b=value; break;
    case 2 : k->diff2_c=value; break;
    case 3 : k->diff2_d=value; break;
    default: printf("Invalid offset %hd\n", offset);
    }
}

unsigned char data[1049];

void compress (ArrFlat src) {
    Chunk diffData;
    int16_t headerIdx=0;
    int16_t diffIdx;
    int16_t currentDiffValue;
    int16_t length=-3;
    int16_t shift=0;
    int16_t diff;
    int16_t diff2;
    Header h;
    int16_t position=0;
    while (position<37*73) {
        if (length==-3) { //encode the absolute value
            h.absolute=currentDiffValue=src[position];
            ++position;
            ++length;
            continue;
        }
        if (length==-2) { //encode the first diff value
            h.adiff=currentDiffValue=src[position]-src[position-1];
            if (currentDiffValue<-64 || currentDiffValue>+63)
                printf("\nDIFF TOO BIG\n");
            ++position;
            ++length;
            continue;
        }
        if (length==-1) { //encode the second diff value
            h.diff=currentDiffValue=src[position]-src[position-1];
            if (currentDiffValue<-64 || currentDiffValue>+63)
                printf("\nDIFF TOO BIG\n");
            ++position;
            ++length;
            diffData.all=0;
            diffIdx=headerIdx+sizeof(Header);
            shift=0;
            continue;
        }
        //compute the diff2
        diff=src[position]-src[position-1];
        diff2=diff-currentDiffValue;
        if (diff2>1 || diff2<-2) { //big change - restart with header
            if (length>511)
                printf("\nLENGTH TOO LONG\n");
            if (shift!=0) { //store partial byte
                data[diffIdx]=diffData.all;
                diffData.all=0;
                ++diffIdx;
            }
            h.diff2_length=length;
            memcpy(data+headerIdx,&h,sizeof(Header));
            headerIdx=diffIdx;
            length=-3;
            continue;
        }
        chunkSet(&diffData,shift,diff2);
        shift+=1;
        currentDiffValue=diff;
        ++position;
        ++length;
        if (shift==4) {
            data[diffIdx]=diffData.all;
            diffData.all=0;
            ++diffIdx;
            shift=0;
        }
    }
    if (shift!=0) { //finalize
        data[diffIdx]=diffData.all;
        ++diffIdx;
    }
    h.diff2_length=length;
    memcpy(data+headerIdx,&h,sizeof(Header));
    headerIdx=diffIdx;
    printf("Ending byte=%hd\n",headerIdx);
}

int16_t get(int row, int col) {
    int idx=row*73+col;
    int dataIdx=0;
    int pos=0;
    int16_t absolute;
    int16_t diff;
    int shift;
    Header h;
    Chunk diffData;
    while (1) {
        memcpy(&h, data+dataIdx, sizeof(Header));
        if (idx==pos) return h.absolute;
        absolute=h.absolute+h.adiff;
        if (idx==pos+1) return absolute;
        diff=h.diff;
        absolute+=diff;
        if (idx==pos+2) return absolute;
        dataIdx+=sizeof(Header);
        pos+=3;
        if (pos+h.diff2_length <= idx) {
            pos+=h.diff2_length;
            dataIdx+=(h.diff2_length+3)/4;
        } else break;
    }
    shift=4;

    while (pos<=idx) {
        if (shift==4) {
            diffData.all=data[dataIdx];
            ++dataIdx;
            shift=0;
        }
        diff+=chunkGet(diffData,shift);
        absolute+=diff;
        ++shift;
        ++pos;
    }
    return absolute;
}

int main() {
    int row;
    int col;
    ArrPtr flatInput=(ArrPtr)input;
    ArrFlat re;

    printf("Input:\n");
    visual(input);
    printf("sizeof(Header)=%lu\n",sizeof(Header));
    printf("sizeof(Chunk)=%lu\n",sizeof(Chunk));
    compress(flatInput);

    for (row=0; row<37; ++row)
        for (col=0; col<73; ++col) {
            int cell=row*73+col;
            re[cell]=get(row,col);
            if (re[cell]!=flatInput[cell])
                printf("ERROR DETECTED IN CELL %d\n",cell);
        }
        visual(re);
        return 0;
}

答案 8 :(得分:1)

还有另一种可能性:

  • 有两个阵列:一个主要,一个溢出
  • 主阵列的每个元素都包含7位实际数据+ 1“状态”位。
  • 如果状态位复位,则该值适合剩余的7位。
  • 如果状态位置1,则部分值仍然在这7位中,但剩余位包含在溢出数组中。
  • 溢出数组中的索引是通过计算主数组中设置了状态位的所有前面元素来找到的。

enter image description here

这具有以下优点:

  • 非常快速地查找适合7位的值。
  • 可以处理无限范围的值(通过在溢出数组中使用适当大的元素,或者通过重复算法并在顶部堆叠另一个溢出数组等...)。
  • 另一方面,如果你知道这些值总是适合9位,那么使用溢出数组中的2位元素来节省额外的空间(需要一些bit-twiddling,但是完成)。
  • 对于某些数据分布,可能使用的空间少于仅使用9位元素(在单个数组或8位数组+ 1位数组中) - 当大多数值时适合7位。
  • 实现起来相当简单,因此代码大小不会为数据节省成本。

缺点:

  • 慢速查找不适合7位的值。访问这样的值需要线性遍历主数组中剩余的所有元素(并检查它们的状态位)以确定溢出数组中的索引。
  • 对于其他一些数据分布,它可能会使用比9位方法更多的空间 - 当有许多值适合7位时。
  • 不像8-bit array + 1-bit array approach那么简单,所以虽然仍然不是很大,但代码会比这稍大一些。

答案 9 :(得分:1)

726字节

该算法对实际值和值之间的差异进行编码,这是通过先前值的线性外推产生的。换句话说,它使用一阶泰勒级数,或者像CygnusX1所称的那样,使用delta-delta。

在此外推编码之后,大多数值都在[-1 .. 1]范围内。这是使用Arithmetic codingRange encoding的一个很好的理由。我已经通过Arturo San Emeterio Campos实现了算术编码器。此外,同一作者的Range coder算法也可用。

[-2 .. 2]范围内的小值由算术编码器压缩,而较大的值用4位半字节压缩。

还有一些优化用于将它打包得更紧:

  • 将所有值压缩为一个连续流
  • 最后一列根本没有编码,因为它等于第一列
  • 编码第一列时,历史记录仅部分更新以改善第二列的结果
  • 几种情况,当值从-100跳到100时,处理方式不同

该算法速度慢,它使用多达8000个32位整数除法和大量位操作来提取单个值。但它将数据打包成726字节的数组,代码大小不是很大。

如果正确缩放频率表,则可以优化速度(至~2800 32位整数除法)。使用范围编码而不是算术编码也可以提高速度。如果算术编码器数据和半字节都打包在字节数组中而不是uint16数组(2字节),并且如果最多两个起始零字节与一些其他数据结构(1..2字节)的末尾混淆,则可以优化空间。使用二阶有序泰勒级数没有获得任何空间,但可能其他推断方法也会有所改进。

可以在此处找到完整的实施:encoderdecoder and a test。在GCC上测试。

答案 10 :(得分:0)

如果代码和数据大小的总和很重要,请不要忘记检查已编译代码的大小。以下是对数据使用普通8位编码(50%增益)并优化代码大小的示例。

我们将为每一行存储8位值:

    unsigned char *row_data = compressed_data[row*73];
    int value = row_data[column];

对于第一行,将它们分成两部分。第一个值将直接编码。下一部分将使用第一个值的负增量。第二部分将被编码为100的正增量。

    if (row <= 4) {
        char break = break_point[row];
        if (column >= break) return 100 + value;
        if (column == 0) return value;
        return row_data[0] - value;
    }

break_point将是前五行中104,101,100,103,110的位置。我还没有检查它是否可以计算而不是存储。它可能是51 +行吗?

在第5行之后,值变得更平滑,我们可以将它们存储在8位二进制补码中。例外是最后一行。

    if (row != 36) return (signed char) value;

最后一行可以这样编码,没有任何数据(节省了73个字节):

    value = 168+5*column;
    if (value <= 178) return value;
    value = 359 - x; /* 359 = 176 + 183 */
    if (value >= 101) return value;
    value = -x;
    if (value > 0) x--;
    return value;

这需要大约2640个字节,但访问速度非常快且紧凑。

第一行可以编码类似于最后一行(增量为-5,符号变化为-104,以及359-x&#34;翻转&#34;在184处)可以保存70字节的数据代码大小有些代价。

答案 11 :(得分:0)

如果重复是连续的并且您有额外的CPU,则可以使用游程编码。

遗憾的是,对于DFA而言,数据集看起来太密集了......但是你可以完全让一个人工作。它需要预处理并且速度超快。程序集可能超过4K数据集,因此可能不是一个选项。

假设您的16位值很少,哈希可能适用于超大条目(请参阅:google sparsehash)......每个实体的开销为1位+。

您也可以使用9位值并手动管理内存字节边界,这与单独的位数组的开销相同......可能更多。