运行长度编码(整数)

时间:2016-11-07 20:07:48

标签: c++ encoding integer run-length-encoding

在课堂上我们讨论的是RLE,我们的教授向我们展示了以下代码。我试图理解它,但我不太明白。所以,如果有人能向我解释这个例子中的RLE是如何工作的,我将非常感激。 我确实理解如何实现数据压缩,但我不理解程序的实现。在评论中,您将找到我的问题。

// Example implementation of a simple variant of // run-length encoding and  decoding of a byte sequence

#include <iostream> 
#include <cassert>

// PRE: 0 <= value <= 255 
// POST: returns true if value is first byte of a tuple, otherwise false 

bool is_tuple_start(const unsigned int value) 
{ 
    assert(0 <= value && value <= 255);
    return value >= 128; //Why is it: value>=128 for first Byte of tuple?
}

// PRE: 1 <= runlength <= 127 //Why must runlength be in this range?
// POST: returns encoded runlength byte 

unsigned int make_tuple_start(const unsigned int run_length) 
{ 
    assert(1 <= run_length && run_length <= 127);
    return run_length + 128; //Why do I add 128?
}

// PRE: n/a 
// POST: returns true if value equals the maximal run-length 

bool is_max_runlength(const unsigned int value)  
{
    return value == 127; //same question: why is max. range 127?
}

// PRE: 128 <= value <= 255 //Why this range for value?
// POST: returns runlength of tuple 

unsigned int get_runlength(const unsigned int value) 
{ 
    assert(128 <= value && value <= 255);
    return value - 128; //Why -128?
}

// PRE: n/a 
// POST: outputs value and adds a newline 

void out_byte(const unsigned int value) 
{ 
    std::cout << value << "\n"; 
}

// PRE: 1 <= runlength <= 127 and 0 <= value <= 255 
// POST: outputs run length encoded bytes of tuple 

void output(const unsigned int run_length, const unsigned int value) 
{ 
    assert(1 <= run_length && run_length <= 127); 
    assert(0 <= value && value <= 255); //Why is value now between 0 and 255?

    if (run_length == 1 && !is_tuple_start(value)) 
        { 
            out_byte(value); 
        } 
    else 
        { 
            out_byte(make_tuple_start(run_length)); 
            out_byte(value); 
        }
}

// PRE: n/a 
// POST: returns true if 0 <= value <= 255, otherwise false 

bool is_byte(const int value) 
{ 
    return 0 <= value && value <= 255; 
}

// PRE: n/a 
// POST: outputs error if value does not indicate end of sequence 

void check_end_of_sequence(const int value) 
{ 
    if (value != -1) 
        { 
            std::cout << "error\n"; 
        } 
}

// PRE: n/a 
// POST: reads byte sequence and outputs encoded bytes 

void encode() 
{ 
    std::cout << "--- encoding: enter byte sequence, terminate with -1\n";
    int value;

    std::cin >> value;

    if (is_byte(value)) 
        { 
            int prev_value = value; //When/Where does value Change?
            unsigned int run_length = 1;

            while(true) 
                {
                    // read next byte, stop if invalid or end of sequence 

                    std::cin >> value; 
                    if (!is_byte(value)) 
                        { break; }

                    // output if value has changed or maximal runlength is reached 
                    // otherwise increase length of current run 

                    if (value != prev_value || is_max_runlength(run_length)) 
                        { 
                            output(run_length, prev_value); 
                            run_length = 1; 
                            prev_value = value; 
                        } 
                    else { ++run_length; }
                }
            output(run_length, prev_value);
        }

    // output "error" if sequence terminated incorrectly 

    check_end_of_sequence(value);
}

// PRE: n/a 
// POST: reads byte sequence and outputs decoded bytes 

void decode() 
{ 
    std::cout << "--- decoding: enter byte sequence, terminate with -1\n";
    int value; 

    while(true) {

        // read next byte, stop if invalid or end of sequence 

        std::cin >> value; //is value only a Byte? Or the whole sequence?

        if (!is_byte(value)) 
            { break; }

        // if this is a tuple output read next byte, otherwise output directly 

        if (is_tuple_start(value)) 
            {
                unsigned int run_length = get_runlength(value);

                // next must be a valid byte, otherwise this is an error 
                std::cin >> value; 

                if (!is_byte(value)) 
                    { 
                        value = 0; 
                        // trigger error in case value = -1 
                        break; 
                    }

                // output uncompressed tuple 

                for(int i = 0; i < run_length; ++i) 
                    { 
                        out_byte(value); 
                    }
            } 

        else { out_byte(value); }
    }

    // output "error" if sequence terminated incorrectly 

    check_end_of_sequence(value);
}


int main(const int argc, const char* argv[]) 
{ 
    std::cout << "--- select mode: 0 = encode / 1 = decode\n"; 

    unsigned int mode; 
    std::cin >> mode;

    if (mode == 0) 
        { 
            encode(); 
        } 
    else if (mode == 1) 
        { 
            decode();
        } 
    else 
        { 
            std::cout << "--- unknown mode, must be 0 (encode) or 1 (decode)\n"; 
        }
}

我希望得到我的问题的答案,并且代码是可读的,基本上是我的讲义中的复制+粘贴。

1 个答案:

答案 0 :(得分:2)

此编码的工作方式是将一系列重复值存储为:

<length> <value>

,而非重复值仅存储为:

<value>

但是当您在编码序列中看到一个数字时,您如何知道它是第一种格式的长度部分,还是只是一个非重复值?它通过使用我们在编码之前在长度上添加128的规则来实现此目的。所以任何数字&gt; 128是启动第一种格式的<length>字节。

但如果非重复项的价值高于128怎么办?对此的解决方案是对大值使用第一种格式,将其视为具有runlength = 1的重复值。

这应该回答你的大多数问题,这些问题涉及所有范围的增加和减少。

为什么runlength必须在这个范围内?

我们将所有内容存储为0到255之间的字节。如果长度大于127,那么当我们向它添加128时,我们得到的数字> 255,这不是适合一个字节。

只是一个字节的值?还是整个序列?

声明为int value;,因此它只是一个数字。每次cin >> value;它都会得到序列中的下一个字节。

为什么值现在在0到255之间?

值始终允许为整个字节,只有长度限制为127,因为我们将128添加到它们。请参阅上面的解释,高值始终编码为长度优先的元组。