java - TIFF 5.0风格的LZW压缩有什么特别之处

TIFF 5.0风格的LZW压缩有什么特别之处

时间：2014-10-14 17:25:06

标签： java decode tiff lzw

我正在编写TIFF解码器。我使用的LZW解码器适用于所有LZW压缩的GIF和TIFF图像，除了一个会溢出解码的代码串的缓冲区。我使用来自com.sun.media.imageioimpl.plugins.tiff包的TIFFLZWDecompressor测试它，它抛出以下异常“java.lang.UnsupportedOperationException：不支持TIFF 5.0样式的LZW代码”。

我一直试图找到5.0风格的LZW的特别之处但没有成功。有没有人对此有任何想法？

注意：从TIFFLZWDecompressor源代码中，TIFF 5.0样式LZW压缩的指示符是压缩数据的前两个字节{0x00,0x01}。

2 个答案:

答案 0 :(得分：3)

I've bumped into the same problem recently while writing a TIFF LZW encoder. A TIFF check tool complained about "old-style LZW codes", while decoding the image properly. After some research, I found out that there has been a change in the implementation of the LZW compressor. The original ("old-style") format used exactly the same mode of operation as the GIF LZW compressor. Actually, you can use a working GIF compressor and snap it into a TIFF implementation without much effort, and it will yield files that are accepted by most TIFF readers. (One notable exception I've found was Corel PaintShop Pro X7.)

The difference between "old-style" and "new-style" applies to two encoding details:

LZW codes are written to the stream in reversed bit order.
"New-style" increases the code size one symbol earlier than "old-style" (so-called "Early Change").

Clever TIFF decoders inspect the first one or two bytes of the bit stream to detect "old-style" encoding. This is possible due to the fact that the first symbol emitted is always a clear code 0x100. If the first byte is 0x00, then those are obviously the 8 zero bits after the leading 1 bit, so it's "old-style". A "new-style" bit stream starts with the 1 bit, so the first byte is 0x01.

答案 1 :(得分：2)

TIFF 6.0规范说：

也可以实现LZW字符的LZW版本深度等于BitsPerSample，如修订版5.0的草案2中所述。但有一个这种方法的主要问题。如果BitsPerSample大于11，我们不能使用12位最大代码，生成的LZW表格大得令人无法接受。

（TIFF6.pdf，第58-59页）

这可能是他们所指的。

另一方面......在我自己的读者中，我发现：

注意：这是规范违规。但是，libTiff会读取此类文件。 TIFF 6.0规范，第13节：＆＃34; LZW压缩＆＃34; /＆＃34;算法＆＃34;，第61页，说： LZW压缩码以高到低的顺序存储到字节中，即FillOrder 假设为1.压缩代码写为字节（不是字），以便无论是'II'还是'MM'文件，压缩数据都是相同的。＆＃34;

关于0x00,0x01的事实上是＆＃34;清除代码＆＃34;在＆＃34;反向＆＃34; （即遵循字节顺序，而不是忽略它，如规范所述）。