从PDF中删除图像

时间:2011-06-24 11:17:48

标签: image pdf ghostscript

我读了Create a tiff with only text and no images from a postscript file with ghostscript并尝试使用KenS的回答。 但是这种方法仅删除“黑色”图像 - 图像仅包含黑色通道中的数据(PDF具有色彩空间CMYK)。如何在我的情况下删除所有图像?

2 个答案:

答案 0 :(得分:2)

这样做的更好,但不完整。例如,它不处理使用多个数据源的图像。它本质上未经测试,除了我测试你的小文件(pages.pdf),使用ps2write转换为PostScript,然后转换为PostScript程序,以及teh pdfwrite设备,转换回PDF。

您将注意到的第一件事是,几乎所有文本都从您的文档中消失了。那是因为您使用的字体是位图字体,并且程序无法区分表示字符的位图和任何其他类型的位图。对于此文件,您可以通过删除imagemask的定义来解决这个问题,因为所有字符都使用imagemask,而其他图像使用'image'。

我有一种偷偷摸摸的怀疑,程序的格式化将会搞砸到这里: - (

8<------------------------------8<--------------------------8<-------------------------
%!

% 
% numbytes -file- ConsumeFileData -
%
/ConsumeFileData {
  userdict begin
  /DataString 256 string def
  /DataFile exch def
  /BytesToRead exch def

%(BytesToRead = ) print BytesToRead ==
  mark
  {
    DataFile DataString readstring {                    % read bytes
      /BytesToRead BytesToRead 256 sub def              % not EOF subtract 256 from required amount.
%(Read 256 bytes) ==
%(BytesToRead now = ) print BytesToRead ==
    } {
      length 
%(Read ) print dup 256 string cvs print (bytes) ==
      BytesToRead exch sub /BytesToRead exch def % Reached EOF, subtract length read froom required amount
%(BytesToRead now = ) print BytesToRead ==
      exit                                              % and exit loop 
    } ifelse
  } loop

%BytesToRead ==
  BytesToRead 0 gt {
    (Ran out of image data reading from DataSource\n) ==
  } if
  cleartomark
  end
} bind def

% 
% numbytes -proc- ConsumeProcData -
%
/ConsumeProcData {
userdict begin
  /DataProc exch def
  /BytesToRead exch def

  {
    DataProc exec                                     % returns a string
    length BytesToRead exch sub                       % subtract # bytes read
    /BytesToRead exch def
    BytesToRead 0 le {
      exit                                            % exit when read enough
    } if
  } loop
end
} bind def

/image {
 (image) ==
 dup type /dicttype eq { 
  dup /MultipleDataSources known {
    dup /MultipleDataSources get {
      (Can't handle image with multiple sources!) ==
    } if
  } if
  dup /Width get                 % stack = -dict- width
  exch dup /BitsPerComponent get % stack = width -dict- bpc
  exch dup /Decode get           % stack = width bpc -dict- decode
  length 2 div                   % decode = 2 * num components
  exch 4 1 roll                  % stack = -dict- width bpc ncomps
  mul mul                        % stack = -dict- width*bpc*ncomps
  7 add cvi 8 idiv               % stack = -dict- width(bytes) 
  exch dup /Height get           % stack = width -dict- height
  exch /DataSource get           % stack = width height DataSource
  3 1 roll                       % stack = DataSource width height
  mul                            % stack = DataSource widht*height
  exch                           % stack = size DataSource
 } {
  5 -1 roll 
  pop                       % throw away matrix
  mul mul                   % bits/sample*width*height
  7 add cvi 8 idiv          % size in bytes of data floor(bits+7 / 8)
  exch                      % stack = size DataSource
 } ifelse

 dup type /filetype eq { 
  ConsumeFileData
 } {
   dup type /arraytype eq or
   1 index type /packedarraytype eq or {
    ConsumeProcData
   } {
    pop pop                  % Remove DataSource and size
   } ifelse
 } ifelse
} bind def

/imagemask {
(imagemask)==
 dup type /dicttype eq { 
  dup /MultipleDataSources known {
    dup /MultipleDataSources get {
      (Can't handle imagemask with multiple sources!) ==
    } if
  } if
  dup /Width get                 % stack = -dict- width
  7 add cvi 8 idiv             % size in bytes of width floor(bits+7 / 8)
  exch dup /Height get           % stack = width -dict- height
  exch /DataSource get           % stack = width height DataSource
  3 1 roll                       % stack = DataSource width height
  mul                            % stack = DataSource width*height
  exch                           % stack = size DataSource
 } {
  5 -1 roll 
  pop                       % throw away matrix
  mul mul                   % bits/sample*width*height
  7 add cvi 8 idiv          % size in bytes of data floor(bits+7 / 8)
  exch                      % stack = size DataSource
 } ifelse

 dup type /filetype eq { 
  ConsumeFileData
 } {
   dup type /arraytype eq or
   1 index type /packedarraytype eq or {
    ConsumeProcData
   } {
    pop pop                  % Remove DataSource and size
   } ifelse
 } ifelse
} bind def

/colorimage {
(colorimage)==
  dup 1 ne {
    1 index
    {
      (Can't handle colorimage with multiple sources!) ==
    } if
  } {
    exch pop                   % get rid of 'multi'
                   % stack: w h bpc m d ncomp
    3 -1 roll pop              % stack: w h bpc d ncomp
    exch 5 -1 roll             % stack d w h bpc ncomp
    mul mul mul                % stack: d w*h*bpc*ncomp
    7 add cvi 8 idiv exch      % stack: bytes datasource
  } ifelse

 dup type /filetype eq { 
  ConsumeFileData
 } {
   dup type /arraytype eq or
   1 index type /packedarraytype eq or {
    ConsumeProcData
   } {
    pop pop                  % Remove DataSource and size
   } ifelse
 } ifelse
} bind def

答案 1 :(得分:1)

该技术适用于任何颜色的图像,因为图像操作符用于彩色和单色图像。除非你的文件使用obselete level 1.5'colorimage'运算符。我不记得是否在示例中重新定义了该运算符,如果没有,则yuo可以以类似的方式重新定义它。

事实上,我看到我提供了图像,色彩图像和图像掩码的重新定义,因此应该省略所有图像类型。也许你可以分享一个例子吗?