Question

摘要：如何使用itextsharp减少将tif转换为pdf所需的时间？

背景：我正在使用C＃和itextsharp将一些相当大的tif转换为pdf，而且我的性能非常糟糕。 tif文件大约是50kb，有些文件最多有150个单独的tif文件（每个文件代表一个页面）。对于一个132页的文档（~6500 kb），转换大约需要13分钟。在转换过程中，它运行的单CPU服务器运行在100％，这让我相信该进程是CPU绑定的。输出pdf文件为3.5 MB。我的尺寸很好，但对我来说时间似乎有点高。

代码：

private void CombineAndConvertTif(IList<FileInfo> inputFiles, FileInfo outputFile)
{
    using (FileStream fs = new FileStream(outputFile.FullName, FileMode.Create, FileAccess.ReadWrite, FileShare.None))
    {
        Document document = new Document(PageSize.A4, 50, 50, 50, 50);
        PdfWriter writer = PdfWriter.GetInstance(document, fs);
        document.Open();
        PdfContentByte cb = writer.DirectContent;

        foreach (FileInfo inputFile in inputFiles)
        {
            using (Bitmap bm = new Bitmap(inputFile.FullName))
            {
                int total = bm.GetFrameCount(FrameDimension.Page);

                for (int k = 0; k < total; ++k)
                {
                    bm.SelectActiveFrame(FrameDimension.Page, k);
                    //Testing shows that this line takes the lion's share (80%) of the time involved.
                    iTextSharp.text.Image img =
                        iTextSharp.text.Image.GetInstance(bm, null, true);
                    img.ScalePercent(72f / 200f * 100);
                    img.SetAbsolutePosition(0, 0);

                    cb.AddImage(img);
                    document.NewPage();
                }
            }
        }

        document.Close();
        writer.Close();
    }

}

Answer 1

将GetInstance方法参数修改为

GetInstance(bm, ImageFormat.Tiff)

这可能会提高性能

iTextSharp.text.Image img =  iTextSharp.text.Image.GetInstance(bm, ImageFormat.Tiff);

Answer 2

我不确定此问题最初发布时可用的内容，但在将TIFF转换为PDF时，似乎iText 5.x可提供更多内容。 iText in Action 2nd Edition "part3.chapter10.PagedImages"中还有一个基本代码示例，我没有注意到任何性能问题。然而，它并没有真正处理好缩放比例所以我改变了它：

public static void AddTiff(Document pdfDocument, Rectangle pdfPageSize, String tiffPath)
{
    RandomAccessFileOrArray ra = new RandomAccessFileOrArray(tiffPath);
    int pageCount = TiffImage.GetNumberOfPages(ra);

    for (int i = 1; i <= pageCount; i++) 
    {
        Image img = TiffImage.GetTiffImage(ra, i);

        if (img.ScaledWidth > pdfPageSize.Width || img.ScaledHeight > pdfPageSize.Height)
        {
            if (img.DpiX != 0 && img.DpiY != 0 && img.DpiX != img.DpiY)
            {
                img.ScalePercent(100f);
                float percentX = (pdfPageSize.Width * 100) / img.ScaledWidth;
                float percentY = (pdfPageSize.Height * 100) / img.ScaledHeight;

                img.ScalePercent(percentX, percentY);
                img.WidthPercentage = 0;
            }
            else
            {
                img.ScaleToFit(pdfPageSize.Width, pdfPageSize.Height);
            }
        }

        Rectangle pageRect = new Rectangle(0, 0, img.ScaledWidth, img.ScaledHeight);

        pdfDocument.SetPageSize(pageRect);
        pdfDocument.SetMargins(0, 0, 0, 0);
        pdfDocument.NewPage();
        pdfDocument.Add(img);
    }
}

Answer 3

问题在于iTextSharp完成使用System.Drawing.Image对象的麻烦所需的时间。

要在某些测试中将速度提高到10秒，我需要将选定的帧保存到内存流中，然后将数据的字节数组直接传递给iTextSharp中的GetInstance方法，请参见下文...

bm.SelectActiveFrame(FrameDimension.Page, k);

iTextSharp.text.Image img;
using(System.IO.MemoryStream mem = new System.IO.MemoryStream())
{
    // This jumps all the inbuilt processing iTextSharp will perform
    // This will create a larger pdf though
    bm.Save(mem, System.Drawing.Imaging.ImageFormat.Png);
    img = iTextSharp.text.Image.GetInstance(mem.ToArray());
}

img.ScalePercent(72f / 200f * 100);

Answer 4

你正在处理大量数据，所以如果PDF导出过程很慢，并且你没有使用快速PC，那么你可能会遇到这种性能。

在多核系统上加速的最明显方法是多线程化。

将代码分为两个阶段。首先，可以转换一组图像并将其存储在列表中，然后可以将列表输出到PDF。根据您所讨论的文件大小，在处理过程中将整个文档存储在内存中的内存使用情况应该不是问题。

然后你可以使这个过程的第一个阶段成为多线程的 - 你可以为每个需要转换的图像启动一个线程池线程，限制活动线程的数量（每个CPU核心一个就足够了 - 更多不会让你获得太多）另一种方法是将输入列表拆分为 n 列表（同样，每个CPU核心一个列表），然后触发只处理自己列表的线程。这减少了线程开销，但可能会导致一些问题在其他问题之前完成很长时间（如果他们的工作量变得更少），所以它可能并不总是能够快速完成。

通过将它分成两个通道，可能也可以通过执行所有输入处理然后将所有输出处理作为单独的阶段来获得性能（即使没有多线程处理），这可能会减少涉及的磁盘搜索（取决于您的PC上可用于磁盘缓存的RAM数量。）

请注意，如果你只有一个核心CPU，那么对它进行多线程处理会有很大用处（尽管你仍然可以看到I / O绑定的部分进程中的收益，但听起来你主要是CPU绑定的）。

您还可以尝试使用除itextsharp调用之外的其他内容来调整位图的大小 - 我对itextsharp一无所知但是它的图像转换代码可能很慢，或者没有以某种方式使用图形硬件其他缩放技术也许能够。您可以设置一些缩放选项，以便在质量和速度之间进行权衡。

Answer 5

我遇到了这个问题。我最终使用了Adobe Acrobat的批处理功能，该功能运行良好。我只是设置了一个新的批处理过程，它将目标文件夹中的所有tiff转换为写入目标文件夹的PDF并启动它。它很容易设置，但处理时间比我想要的要长。它确实完成了工作。

不幸的是，Adobe Acrobat不是免费的，但您应该考虑它（权衡开发'免费'解决方案的时间成本与软件成本的关系。）

Answer 6

//Testing shows that this line takes the lion's share (80%) of the time involved.
iTextSharp.text.Image img =
  iTextSharp.text.Image.GetInstance(bm, null, true);

可能是愚蠢的建议（现在没有大型测试版在本地试用），但给我带来疑问的好处：

你在这里循环遍历多项目，逐帧选择。 bm是这个（巨大的，6.5M）图像，在内存中。我对iTextSharps内部图像处理知之甚少，但也许你可以在这里提供单页图像来帮助吗？您是否可以尝试创建所需大小的新位图，在其上绘制bm（查看Graphics对象的选项以获取与速度相关的属性：例如InterpolationMode）并传入此单个图像而不是每个调用的巨大内容？

Answer 7

根据你的样本，我做了一个基于简单枚举的函数，你定义了工作模式，这里是：

private static void CombineAndConvertTif(FileInfo inputFile, FileInfo outputFile)
    {
        Encoder myEncoder = Encoder.Quality;
        EncoderParameters myEncoderParameters = new EncoderParameters(1);
        EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, 50L);
        myEncoderParameters.Param[0] = myEncoderParameter;
        ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);

        Console.Write("Converting {0} to {1}... ", inputFile.Name, outputFile.Name);
        Stopwatch sw = Stopwatch.StartNew();

        using (
            FileStream fs = new FileStream(
                outputFile.FullName, FileMode.Create, FileAccess.ReadWrite, FileShare.None))
        {
            Document document = new Document(PageSize.A4, 50, 50, 50, 50);

            PdfWriter writer = PdfWriter.GetInstance(document, fs);

            writer.CompressionLevel = 100;
            writer.SetFullCompression();

            document.Open();
            PdfContentByte cb = writer.DirectContent;

            using (Bitmap bm = new Bitmap(inputFile.FullName))
            {
                int pages = bm.GetFrameCount(FrameDimension.Page);

                for (int currentPage = 0; currentPage < pages; ++currentPage)
                {
                    bm.SelectActiveFrame(FrameDimension.Page, currentPage);
                    bm.SetResolution(96, 96);

                    Image img;
                    if (QualityMode == QualityMode.Slow)
                    {
                        #region Low speed, smaller files
                        img = iTextSharp.text.Image.GetInstance(bm, null, true);
                        #endregion
                    }
                    else
                    {
                        #region Fast speed, bigger files
                        using (MemoryStream mem = new MemoryStream())
                        {
                            bm.Save(mem, jgpEncoder, myEncoderParameters);
                            img = Image.GetInstance(mem.ToArray());
                        }
                        #endregion
                    }

                    img.ScalePercent(72f / 200f * 100);
                    img.SetAbsolutePosition(0, 0);

                    cb.AddImage(img);
                    document.NewPage();
                }
            }

            document.Close();
            writer.Close();
        }

        sw.Stop();
        Console.WriteLine(" time: {0}", sw.Elapsed);
    }

枚举是：

    internal enum QualityMode
{
    /// <summary>
    /// Process images quickly but
    /// produces bigger PDFs
    /// </summary>
    Fast,
    /// <summary>
    /// Process images slower but
    /// produces smaller PDFs
    /// </summary>
    Slow
}

糟糕的性能使用ITextSharp将tif转换为pdf

7 个答案: