将扫描的PDF转换为图像

时间:2016-11-30 17:54:51

标签: c# pdf itext tesseract

我可以使用Tesseract扫描JPG图像,我可以使用ITextSharp扫描常规PDF并从中获取文本。但我无法找到一种方法来从带有.PDF扩展名的扫描PDF中获取文本,或者将PDF转换为图像,以便我可以使用Tesseract进行扫描。我有什么选择吗?谢谢!

1 个答案:

答案 0 :(得分:0)

假设您已扫描PDF文档。其次假设您在PDF文档中只有文本。您可以使用以下方法从文本生成图像

private Image DrawText(String text, Font font, Color textColor, Color backColor)
{
    //first, create a dummy bitmap just to get a graphics object
    Image img = new Bitmap(1, 1);
    Graphics drawing = Graphics.FromImage(img);

    //measure the string to see how big the image needs to be
    SizeF textSize = drawing.MeasureString(text, font);

    //free up the dummy image and old graphics object
    img.Dispose();
    drawing.Dispose();

    //create a new image of the right size
    img = new Bitmap((int) textSize.Width, (int)textSize.Height);

    drawing = Graphics.FromImage(img);

    //paint the background
    drawing.Clear(backColor);

    //create a brush for the text
    Brush textBrush = new SolidBrush(textColor);

    drawing.DrawString(text, font, textBrush, 0, 0);

    drawing.Save();

    textBrush.Dispose();
    drawing.Dispose();

    return img;

}

参考:How to generate an image from text on fly at runtime