如何从扫描页面的PDF中制作可搜索的PDF?

时间:2015-09-01 11:11:17

标签: java pdf tesseract

如何使用tesseract为我的Java应用程序从扫描页面的PDF中制作可搜索的PDF?

2 个答案:

答案 0 :(得分:1)

可以使用Gnostice XtremeDocumentStudio(适用于Java)。 http://www.gnostice.com/nl_article.asp?id=289&t=How_to_convert_scanned_images_to_searchable_PDF_using_OCR_in_Java

DocumentConverter dc = new DocumentConverter();
DigitizerSettings ds = dc.getPreferences().getDigitizerSettings();
ds.setDigitizationMode(DigitizationMode.ALL_IMAGES);
ds.setRecognizeElementTypes(RecognizeElementTypes.TEXT);

try {
  dc.convertToFile(
    "H:\\Screenshot-2.png", 
    "e:\\converted_image.pdf");
} catch (FormatNotSupportedException e) {
  e.printStackTrace();
} catch (ConverterException e) {
  e.printStackTrace();
} catch (XDocException e) {
  e.printStackTrace();
}

免责声明:我在Gnostice工作。

答案 1 :(得分:0)

private decimal _cost;
public decimal Cost
{
    get { return _cost; }
    set
    {
        if (_cost != value)
        {
            _cost = value;
            NotifyOfPropertyChange("Cost");

            if (_cost > 0)
            {
                _price = Math.Round(_cost * ((decimal)1.50), 2);
                NotifyOfPropertyChange("Price");
            }

        }
    }
}

private decimal _price;
public decimal Price
{
    get { return _price; }
    set
    {
        if (_price != value)
        {
            _price = value;
            NotifyOfPropertyChange("Price");

            if (_price > 0)
            {
                _cost = Math.Round(_price / (decimal)(1.55), 2);
                NotifyOfPropertyChange("Cost");
            }                    
        }
    }
}