如何将pdf数据提取到excel中?

时间:2017-12-15 15:45:40

标签: excel pdf data-analysis

我想将pdf数据转换为excel数据。我已经将pdf转换为文本文件,并删除了.txt文件中不必要的文本,但它们现在已经存在行但我希望它们成为列式。

PDF文件:chemistry-chemists.com/chemister/Spravochniki/handbook-of-aqueous-solubility-data-2010.pdf

excel文件的当前状态:

enter image description here

excel文件的必需状态:

enter image description here

3 个答案:

答案 0 :(得分:1)

PDFtables.com专门从PDF中将表格提取到Excel中。这应该能够做你想要的:)

答案 1 :(得分:0)

在ASP.NET中,您可以使用该代码

    <div>
    Upload PDF File :<asp:FileUpload ID="fuPdfUpload" runat="server" />
    <asp:Button ID="btnExportToExcel" Text="Export To Excel" OnClick="ExportToExcel" runat="server" />
</div>

!!您必须从NuGet实现iTextSharp!

protected void ExportToExcel(object sender, EventArgs e)
        {
            if (this.fuPdfUpload.HasFile)
            {
                string file = Path.GetFullPath(fuPdfUpload.PostedFile.FileName);
                this.ExportPDFToExcel(file);
            }
        }

        private void ExportPDFToExcel(string fileName)
        {
            StringBuilder text = new StringBuilder();
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
                string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
                text.Append(currentText);

            }

            pdfReader.Close();
            Response.Clear();
            Response.Buffer = true;
            Response.AddHeader("content-disposition", "attachment;filename=ReceiptExport.xls");
            Response.Charset = "";
            Response.ContentType = "application/vnd.ms-excel";
            Response.Write(text);
            Response.Flush();
            Response.End();
        }

答案 2 :(得分:0)

看看Tabula是一种非常有效的工具,可以将表格从pdf转换为https://github.com/tabulapdf/tabula

相关问题