XWPFTable无法识别word文档中的表

时间:2017-09-11 11:42:00

标签: java apache-poi xwpf

我已使用ABBYY finereader将PDF文档转换为word文档。 XWPFTable(Apache POI)无法识别word文档中的表格。

以下是表格格式:

Heading1        Heading2       Heading3  Heading4
Sub-heading1    Sub-heading2         
2011            36.66          ABC       24,000 C
2012            46.90          ABC       78,000 C
                ​               ABC       90,000 D

以下是我的代码:

import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;

public class TableExtraction {
  public static void main(String[] args) {
    try {
      FileInputStream fis = new FileInputStream("<path to docx file>");
      XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
      Iterator<IBodyElement> bodyElementIterator = xdoc.getBodyElementsIterator();
      while(bodyElementIterator.hasNext()) {
        IBodyElement element = bodyElementIterator.next();
        if("TABLE".equalsIgnoreCase(element.getElementType().name())) {
          System.out.println("Table Data");
          List<XWPFTable> tableList =  element.getBody().getTables();
          for (XWPFTable table: tableList) {
            System.out.println("Total Number of Rows of Table:" + table.getNumberOfRows());
            System.out.println(table.getText());
          }
        }
        else {
          System.out.println("Not a Table Data"); 
        }
      }
      xdoc.close();
    }
    catch(Exception ex) {
      ex.printStackTrace();
    } 
  }
}  

输出:

  

不是表格数据

1 个答案:

答案 0 :(得分:0)

我在我的Word桌面上用你的代码尝试了它,但它没有用。假设它是一个常规的Word表,你可以像这样直接迭代表:

public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(FILE_NAME);
    XWPFDocument xdoc = new XWPFDocument(fis);

    for (XWPFTable table : xdoc.getTables()) {
         System.out.println(table.getRows().size());

          //in case you want to do more with the table cells...
         for (XWPFTableRow row : table.getRows()) {
            for (XWPFTableCell cell : row.getTableCells()) {
                for (XWPFParagraph para : cell.getParagraphs()) {
                    System.out.println(para.getText());
                }
            }
        }
    }
    fis.close();
    xdoc.close();
}

如果这不起作用,则从PDF转换中可能出现问题。