如何使用Poi阅读doc文件?

时间:2016-01-21 13:05:52

标签: java ms-word apache-poi docx

我正在尝试在编辑器窗格中查看word文件 我试过这些行

import java.awt.Dimension;
import java.awt.GridLayout;
import java.io.File;
import java.io.FileInputStream;
import javax.swing.JEditorPane;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class editorpane extends JEditorPane
{
public editorpane(File file)
{

    try
    {
        FileInputStream fis = new FileInputStream(file.getAbsolutePath());
        HWPFDocument hwpfd = new HWPFDocument(fis);
        WordExtractor we = new WordExtractor(hwpfd);
        String[] array = we.getParagraphText();
        for (int i = 0; i < array.length; i++)
        {
            this.setPage(array[i]);
        }

    } catch (Exception e)
    {
        e.printStackTrace();
    }

但是给了我

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at frame1.editorpane.<init>(editorpane.java:24)

在这一行

HWPFDocument hwpfd = new HWPFDocument(fis);

我该如何解决?

旁边我不确定这些行

for (int i = 0; i < array.length; i++)
        {
            this.setPage(array[i]);
        }

我可以确认吗??

1 个答案:

答案 0 :(得分:2)

您正在尝试使用.doc(HWPF)文件的代码打开.docx文件(XWPF)。您可以将XWPFWordExtractor用于.docx文件。

您可以使用ExtractorFactory让POI决定哪些适用并使用正确的类打开文件,但是您不能按页面迭代只作为通用getText()方法可用。

像这样使用

POITextExtractor extractor = ExtractorFactory.createExtractor(file);
extractor.getText();
相关问题