如何将.docx转换为.txt或从android中的docx文件获取文本?

时间:2019-07-18 04:25:17

标签: android apache apache-tika

我正在使用OneDrive sdk从OneDrive下载.docx文件。下载成功,但是我需要将其转换为.txt格式,但是我做不到。

有人知道如何在android中从.docx文件转换或获取文本吗?

我可以获取{。{1}}的.docx文件。

这是从OneDrive下载文件的代码

InputStream

此代码已经在InputStream inputStream = iOneDriveClient.getDrive().getItems(fileID).getContent().buildRequest().get(); OutputStream out = new FileOutputStream(mPath); int read; byte[] bytes = new byte[1024]; while ((read = inputStream.read(bytes)) != -1) { out.write(bytes, 0, read); } out.flush(); out.close(); inputStream.close();

编辑

我添加了Apache POI库,但无法编译

我在很多文件上遇到冲突

这是我的doInBackground

build.gradle

冲突错误是

在模块docx4j-6.1.1-SNAPSHOT-shaded.jar(docx4j-6.1.1-SNAPSHOT-shaded.jar)和jackson-core-2.9.6中找到的重复类com.fasterxml.jackson.core.Base64Variant。 jar(com.fasterxml.jackson.core:jackson-core:2.9.6)

1 个答案:

答案 0 :(得分:0)

您可以使用Apache POI

来自文档:

  

对于Word 97-Word 2003中的.doc文件,在暂存器中有org.apache.poi.hwpf.extractor.WordExtractor,它将为您的文档返回文本。

以下是Google文档中的示例:

FileInputStream fis = new FileInputStream(inputFile);
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
// Firstly, get an extractor for the Workbook
POIOLE2TextExtractor oleTextExtractor = 
   ExtractorFactory.createExtractor(fileSystem);
// Then a List of extractors for any embedded Excel, Word, PowerPoint
// or Visio objects embedded into it.
POITextExtractor[] embeddedExtractors =
   ExtractorFactory.getEmbededDocsTextExtractors(oleTextExtractor);
for (POITextExtractor textExtractor : embeddedExtractors) {
   // If the embedded object was an Excel spreadsheet.
   if (textExtractor instanceof ExcelExtractor) {
      ExcelExtractor excelExtractor = (ExcelExtractor) textExtractor;
      System.out.println(excelExtractor.getText());
   }
   // A Word Document
   else if (textExtractor instanceof WordExtractor) {
      WordExtractor wordExtractor = (WordExtractor) textExtractor;
      String[] paragraphText = wordExtractor.getParagraphText();
      for (String paragraph : paragraphText) {
         System.out.println(paragraph);
      }
      // Display the document's header and footer text
      System.out.println("Footer text: " + wordExtractor.getFooterText());
      System.out.println("Header text: " + wordExtractor.getHeaderText());
   }
   // PowerPoint Presentation.
   else if (textExtractor instanceof PowerPointExtractor) {
      PowerPointExtractor powerPointExtractor =
         (PowerPointExtractor) textExtractor;
      System.out.println("Text: " + powerPointExtractor.getText());
      System.out.println("Notes: " + powerPointExtractor.getNotes());
   }

}