Question

我试图从PPTS中提取所有文本，并使用段落和换行符分隔它们。

NPOIFSFileSystem poifs = new NPOIFSFileSystem(inputStream);
PowerPointExtractor extractor = new PowerPointExtractor(poifs);
StringBuilder SB = new StringBuilder();
BufferedReader bufReader = new BufferedReader(new StringReader(extractor2.getText()));

String line = null;
while ((line = bufReader.readLine()) != null) {
    if (line.trim().length() > 2) {
        line = line.replaceAll("  ", "<br />");
        line = line.replaceAll("\\s+", " ");
        line = Normalizer.normalize(line, Normalizer.Form.NFD);
        SB.append("<p>").append(line).append("</p>\r\n");
    }
}
System.out.println(SB.toString());

但是，让我们说在包含一个包含多个单元格的表格的特定幻灯片中，如下所示：

使用上面的代码，输出就像这样

有没有办法正确浏览每张幻灯片，然后根据容器提取和分离文本？

实施例

<p>IBM IBM IBM MS IBM MS</p> 

will become 

<p>IBM</p><p>IBM</p><p>IBM</p><p>MS</p><p>IBM</p><p>MS</p>

使用ApachePOI从PPT幻灯片中提取和分离文本

0 个答案: